Anthropic Releases Claude 4.6 Sonnet with 1 Million Token Context to S …

Anthropic is officially entering its ‘Thinking’ era. Today, the company announced Claude 4.6 Sonnet, a model designed to transform how devs and data scientists handle complex logic. Alongside this release comes Improved Web Search with Dynamic Filtering, a feature that uses internal code execution to verify facts in real-time.

https://www.anthropic.com/news/claude-sonnet-4-6

Adaptive Thinking: A New Logic Engine

The core update in Claude 4.6 Sonnet is the Adaptive Thinking engine. Accessed via the extended thinking API, this allows the model to ‘pause’ and reason through a problem before generating a final response.

Instead of jumping straight to code, the model creates internal monologues to test logic paths. You can see this in the new Thought interface. For a dev debugging a complex race condition, this means the model identifies the root cause in its ‘thinking’ stage rather than guessing in the code output.

This improves data cleaning tasks. When processing a messy dataset, 4.6 Sonnet spends more compute time analyzing edge cases and schema inconsistencies. This process significantly reduces the ‘hallucinations’ common in faster, non-reasoning models.

The Benchmarks: Closing the Gap with Opus

The performance data for 4.6 Sonnet shows it is now breathing down the neck of the flagship Opus model. In many categories, it is the most efficient ‘workhorse’ model currently available.

Benchmark CategoryClaude 3.5 SonnetClaude 4.6 SonnetKey ImprovementSWE-bench Verified49.0%79.6%Optimized for complex bug fixing and multi-file editing.OSWorld (Computer Use)14.9%72.5%Massive gain in autonomous UI navigation and tool usage.MATH71.1%88.0%Enhanced reasoning for advanced algorithmic logic.BrowseComp (Search)33.3%46.6%Improved accuracy via native Python-based dynamic filtering.

The 72.5% score on OSWorld is a major highlight. It suggests that Claude 4.6 Sonnet can now navigate spreadsheets, web browsers, and local files with near-human accuracy. This makes it a prime candidate for building autonomous ‘Computer Use’ agents.

Search Meets Python: Dynamic Filtering

Anthropic’s Improved Web Search with Dynamic Filtering changes how AI interacts with the live web. Most AI search tools simply scrape the first few results they find.

Claude 4.6 Sonnet takes a different path. It uses a Python code execution sandbox to post-process search results. If you search for a library update from 2025, the model writes and runs code to filter out any results that are older than your specified date. It also filters by Site Authority, prioritizing technical hubs like GitHub, Stack Overflow, and official documentation.

This means fewer outdated code snippets. The model performs a ‘Multi-Step Retrieval.’ It does an initial search, parses the HTML, and applies filters to ensure the ‘Noise-to-Signal’ ratio remains low. This increased search accuracy from 33.3% to 46.6% in internal testing.

Scaling and Pricing for Production

Anthropic is positioning 4.6 Sonnet as the primary model for production-grade applications. It now features a 1M token context window in beta. This allows developers to feed an entire repository or a massive technical library into the prompt without losing coherence.

Pricing and Availability:

Input Cost: $3 per 1M tokens.

Output Cost: $15 per 1M tokens.

Platforms: Available on the Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI.

The model also shows improved adherence to System Prompts. This is critical for devs building agents that require strict JSON formatting or specific ‘persona’ constraints.

https://www.anthropic.com/news/claude-sonnet-4-6

Key Takeaways

Adaptive Thinking Engine: Replacing the old binary ‘extended thinking’ mode, Claude 4.6 Sonnet introduces Adaptive Thinking. Using the new effort parameter, the model can dynamically decide how much reasoning is required for a task, optimizing the balance between speed, cost, and intelligence.

Frontier Agentic Performance: The model sets new industry benchmarks for autonomous agents, scoring 79.6% on SWE-bench Verified for coding and 72.5% on OSWorld for computer use. These scores indicate it can now navigate complex software and UI environments with near-human accuracy.

1 Million Token Context Window: Now available in beta, the context window has expanded to 1M tokens. This allows AI devs to ingest entire multi-repo codebases or massive technical archives in a single prompt without the model losing focus or ‘forgetting’ instructions.

Search via Native Code Execution: The new Improved Web Search with Dynamic Filtering allows Claude to write and run Python code to post-process search results. This ensures the model can programmatically filter for the most recent and authoritative sources (like GitHub or official docs) before generating a response.

Production-Ready Efficiency: Claude 4.6 Sonnet maintains a competitive price of $3 per 1M input tokens and $15 per 1M output tokens. Combined with the new Context Compaction API, developers can now build long-running agents that maintain ‘infinite’ conversation history more cost-effectively.

Check out the Technical details here. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
The post Anthropic Releases Claude 4.6 Sonnet with 1 Million Token Context to Solve Complex Coding and Search for Developers appeared first on MarkTechPost.

Cloudflare Releases Agents SDK v0.5.0 with Rewritten @cloudflare/ai-ch …

Cloudflare has released the Agents SDK v0.5.0 to address the limitations of stateless serverless functions in AI development. In standard serverless architectures, every LLM call requires rebuilding the session context from scratch, which increases latency and token consumption. The Agents SDK’s latest version (Agents SDK v0.5.0) provides a vertically integrated execution layer where compute, state, and inference coexist at the network edge.

The SDK allows developers to build agents that maintain state over long durations, moving beyond simple request-response cycles. This is achieved through 2 primary technologies: Durable Objects, which provide persistent state and identity, and Infire, a custom-built Rust inference engine designed to optimize edge resources. For devs, this architecture removes the need to manage external database connections or WebSocket servers for state synchronization.

State Management via Durable Objects

The Agents SDK relies on Durable Objects (DO) to provide persistent identity and memory for every agent instance. In traditional serverless models, functions have no memory of previous events unless they query an external database like RDS or DynamoDB, which often adds 50ms to 200ms of latency.

A Durable Object is a stateful micro-server running on Cloudflare’s network with its own private storage. When an agent is instantiated using the Agents SDK, it is assigned a stable ID. All subsequent requests for that user are routed to the same physical instance, allowing the agent to keep its state in memory. Each agent includes an embedded SQLite database with a 1GB storage limit per instance, enabling zero-latency reads and writes for conversation history and task logs.

Durable Objects are single-threaded, which simplifies concurrency management. This design ensures that only 1 event is processed at a time for a specific agent instance, eliminating race conditions. If an agent receives multiple inputs simultaneously, they are queued and processed atomically, ensuring the state remains consistent during complex operations.

Infire: Optimizing Inference with Rust

For the inference layer, Cloudflare developed Infire, an LLM engine written in Rust that replaces Python-based stacks like vLLM. Python engines often face performance bottlenecks due to the Global Interpreter Lock (GIL) and garbage collection pauses. Infire is designed to maximize GPU utilization on H100 hardware by reducing CPU overhead.

The engine utilizes Granular CUDA Graphs and Just-In-Time (JIT) compilation. Instead of launching GPU kernels sequentially, Infire compiles a dedicated CUDA graph for every possible batch size on the fly. This allows the driver to execute work as a single monolithic structure, cutting CPU overhead by 82%. Benchmarks show that Infire is 7% faster than vLLM 0.10.0 on unloaded machines, utilizing only 25% CPU compared to vLLM’s >140%.

MetricvLLM 0.10.0 (Python)Infire (Rust)ImprovementThroughput SpeedBaseline7% Faster+7%CPU Overhead>140% CPU usage25% CPU usage-82%Startup LatencyHigh (Cold Start)<4 seconds (Llama 3 8B)Significant

Infire also uses Paged KV Caching, which breaks memory into non-contiguous blocks to prevent fragmentation. This enables ‘continuous batching,’ where the engine processes new prompts while simultaneously finishing previous generations without a performance drop. This architecture allows Cloudflare to maintain a 99.99% warm request rate for inference.

Code Mode and Token Efficiency

Standard AI agents typically use ‘tool calling,’ where the LLM outputs a JSON object to trigger a function. This process requires a back-and-forth between the LLM and the execution environment for every tool used. Cloudflare’s ‘Code Mode’ changes this by asking the LLM to write a TypeScript program that orchestrates multiple tools at once.

This code executes in a secure V8 isolate sandbox. For complex tasks, such as searching 10 different files, Code Mode provides an 87.5% reduction in token usage. Because intermediate results stay within the sandbox and are not sent back to the LLM for every step, the process is both faster and more cost-effective.

Code Mode also improves security through ‘secure bindings.’ The sandbox has no internet access; it can only interact with Model Context Protocol (MCP) servers through specific bindings in the environment object. These bindings hide sensitive API keys from the LLM, preventing the model from accidentally leaking credentials in its generated code.

February 2026: The v0.5.0 Release

The Agents SDK reached version 0.5.0. This release introduced several utilities for production-ready agents:

this.retry(): A new method for retrying asynchronous operations with exponential backoff and jitter.

Protocol Suppression: Developers can now suppress JSON text frames on a per-connection basis using the shouldSendProtocolMessages hook. This is useful for IoT or MQTT clients that cannot process JSON data.

Stable AI Chat: The @cloudflare/ai-chat package reached version 0.1.0, adding message persistence to SQLite and a “Row Size Guard” that performs automatic compaction when messages approach the 2MB SQLite limit.

FeatureDescriptionthis.retry()Automatic retries for external API calls.Data PartsAttaching typed JSON blobs to chat messages.Tool ApprovalPersistent approval state that survives hibernation.Synchronous GettersgetQueue() and getSchedule() no longer require Promises.

Key Takeaways

Stateful Persistence at the Edge: Unlike traditional stateless serverless functions, the Agents SDK uses Durable Objects to provide agents with a permanent identity and memory. This allows each agent to maintain its own state in an embedded SQLite database with 1GB of storage, enabling zero-latency data access without external database calls.

High-Efficiency Rust Inference: Cloudflare’s Infire inference engine, written in Rust, optimizes GPU utilization by using Granular CUDA Graphs to reduce CPU overhead by 82%. Benchmarks show it is 7% faster than Python-based vLLM 0.10.0 and uses Paged KV Caching to maintain a 99.99% warm request rate, significantly reducing cold start latencies.

Token Optimization via Code Mode: ‘Code Mode’ allows agents to write and execute TypeScript programs in a secure V8 isolate rather than making multiple individual tool calls. This deterministic approach reduces token consumption by 87.5% for complex tasks and keeps intermediate data within the sandbox to improve both speed and security.

Universal Tool Integration: The platform fully supports the Model Context Protocol (MCP), a standard that acts as a universal translator for AI tools. Cloudflare has deployed 13 official MCP servers that allow agents to securely manage infrastructure components like DNS, R2 storage, and Workers KV through natural language commands.

Production-Ready Utilities (v0.5.0): The February, 2026, release introduced critical reliability features, including a this.retry() utility for asynchronous operations with exponential backoff and jitter. It also added protocol suppression, which allows agents to communicate with binary-only IoT devices and lightweight embedded systems that cannot process standard JSON text frames.

Check out the Technical details. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
The post Cloudflare Releases Agents SDK v0.5.0 with Rewritten @cloudflare/ai-chat and New Rust-Powered Infire Engine for Optimized Edge Inference Performance appeared first on MarkTechPost.

How to Build an Advanced, Interactive Exploratory Data Analysis Workfl …

In this tutorial, we demonstrate how to move beyond static, code-heavy charts and build a genuinely interactive exploratory data analysis workflow directly using PyGWalker. We start by preparing the Titanic dataset for large-scale interactive querying. These analysis-ready engineered features reveal the underlying structure of the data while enabling both detailed row-level exploration and high-level aggregated views for deeper insight. Embedding a Tableau-style drag-and-drop interface directly in the notebook enables rapid hypothesis testing, intuitive cohort comparisons, and efficient data-quality inspection, all without the friction of switching between code and visualization tools.

Copy CodeCopiedUse a different Browserimport sys, subprocess, json, math, os
from pathlib import Path

def pip_install(pkgs):
subprocess.check_call([sys.executable, “-m”, “pip”, “install”, “-q”] + pkgs)

pip_install([
“pygwalker>=0.4.9”,
“duckdb>=0.10.0”,
“pandas>=2.0.0”,
“numpy>=1.24.0”,
“seaborn>=0.13.0”
])

import numpy as np
import pandas as pd
import seaborn as sns

df_raw = sns.load_dataset(“titanic”).copy()
print(“Raw shape:”, df_raw.shape)
display(df_raw.head(3))

We set up a clean and reproducible Colab environment by installing all required dependencies for interactive EDA. We load the Titanic dataset and perform an initial sanity check to understand its raw structure and scale. It establishes a stable foundation before any transformation or visualization begins.

Copy CodeCopiedUse a different Browserdef make_safe_bucket(series, bins=None, labels=None, q=None, prefix=”bucket”):
s = pd.to_numeric(series, errors=”coerce”)
if q is not None:
try:
cuts = pd.qcut(s, q=q, duplicates=”drop”)
return cuts.astype(“string”).fillna(“Unknown”)
except Exception:
pass
if bins is not None:
cuts = pd.cut(s, bins=bins, labels=labels, include_lowest=True)
return cuts.astype(“string”).fillna(“Unknown”)
return s.astype(“float64″)

def preprocess_titanic_advanced(df):
out = df.copy()
out.columns = [c.strip().lower().replace(” “, “_”) for c in out.columns]

for c in [“survived”, “pclass”, “sibsp”, “parch”]:
if c in out.columns:
out[c] = pd.to_numeric(out[c], errors=”coerce”).fillna(-1).astype(“int64”)

if “age” in out.columns:
out[“age”] = pd.to_numeric(out[“age”], errors=”coerce”).astype(“float64”)
out[“age_is_missing”] = out[“age”].isna()
out[“age_bucket”] = make_safe_bucket(
out[“age”],
bins=[0, 12, 18, 30, 45, 60, 120],
labels=[“child”, “teen”, “young_adult”, “adult”, “mid_age”, “senior”],
)

if “fare” in out.columns:
out[“fare”] = pd.to_numeric(out[“fare”], errors=”coerce”).astype(“float64”)
out[“fare_is_missing”] = out[“fare”].isna()
out[“log_fare”] = np.log1p(out[“fare”].fillna(0))
out[“fare_bucket”] = make_safe_bucket(out[“fare”], q=8)

for c in [“sex”, “class”, “who”, “embarked”, “alone”, “adult_male”]:
if c in out.columns:
out[c] = out[c].astype(“string”).fillna(“Unknown”)

if “cabin” in out.columns:
out[“deck”] = out[“cabin”].astype(“string”).str.strip().str[0].fillna(“Unknown”)
out[“deck_is_missing”] = out[“cabin”].isna()
else:
out[“deck”] = “Unknown”
out[“deck_is_missing”] = True

if “ticket” in out.columns:
t = out[“ticket”].astype(“string”)
out[“ticket_len”] = t.str.len().fillna(0).astype(“int64”)
out[“ticket_has_alpha”] = t.str.contains(r”[A-Za-z]”, regex=True, na=False)
out[“ticket_prefix”] = t.str.extract(r”^([A-Za-z./s]+)”, expand=False).fillna(“None”).str.strip()
out[“ticket_prefix”] = out[“ticket_prefix”].replace(“”, “None”).astype(“string”)

if “sibsp” in out.columns and “parch” in out.columns:
out[“family_size”] = (out[“sibsp”] + out[“parch”] + 1).astype(“int64”)
out[“is_alone”] = (out[“family_size”] == 1)

if “name” in out.columns:
title = out[“name”].astype(“string”).str.extract(r”,s*([^.]+).”, expand=False).fillna(“Unknown”).str.strip()
vc = title.value_counts(dropna=False)
keep = set(vc[vc >= 15].index.tolist())
out[“title”] = title.where(title.isin(keep), other=”Rare”).astype(“string”)
else:
out[“title”] = “Unknown”

out[“segment”] = (
out[“sex”].fillna(“Unknown”).astype(“string”)
+ ” | ”
+ out[“class”].fillna(“Unknown”).astype(“string”)
+ ” | ”
+ out[“age_bucket”].fillna(“Unknown”).astype(“string”)
)

for c in out.columns:
if out[c].dtype == bool:
out[c] = out[c].astype(“int64”)
if out[c].dtype == “object”:
out[c] = out[c].astype(“string”)

return out

df = preprocess_titanic_advanced(df_raw)
print(“Prepped shape:”, df.shape)
display(df.head(3))

We focus on advanced preprocessing and feature engineering to convert the raw data into an analysis-ready form. We create robust, DuckDB-safe features such as buckets, segments, and engineered categorical signals that enhance downstream exploration. We ensure the dataset is stable, expressive, and suitable for interactive querying.

Copy CodeCopiedUse a different Browserdef data_quality_report(df):
rows = []
n = len(df)
for c in df.columns:
s = df[c]
miss = int(s.isna().sum())
miss_pct = (miss / n * 100.0) if n else 0.0
nunique = int(s.nunique(dropna=True))
dtype = str(s.dtype)
sample = s.dropna().head(3).tolist()
rows.append({
“col”: c,
“dtype”: dtype,
“missing”: miss,
“missing_%”: round(miss_pct, 2),
“nunique”: nunique,
“sample_values”: sample
})
return pd.DataFrame(rows).sort_values([“missing”, “nunique”], ascending=[False, False])

dq = data_quality_report(df)
display(dq.head(20))

RANDOM_SEED = 42
MAX_ROWS_FOR_UI = 200_000

df_for_ui = df
if len(df_for_ui) > MAX_ROWS_FOR_UI:
df_for_ui = df_for_ui.sample(MAX_ROWS_FOR_UI, random_state=RANDOM_SEED).reset_index(drop=True)

agg = (
df.groupby([“segment”, “deck”, “embarked”], dropna=False)
.agg(
n=(“survived”, “size”),
survival_rate=(“survived”, “mean”),
avg_fare=(“fare”, “mean”),
avg_age=(“age”, “mean”),
)
.reset_index()
)

for c in [“survival_rate”, “avg_fare”, “avg_age”]:
agg[c] = agg[c].astype(“float64”)

Path(“/content”).mkdir(parents=True, exist_ok=True)
df_for_ui.to_csv(“/content/titanic_prepped_for_ui.csv”, index=False)
agg.to_csv(“/content/titanic_agg_segment_deck_embarked.csv”, index=False)

We evaluate data quality and generate a structured overview of missingness, cardinality, and data types. We prepare both a row-level dataset and an aggregated cohort-level table to support fast comparative analysis. The dual representation allows us to explore detailed patterns and high-level trends simultaneously.

Copy CodeCopiedUse a different Browserimport pygwalker as pyg

SPEC_PATH = Path(“/content/pygwalker_spec_titanic.json”)

def load_spec(path):
if path.exists():
try:
return json.loads(path.read_text())
except Exception:
return None
return None

def save_spec(path, spec_obj):
try:
if isinstance(spec_obj, str):
spec_obj = json.loads(spec_obj)
path.write_text(json.dumps(spec_obj, indent=2))
return True
except Exception:
return False

def launch_pygwalker(df, spec_path):
spec = load_spec(spec_path)
kwargs = {}
if spec is not None:
kwargs[“spec”] = spec

try:
walker = pyg.walk(df, use_kernel_calc=True, **kwargs)
except TypeError:
walker = pyg.walk(df, **kwargs) if spec is not None else pyg.walk(df)

captured = None
for attr in [“spec”, “_spec”]:
if hasattr(walker, attr):
try:
captured = getattr(walker, attr)
break
except Exception:
pass
for meth in [“to_spec”, “export_spec”, “get_spec”]:
if captured is None and hasattr(walker, meth):
try:
captured = getattr(walker, meth)()
break
except Exception:
pass

if captured is not None:
save_spec(spec_path, captured)

return walker

walker_rows = launch_pygwalker(df_for_ui, SPEC_PATH)
walker_agg = pyg.walk(agg)

We integrate PyGWalker to transform our prepared tables into a fully interactive, drag-and-drop analytical interface. We persist the visualization specification so that dashboard layouts and encodings survive notebook reruns. It turns the notebook into a reusable, BI-style exploration environment.

Copy CodeCopiedUse a different BrowserHTML_PATH = Path(“/content/pygwalker_titanic_dashboard.html”)

def export_html_best_effort(df, spec_path, out_path):
spec = load_spec(spec_path)
html = None

try:
html = pyg.walk(df, spec=spec, return_html=True) if spec is not None else pyg.walk(df, return_html=True)
except Exception:
html = None

if html is None:
for fn in [“to_html”, “export_html”]:
if hasattr(pyg, fn):
try:
f = getattr(pyg, fn)
html = f(df, spec=spec) if spec is not None else f(df)
break
except Exception:
continue

if html is None:
return None

if not isinstance(html, str):
html = str(html)

out_path.write_text(html, encoding=”utf-8″)
return out_path

export_html_best_effort(df_for_ui, SPEC_PATH, HTML_PATH)

We extend the workflow by exporting the interactive dashboard as a standalone HTML artifact. We ensure the analysis can be shared or reviewed without requiring a Python environment or Colab session. It completes the pipeline from raw data to distributable, interactive insight.

Interactive EDA Dashboard

In conclusion, we established a robust pattern for advanced EDA that scales far beyond the Titanic dataset while remaining fully notebook-native. We showed how careful preprocessing, type safety, and feature design allow PyGWalker to operate reliably on complex data, and how combining detailed records with aggregated summaries unlocks powerful analytical workflows. Instead of treating visualization as an afterthought, we used it as a first-class interactive layer, allowing us to iterate, validate assumptions, and extract insights in real time.

Check out the Full Codes here. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
The post How to Build an Advanced, Interactive Exploratory Data Analysis Workflow Using PyGWalker and Feature-Engineered Data appeared first on MarkTechPost.

How to Build Human-in-the-Loop Plan-and-Execute AI Agents with Explici …

In this tutorial, we build a human-in-the-loop travel booking agent that treats the user as a teammate rather than a passive observer. We design the system so the agent first reasons openly by drafting a structured travel plan, then deliberately pauses before taking any action. We expose this proposed plan in a live interface where we can inspect, edit, or reject it, and only after explicit approval do we allow the agent to execute tools. By combining LangGraph interrupts with a Streamlit frontend, we create a workflow that makes agent reasoning visible, controllable, and trustworthy instead of opaque and autonomous.

Copy CodeCopiedUse a different Browser!pip -q install -U langgraph openai streamlit pydantic
!npm -q install -g localtunnel

import os, getpass, textwrap, json, uuid, time
if not os.environ.get(“OPENAI_API_KEY”):
os.environ[“OPENAI_API_KEY”] = getpass.getpass(“OPENAI_API_KEY (hidden input): “)
os.environ.setdefault(“OPENAI_MODEL”, “gpt-4.1-mini”)

We set up the execution environment by installing all required libraries and utilities needed for agent orchestration and UI exposure. We securely collect the OpenAI API key at runtime so it is never hardcoded or leaked in the notebook. We also configure the model selection upfront to keep the rest of the pipeline clean and reproducible.

Copy CodeCopiedUse a different Browserapp_code = r”’
import os, json, uuid
import streamlit as st
from typing import TypedDict, List, Dict, Any, Optional
from pydantic import BaseModel, Field
from openai import OpenAI

from langgraph.graph import StateGraph, START, END
from langgraph.types import Command, interrupt
from langgraph.checkpoint.memory import InMemorySaver

def tool_search_flights(origin: str, destination: str, depart_date: str, return_date: str, budget_usd: int) -> Dict[str, Any]:
options = [
{“airline”: “SkyJet”, “route”: f”{origin}->{destination}”, “depart”: depart_date, “return”: return_date, “price_usd”: int(budget_usd*0.55)},
{“airline”: “AeroBlue”, “route”: f”{origin}->{destination}”, “depart”: depart_date, “return”: return_date, “price_usd”: int(budget_usd*0.70)},
{“airline”: “Nimbus Air”, “route”: f”{origin}->{destination}”, “depart”: depart_date, “return”: return_date, “price_usd”: int(budget_usd*0.62)},
]
options = sorted(options, key=lambda x: x[“price_usd”])
return {“tool”: “search_flights”, “top_options”: options[:2]}

def tool_search_hotels(city: str, nights: int, budget_usd: int, preferences: List[str]) -> Dict[str, Any]:
base = max(60, int(budget_usd / max(nights, 1)))
picks = [
{“name”: “Central Boutique”, “city”: city, “nightly_usd”: int(base*0.95), “notes”: [“walkable”, “great reviews”]},
{“name”: “Riverside Stay”, “city”: city, “nightly_usd”: int(base*0.80), “notes”: [“quiet”, “good value”]},
{“name”: “Modern Loft Hotel”, “city”: city, “nightly_usd”: int(base*1.10), “notes”: [“new”, “gym”]},
]
if “luxury” in [p.lower() for p in preferences]:
picks = sorted(picks, key=lambda x: -x[“nightly_usd”])
else:
picks = sorted(picks, key=lambda x: x[“nightly_usd”])
return {“tool”: “search_hotels”, “top_options”: picks[:2]}

def tool_build_day_by_day(city: str, days: int, vibe: str) -> Dict[str, Any]:
blocks = []
for d in range(1, days+1):
blocks.append({
“day”: d,
“morning”: f”{city}: coffee + a must-see landmark”,
“afternoon”: f”{city}: {vibe} activity + local lunch”,
“evening”: f”{city}: sunset spot + dinner + optional night walk”
})
return {“tool”: “draft_itinerary”, “days”: blocks}
”’

We define the Streamlit application core and implement safe, deterministic tool functions that simulate flights, hotels, and itinerary generation. We design these tools to behave like real-world APIs while still running fully in a Colab environment. We ensure all tool outputs are structured so they can be audited before execution.

Copy CodeCopiedUse a different Browserapp_code += r”’
class TravelPlan(BaseModel):
trip_title: str = Field(…, description=”Short human-friendly title”)
origin: str
destination: str
depart_date: str
return_date: str
travelers: int = 1
budget_usd: int = 1500
preferences: List[str] = Field(default_factory=list)
vibe: str = “balanced”
lodging_nights: int = 4
daily_outline: List[Dict[str, Any]] = Field(default_factory=list)
tool_calls: List[Dict[str, Any]] = Field(default_factory=list)

class State(TypedDict):
user_request: str
plan: Dict[str, Any]
approval: Dict[str, Any]
execution: Dict[str, Any]

def make_llm_plan(state: State) -> Dict[str, Any]:
client = OpenAI(api_key=os.environ[“OPENAI_API_KEY”])
model = os.environ.get(“OPENAI_MODEL”, “gpt-4.1-mini”)

sys = (
“You are a travel planning agent. ”
“Return a JSON travel plan that matches the provided schema. ”
“Be realistic, concise, and include a tool_calls list describing what you want executed ”
“(e.g., search_flights, search_hotels, draft_itinerary).”
)

schema = TravelPlan.model_json_schema()

resp = client.responses.create(
model=model,
input=[
{“role”:”system”,”content”: sys},
{“role”:”user”,”content”: state[“user_request”]},
{“role”:”user”,”content”: f”Schema (JSON): {json.dumps(schema)}”}
],
)

text = resp.output_text.strip()
start = text.find(“{“)
end = text.rfind(“}”)
if start == -1 or end == -1:
raise ValueError(“Model did not return JSON. Try again or change model.”)
raw = text[start:end+1]
plan_obj = json.loads(raw)

plan = TravelPlan(**plan_obj).model_dump()

if not plan.get(“tool_calls”):
plan[“tool_calls”] = [
{“name”:”search_flights”, “args”:{“origin”: plan[“origin”], “destination”: plan[“destination”], “depart_date”: plan[“depart_date”], “return_date”: plan[“return_date”], “budget_usd”: plan[“budget_usd”]}},
{“name”:”search_hotels”, “args”:{“city”: plan[“destination”], “nights”: plan[“lodging_nights”], “budget_usd”: int(plan[“budget_usd”]*0.35), “preferences”: plan[“preferences”]}},
{“name”:”draft_itinerary”, “args”:{“city”: plan[“destination”], “days”: max(2, plan[“lodging_nights”]+1), “vibe”: plan[“vibe”]}},
]

return {“plan”: plan}

def wait_for_approval(state: State) -> Dict[str, Any]:
payload = {
“kind”: “approval”,
“message”: “Review/edit the plan. Approve to execute tools.”,
“plan”: state[“plan”],
}
decision = interrupt(payload)
return {“approval”: decision}

def execute_tools(state: State) -> Dict[str, Any]:
approval = state.get(“approval”) or {}
if not approval.get(“approved”):
return {“execution”: {“status”: “not_executed”, “reason”: “User rejected or did not approve.”}}

plan = approval.get(“edited_plan”) or state[“plan”]
tool_calls = plan.get(“tool_calls”, [])

results = []
for call in tool_calls:
name = call.get(“name”)
args = call.get(“args”, {})
if name == “search_flights”:
results.append(tool_search_flights(**args))
elif name == “search_hotels”:
results.append(tool_search_hotels(**args))
elif name == “draft_itinerary”:
results.append(tool_build_day_by_day(**args))
else:
results.append({“tool”: name, “error”: “Unknown tool (blocked for safety).”, “args”: args})

return {“execution”: {“status”: “executed”, “tool_results”: results, “final_plan”: plan}}
”’

We formalize the agent’s reasoning using a strict schema that requires the model to output an explicit travel plan rather than free-form text. We generate the plan using the OpenAI model and validate it before allowing it into the workflow. We also auto-inject tool calls if the model omits them to guarantee a complete execution path.

Copy CodeCopiedUse a different Browserapp_code += r”’
def build_graph():
builder = StateGraph(State)
builder.add_node(“plan”, make_llm_plan)
builder.add_node(“approve”, wait_for_approval)
builder.add_node(“execute”, execute_tools)

builder.add_edge(START, “plan”)
builder.add_edge(“plan”, “approve”)
builder.add_edge(“approve”, “execute”)
builder.add_edge(“execute”, END)

memory = InMemorySaver()
graph = builder.compile(checkpointer=memory)
return graph

st.set_page_config(page_title=”Plan → Approve → Execute Travel Agent”, layout=”wide”)
st.title(“Human-in-the-Loop Travel Booking Agent (Plan → Approve/Edit → Execute)”)

with st.sidebar:
st.header(“Runtime”)
if st.button(“New Session / Thread”):
st.session_state.thread_id = str(uuid.uuid4())
st.session_state.ran_once = False
st.session_state.interrupt_payload = None
st.session_state.last_execution = None

thread_id = st.session_state.get(“thread_id”) or str(uuid.uuid4())
st.session_state.thread_id = thread_id

graph = build_graph()
config = {“configurable”: {“thread_id”: thread_id}}

st.caption(f”Thread ID: {thread_id}”)

req = st.text_area(
“Describe your trip request”,
value=st.session_state.get(“user_request”, “Plan a 5-day trip from Dubai to Istanbul in April. Budget $1800. Prefer museums, street food, and a relaxed pace.”),
height=120
)
st.session_state.user_request = req

colA, colB = st.columns([1,1])
run_plan = colA.button(“1) Generate Plan (LLM)”)
resume_btn = colB.button(“2) Resume After Approval”)

if run_plan:
st.session_state.ran_once = True
st.session_state.interrupt_payload = None
st.session_state.last_execution = None

initial = {“user_request”: req, “plan”: {}, “approval”: {}, “execution”: {}}
out = graph.invoke(initial, config=config)

if “__interrupt__” in out and out[“__interrupt__”]:
st.session_state.interrupt_payload = out[“__interrupt__”][0].value
else:
st.session_state.last_execution = out.get(“execution”)

payload = st.session_state.get(“interrupt_payload”)

if payload:
st.subheader(“Plan proposed by agent (editable)”)
plan = payload.get(“plan”, {})
left, right = st.columns([1,1])

with left:
st.write(“**Edit JSON (advanced):**”)
edited_text = st.text_area(“Plan JSON”, value=json.dumps(plan, indent=2), height=420)

with right:
st.write(“**Quick actions:**”)
approved = st.radio(“Decision”, options=[“Approve”, “Reject”], index=0)
st.write(“Tip: If you edit JSON, keep it valid. You can also reject and re-run planning.”)

try:
edited_plan = json.loads(edited_text)
json_ok = True
except Exception as e:
json_ok = False
st.error(f”Invalid JSON: {e}”)

if resume_btn:
if not json_ok:
st.stop()

decision = {
“approved”: (approved == “Approve”),
“edited_plan”: edited_plan
}
out2 = graph.invoke(Command(resume=decision), config=config)
st.session_state.interrupt_payload = None
st.session_state.last_execution = out2.get(“execution”)

exec_result = st.session_state.get(“last_execution”)
if exec_result:
st.subheader(“Execution result”)
st.json(exec_result)
if exec_result.get(“status”) == “executed”:
st.success(“Tools executed only AFTER approval “)
else:
st.warning(“Not executed (rejected or not approved).”)
”’

We construct the LangGraph workflow by separating planning, approval, and execution into distinct nodes. We deliberately interrupt the graph after planning so we can review and control the agent’s intent. We only allow tool execution to proceed when explicit human approval is provided.

Copy CodeCopiedUse a different Browserimport pathlib
pathlib.Path(“app.py”).write_text(app_code)

!streamlit run app.py –server.port 8501 –server.address 0.0.0.0 & sleep 2
!lt –port 8501

We connect the agent workflow to a live Streamlit interface that supports editing, approval, and rejection of plans. We persist the state across runs using a thread identifier so the agent behaves consistently across interactions. We finally launch the app and make it publicly available, enabling real human-in-the-loop collaboration.

In conclusion, we demonstrated how plan-and-execute agents become significantly more reliable when humans remain in the loop at the right moment. We showed that interrupts are not just a technical feature but a design primitive for building trust, accountability, and collaboration into agent systems. By separating planning from execution and inserting a clear approval boundary, we ensured that tools run only with human consent and context. This pattern scales beyond travel planning to any high-stakes automation, giving us agents that think with us rather than act for us.

Check out the Full Codes here. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
The post How to Build Human-in-the-Loop Plan-and-Execute AI Agents with Explicit User Approval Using LangGraph and Streamlit appeared first on MarkTechPost.

Google DeepMind Proposes New Framework for Intelligent AI Delegation t …

The AI industry is currently obsessed with ‘agents’—autonomous programs that do more than just chat. However, most current multi-agent systems rely on brittle, hard-coded heuristics that fail when the environment changes.

Google DeepMind researchers have proposed a new solution. The research team argued that for the ‘agentic web’ to scale, agents must move beyond simple task-splitting and adopt human-like organizational principles such as authority, responsibility, and accountability.

Defining ‘Intelligent’ Delegation

In standard software, a subroutine is just ‘outsourced’. Intelligent delegation is different. It is a sequence of decisions where a delegator transfers authority and responsibility to a delegatee. This process involves risk assessment, capability matching, and establishing trust.

The 5 Pillars of the Framework

To build this, the research team identified 5 core requirements mapped to specific technical protocols:

Framework PillarTechnical ImplementationCore FunctionDynamic AssessmentTask Decomposition & AssignmentGranularly inferring agent state and capacity.Adaptive ExecutionAdaptive CoordinationHandling context shifts and runtime failures.Structural TransparencyMonitoring & Verifiable Completion Auditing both the process and the final outcome.Scalable MarketTrust & Reputation & Multi-objective OptimizationEfficient, trusted coordination in open markets.Systemic ResilienceSecurity & Permission HandlingPreventing cascading failures and malicious use.

Engineering Strategy: ‘Contract-First’ Decomposition

The most significant shift is contract-first decomposition. Under this principle, a delegator only assigns a task if the outcome can be precisely verified.

If a task is too subjective or complex to verify—like ‘write a compelling research paper’—the system must recursively decompose it. This continues until the sub-tasks match available verification tools, such as unit tests or formal mathematical proofs.

Recursive Verification: The Chain of Custody

In a delegation chain, such as 𝐴 → 𝐵 → 𝐶, accountability is transitive.

Agent B is responsible for verifying the work of C.

When Agent B returns the result to A, it must provide a full chain of cryptographically signed attestations.

Agent A then performs a 2-stage check: verifying B’s direct work and verifying that B correctly verified C.

Security: Tokens and Tunnels

Scaling these chains introduces massive security risks, including Data Exfiltration, Backdoor Implanting, and Model Extraction.

To protect the network, DeepMind team suggests Delegation Capability Tokens (DCTs). Based on technologies like Macaroons or Biscuits, these tokens use ‘cryptographic caveats’ to enforce the principle of least privilege. For example, an agent might receive a token that allows it to READ a specific Google Drive folder but forbids any WRITE operations.

Evaluating Current Protocols

The research team analyzed whether current industry standards are ready for this framework. While these protocols provide a base, they all have ‘missing pieces’ for high-stakes delegation.

MCP (Model Context Protocol): Standardizes how models connect to tools. The Gap: It lacks a policy layer to govern permissions across deep delegation chains.

A2A (Agent-to-Agent): Manages discovery and task lifecycles. The Gap: It lacks standardized headers for Zero-Knowledge Proofs (ZKPs) or digital signature chains.

AP2 (Agent Payments Protocol): Authorizes agents to spend funds. The Gap: It cannot natively verify the quality of the work before releasing payment.

UCP (Universal Commerce Protocol): Standardizes commercial transactions. The Gap: It is optimized for shopping/fulfillment, not abstract computational tasks.

Key Takeaways

Move Beyond Heuristics: Current AI delegations relies on simple, hard-coded heuristics that are brittle and cannot dynamically adapt to environmental changes or unexpected failures. Intelligent delegation requires an adaptive framework that incorporates transfer of authority, responsibility, and accountability.

‘Contract-First’ Task Decomposition: For complex goals, delegators should use a ‘contract-first’ approach, where tasks are decomposed until the sub-units match specific, automated verification capabilities, such as unit tests or formal proofs.

Transitive Accountability in Chains: In long delegation chains (e.g., 𝐴 → 𝐵 → 𝐶), responsibility is transitive. Agent B is responsible for the work of C, and Agent A must verify both B’s direct work and that B correctly verified C’s attestations.

Attenuated Security via Tokens: To prevent systemic breaches and the ‘confused deputy problem,’ agents should use Delegation Capability Tokens (DCTs) that provide attenuated authorization. This ensures agents operate under the principle of least privilege, with access restricted to specific subsets of resources and allowable operations.

Check out the Paper here. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
The post Google DeepMind Proposes New Framework for Intelligent AI Delegation to Secure the Emerging Agentic Web for Future Economies appeared first on MarkTechPost.

Alibaba Qwen Team Releases Qwen3.5-397B MoE Model with 17B Active Para …

Alibaba Cloud just updated the open-source landscape. Today, the Qwen team released Qwen3.5, the newest generation of their large language model (LLM) family. The most powerful version is Qwen3.5-397B-A17B. This model is a sparse Mixture-of-Experts (MoE) system. It combines massive reasoning power with high efficiency.

Qwen3.5 is a native vision-language model. It is designed specifically for AI agents. It can see, code, and reason across 201 languages.

https://qwen.ai/blog?id=qwen3.5

The Core Architecture: 397B Total, 17B Active

The technical specifications of Qwen3.5-397B-A17B are impressive. The model contains 397B total parameters. However, it uses a sparse MoE design. This means it only activates 17B parameters during any single forward pass.

This 17B activation count is the most important number for devs. It allows the model to provide the intelligence of a 400B model. But it runs with the speed of a much smaller model. The Qwen team reports a 8.6x to 19.0x increase in decoding throughput compared to previous generations. This efficiency solves the high cost of running large-scale AI.

https://qwen.ai/blog?id=qwen3.5

Efficient Hybrid Architecture: Gated Delta Networks

Qwen3.5 does not use a standard Transformer design. It uses an ‘Efficient Hybrid Architecture.’ Most LLMs rely only on Attention mechanisms. These can become slow with long text. Qwen3.5 combines Gated Delta Networks (linear attention) with Mixture-of-Experts (MoE).

The model consists of 60 layers. The hidden dimension size is 4,096. These layers follow a specific ‘Hidden Layout.’ The layout groups layers into sets of 4.

3 blocks use Gated DeltaNet-plus-MoE.

1 block uses Gated Attention-plus-MoE.

This pattern repeats 15 times to reach 60 layers.

Technical details include:

Gated DeltaNet: It uses 64 linear attention heads for Values (V). It uses 16 heads for Queries and Keys (QK).

MoE Structure: The model has 512 total experts. Each token activates 10 routed experts and 1 shared expert. This equals 11 active experts per token.

Vocabulary: The model uses a padded vocabulary of 248,320 tokens.

Native Multimodal Training: Early Fusion

Qwen3.5 is a native vision-language model. Many other models add vision capabilities later. Qwen3.5 used ‘Early Fusion’ training. This means the model learned from images and text at the same time.

The training used trillions of multimodal tokens. This makes Qwen3.5 better at visual reasoning than previous Qwen3-VL versions. It is highly capable of ‘agentic’ tasks. For example, it can look at a UI screenshot and generate the exact HTML and CSS code. It can also analyze long videos with second-level accuracy.

The model supports the Model Context Protocol (MCP). It also handles complex function-calling. These features are vital for building agents that control apps or browse the web. In the IFBench test, it scored 76.5. This score beats many proprietary models.

https://qwen.ai/blog?id=qwen3.5

Solving the Memory Wall: 1M Context Length

Long-form data processing is a core feature of Qwen3.5. The base model has a native context window of 262,144 (256K) tokens. The hosted Qwen3.5-Plus version goes even further. It supports 1M tokens.

Alibaba Qwen team used a new asynchronous Reinforcement Learning (RL) framework for this. It ensures the model stays accurate even at the end of a 1M token document. For Devs, this means you can feed an entire codebase into one prompt. You do not always need a complex Retrieval-Augmented Generation (RAG) system.

Performance and Benchmarks

The model excels in technical fields. It achieved high scores on Humanity’s Last Exam (HLE-Verified). This is a difficult benchmark for AI knowledge.

Coding: It shows parity with top-tier closed-source models.

Math: The model uses ‘Adaptive Tool Use.’ It can write Python code to solve math problems. It then runs the code to verify the answer.

Languages: It supports 201 different languages and dialects. This is a big jump from the 119 languages in the previous version.

Key Takeaways

Hybrid Efficiency (MoE + Gated Delta Networks): Qwen3.5 uses a 3:1 ratio of Gated Delta Networks (linear attention) to standard Gated Attention blocks across 60 layers. This hybrid design allows for an 8.6x to 19.0x increase in decoding throughput compared to previous generations.

Massive Scale, Low Footprint: The Qwen3.5-397B-A17B features 397B total parameters but only activates 17B per token. You get 400B-class intelligence with the inference speed and memory requirements of a much smaller model.

Native Multimodal Foundation: Unlike ‘bolted-on’ vision models, Qwen3.5 was trained via Early Fusion on trillions of text and image tokens simultaneously. This makes it a top-tier visual agent, scoring 76.5 on IFBench for following complex instructions in visual contexts.

1M Token Context: While the base model supports a native 256k token context, the hosted Qwen3.5-Plus handles up to 1M tokens. This massive window allows devs to process entire codebases or 2-hour videos without needing complex RAG pipelines.

Check out the Technical details, Model Weights and GitHub Repo. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
The post Alibaba Qwen Team Releases Qwen3.5-397B MoE Model with 17B Active Parameters and 1M Token Context for AI agents appeared first on MarkTechPost.

Meet ‘Kani-TTS-2’: A 400M Param Open Source Text-to-Speech Model t …

The landscape of generative audio is shifting toward efficiency. A new open-source contender, Kani-TTS-2, has been released by the team at nineninesix.ai. This model marks a departure from heavy, compute-expensive TTS systems. Instead, it treats audio as a language, delivering high-fidelity speech synthesis with a remarkably small footprint.

Kani-TTS-2 offers a lean, high-performance alternative to closed-source APIs. It is currently available on Hugging Face in both English (EN) and Portuguese (PT) versions.

The Architecture: LFM2 and NanoCodec

Kani-TTS-2 follows the ‘Audio-as-Language‘ philosophy. The model does not use traditional mel-spectrogram pipelines. Instead, it converts raw audio into discrete tokens using a neural codec.

The system relies on a two-stage process:

The Language Backbone: The model is built on LiquidAI’s LFM2 (350M) architecture. This backbone generates ‘audio intent’ by predicting the next audio tokens. Because LFM (Liquid Foundation Models) are designed for efficiency, they provide a faster alternative to standard transformers.

The Neural Codec: It uses the NVIDIA NanoCodec to turn those tokens into 22kHz waveforms.

By using this architecture, the model captures human-like prosody—the rhythm and intonation of speech—without the ‘robotic’ artifacts found in older TTS systems.

Efficiency: 10,000 Hours in 6 Hours

The training metrics for Kani-TTS-2 are a masterclass in optimization. The English model was trained on 10,000 hours of high-quality speech data.

While that scale is impressive, the speed of training is the real story. The research team trained the model in only 6 hours using a cluster of 8 NVIDIA H100 GPUs. This proves that massive datasets no longer require weeks of compute time when paired with efficient architectures like LFM2.

Zero-Shot Voice Cloning and Performance

The standout feature for developers is zero-shot voice cloning. Unlike traditional models that require fine-tuning for new voices, Kani-TTS-2 uses speaker embeddings.

How it works: You provide a short reference audio clip.

The result: The model extracts the unique characteristics of that voice and applies them to the generated text instantly.

From a deployment perspective, the model is highly accessible:

Parameter Count: 400M (0.4B) parameters.

Speed: It features a Real-Time Factor (RTF) of 0.2. This means it can generate 10 seconds of speech in roughly 2 seconds.

Hardware: It requires only 3GB of VRAM, making it compatible with consumer-grade GPUs like the RTX 3060 or 4050.

License: Released under the Apache 2.0 license, allowing for commercial use.

Key Takeaways

Efficient Architecture: The model uses a 400M parameter backbone based on LiquidAI’s LFM2 (350M). This ‘Audio-as-Language’ approach treats speech as discrete tokens, allowing for faster processing and more human-like intonation compared to traditional architectures.

Rapid Training at Scale: Kani-TTS-2-EN was trained on 10,000 hours of high-quality speech data in just 6 hours using 8 NVIDIA H100 GPUs.

Instant Zero-Shot Cloning: There is no need for fine-tuning to replicate a specific voice. By providing a short reference audio clip, the model uses speaker embeddings to instantly synthesize text in the target speaker’s voice.

High Performance on Edge Hardware: With a Real-Time Factor (RTF) of 0.2, the model can generate 10 seconds of audio in approximately 2 seconds. It requires only 3GB of VRAM, making it fully functional on consumer-grade GPUs like the RTX 3060.

Developer-Friendly Licensing: Released under the Apache 2.0 license, Kani-TTS-2 is ready for commercial integration. It offers a local-first, low-latency alternative to expensive closed-source TTS APIs.

Check out the Model Weight. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
The post Meet ‘Kani-TTS-2’: A 400M Param Open Source Text-to-Speech Model that Runs in 3GB VRAM with Voice Cloning Support appeared first on MarkTechPost.

Moonshot AI Launches Kimi Claw: Native OpenClaw on Kimi.com with 5,000 …

Moonshot AI has officially brought the power of OpenClaw framework directly to the browser. The newly rebranded Kimi Claw is now native to kimi.com, providing developers and data scientists with a persistent, 24/7 AI agent environment.

This update moves the project from a local setup to a cloud-native powerhouse. This means the infrastructure for complex agents is now fully managed and ready to scale.

ClawHub: A Global Skill Registry

The core of Kimi Claw’s versatility is ClawHub. This library features over 5,000 community-contributed skills.

Modular Architecture: Each ‘skill’ is a functional extension that allows the AI to interact with external tools.

Instant Orchestration: Developers can discover, call, and chain these skills within the kimi.com interface.

No-Code Integration: Instead of writing custom API wrappers, engineers can leverage existing skills to connect their agents to third-party services immediately.

40GB Cloud Storage for Data Workflows

Data scientists often face memory limits in standard chat interfaces. Kimi Claw addresses this by providing 40GB of dedicated cloud storage.

Persistent Context: Store large datasets, technical documentation, and code repositories directly in your tab.

RAG Ready: This space facilitates high-volume Retrieval-Augmented Generation (RAG), allowing the model to ground its responses in your specific files across sessions.

Large-Scale File Management: The 40GB limit enables the AI to handle complex, data-heavy projects that were previously restricted to local environments.

Pro-Grade Search with Real-Time Data

To solve the knowledge cutoff problem, Kimi Claw integrates Pro-Grade Search. This feature allows the agent to fetch live, high-quality data from sources like Yahoo Finance.

Structured Data Fetching: The AI does not just browse the web; it retrieves specific data points to inform its reasoning.

Grounding: By pulling live financial or technical data, the agent significantly reduces hallucinations and provides up-to-the-minute accuracy for time-sensitive tasks.

‘Bring Your Own Claw’ (BYOC) & Multi-App Bridging

For devs who already have a custom setup, Kimi Claw offers a ‘Bring Your Own Claw’ (BYOC) feature.

Hybrid Connectivity: Connect your third-party OpenClaw to kimi.com to maintain control over your local configuration while using the native cloud interface.

Telegram Integration: You can bridge your AI setup to messaging apps like Telegram. This allows your agent to participate in group chats, execute skills, and provide automated updates outside of the browser.

Automation Pipelines: With 24/7 uptime, these bridged agents can monitor workflows and trigger notifications autonomously.

Kimi Claw simplifies the process of building and deploying agents. By combining a massive skill library with significant storage and real-time data access, Moonshot AI is turning the browser tab into a professional-grade development environment.

Key Takeaways

Native Cloud Integration: Kimi Claw is now officially native to kimi.com, providing a persistent, 24/7 environment that lives in your browser tab and eliminates the need for local hardware management.

Extensive Skill Ecosystem: Developers can access ClawHub, a library of 5,000+ community skills, allowing for the instant discovery and chaining of pre-built functions into complex agentic workflows.

High-Capacity Storage: The platform provides 40GB of cloud storage, enabling data scientists to manage large datasets and maintain deep context for RAG (Retrieval-Augmented Generation) operations.

Live Financial Grounding: Through Pro-Grade Search, the AI can fetch real-time, high-quality data from sources like Yahoo Finance, reducing hallucinations and providing accurate market information.

Flexible Connectivity (BYOC): The ‘Bring Your Own Claw’ feature allows engineers to connect third-party OpenClaw setups or bridge their AI agents to external platforms like Telegram group chats.

Check out the Technical details and Try it here. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
The post Moonshot AI Launches Kimi Claw: Native OpenClaw on Kimi.com with 5,000 Community Skills and 40GB Cloud Storage Now appeared first on MarkTechPost.

Getting Started with OpenClaw and Connecting It with WhatsApp

OpenClaw is a self-hosted personal AI assistant that runs on your own devices and communicates through the apps you already use—such as WhatsApp, Telegram, Slack, Discord, and more. It can answer questions, automate tasks, interact with your files and services, and even speak or listen on supported devices, all while keeping you in control of your data.

Rather than being just another chatbot, OpenClaw acts as a true personal assistant that fits into your daily workflow. In just a few months, this open-source project has surged in popularity, crossing 150,000+ stars on GitHub. In this article, we’ll walk through how to get started with OpenClaw and connect it to WhatsApp.

What can OpenClaw do?

OpenClaw is built to fit seamlessly into your existing digital life. It connects with 50+ integrations, letting you chat with your assistant from apps like WhatsApp, Telegram, Slack, or Discord, while controlling and automating tasks from your desktop. You can use cloud or local AI models of your choice, manage notes and tasks, control music and smart home devices, trigger automations, and even interact with files, browsers, and APIs—all from a single assistant you own.

Beyond chat, OpenClaw acts as a powerful automation and productivity hub. It works with popular tools like Notion, Obsidian, GitHub, Spotify, Gmail, and Home Assistants, supports voice interaction and a live visual Canvas, and runs across macOS, Windows, Linux, iOS, and Android. Whether you’re scheduling tasks, controlling devices, generating content, or automating workflows, OpenClaw brings everything together under one private, extensible AI assistant.

Installing OpenClaw

You can head over to openclaw.ai to access the code and follow the quick start guide. OpenClaw supports macOS, Windows, and Linux, and provides a simple one-liner that installs Node.js along with all required dependencies for you:

Copy CodeCopiedUse a different Browsercurl -fsSL https://openclaw.ai/install.cmd -o install.cmd && install.cmd && del install.cmd

After running the command, OpenClaw will guide you through an onboarding process. During setup, you’ll see security-related warnings explaining that the assistant can access local files and execute actions. This is expected behavior—since OpenClaw is designed to act autonomously, it also highlights the importance of staying cautious about prompts and permissions.

Configuring the LLM 

Once the setup is complete, the next step is to choose an LLM provider. OpenClaw supports multiple providers, including OpenAI, Google, Anthropic, Minimax, and others.

After selecting your provider, you’ll be prompted to enter the corresponding API key. Once the key is verified, you can choose the specific model you want to use. In this setup, we’ll be using GPT-5.1.

Adding Skills

During configuration, OpenClaw also lets you add skills, which define what the agent can do beyond basic conversation. OpenClaw uses AgentSkills-compatible skill folders to teach the assistant how to work with different tools and services.

Each skill lives in its own directory and includes a SKILL.md file with YAML frontmatter and usage instructions. By default, OpenClaw loads bundled skills and any local overrides, then filters them at startup based on your environment, configuration, and available binaries.

OpenClaw also supports ClawHub, a lightweight skill registry. When enabled, the agent can automatically search for relevant skills and install them on demand.

Another popular option is https://skills.sh/. You can simply search for the skill you need, copy the provided command, and ask the agent to run it. Once executed, the new skill is added and immediately available to OpenClaw.

Configuring the Chat Channel

The final step is to configure the channel where you want to run the agent. In this walkthrough, we’ll use WhatsApp. During setup, OpenClaw will ask for your phone number and then display a QR code. Scanning this QR code links your WhatsApp account to OpenClaw.

Once connected, you can message OpenClaw from WhatsApp—or any other supported chat app—and it will respond directly in the same conversation.

Once the setup is complete, OpenClaw will open a local web page in your browser with a unique gateway token. Make sure to keep this token safe and handy, as it will be required later.

Running OpenClaw Gateway

Next, we’ll start the OpenClaw Gateway, which acts as the control plane for OpenClaw. The Gateway runs a WebSocket server that manages channels, nodes, sessions, and hooks.

To start the Gateway, run the following command:

Copy CodeCopiedUse a different Browseropenclaw gateway

Once the Gateway is running, refresh the earlier local web page that displayed the token. This will open the OpenClaw Gateway dashboard.

From the dashboard, navigate to the Overview section and enter the Gateway token you saved earlier to complete the connection.

Once this is done, you can start using OpenClaw either from the chat interface in the Gateway dashboard or by messaging the bot directly on WhatsApp.

Note that OpenClaw responds to messages sent to yourself on WhatsApp, so make sure you’re chatting with your own number when testing the setup.

The post Getting Started with OpenClaw and Connecting It with WhatsApp appeared first on MarkTechPost.

How to Build a Self-Organizing Agent Memory System for Long-Term AI Re …

In this tutorial, we build a self-organizing memory system for an agent that goes beyond storing raw conversation history and instead structures interactions into persistent, meaningful knowledge units. We design the system so that reasoning and memory management are clearly separated, allowing a dedicated component to extract, compress, and organize information. At the same time, the main agent focuses on responding to the user. We use structured storage with SQLite, scene-based grouping, and summary consolidation, and we show how an agent can maintain useful context over long horizons without relying on opaque vector-only retrieval.

Copy CodeCopiedUse a different Browserimport sqlite3
import json
import re
from datetime import datetime
from typing import List, Dict
from getpass import getpass
from openai import OpenAI

OPENAI_API_KEY = getpass(“Enter your OpenAI API key: “).strip()
client = OpenAI(api_key=OPENAI_API_KEY)

def llm(prompt, temperature=0.1, max_tokens=500):
return client.chat.completions.create(
model=”gpt-4o-mini”,
messages=[{“role”: “user”, “content”: prompt}],
temperature=temperature,
max_tokens=max_tokens
).choices[0].message.content.strip()

We set up the core runtime by importing all required libraries and securely collecting the API key at execution time. We initialize the language model client and define a single helper function that standardizes all model calls. We ensure that every downstream component relies on this shared interface for consistent generation behavior.

Copy CodeCopiedUse a different Browserclass MemoryDB:
def __init__(self):
self.db = sqlite3.connect(“:memory:”)
self.db.row_factory = sqlite3.Row
self._init_schema()

def _init_schema(self):
self.db.execute(“””
CREATE TABLE mem_cells (
id INTEGER PRIMARY KEY,
scene TEXT,
cell_type TEXT,
salience REAL,
content TEXT,
created_at TEXT
)
“””)

self.db.execute(“””
CREATE TABLE mem_scenes (
scene TEXT PRIMARY KEY,
summary TEXT,
updated_at TEXT
)
“””)

self.db.execute(“””
CREATE VIRTUAL TABLE mem_cells_fts
USING fts5(content, scene, cell_type)
“””)

def insert_cell(self, cell):
self.db.execute(
“INSERT INTO mem_cells VALUES(NULL,?,?,?,?,?)”,
(
cell[“scene”],
cell[“cell_type”],
cell[“salience”],
json.dumps(cell[“content”]),
datetime.utcnow().isoformat()
)
)
self.db.execute(
“INSERT INTO mem_cells_fts VALUES(?,?,?)”,
(
json.dumps(cell[“content”]),
cell[“scene”],
cell[“cell_type”]
)
)
self.db.commit()

We define a structured memory database that persists information across interactions. We create tables for atomic memory units, higher-level scenes, and a full-text search index to enable symbolic retrieval. We also implement the logic to insert new memory entries in a normalized and queryable form.

Copy CodeCopiedUse a different Browser def get_scene(self, scene):
return self.db.execute(
“SELECT * FROM mem_scenes WHERE scene=?”, (scene,)
).fetchone()

def upsert_scene(self, scene, summary):
self.db.execute(“””
INSERT INTO mem_scenes VALUES(?,?,?)
ON CONFLICT(scene) DO UPDATE SET
summary=excluded.summary,
updated_at=excluded.updated_at
“””, (scene, summary, datetime.utcnow().isoformat()))
self.db.commit()

def retrieve_scene_context(self, query, limit=6):
tokens = re.findall(r”[a-zA-Z0-9]+”, query)
if not tokens:
return []

fts_query = ” OR “.join(tokens)

rows = self.db.execute(“””
SELECT scene, content FROM mem_cells_fts
WHERE mem_cells_fts MATCH ?
LIMIT ?
“””, (fts_query, limit)).fetchall()

if not rows:
rows = self.db.execute(“””
SELECT scene, content FROM mem_cells
ORDER BY salience DESC
LIMIT ?
“””, (limit,)).fetchall()

return rows

def retrieve_scene_summary(self, scene):
row = self.get_scene(scene)
return row[“summary”] if row else “”

We focus on memory retrieval and scene maintenance logic. We implement safe full-text search by sanitizing user queries and adding a fallback strategy when no lexical matches are found. We also expose helper methods to fetch consolidated scene summaries for long-horizon context building.

Copy CodeCopiedUse a different Browserclass MemoryManager:
def __init__(self, db: MemoryDB):
self.db = db

def extract_cells(self, user, assistant) -> List[Dict]:
prompt = f”””
Convert this interaction into structured memory cells.

Return JSON array with objects containing:
– scene
– cell_type (fact, plan, preference, decision, task, risk)
– salience (0-1)
– content (compressed, factual)

User: {user}
Assistant: {assistant}
“””
raw = llm(prompt)
raw = re.sub(r”“`json|“`”, “”, raw)

try:
cells = json.loads(raw)
return cells if isinstance(cells, list) else []
except Exception:
return []

def consolidate_scene(self, scene):
rows = self.db.db.execute(
“SELECT content FROM mem_cells WHERE scene=? ORDER BY salience DESC”,
(scene,)
).fetchall()

if not rows:
return

cells = [json.loads(r[“content”]) for r in rows]

prompt = f”””
Summarize this memory scene in under 100 words.
Keep it stable and reusable for future reasoning.

Cells:
{cells}
“””
summary = llm(prompt, temperature=0.05)
self.db.upsert_scene(scene, summary)

def update(self, user, assistant):
cells = self.extract_cells(user, assistant)

for cell in cells:
self.db.insert_cell(cell)

for scene in set(c[“scene”] for c in cells):
self.consolidate_scene(scene)

We implement the dedicated memory management component responsible for structuring experience. We extract compact memory representations from interactions, store them, and periodically consolidate them into stable scene summaries. We ensure that memory evolves incrementally without interfering with the agent’s response flow.

Copy CodeCopiedUse a different Browserclass WorkerAgent:
def __init__(self, db: MemoryDB, mem_manager: MemoryManager):
self.db = db
self.mem_manager = mem_manager

def answer(self, user_input):
recalled = self.db.retrieve_scene_context(user_input)
scenes = set(r[“scene”] for r in recalled)

summaries = “n”.join(
f”[{scene}]n{self.db.retrieve_scene_summary(scene)}”
for scene in scenes
)

prompt = f”””
You are an intelligent agent with long-term memory.

Relevant memory:
{summaries}

User: {user_input}
“””
assistant_reply = llm(prompt)
self.mem_manager.update(user_input, assistant_reply)
return assistant_reply

db = MemoryDB()
memory_manager = MemoryManager(db)
agent = WorkerAgent(db, memory_manager)

print(agent.answer(“We are building an agent that remembers projects long term.”))
print(agent.answer(“It should organize conversations into topics automatically.”))
print(agent.answer(“This memory system should support future reasoning.”))

for row in db.db.execute(“SELECT * FROM mem_scenes”):
print(dict(row))

We define the worker agent that performs reasoning while remaining memory-aware. We retrieve relevant scenes, assemble contextual summaries, and generate responses grounded in long-term knowledge. We then close the loop by passing the interaction back to the memory manager so the system continuously improves over time.

In this tutorial, we demonstrated how an agent can actively curate its own memory and turn past interactions into stable, reusable knowledge rather than ephemeral chat logs. We enabled memory to evolve through consolidation and selective recall, which supports more consistent and grounded reasoning across sessions. This approach provides a practical foundation for building long-lived agentic systems, and it can be naturally extended with mechanisms for forgetting, richer relational memory, or graph-based orchestration as the system grows in complexity.

Check out the Full Codes. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
The post How to Build a Self-Organizing Agent Memory System for Long-Term AI Reasoning  appeared first on MarkTechPost.

Google AI Introduces the WebMCP to Enable Direct and Structured Websit …

Google is officially turning Chrome into a playground for AI agents. For years, AI ‘browsers’ have relied on a messy process: taking screenshots of websites, running them through vision models, and guessing where to click. This method is slow, breaks easily, and consumes massive amounts of compute.

Google has introduced a better way: the Web Model Context Protocol (WebMCP). Announced alongside the Early Preview Program (EPP), this protocol allows websites to communicate directly to AI models. Instead of the AI ‘guessing’ how to use a site, the site tells the AI exactly what tools are available.

The End of Screen Scraping

Current AI agents treat the web like a picture. They ‘look’ at the UI and try to find the ‘Submit’ button. If the button moves 5 pixels, the agent might fail.

WebMCP replaces this guesswork with structured data. It turns a website into a set of capabilities. For developers, this means you no longer have to worry about an AI breaking your frontend. You simply define what the AI can do, and Chrome handles the communication.

How WebMCP Works: 2 Integration Paths

AI Devs can choose between 2 ways to make a site ‘agent-ready.’

1. The Declarative Approach (HTML)

This is the simplest method for web developers. You can expose a website’s functions by adding new attributes to your standard HTML.

Attributes: Use toolname and tooldescription inside your <form> tags.

The Benefit: Chrome automatically reads these tags and creates a schema for the AI. If you have a ‘Book Flight’ form, the AI sees it as a structured tool with specific inputs.

Event Handling: When an AI fills the form, it triggers a SubmitEvent.agentInvoked. This allows your backend to know a machine—not a human—is making the request.

2. The Imperative Approach (JavaScript)

For complex apps, the Imperative API provides deeper control. This allows for multi-step workflows that a simple form cannot handle.

The Method: Use navigator.modelContext.registerTool().

The Logic: You define a tool name, a description, and a JSON schema for inputs.

Real-time Execution: When the AI agent wants to ‘Add to Cart,’ it calls your registered JavaScript function. This happens within the user’s current session, meaning the AI doesn’t need to re-login or bypass security headers.

Why the Early Preview Program (EPP) Matters

Google is not releasing this to everyone at once. They are using the Early Preview Program (EPP) to gather data from 1st-movers. Developers who join the EPP get early access to Chrome 146 features.

This is a critical phase for data scientists. By testing in the EPP, you can see how different Large Language Models (LLMs) interpret your tool descriptions. If a description is too vague, the model might hallucinate. The EPP allows engineers to fine-tune these descriptions before the protocol becomes a global standard.

Performance and Efficiency

The technical shift here is massive. Moving from vision-based browsing to WebMCP-based interaction offers 3 key improvements:

Lower Latency: No more waiting for screenshots to upload and be processed by a vision model.

Higher Accuracy: Models interact with structured JSON data, which reduces errors to nearly 0%.

Reduced Costs: Sending text-based schemas is much cheaper than sending high-resolution images to an LLM.

The Technical Stack: navigator.modelContext

For AI devs, the core aspect of this update lives in the new modelContext object. Here is the breakdown of the 4 primary methods:

MethodPurposeregisterTool()Makes a function visible to the AI agent.unregisterTool()Removes a function from the AI’s reach.provideContext()Sends extra metadata (like user preferences) to the agent.clearContext()Wipes the shared data to ensure privacy.

Security First

A common concern for software engineers is security. WebMCP is designed as a ‘permission-first’ protocol. The AI agent cannot execute a tool without the browser acting as a mediator. In many cases, Chrome will prompt the user to ‘Allow AI to book this flight?’ before the final action is taken. This keeps the user in control while allowing the agent to do the heavy lifting.

Key Takeaways

Standardizing the ‘Agentic Web’: The Web Model Context Protocol (WebMCP) is a new standard that allows AI agents to interact with websites as structured toolkits rather than just ‘looking’ at pixels. This replaces slow, error-prone screen scraping with direct, reliable communication.

Dual Integration Paths: Developers can make sites ‘AI-ready’ via two methods: a Declarative API (using simple HTML attributes like toolname in forms) or an Imperative API (using JavaScript’s navigator.modelContext.registerTool() for complex, multi-step workflows).

Massive Efficiency Gains: By using structured JSON schemas instead of vision-based processing (screenshots), WebMCP leads to a 67% reduction in computational overhead and pushes task accuracy to approximately 98%.

Built-in Security and Privacy: The protocol is ‘permission-first.’ The browser acts as a secure proxy, requiring user confirmation before an AI agent can execute sensitive tools. It also includes methods like clearContext() to wipe shared session data.

Early Access via EPP: The Early Preview Program (EPP) allows software engineers and data scientists to test these features in Chrome 146.

Check out the Technical details. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
The post Google AI Introduces the WebMCP to Enable Direct and Structured Website Interactions for New AI Agents appeared first on MarkTechPost.

Exa AI Introduces Exa Instant: A Sub-200ms Neural Search Engine Design …

In the world of Large Language Models (LLMs), speed is the only feature that matters once accuracy is solved. For a human, waiting 1 second for a search result is fine. For an AI agent performing 10 sequential searches to solve a complex task, a 1-second delay per search creates a 10-second lag. This latency kills the user experience.

Exa, the search engine startup formerly known as Metaphor, just released Exa Instant. It is a search model designed to provide the world’s web data to AI agents in under 200ms. For software engineers and data scientists building Retrieval-Augmented Generation (RAG) pipelines, this removes the biggest bottleneck in agentic workflows.

https://exa.ai/blog/exa-instant

Why Latency is the Enemy of RAG

When you build a RAG application, your system follows a loop: the user asks a question, your system searches the web for context, and the LLM processes that context. If the search step takes 700ms to 1000ms, the total ‘time to first token’ becomes sluggish.

Exa Instant delivers results with a latency between 100ms and 200ms. In tests conducted from the us-west-1 (northern california) region, the network latency was roughly 50ms. This speed allows agents to perform multiple searches in a single ‘thought’ process without the user feeling a delay.

No More ‘Wrapping’ Google

Most search APIs available today are ‘wrappers.’ They send a query to a traditional search engine like Google or Bing, scrape the results, and send them back to you. This adds layers of overhead.

Exa Instant is different. It is built on a proprietary, end-to-end neural search and retrieval stack. Instead of matching keywords, Exa uses embeddings and transformers to understand the meaning of a query. This neural approach ensures the results are relevant to the AI’s intent, not just the specific words used. By owning the entire stack from the crawler to the inference engine, Exa can optimize for speed in ways that ‘wrapper’ APIs cannot.

Benchmarking the Speed

The Exa team benchmarked Exa Instant against other popular options like Tavily Ultra Fast and Brave. To ensure the tests were fair and avoided ‘cached’ results, the team used the SealQA query dataset. They also added random words generated by GPT-5 to each query to force the engine to perform a fresh search every time.

The results showed that Exa Instant is up to 15x faster than competitors. While Exa offers other models like Exa Fast and Exa Auto for higher-quality reasoning, Exa Instant is the clear choice for real-time applications where every millisecond counts.

Pricing and Developer Integration

The transition to Exa Instant is simple. The API is accessible through the dashboard.exa.ai platform.

Cost: Exa Instant is priced at $5 per 1,000 requests.

Capacity: It searches the same massive index of the web as Exa’s more powerful models.

Accuracy: While designed for speed, it maintains high relevance. For specialized entity searches, Exa’s Websets product remains the gold standard, proving to be 20x more correct than Google for complex queries.

The API returns clean content ready for LLMs, removing the need for developers to write custom scraping or HTML cleaning code.

Key Takeaways

Sub-200ms Latency for Real-Time Agents: Exa Instant is optimized for ‘agentic’ workflows where speed is a bottleneck. By delivering results in under 200ms (and network latency as low as 50ms), it allows AI agents to perform multi-step reasoning and parallel searches without the lag associated with traditional search engines.

Proprietary Neural Stack vs. ‘Wrappers‘: Unlike many search APIs that simply ‘wrap’ Google or Bing (adding 700ms+ of overhead), Exa Instant is built on a proprietary, end-to-end neural search engine. It uses a custom transformer-based architecture to index and retrieve web data, offering up to 15x faster performance than existing alternatives like Tavily or Brave.

Cost-Efficient Scaling: The model is designed to make search a ‘primitive’ rather than an expensive luxury. It is priced at $5 per 1,000 requests, allowing developers to integrate real-time web lookups at every step of an agent’s thought process without breaking the budget.

Semantic Intent over Keywords: Exa Instant leverages embeddings to prioritize the ‘meaning’ of a query rather than exact word matches. This is particularly effective for RAG (Retrieval-Augmented Generation) applications, where finding ‘link-worthy’ content that fits an LLM’s context is more valuable than simple keyword hits.

Optimized for LLM Consumption: The API provides more than just URLs; it offers clean, parsed HTML, Markdown, and token-efficient highlights. This reduces the need for custom scraping scripts and minimizes the number of tokens the LLM needs to process, further speeding up the entire pipeline.

Check out the Technical details. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
The post Exa AI Introduces Exa Instant: A Sub-200ms Neural Search Engine Designed to Eliminate Bottlenecks for Real-Time Agentic Workflows appeared first on MarkTechPost.

[In-Depth Guide] The Complete CTGAN + SDV Pipeline for High-Fidelity S …

In this tutorial, we build a complete, production-grade synthetic data pipeline using CTGAN and the SDV ecosystem. We start from raw mixed-type tabular data and progressively move toward constrained generation, conditional sampling, statistical validation, and downstream utility testing. Rather than stopping at sample generation, we focus on understanding how well synthetic data preserves structure, distributions, and predictive signal. This tutorial demonstrates how CTGAN can be used responsibly and rigorously in real-world data science workflows.

Copy CodeCopiedUse a different Browser!pip -q install “ctgan” “sdv” “sdmetrics” “scikit-learn” “pandas” “numpy” “matplotlib”

import numpy as np
import pandas as pd
import warnings
warnings.filterwarnings(“ignore”)

import ctgan, sdv, sdmetrics
from ctgan import load_demo, CTGAN

from sdv.metadata import SingleTableMetadata
from sdv.single_table import CTGANSynthesizer

from sdv.cag import Inequality, FixedCombinations
from sdv.sampling import Condition

from sdmetrics.reports.single_table import DiagnosticReport, QualityReport

from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline

import matplotlib.pyplot as plt

print(“Versions:”)
print(“ctgan:”, ctgan.__version__)
print(“sdv:”, sdv.__version__)
print(“sdmetrics:”, sdmetrics.__version__)

We set up the environment by installing all required libraries and importing the full dependency stack. We explicitly load CTGAN, SDV, SDMetrics, and downstream ML tooling to ensure compatibility across the pipeline. We also surface library versions to make the experiment reproducible and debuggable.

Copy CodeCopiedUse a different Browserreal = load_demo().copy()
real.columns = [c.strip().replace(” “, “_”) for c in real.columns]

target_col = “income”
real[target_col] = real[target_col].astype(str)

categorical_cols = real.select_dtypes(include=[“object”]).columns.tolist()
numerical_cols = [c for c in real.columns if c not in categorical_cols]

print(“Rows:”, len(real), “Cols:”, len(real.columns))
print(“Categorical:”, len(categorical_cols), “Numerical:”, len(numerical_cols))
display(real.head())

ctgan_model = CTGAN(
epochs=30,
batch_size=500,
verbose=True
)
ctgan_model.fit(real, discrete_columns=categorical_cols)
synthetic_ctgan = ctgan_model.sample(5000)
print(“Standalone CTGAN sample:”)
display(synthetic_ctgan.head())

We load the CTGAN Adult demo dataset and perform minimal normalization on column names and data types. We explicitly identify categorical and numerical columns, which is critical for both CTGAN training and evaluation. We then train a baseline standalone CTGAN model and generate synthetic samples for comparison.

Copy CodeCopiedUse a different Browsermetadata = SingleTableMetadata()
metadata.detect_from_dataframe(data=real)
metadata.update_column(column_name=target_col, sdtype=”categorical”)

constraints = []

if len(numerical_cols) >= 2:
col_lo, col_hi = numerical_cols[0], numerical_cols[1]
constraints.append(Inequality(low_column_name=col_lo, high_column_name=col_hi))
print(f”Added Inequality constraint: {col_hi} > {col_lo}”)

if len(categorical_cols) >= 2:
c1, c2 = categorical_cols[0], categorical_cols[1]
constraints.append(FixedCombinations(column_names=[c1, c2]))
print(f”Added FixedCombinations constraint on: [{c1}, {c2}]”)

synth = CTGANSynthesizer(
metadata=metadata,
epochs=30,
batch_size=500
)

if constraints:
synth.add_constraints(constraints)

synth.fit(real)

synthetic_sdv = synth.sample(num_rows=5000)
print(“SDV CTGANSynthesizer sample:”)
display(synthetic_sdv.head())

We construct a formal metadata object and attach explicit semantic types to the dataset. We introduce structural constraints using SDV’s constraint graph system, enforcing numeric inequalities and validity of categorical combinations. We then train a CTGAN-based SDV synthesizer that respects these constraints during generation.

Copy CodeCopiedUse a different Browserloss_df = synth.get_loss_values()
display(loss_df.tail())

x_candidates = [“epoch”, “step”, “steps”, “iteration”, “iter”, “batch”, “update”]
xcol = next((c for c in x_candidates if c in loss_df.columns), None)

g_candidates = [“generator_loss”, “gen_loss”, “g_loss”]
d_candidates = [“discriminator_loss”, “disc_loss”, “d_loss”]
gcol = next((c for c in g_candidates if c in loss_df.columns), None)
dcol = next((c for c in d_candidates if c in loss_df.columns), None)

plt.figure(figsize=(10,4))

if xcol is None:
x = np.arange(len(loss_df))
else:
x = loss_df[xcol].to_numpy()

if gcol is not None:
plt.plot(x, loss_df[gcol].to_numpy(), label=gcol)
if dcol is not None:
plt.plot(x, loss_df[dcol].to_numpy(), label=dcol)

plt.xlabel(xcol if xcol is not None else “index”)
plt.ylabel(“loss”)
plt.legend()
plt.title(“CTGAN training losses (SDV wrapper)”)
plt.show()

cond_col = categorical_cols[0]
common_value = real[cond_col].value_counts().index[0]
conditions = [Condition({cond_col: common_value}, num_rows=2000)]

synthetic_cond = synth.sample_from_conditions(
conditions=conditions,
max_tries_per_batch=200,
batch_size=5000
)

print(“Conditional sampling requested:”, 2000, “got:”, len(synthetic_cond))
print(“Conditional sample distribution (top 5):”)
print(synthetic_cond[cond_col].value_counts().head(5))
display(synthetic_cond.head())

We extract and visualize the dynamics of generator and discriminator losses using a version-robust plotting strategy. We perform conditional sampling to generate data under specific attribute constraints and verify that the conditions are satisfied. This demonstrates how CTGAN behaves under guided generation scenarios.

Copy CodeCopiedUse a different Browsermetadata_dict = metadata.to_dict()

diagnostic = DiagnosticReport()
diagnostic.generate(real_data=real, synthetic_data=synthetic_sdv, metadata=metadata_dict, verbose=True)
print(“Diagnostic score:”, diagnostic.get_score())

quality = QualityReport()
quality.generate(real_data=real, synthetic_data=synthetic_sdv, metadata=metadata_dict, verbose=True)
print(“Quality score:”, quality.get_score())

def show_report_details(report, title):
print(f”n===== {title} details =====”)
props = report.get_properties()
for p in props:
print(f”n— {p} —“)
details = report.get_details(property_name=p)
try:
display(details.head(10))
except Exception:
display(details)

show_report_details(diagnostic, “DiagnosticReport”)
show_report_details(quality, “QualityReport”)

train_real, test_real = train_test_split(
real, test_size=0.25, random_state=42, stratify=real[target_col]
)

def make_pipeline(cat_cols, num_cols):
pre = ColumnTransformer(
transformers=[
(“cat”, OneHotEncoder(handle_unknown=”ignore”), cat_cols),
(“num”, “passthrough”, num_cols),
],
remainder=”drop”
)
clf = LogisticRegression(max_iter=200)
return Pipeline([(“pre”, pre), (“clf”, clf)])

pipe_syn = make_pipeline(categorical_cols, numerical_cols)
pipe_syn.fit(synthetic_sdv.drop(columns=[target_col]), synthetic_sdv[target_col])

proba_syn = pipe_syn.predict_proba(test_real.drop(columns=[target_col]))[:, 1]
y_true = (test_real[target_col].astype(str).str.contains(“>”)).astype(int)
auc_syn = roc_auc_score(y_true, proba_syn)
print(“Synthetic-train -> Real-test AUC:”, auc_syn)

pipe_real = make_pipeline(categorical_cols, numerical_cols)
pipe_real.fit(train_real.drop(columns=[target_col]), train_real[target_col])

proba_real = pipe_real.predict_proba(test_real.drop(columns=[target_col]))[:, 1]
auc_real = roc_auc_score(y_true, proba_real)
print(“Real-train -> Real-test AUC:”, auc_real)

model_path = “ctgan_sdv_synth.pkl”
synth.save(model_path)
print(“Saved synthesizer to:”, model_path)

from sdv.utils import load_synthesizer
synth_loaded = load_synthesizer(model_path)

synthetic_loaded = synth_loaded.sample(1000)
print(“Loaded synthesizer sample:”)
display(synthetic_loaded.head())

We evaluate synthetic data using SDMetrics diagnostic and quality reports and a property-level inspection. We validate downstream usefulness by training a classifier on synthetic data and testing it on real data. Finally, we serialize the trained synthesizer and confirm that it can be reloaded and sampled reliably.

In conclusion, we demonstrated that synthetic data generation with CTGAN becomes significantly more powerful when paired with metadata, constraints, and rigorous evaluation. By validating both statistical similarity and downstream task performance, we ensured that the synthetic data is not only realistic but also useful. This pipeline serves as a strong foundation for privacy-preserving analytics, data sharing, and simulation workflows. With careful configuration and evaluation, CTGAN can be safely deployed in real-world data science systems.

Check out the Full Codes here. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
The post [In-Depth Guide] The Complete CTGAN + SDV Pipeline for High-Fidelity Synthetic Data appeared first on MarkTechPost.

Kyutai Releases Hibiki-Zero: A3B Parameter Simultaneous Speech-to-Spee …

Kyutai has released Hibiki-Zero, a new model for simultaneous speech-to-speech translation (S2ST) and speech-to-text translation (S2TT). The system translates source speech into a target language in real-time. It handles non-monotonic word dependencies during the process. Unlike previous models, Hibiki-Zero does not require word-level aligned data for training. This eliminates a major bottleneck in scaling AI translation to more languages.

Traditional approaches rely on supervised training with word-level alignments. These alignments are difficult to collect at scale. Developers usually depend on synthetic alignments and language-specific heuristics. Hibiki-Zero removes this complexity by using a novel reinforcement learning (RL) strategy to optimize latency.

https://kyutai.org/blog/2026-02-12-hibiki-zero

A Multistream Architecture

Hibiki-Zero is a decoder-only model. It uses a multistream architecture to model sequences of tokens jointly. The model handles 3 specific streams:

Source Stream: Audio tokens from the input speech.

Target Stream: Generated audio tokens for the translated speech.

Inner Monologue: A stream of padded text tokens that match the target audio.

The system uses the Mimi neural audio codec. Mimi is a causal and streaming codec that encodes waveforms into discrete tokens. It operates at a framerate of 12.5 Hz. The model uses an RQ-Transformer to model these audio streams.

The architectural specs include:

Total Parameters: 3B.

Temporal Transformer: 28 layers with a latent dimension of 2048.

Depth Transformer: 6 layers per codebook with a latent dimension of 1024.

Context Window: 4min.

Audio Codebooks: 16 levels for high-quality speech.

Training Without Human Interpretation Data

Hibiki-Zero is trained in 2 main stages:

Coarse Alignment Training: The model first trains on sentence-level aligned data. This data ensures that the ith sentence in the target is a translation of the ith sentence in the source. The research team use a technique to insert artificial silence in the target speech to delay its content relative to the source.

Reinforcement Learning (RL): The model uses Group Relative Policy Optimization (GRPO) to refine its policy. This stage reduces translation latency while preserving quality.

The RL process uses process rewards based only on the BLEU score. It computes intermediate rewards at multiple points during translation. A hyperparameter ⍺ balances the trade-off between speed and accuracy. A lower ⍺ reduces latency but may slightly decrease quality.

Scaling to Italian in Record Time

The researchers demonstrated how easily Hibiki-Zero adapts to new languages. They added Italian as an input language using less than 1000h of speech data.

They performed supervised fine-tuning followed by the GRPO process.

The model reached a quality and latency trade-off similar to Meta’s Seamless model.

It surpassed Seamless in speaker similarity by over 30 points.

Performance and Results

Hibiki-Zero achieves state-of-the-art results across 5 X-to-English tasks. It was tested on the Audio-NTREX-4L long-form benchmark, which includes 15h of speech per TTS system.

MetricHibiki-Zero (French)Seamless (French)ASR-BLEU (↑)28.7 23.9 Speaker Similarity (↑)61.3 44.4 Average Lag (LAAL) (↓)2.3 6.2

In short-form tasks (Europarl-ST), Hibiki-Zero reached an ASR-BLEU of 34.6 with a lag of 2.8 seconds. Human raters also scored the model significantly higher than baselines for speech naturalness and voice transfer.

https://kyutai.org/blog/2026-02-12-hibiki-zero

Key Takeaways

Zero Aligned Data Requirement: Hibiki-Zero eliminates the need for expensive, hand-crafted word-level alignments between source and target speech, which were previously the biggest bottleneck in scaling simultaneous translation to new languages.

GRPO-Driven Latency Optimization: The model uses Group Relative Policy Optimization (GRPO) and a simple reward system based only on BLEU scores to automatically learn an efficient translation policy, balancing high translation quality with low latency.

Coarse-to-Fine Training Strategy: The training pipeline starts with sentence-level aligned data to teach the model base translation at high latency, followed by a reinforcement learning phase that “teaches” the model when to speak and when to listen.

Superior Voice and Naturalness: In benchmarking against previous state-of-the-art systems like Seamless, Hibiki-Zero achieved a 30-point lead in speaker similarity and significantly higher scores in speech naturalness and audio quality across five language tasks.

Rapid New Language Adaptation: The architecture is highly portable; researchers demonstrated that Hibiki-Zero could be adapted to a new input language (Italian) with less than 1,000 hours of speech data while maintaining its original performance on other languages.

Check out the Paper, Technical details, Repo and Samples. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
The post Kyutai Releases Hibiki-Zero: A3B Parameter Simultaneous Speech-to-Speech Translation Model Using GRPO Reinforcement Learning Without Any Word-Level Aligned Data appeared first on MarkTechPost.

Customize AI agent browsing with proxies, profiles, and extensions in …

AI agents that browse the web need more than basic page navigation. Our customers tell us they need agents that maintain session state across interactions, route traffic through corporate proxy infrastructure, and run with custom browser configurations. AgentCore Browser provides a secure, isolated browser environment for your agents to interact with web applications. Until now, in Agent Core Browser, each browser session started from a blank slate with default settings and direct internet access, limiting what agents could accomplish in real-world enterprise environments.
Today, we are announcing three new capabilities that address these requirements: proxy configuration, browser profiles, and browser extensions. Together, these features give you fine-grained control over how your AI agents interact with the web.
These three capabilities give you control over how AgentCore Browser sessions connect to the internet, what state they retain, and how they behave. Proxy configuration lets you route browser traffic through your own proxy servers, providing IP stability and integration with corporate network infrastructure. Browser profiles persist cookies and local storage across sessions, so agents can resume authenticated workflows without repeating login flows. Browser extensions load Chrome extensions into sessions to customize browser behavior for your use case. This post will walk through each capability with configuration examples and practical use cases to help you get started.
How persistent browser profiles keep AI Agents running smoothly
Customers building agents for e-commerce testing, authenticated workflows, and multi-step user journeys need browser sessions that remember state. Without persistent profiles, agents are required to re-authenticate and rebuild context at the start of every session, adding latency and fragility to automated workflows. Browser profiles solve this by saving and restoring cookies and local storage between sessions, so an agent that logged into a portal yesterday can pick up where it left off today.
IP stability is another common requirement. Healthcare and financial portals validate sessions based on source IP address, and rotating AWS IP addresses cause frequent re-authentication cycles that break long-running workflows. Proxy support lets you route traffic through servers with stable egress IPs, maintaining session continuity and meeting IP allowlisting requirements. Organizations that route traffic through corporate proxies need to extend this practice to AI agents for browser sessions. Proxy configuration enables access to internal webpages and resources that require proxy-based connectivity.
Browser extensions allow custom configurations such as ad blocking, authentication helpers, or other browser-level customization. When combined with proxy logging, these capabilities helps provide access control and audit evidence that may support compliance programs such as FedRAMP, HITRUST, and PCI. 
Feature 1: Proxy configuration
AgentCore Browser now supports routing browser traffic through your own external proxy servers. When you create a browser session with proxy configuration, AgentCore configures the browser to route HTTP and HTTPS traffic through your specified proxy servers.
How it works
You call StartBrowserSession with a proxyConfiguration specifying your proxy server. If using authentication, AgentCore retrieves proxy credentials from AWS Secrets Manager. The browser session starts with your proxy configuration applied, and browser traffic routes through your proxy server based on your domain routing rules.
Getting started with proxies
Complete these prerequisites before proceeding.
Step 1: Create a credentials secret (if your proxy requires authentication)

import boto3
import json
client = boto3.client(‘secretsmanager’)
client.create_secret(
Name=’my-proxy-credentials’,
SecretString=json.dumps({
‘username’: ‘<your-username>’,
‘password’: ‘<your-password>’
})
)

Step 2: Create a browser session with proxy configuration 

session_client = boto3.client(‘bedrock-agentcore’, region_name='<region>’)

response = session_client.start_browser_session(
browserIdentifier=”aws.browser.v1″,
name=”my-proxy-session”,
proxyConfiguration={
“proxies”: [{
“externalProxy”: {
“server”: “<your-proxy-hostname>”,
“port”: 8080,
“credentials”: {
“basicAuth”: {
“secretArn”: “arn:aws:secretsmanager:<region>:<account-id>:secret:<secret-name>”
}
}
}
}]
}
)
print(f”Session ID: {response[‘sessionId’]}”)

The credentials field is optional for proxies without authentication.
Domain-based routing
Use domainPatterns to route specific domains through designated proxies, and bypass.domainPatterns for domains that should connect directly:

proxyConfiguration={
“proxies”: [
{
“externalProxy”: {
“server”: “corp-proxy.example.com”,
“port”: 8080,
“domainPatterns”: [“.company.com”, “.internal.corp”]
}
},
{
“externalProxy”: {
“server”: “general-proxy.example.com”,
“port”: 8080
}
}
],
“bypass”: {
“domainPatterns”: [“.amazonaws.com”]
}
}

With this configuration, requests to *.company.com and *.internal.corp route through the corporate proxy,  requests to *.amazonaws.com bypass all proxies, and everything else routes through the general proxy. These fields are just an example. Bypass domains can match bypass.domainPatterns to connect directly and external proxy can be a valid proxy’s domainPatterns route through that proxy (first match wins based on array order). 
Routing precedence
When AgentCore Browser processes an outbound request, it walks through three tiers of routing rules to decide where to send the traffic. It first checks the bypass list. If the destination domain matches a bypass.domainPatterns entry, the request connects directly to the internet without using any proxy. If the domain does not match a bypass rule, AgentCore checks each proxy’s domainPatterns in order and routes the request through the first proxy whose pattern matches. If no proxy pattern matches either, the request falls through to the default proxy, which is the proxy entry that has no domainPatterns defined.
Test the new proxy feature with this code example.
Feature 2: Browser profiles
Browser profiles let you persist and reuse session data across multiple browser sessions, including cookies and local storage. An agent that authenticates with a web portal in one session can restore that state in a later session without logging in again. This is useful for authenticated workflows where re-login adds latency, e-commerce testing where shopping carts and form data need to survive between sessions, and multi-step user journeys that span multiple browser invocations.
The profile lifecycle has four stages. You start by calling create_browser_profile() to create a named profile. At the end of a session, you call save_browser_session_profile() to capture the current cookies and local storage into that profile. When you start a new session, you pass the profile identifier in the profileConfiguration parameter of start_browser_session(), which restores the saved state into the new browser. When you no longer need the profile, you call delete_browser_profile() to clean it up.
The following example shows an agent that adds items to a shopping cart in one session and verifies they persist in a subsequent session.
Complete these prerequisites before proceeding.

import boto3

control_client = boto3.client(‘bedrock-agentcore-control’, region_name='<region>’) # replace by your region

session_client = boto3.client(‘bedrock-agentcore’, region_name='<region>’) # replace by your region

# Create a browser profile
profile = control_client.create_browser_profile(name=”ecommerce_profile”)
profile_id = profile[‘profileId’]

# Session 1: Add items to cart
session1 = session_client.start_browser_session(
browserIdentifier=”aws.browser.v1”,
name=”shopping-session-1″
)
# … agent navigates and adds items to cart …

# Save session state to profile
session_client.save_browser_session_profile(
sessionId=session1[‘sessionId’],
browserIdentifier=”aws.browser.v1”,
profileIdentifier=profile_id
)
session_client.stop_browser_session(sessionId=session1[‘sessionId’], browserIdentifier=”aws.browser.v1″)

# Session 2: Resume with saved profile
session2 = session_client.start_browser_session(
browserIdentifier=”aws.browser.v1”,
name=”shopping-session-2″,
profileConfiguration={“profileIdentifier”: profile_id}
)
# Cart items from Session 1 are now available

Test the new profile feature with this code example.
Feature 3: Browser extensions
Browser extensions let you load Chrome extensions into AgentCore Browser sessions to customize how the browser behaves. You package extensions as ZIP files, upload them to Amazon Simple Storage Service (Amazon S3), and reference them when starting a browser session. This provides access to functionality available through the Chrome extension API, from proxy routing and ad blocking to authentication helpers and content modification. For example, you can inject authentication tokens for internal applications, remove ads, and track scripts that interfere with agent navigation, or modify page content to improve how agents interact with a site.
Your extension should follow the standard Chromium extension format and adhere to Chromium extension guidelines.
Complete these prerequisites before proceeding.

Upload the extension to Amazon S3:

# Upload extension to S3

import boto3
s3 = boto3.client(‘s3’)
s3.upload_file(
‘my-extension.zip’,
‘amzn-s3-demo-bucket-extensions’,
‘extensions/my-extension.zip’
)

Then, start a session with the extension, pointing to the Amazon S3 bucket where you’ve uploaded the zip file:

import boto3
region = “<region>” # replace by your region
client = boto3.client(‘bedrock-agentcore’, region_name=region)

response = client.start_browser_session(
browserIdentifier=”aws.browser.v1″,
name=”my-session-with-extensions”,
sessionTimeoutSeconds=1800,
viewPort={
‘height’: 1080,
‘width’: 1920
},
extensions=[
{
“location”: {
“s3”: {
“bucket”: “amzn-s3-demo-bucket-extensions”,
“prefix”: “extensions/my-extension.zip”
}
}
},
{
“location”: {
“s3”: {
“bucket”: “amzn-s3-demo-bucket-extensions”,
“prefix”: “extensions/another-extension.zip”,
“versionId”: “abc123″ # Optional – for versioned S3 buckets
}
}
}
]
)

print(f”Session ID: {response[‘sessionId’]}”)
print(f”Status: {response[‘status’]}”)
print(f”Automation Stream: {response[‘streams’][‘automationStream’][‘streamEndpoint’]}”)

Test the new extensions feature with this code example.
Conclusion
Proxy configuration, browser profiles, and browser extensions give AgentCore Browser the proxy routing, session persistence, and extensibility controls that customers need to deploy AI agents that browse the web in production. You can route traffic through your corporate proxy infrastructure, maintain session continuity across interactions, and customize browser behavior with extensions, all while keeping credentials secure in AWS Secrets Manager. Customers can carry e-commerce context and information among sessions, create your own extension and test it in a secure environment before release, and, also, have browser connecting into your network through proxies. 
To get started, see the tutorials in the Amazon Bedrock AgentCore samples repository and the Amazon Bedrock AgentCore Browser documentation.  For more information about pricing, visit Amazon Bedrock AgentCore Pricing. 

About the Authors

Joshua Samuel
Joshua Samuel is a Senior AI/ML Specialist Solutions Architect at AWS who accelerates enterprise transformation through AI/ML, and generative AI solutions, based in Melbourne, Australia. A passionate disrupter, he specializes in agentic AI and coding techniques – Anything that makes builders faster and happier. Outside work, he tinkers with home automation and AI coding projects, and enjoys life with his wife, kids and dog.

Evandro Franco
Evandro Franco is a Sr. Data Scientist working on Amazon Web Services. He is part of the Global GTM team that helps AWS customers overcome business challenges related to AI/ML on top of AWS, mainly on Amazon Bedrock AgentCore and Strands Agents. He has more than 18 years of experience working with technology, from software development, infrastructure, serverless, to machine learning. In his free time, Evandro enjoys playing with his son, mainly building some funny Lego bricks.

Kosti Vasilakakis
Kosti Vasilakakis is a Principal PM at AWS on the Agentic AI team, where he has led the design and development of several Bedrock AgentCore services from the ground up, including Runtime, Browser, Code Interpreter, and Identity. He previously worked on Amazon SageMaker since its early days, launching AI/ML capabilities now used by thousands of companies worldwide. Earlier in his career, Kosti was a data scientist. Outside of work, he builds personal productivity automations, plays tennis, and enjoys life with his wife and kids.

Yan Marim
Yan Marim is a Sr. GenAI Specialist Solutions Architect at Amazon Web Services, based in Brazil. As part of the LATAM Specialist team, he guides customers through their generative AI adoption journey, focusing on Amazon Bedrock and agentic AI solutions. In his free time, Yan enjoys spending quality time with his wife and dog, and watching soccer games.

Kevin Orellana
Kevin Orellana is a Software Development Engineer at Amazon Web Services on the Bedrock AgentCore team, based in Seattle. He builds and operates core infrastructure powering agentic AI capabilities, including Browser, Code Interpreter, and Runtime. Earlier in his career, Kevin worked on the Bedrock inference team hosting frontier models. In his free time, he enjoys hiking with his Goldendoodle, experimenting with multi-agent simulations, and working toward building a personal AI assistant that speaks English, Spanish, and Mandarin.