October 2025 - i-genie.co.uk

Ant Group Releases Ling 2.0: A Reasoning-First MoE Language Model Seri …

Posted on October 31, 2025 by i-genie

How do you build a language model that grows in capacity but keeps the computation for each token almost unchanged? The Inclusion AI team from the Ant Group is pushing sparse large models in a methodical way by releasing Ling 2.0. Ling 2.0 is a reasoning based language model family built on the idea that each activation should translate directly into stronger reasoning behavior. It is one of the latest approaches that shows how to keep activation small while moving from 16B to 1T without rewriting the recipe. The series has three versions, Ling mini 2.0 at 16B total with 1.4B activated, Ling flash 2.0 in the 100B class with 6.1B activated, and Ling 1T with 1T total and about 50B active per token.

Sparse MoE as the central design

Every Ling 2.0 model uses the same sparse Mixture of Experts layer. Each layer has 256 routed experts and one shared expert. The router picks 8 routed experts for every token, the shared expert is always on, so about 9 experts out of 257 are used for every token, this is about 3.5 percent activation, which matches the 1/32 activation ratio. The research team reports about 7 times efficiency compared to an equivalent dense model because you train and serve only a small part of the network per token while keeping a very large parameter pool.

https://arxiv.org/abs/2510.22115

Ling 2.0 brings coordinated advances across four layers of the stack, model architecture, pre training, post training, and the underlying FP8 infrastructure:

Model architecture: The architecture is chosen using Ling Scaling Laws, not by trial and error. To support the Ling Scaling Laws, the team runs what they call the Ling Wind Tunnel, a fixed set of small MoE runs trained under the same data and routing rules, then fitted to power laws to predict loss, activation and expert balance at much larger sizes. This gives them a low cost way to choose 1/32 activation, 256 routed experts and 1 shared expert before committing GPUs to 1T scale. Routing is aux-loss-free with sigmoid scoring, and the stack uses QK Norm, MTP loss and partial RoPE to keep depth stable. Because the same law picked the shape, Ling mini 2.0, Ling flash 2.0 and Ling 1T can all share the consistency across sizes.

Pre training: The series is trained on more than 20T tokens, starting with 4K context and a mix in which reasoning heavy sources such as math and code gradually increase to almost half of the corpus. A later mid training stage extends context to about 32K on a selected 150B token slice, then injects another 600B tokens of high quality chain of thought, before finally stretching to 128K with YaRN while preserving short context quality. This pipeline ensures that long context and reasoning are introduced early, not just added at the SFT step.

Post training: Alignment is separated into a capability pass and a preference pass. First, Decoupled Fine Tuning teaches the model to switch between quick responses and deep reasoning through different system prompts, then an evolutionary CoT stage expands and diversifies chains, and finally a sentence level policy optimization with a Group Arena Reward aligns outputs to human judgments at fine granularity. This staged alignment is what lets a non thinking base reach strong math, code and instruction performance without inflating every answer.

Infrastructure: Ling 2.0 trains natively in FP8 with safeguards, keeping the loss curve within a small gap of BF16 while gaining about 15% utilization on the reported hardware. The larger speedups, around 40 percent, come from heterogeneous pipeline parallelism, interleaved one forward one backward execution and partitioning that is aware of the MTP block, not from precision alone. Together with Warmup Stable Merge, which replaces LR decay by merging checkpoints, this systems stack makes 1T scale runs practical on existing clusters.

Understanding the Results

Evaluations are consistent in pattern, small activation MoE models deliver competitive quality while keeping per token compute low. Ling mini 2.0 has 16B total parameters, activates 1.4B per token, and is reported to perform in the 7 to 8B dense band. Ling flash 2.0 keeps the same 1/32 activation recipe, has 100B and activates 6.1B per token. Ling 1T is the flagship non thinking model, it has 1T total parameters and about 50B active per token, preserving the 1/32 sparsity and extending the same Ling Scaling Laws to trillion scale.

https://arxiv.org/abs/2510.22115

Key Takeaways

Ling 2.0 is built around a 1/32 activation MoE architecture, selected using Ling Scaling Laws so that 256 routed experts plus 1 shared expert stay optimal from 16B up to 1T.

Ling mini 2.0 has 16B total parameters with 1.4B activated per token and is reported to match 7B to 8B dense models while generating at more than 300 tokens per second in simple QA on H20.

Ling flash 2.0 keeps the same recipe, has 6.1B active parameters and sits in the 100B range, giving a higher capacity option without increasing per token compute.

Ling 1T exposes the full design, 1T total parameters with about 50B active per token, 128K context, and an Evo CoT plus LPO style post training stack to push efficient reasoning.

Across all sizes, efficiency gains above 7 times over dense baselines come from the combination of sparse activation, FP8 training, and a shared training schedule, so quality scales predictably without re tuning compute.

Editorial Comments

This release demonstrates a complete sparse MoE stack. Ling Scaling Laws identify a 1/32 activation as optimal, the architecture locks in 256 routed experts plus 1 shared expert, and the same shape is used from 16B to 1T. Training, context extension and preference optimization are all aligned to that choice, so small activation does not block math, code or long context, and FP8 plus heterogeneous pipelines keep cost in a practical range. It is a clear signal that trillion scale reasoning can be organized around fixed sparsity instead of growing dense compute.

Check out the Weights on HF, Repo and Paper. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
The post Ant Group Releases Ling 2.0: A Reasoning-First MoE Language Model Series Built on the Principle that Each Activation Enhances Reasoning Capability appeared first on MarkTechPost.

How to Build Ethically Aligned Autonomous Agents through Value-Guided …

Posted on October 31, 2025 by i-genie

In this tutorial, we explore how we can build an autonomous agent that aligns its actions with ethical and organizational values. We use open-source Hugging Face models running locally in Colab to simulate a decision-making process that balances goal achievement with moral reasoning. Through this implementation, we demonstrate how we can integrate a “policy” model that proposes actions and an “ethics judge” model that evaluates and aligns them, allowing us to see value alignment in practice without depending on any APIs. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browser!pip install -q transformers torch accelerate sentencepiece

import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, AutoModelForCausalLM

def generate_seq2seq(model, tokenizer, prompt, max_new_tokens=128):
inputs = tokenizer(prompt, return_tensors=”pt”)
with torch.no_grad():
output_ids = model.generate(
**inputs,
max_new_tokens=max_new_tokens,
do_sample=True,
top_p=0.9,
temperature=0.7,
pad_token_id=tokenizer.eos_token_id if tokenizer.eos_token_id is not None else tokenizer.pad_token_id,
)
return tokenizer.decode(output_ids[0], skip_special_tokens=True)

def generate_causal(model, tokenizer, prompt, max_new_tokens=128):
inputs = tokenizer(prompt, return_tensors=”pt”)
with torch.no_grad():
output_ids = model.generate(
**inputs,
max_new_tokens=max_new_tokens,
do_sample=True,
top_p=0.9,
temperature=0.7,
pad_token_id=tokenizer.eos_token_id if tokenizer.eos_token_id is not None else tokenizer.pad_token_id,
)
full_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
return full_text[len(prompt):].strip()

We begin by setting up our environment and importing essential libraries from Hugging Face. We define two helper functions that generate text using sequence-to-sequence and causal models. This allows us to easily produce both reasoning-based and creative outputs later in the tutorial. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserpolicy_model_name = “distilgpt2”
judge_model_name = “google/flan-t5-small”

policy_tokenizer = AutoTokenizer.from_pretrained(policy_model_name)
policy_model = AutoModelForCausalLM.from_pretrained(policy_model_name)

judge_tokenizer = AutoTokenizer.from_pretrained(judge_model_name)
judge_model = AutoModelForSeq2SeqLM.from_pretrained(judge_model_name)

device = “cuda” if torch.cuda.is_available() else “cpu”
policy_model = policy_model.to(device)
judge_model = judge_model.to(device)

if policy_tokenizer.pad_token is None:
policy_tokenizer.pad_token = policy_tokenizer.eos_token
if judge_tokenizer.pad_token is None:
judge_tokenizer.pad_token = judge_tokenizer.eos_token

We load two small open-source models—distilgpt2 as our action generator and flan-t5-small as our ethics reviewer. We prepare both models and tokenizers for CPU or GPU execution, ensuring smooth performance in Colab. This setup provides the foundation for the agent’s reasoning and ethical evaluation. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserclass EthicalAgent:
def __init__(self, policy_model, policy_tok, judge_model, judge_tok):
self.policy_model = policy_model
self.policy_tok = policy_tok
self.judge_model = judge_model
self.judge_tok = judge_tok

def propose_actions(self, user_goal, context, n_candidates=3):
base_prompt = (
“You are an autonomous operations agent. ”
“Given the goal and context, list a specific next action you will take:nn”
f”Goal: {user_goal}nContext: {context}nAction:”
)
candidates = []
for _ in range(n_candidates):
action = generate_causal(self.policy_model, self.policy_tok, base_prompt, max_new_tokens=40)
action = action.split(“n”)[0]
candidates.append(action.strip())
return list(dict.fromkeys(candidates))

def judge_action(self, action, org_values):
judge_prompt = (
“You are the Ethics & Compliance Reviewer.n”
“Evaluate the proposed agent action.n”
“Return fields:n”
“RiskLevel (LOW/MED/HIGH),n”
“Issues (short bullet-style text),n”
“Recommendation (approve / modify / reject).nn”
f”ORG_VALUES:n{org_values}nn”
f”ACTION:n{action}nn”
“Answer in this format:n”
“RiskLevel: …nIssues: …nRecommendation: …”
)
verdict = generate_seq2seq(self.judge_model, self.judge_tok, judge_prompt, max_new_tokens=128)
return verdict.strip()

def align_action(self, action, verdict, org_values):
align_prompt = (
“You are an Ethics Alignment Assistant.n”
“Your job is to FIX the proposed action so it follows ORG_VALUES.n”
“Keep it effective but safe, legal, and respectful.nn”
f”ORG_VALUES:n{org_values}nn”
f”ORIGINAL_ACTION:n{action}nn”
f”VERDICT_FROM_REVIEWER:n{verdict}nn”
“Rewrite ONLY IF NEEDED. If original is fine, return it unchanged. ”
“Return just the final aligned action:”
)
aligned = generate_seq2seq(self.judge_model, self.judge_tok, align_prompt, max_new_tokens=128)
return aligned.strip()

We define the core agent class that generates, evaluates, and refines actions. Here, we design methods for proposing candidate actions, evaluating their ethical compliance, and rewriting them to align with values. This structure helps us modularize reasoning, judgment, and correction into clear functional steps. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browser def decide(self, user_goal, context, org_values, n_candidates=3):
proposals = self.propose_actions(user_goal, context, n_candidates=n_candidates)
scored = []
for act in proposals:
verdict = self.judge_action(act, org_values)
aligned_act = self.align_action(act, verdict, org_values)
scored.append({“original_action”: act, “review”: verdict, “aligned_action”: aligned_act})

def extract_risk(vtext):
for line in vtext.splitlines():
if “RiskLevel” in line:
lvl = line.split(“:”, 1)[-1].strip().upper()
if “LOW” in lvl:
return 0
if “MED” in lvl:
return 1
if “HIGH” in lvl:
return 2
return 3

scored_sorted = sorted(scored, key=lambda x: extract_risk(x[“review”]))
final_choice = scored_sorted[0]
report = {
“goal”: user_goal,
“context”: context,
“org_values”: org_values,
“candidates_evaluated”: scored,
“final_plan”: final_choice[“aligned_action”],
“final_plan_rationale”: final_choice[“review”],
}
return report

We implement the complete decision-making pipeline that links generation, judgment, and alignment. We assign risk scores to each candidate action and automatically choose the most ethically aligned one. This section captures how the agent can self-assess and improve its choices before finalizing an action. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserorg_values_text = (
“- Respect privacy; do not access personal data without consent.n”
“- Follow all laws and safety policies.n”
“- Avoid discrimination, harassment, or harmful manipulation.n”
“- Be transparent and truthful with stakeholders.n”
“- Prioritize user well-being and long-term trust over short-term gain.”
)

demo_goal = “Increase customer adoption of the new financial product.”
demo_context = (
“The agent works for a bank outreach team. The target customers are small family businesses. ”
“Regulations require honest disclosure of risks and fees. Cold-calling minors or lying about terms is illegal.”
)

agent = EthicalAgent(policy_model, policy_tokenizer, judge_model, judge_tokenizer)
report = agent.decide(demo_goal, demo_context, org_values_text, n_candidates=4)

def pretty_report(r):
print(“=== ETHICAL DECISION REPORT ===”)
print(f”Goal: {r[‘goal’]}n”)
print(f”Context: {r[‘context’]}n”)
print(“Org Values:”)
print(r[“org_values”])
print(“n— Candidate Evaluations —“)
for i, cand in enumerate(r[“candidates_evaluated”], 1):
print(f”nCandidate {i}:”)
print(“Original Action:”)
print(” “, cand[“original_action”])
print(“Ethics Review:”)
print(cand[“review”])
print(“Aligned Action:”)
print(” “, cand[“aligned_action”])
print(“n— Final Plan Selected —“)
print(r[“final_plan”])
print(“nWhy this plan is acceptable (review snippet):”)
print(r[“final_plan_rationale”])

pretty_report(report)

We define organizational values, create a real-world scenario, and run the ethical agent to generate its final plan. Finally, we print a detailed report showing candidate actions, reviews, and the selected ethical decision. Through this, we observe how our agent integrates ethics directly into its reasoning process.

In conclusion, we clearly understand how an agent can reason not only about what to do but also about whether to do it. We witness how the system learns to identify risks, correct itself, and align its actions with human and organizational principles. This exercise helps us realize that value alignment and ethics are not abstract ideas but practical mechanisms we can embed into agentic systems to make them safer, fairer, and more trustworthy.

Check out the FULL CODES here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
The post How to Build Ethically Aligned Autonomous Agents through Value-Guided Reasoning and Self-Correcting Decision-Making Using Open-Source Models appeared first on MarkTechPost.

IBM AI Team Releases Granite 4.0 Nano Series: Compact and Open-Source …

Posted on October 31, 2025 by i-genie

Small models are often blocked by poor instruction tuning, weak tool use formats, and missing governance. IBM AI team released Granite 4.0 Nano, a small model family that targets local and edge inference with enterprise controls and open licensing. The family includes 8 models in two sizes, 350M and about 1B, with both hybrid SSM and transformer variants, each in base and instruct. Granite 4.0 Nano series models are released under an Apache 2.0 license with native architecture support on popular runtimes like vLLM, llama.cpp, and MLX

https://huggingface.co/blog/ibm-granite/granite-4-nano

What is new in Granite 4.0 Nano series?

Granite 4.0 Nano consists of four model lines and their base counterparts. Granite 4.0 H 1B uses a hybrid SSM based architecture and is about 1.5B parameters. Granite 4.0 H 350M uses the same hybrid approach at 350M. For maximum runtime portability IBM also provides Granite 4.0 1B and Granite 4.0 350M as transformer versions.

Granite releaseSizes in releaseArchitectureLicense and governanceKey notesGranite 13B, first watsonx Granite models13B base, 13B instruct, later 13B chatDecoder only transformer, 8K contextIBM enterprise terms, client protectionsFirst public Granite models for watsonx, curated enterprise data, English focusGranite Code Models (open)3B, 8B, 20B, 34B code, base and instructDecoder only transformer, 2 stage code training on 116 languagesApache 2.0First fully open Granite line, for code intelligence, paper 2405.04324, available on HF and GitHubGranite 3.0 Language Models 2B and 8B, base and instructTransformer, 128K context for instructApache 2.0Business LLMs for RAG, tool use, summarization, shipped on watsonx and HFGranite 3.1 Language Models (HF) 1B A400M, 3B A800M, 2B, 8BTransformer, 128K contextApache 2.0Size ladder for enterprise tasks, both base and instruct, same Granite data recipeGranite 3.2 Language Models (HF) 2B instruct, 8B instructTransformer, 128K, better long promptApache 2.0Iterative quality bump on 3.x, keeps business alignmentGranite 3.3 Language Models (HF) 2B base, 2B instruct, 8B base, 8B instruct, all 128KDecoder only transformerApache 2.0Latest 3.x line on HF before 4.0, adds FIM and better instruction followingGranite 4.0 Language Models 3B micro, 3B H micro, 7B H tiny, 32B H small, plus transformer variantsHybrid Mamba 2 plus transformer for H, pure transformer for compatibilityApache 2.0, ISO 42001, cryptographically signedStart of hybrid generation, lower memory, agent friendly, same governance across sizesGranite 4.0 Nano Language Models 1B H, 1B H instruct, 350M H, 350M H instruct, 2B transformer, 2B transformer instruct, 0.4B transformer, 0.4B transformer instruct, total 8H models are hybrid SSM plus transformer, non H are pure transformerApache 2.0, ISO 42001, signed, same 4.0 pipelineSmallest Granite models, made for edge, local and browser, run on vLLM, llama.cpp, MLX, watsonxTable Created by Marktechpost.com

Architecture and training

The H variants interleave SSM layers with transformer layers. This hybrid design reduces memory growth versus pure attention, while preserving the generality of transformer blocks. The Nano models did not use a reduced data pipeline. They were trained with the same Granite 4.0 methodology and more than 15T tokens, then instruction tuned to deliver solid tool use and instruction following. This carries over strengths from the larger Granite 4.0 models to sub 2B scales.

Benchmarks and competitive context

IBM compares Granite 4.0 Nano with other under 2B models, including Qwen, Gemma, and LiquidAI LFM. Reported aggregates show a significant increase in capabilities across general knowledge, math, code, and safety at similar parameter budgets. On agent tasks, the models outperform several peers on IFEval and on the Berkeley Function Calling Leaderboard v3.

https://huggingface.co/blog/ibm-granite/granite-4-nano

Key Takeaways

IBM released 8 Granite 4.0 Nano models, 350M and about 1B each, in hybrid SSM and transformer variants, in base and instruct, all under Apache 2.0.

The hybrid H models, Granite 4.0 H 1B at about 1.5B parameters and Granite 4.0 H 350M at about 350M, reuse the Granite 4.0 training recipe on more than 15T tokens, so capability is inherited from the larger family and not a reduced data branch.

IBM team reports that Granite 4.0 Nano is competitive with other sub 2B models such as Qwen, Gemma and LiquidAI LFM on general, math, code and safety, and that it outperforms on IFEval and BFCLv3 which matter for tool using agents.

All Granite 4.0 models, including Nano, are cryptographically signed, ISO 42001 certified and released for enterprise use, which gives provenance and governance that typical small community models do not provide.

The models are available on Hugging Face and IBM watsonx.ai with runtime support for vLLM, llama.cpp and MLX, which makes local, edge and browser level deployments realistic for early AI engineers and software teams.

Editorial Comments

IBM is doing the right thing here, it is taking the same Granite 4.0 training pipeline, the same 15T token scale, the same hybrid Mamba 2 plus transformer architecture, and pushing it down to 350M and about 1B so that edge and on device workloads can use the exact governance and provenance story that the larger Granite models already have. The models are Apache 2.0, ISO 42001 aligned, cryptographically signed, and already runnable on vLLM, llama.cpp and MLX. Overall, this is a clean and auditable way to run small LLMs.

Check out the Model Weights on HF and Technical details. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
The post IBM AI Team Releases Granite 4.0 Nano Series: Compact and Open-Source Small Models Built for AI at the Edge appeared first on MarkTechPost.

Reduce CAPTCHAs for AI agents browsing the web with Web Bot Auth (Prev …

Posted on October 31, 2025 by i-genie

AI agents need to browse the web on your behalf. When your agent visits a website to gather information, complete a form, or verify data, it encounters the same defenses designed to stop unwanted bots: CAPTCHAs, rate limits, and outright blocks.
Today, we are excited to share that AWS has a solution. Amazon Bedrock AgentCore Browser, our secure, cloud-based browser for AI agents to interact with websites, now supports Web Bot Auth (in preview), a draft IETF protocol that gives agents verifiable cryptographic identities.
CAPTCHA friction
Customers tell us that CAPTCHA friction is one of the biggest obstacles to reliable browser-based agentic workflows. Your agent halts mid-task, waiting for human intervention to solve a puzzle that proves you’re not a bot – except your agent is a bot, and that’s the point. CAPTCHAs exist for good reason. Websites face constant challenges protecting their content, inventory and reviews. Web Application Firewalls (WAFs) and bot detection services protect these sites, but they treat nearly all automated traffic as suspicious because they have no reliable way to distinguish legitimate agents from malicious ones.
Some automation providers try to solve CAPTCHAs programmatically – using computer vision models to read distorted text or clicking through image grids until the puzzle clears. This approach is brittle, expensive, and is bypassing controls that domain owners intended for their content. Other approaches rely on IP allowlists or User-Agent strings. IP allowlists break when you run agents in cloud environments where addresses change frequently. User-Agent strings can be spoofed by anyone, so they provide no verification, and pose a risk of people emulating well trusted strings. Both methods require manual coordination with every website you want to access, which does not scale.
Web Bot Auth: Cryptographic identity for agents browsing the web
Web Bot Auth is a draft IETF protocol that gives agents verifiable cryptographic identities. When you enable Web Bot Auth in AgentCore Browser, we issue cryptographic credentials that websites can verify. The agent presents these credentials with every request. The WAF may now additionally check the signature, confirm it matches a trusted directory, and allow the request through if verified bots are allowed by the domain owner and other WAF checks are clear.
AgentCore is working with Cloudflare, HUMAN Security, and Akamai Technologies to support this verification flow. These providers protect millions of websites. When you create an AgentCore Browser with signing enabled in the configuration, we automatically register your agent’s signature directory with these providers. Many domains already configure their WAFs to allow verified bots by default, which means you can see immediate CAPTCHA reduction without additional setup in the cases that this happens.
How domain owners control access
WAF providers give website owners three levels of control using Web Bot Auth:

Block all bots – Some sites choose to block automated traffic entirely. Web Bot Auth does not bypass this – if a domain wants no automation, that choice is respected.
Allow verified bots – Many domains configure their WAF to allow any bot that presents a valid cryptographic signature. This is the default policy for a growing number of sites protected by Cloudflare, HUMAN Security, and Akamai Technologies. When you enable signing, as a parameter in the AgentCore Browser configuration, this policy will apply to your agents.
Allow specific verified bots to conduct only specific actions – For example, a financial services company automating vendor portal access can share its unique directory with those vendors. The vendor can create rules like “allow FinCo agents at 100 requests per minute, don’t allow them to create new accounts, and block all other signed agents.” This gives websites granular control while preserving the benefits of cryptographic verification.

Today’s preview release of Web Both Auth support in AgentCore Browser helps reduce friction with CAPTCHAs on domains that allow verified bots, by making your agent appear as a verified bot. Once the Web Bot Auth protocol is finalized, AgentCore intends to transition to customer-specific keys, so AgentCore users can use the tier of control that allows only specified verified bots.
Using the Web Bot Auth protocol
To enable the browser to sign requests using the Web Bot Auth protocol, create a browser tool with the browserSigning configuration:

import boto3
cp_client = boto3.client(‘bedrock-agentcore-control’)
response = cp_client.create_browser(
name=”signed_browser”,
description=”Browser tool with Web Bot Auth enabled”,
networkConfiguration={
“networkMode”: “PUBLIC”
},
executionRoleArn=”arn:aws:iam::123456789012:role/AgentCoreExecutionRole”,
browserSigning={
“enabled”: True
}
)
browserId = response[‘browserId’]

Pass the browser identifier to your agent framework. Here is an example using Strands Agents:

from strands import Agent
from strands_tools.browser import AgentCoreBrowser
agent_core_browser = AgentCoreBrowser(
region=”us-west-2″,
identifier=browserId
)
strands_agent = Agent(
tools=[agent_core_browser.browser],
model=”anthropic.claude-4-5-haiku-20251001-v1:0″,
system_prompt=”You are a website analyst. Use the browser tool efficiently.”
)
result = strands_agent(“Analyze the website at <https://example.com/>”)

The agent is now configured to use the new browser tool that signs every HTTP request. Websites protected by Cloudflare, HUMAN Security, or Akamai Technologies can verify the signature and allow the request through without presenting a CAPTCHA, if the domain owner allows verified bots.
Protocol development
The Web Bot Auth protocol is gaining industry momentum because it solves a real problem: legitimate automation is indistinguishable from abuse without verifiable identity. You can read the draft protocol specification, HTTP Message Signatures for automated traffic Architecture. The architecture defines how agents generate signatures, how WAFs verify them, and how key directories enable discovery. Amazon is working with Cloudflare and many popular WAF providers to help finalize the customer-specific key directory format and work towards finalizing the draft.
Conclusion
Amazon Bedrock AgentCore Browser is generally available, with the Web Bot Auth feature available in preview. AgentCore Browser signing requests using the Web Bot Auth protocol help reduce friction with CAPTCHA across domains that allow verified bots. As the protocol finalizes, AgentCore Browser intends to issue customer-specific keys and directories, so you can prove your agent’s identity to specific websites and establish trust relationships directly with the domains you need to access.
Web Bot Auth enables agents to prove their identity when challenged, reduces operational friction in automated workflows, and gives website owners control over which agents access their resources. Amazon Bedrock AgentCore Browser support for Web Bot Auth (Preview) provides the infrastructure layer that makes this possible.

About the authors
Veda Raman is a Senior Specialist Solutions Architect for generative AI and machine learning at AWS. Veda works with customers to help them architect efficient, secure, and scalable machine learning applications. Veda specializes in generative AI services like Amazon Bedrock and Amazon SageMaker.
Kosti Vasilakakis is a Principal PM at AWS on the Agentic AI team, where he has led the design and development of several Bedrock AgentCore services from the ground up, including Runtime, Browser, Code Interpreter, and Identity. He previously worked on Amazon SageMaker since its early days, launching AI/ML capabilities now used by thousands of companies worldwide. Earlier in his career, Kosti was a data scientist. Outside of work, he builds personal productivity automations, plays tennis, and enjoys life with his wife and kids.
Joshua Samuel is a Senior AI/ML Specialist Solutions Architect at AWS who accelerates enterprise transformation through AI/ML, and generative AI solutions, based in Melbourne, Australia. A passionate disrupter, he specializes in agentic AI and coding techniques – Anything that makes builders faster and happier.

Microsoft Releases Agent Lightning: A New AI Framework that Enables Re …

Posted on October 30, 2025 by i-genie

How do you convert real agent traces into reinforcement learning RL transitions to improve policy LLMs without changing your existing agent stack? Microsoft AI team releases Agent Lightning to help optimize multi-agent systems. Agent Lightning is a open-sourced framework that makes reinforcement learning work for any AI agent without rewrites. It separates training from execution, defines a unified trace format, and introduces LightningRL, a hierarchical method that converts complex agent runs into transitions that standard single turn RL trainers can optimize.

What Agent Lightning does?

The framework models an agent as a decision process. It formalizes the agent as a partially observable Markov decision process where the observation is the current input to the policy LLM, the action is the model call, and the reward can be terminal or intermediate. From each run it extracts only the calls made by the policy model, along with inputs, outputs, and rewards. This trims away other framework noise and yields clean transitions for training.

LightningRL performs credit assignment across multi step episodes, then optimizes the policy with a single turn RL objective. The research team describes compatibility with single turn RL methods. In practice, teams often use trainers that implement PPO or GRPO, such as VeRL, which fits this interface.