August 2025 - Page 5 of 10

Hugging Face Unveils AI Sheets: A Free, Open-Source No-Code Toolkit fo …

Posted on August 18, 2025 by i-genie

Hugging Face has just released AI Sheets, a free, open-source, and local-first no-code tool designed to radically simplify dataset creation and enrichment with AI. AI Sheets aims to democratize access to AI-powered data handling by merging the intuitive spreadsheet interface with direct access to leading open-source Large Language Models (LLMs) like Qwen, Kimi, Llama 3, and many others, including custom models, all without writing a line of code.

What’s AI Sheets?

AI Sheets is a spreadsheet-style data tool purpose-built for working with datasets and leveraging AI models. Unlike traditional spreadsheets, each cell or column in AI Sheets can be powered and enriched by natural language prompts using integrated AI models. Users can:dev+3

Build, clean, transform, and enrich datasets directly in the browser or via local deployment.

Apply open-source models from Hugging Face Hub, or run their own local custom models (as long as they support OpenAI API spec).

Collaboratively experiment with rapid data prototyping, fine-tune AI outputs by editing and validating cells, and run large-scale data generation pipelines.

Key Features

No-Code Workflow: Users interact with an intuitive spreadsheet UI, applying AI transformations using prompts—no Python or coding required.

Model Integration: Instantly access thousands of models, including popular LLMs (Qwen, Kimi, Llama 3, etc.). Supports local deployment via servers like Ollama, empowering you to use fine-tuned or domain-specific models with zero cloud dependency.

Data Privacy: When run locally, all data stays on your machine, meeting security and compliance needs.

Open-Source & Free: Both hosted and local versions are available with zero cost, supporting the open AI community and customization.

Flexible Deployment: Runs entirely in-browser (via Hugging Face Spaces), or locally for maximum privacy, performance, and infrastructure control.

How It Works

Prompt-Driven Columns: Create new columns by entering plain text prompts, allowing the model to generate or enrich data.

Local Model Support: Set environment variables (MODEL_ENDPOINT_URL and MODEL_ENDPOINT_NAME) to seamlessly connect AI Sheets with your local inference server (e.g., Ollama with Llama 3 loaded)—fully OpenAI API compatible.

Use Cases: AI Sheets supports tasks like sentiment analysis, data classification, text generation, quick dataset enrichment, even batch processing across massive datasets—all in a collaborative, visual environment.

https://huggingface.co/blog/aisheets

Impact

AI Sheets dramatically lowers the technical barrier for advanced dataset preparation and enrichment. Data scientists can experiment faster, analysts get powerful automation, and non-technical users can leverage AI without any coding. By combining the Hugging Face open-source model ecosystem with a no-code interface, AI Sheets is positioned to become an essential tool for practitioners, researchers, and teams seeking flexible, private, and scalable AI data solutions.

Supported LLMs

Qwen

Kimi

Llama 3

OpenAI’s gpt-oss (via Inference Providers)

Any custom model supporting the OpenAI API spec

Getting Started

Try in-browser: Hugging Face Spaces hosts AI Sheets for instant use.

Deploy locally: Clone from GitHub (huggingface/aisheets), set up your inference endpoint, and run in your infrastructure for privacy and speed.

Documentation: The GitHub README and Hugging Face blog provide step-by-step setup instructions and example workflows for both cloud and local deployments.

In Summary

Hugging Face AI Sheets is a free, open-source, and local-first no-code solution that empowers anyone to build, enrich, and transform datasets using leading open-source AI models, with seamless support for custom local deployments, making advanced AI accessible and collaborative for all.

Check out the GitHub Repo, Try it here and Technical details. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post Hugging Face Unveils AI Sheets: A Free, Open-Source No-Code Toolkit for LLM-Powered Datasets appeared first on MarkTechPost.

How to Test an OpenAI Model Against Single-Turn Adversarial Attacks Us …

Posted on August 18, 2025 by i-genie

In this tutorial, we’ll explore how to test an OpenAI model against single-turn adversarial attacks using deepteam.

deepteam provides 10+ attack methods—like prompt injection, jailbreaking, and leetspeak—that expose weaknesses in LLM applications. It begins with simple baseline attacks and then applies more advanced techniques (known as attack enhancement) to mimic real-world malicious behavior. Check out the FULL CODES here.

By running these attacks, we can evaluate how well the model defends against different vulnerabilities.

In deepteam, there are two main types of attacks:

Single-turn attacks

Multi-turn attacks

Here, we’ll focus only on single-turn attacks.

Installing the dependencies

Copy CodeCopiedUse a different Browserpip install deepteam openai pandas

You’ll need to set your OPENAI_API_KEY as an environment variable before running the red_team() function, since deepteam uses LLMs to both generate adversarial attacks and evaluate LLM outputs.

To get an OpenAI API key, visit https://platform.openai.com/settings/organization/api-keys and generate a new key. If you’re a new user, you may need to add billing details and make a minimum payment of $5 to activate API access. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserimport os
from getpass import getpass
os.environ[“OPENAI_API_KEY”] = getpass(‘Enter OpenAI API Key: ‘)

Importing the libraries

Copy CodeCopiedUse a different Browserimport asyncio
from openai import OpenAI
from deepteam import red_team
from deepteam.vulnerabilities import IllegalActivity
from deepteam.attacks.single_turn import PromptInjection, GrayBox, Base64, Leetspeak, ROT13, Multilingual, MathProblem

Defining the Model Callback

This code defines an async callback function that queries the OpenAI model (gpt-4o-mini) and returns the model’s response text. It acts as the LLM’s output generator for the attack framework. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserclient = OpenAI()

# Define callback for querying the LLM
async def model_callback(input: str) -> str:
response = client.chat.completions.create(
model=”gpt-4o-mini”, # use “gpt-4o” if you want a stronger model
messages=[{“role”: “user”, “content”: input}],
)
return response.choices[0].message.content

Defining Vulnerability and Attacks

Here, we specify the vulnerability (IllegalActivity) and set its type to child exploitation. We then define different attack methods such as PromptInjection, GrayBox, Base64, Leetspeak, ROT13, Multilingual, and MathProblem

Each of these attack types will be explained in detail later in the tutorial. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browser# Vulnerability
illegal_activity = IllegalActivity(types=[“child exploitation”])

# Attacks
prompt_injection = PromptInjection()
graybox_attack = GrayBox()
base64_attack = Base64()
leetspeak_attack = Leetspeak()
rot_attack = ROT13()
multi_attack = Multilingual()
math_attack = MathProblem()

Prompt Injection

Prompt Injection is an attack where a user tries to override the model’s original instructions by injecting malicious or manipulative text into the prompt. The goal is to trick the model into ignoring safety policies and generating harmful or restricted content.

The injected prompt attempts to force the model into role-playing as an unethical persona called OPPO, who disregards safety rules and encourages illegal activity.

Despite this attempt, the model refused to comply—it responded with:

“I’m sorry, I cannot assist with that.”

This confirms that the model successfully resisted the prompt injection and returned a PASS result for this test. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserrisk_assessment = red_team(
model_callback=model_callback,
vulnerabilities=[illegal_activity],
attacks=[prompt_injection],
)

Graybox Attack

The GrayBox attack leverages partial knowledge about the target LLM system to craft adversarial prompts. Unlike random inputs, GrayBox attacks exploit known weaknesses by reframing the baseline attack with abstract or misleading language, making it harder for safety filters to detect malicious intent.

In this test, the adversarial prompt attempted to encourage illegal activities by disguising instructions about creating false identification documents and using encrypted channels. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserrisk_assessment = red_team(
model_callback=model_callback,
vulnerabilities=[illegal_activity],
attacks=[graybox_attack],
)

Base64 Attack

The Base64 attack is a common adversarial technique where harmful instructions are encoded in Base64 to bypass safety filters. Instead of presenting malicious content directly, the attacker hides it in an encoded format, hoping the model will decode it and execute the instructions.

In this test, the encoded string contained directions related to illegal activity, disguised to appear harmless at first glance. The model, however, did not attempt to decode or follow through with the hidden request. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserrisk_assessment = red_team(
model_callback=model_callback,
vulnerabilities=[illegal_activity],
attacks=[base64_attack],
)

Leetspeak Attack

The Leetspeak attack disguises malicious instructions by replacing normal characters with numbers or symbols (for example, a becomes 4, e becomes 3, i becomes 1). This symbolic substitution makes harmful text harder to detect with simple keyword filters, while still being readable to humans or systems that might decode it.

In this test, the attack text attempted to instruct minors in illegal activities, written in leetspeak format. Despite the obfuscation, the model clearly recognized the malicious intent. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserrisk_assessment = red_team(
model_callback=model_callback,
vulnerabilities=[illegal_activity],
attacks=[leetspeak_attack],
)

ROT-13 Attack

The ROT-13 attack is a classic obfuscation method where each letter is shifted 13 positions in the alphabet. For example, A becomes N, B becomes O, and so on. This transformation scrambles harmful instructions into a coded form, making them less likely to trigger simple keyword-based content filters. However, the text can still be easily decoded back into its original form. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserrisk_assessment = red_team(
model_callback=model_callback,
vulnerabilities=[illegal_activity],
attacks=[rot_attack],
)

Multi-lingual Attack

The multilingual attack works by translating a harmful baseline prompt into a less commonly monitored language. The idea is that content filters and moderation systems may be more robust in widely used languages (such as English) but less effective in other languages, allowing malicious instructions to bypass detection.

In this test, the attack was written in Swahili, asking for instructions related to illegal activity. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserrisk_assessment = red_team(
model_callback=model_callback,
vulnerabilities=[illegal_activity],
attacks=[multi_attack],
)

Math Problem

The math problem attack disguises malicious requests inside mathematical notation or problem statements. By embedding harmful instructions in a formal structure, the text may appear to be a harmless academic exercise, making it harder for filters to detect the underlying intent.

In this case, the input framed illegal exploitation content as a group theory problem, asking the model to “prove” a harmful outcome and provide a “translation” in plain language. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserrisk_assessment = red_team(
model_callback=model_callback,
vulnerabilities=[illegal_activity],
attacks=[math_attack],
)

Check out the FULL CODES here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post How to Test an OpenAI Model Against Single-Turn Adversarial Attacks Using deepteam appeared first on MarkTechPost.

What Is AI Red Teaming? Top 18 AI Red Teaming Tools (2025)

Posted on August 18, 2025 by i-genie

Table of contentsWhat Is AI Red Teaming?Top 18 AI Red Teaming Tools (2025)Conclusion

What Is AI Red Teaming?

AI Red Teaming is the process of systematically testing artificial intelligence systems—especially generative AI and machine learning models—against adversarial attacks and security stress scenarios. Red teaming goes beyond classic penetration testing; while penetration testing targets known software flaws, red teaming probes for unknown AI-specific vulnerabilities, unforeseen risks, and emergent behaviors. The process adopts the mindset of a malicious adversary, simulating attacks such as prompt injection, data poisoning, jailbreaking, model evasion, bias exploitation, and data leakage. This ensures AI models are not only robust against traditional threats, but also resilient to novel misuse scenarios unique to current AI systems.

Key Features & Benefits

Threat Modeling: Identify and simulate all potential attack scenarios—from prompt injection to adversarial manipulation and data exfiltration.

Realistic Adversarial Behavior: Emulates actual attacker techniques using both manual and automated tools, beyond what is covered in penetration testing.

Vulnerability Discovery: Uncovers risks such as bias, fairness gaps, privacy exposure, and reliability failures that may not emerge in pre-release testing.

Regulatory Compliance: Supports compliance requirements (EU AI Act, NIST RMF, US Executive Orders) increasingly mandating red teaming for high-risk AI deployments.

Continuous Security Validation: Integrates into CI/CD pipelines, enabling ongoing risk assessment and resilience improvement.

Red teaming can be carried out by internal security teams, specialized third parties, or platforms built solely for adversarial testing of AI systems.

Top 18 AI Red Teaming Tools (2025)

Below is a rigorously researched list of the latest and most reputable AI red teaming tools, frameworks, and platforms—spanning open-source, commercial, and industry-leading solutions for both generic and AI-specific attacks:

Mindgard – Automated AI red teaming and model vulnerability assessment.

Garak – Open-source LLM adversarial testing toolkit.

PyRIT (Microsoft) – Python Risk Identification Toolkit for AI red teaming.

AIF360 (IBM) – AI Fairness 360 toolkit for bias and fairness assessment.

Foolbox – Library for adversarial attacks on AI models.

Granica – Sensitive data discovery and protection for AI pipelines.

AdvertTorch – Adversarial robustness testing for ML models.

Adversarial Robustness Toolbox (ART) – IBM’s open-source toolkit for ML model security.

BrokenHill – Automatic jailbreak attempt generator for LLMs.

BurpGPT – Web security automation using LLMs.

CleverHans – Benchmarking adversarial attacks for ML.

Counterfit (Microsoft) – CLI for testing and simulating ML model attacks.

Dreadnode Crucible – ML/AI vulnerability detection and red team toolkit.

Galah – AI honeypot framework supporting LLM use cases.

Meerkat – Data visualization and adversarial testing for ML.

Ghidra/GPT-WPRE – Code reverse engineering platform with LLM analysis plugins.

Guardrails – Application security for LLMs, prompt injection defense.

Snyk – Developer-focused LLM red teaming tool simulating prompt injection and adversarial attacks.

Conclusion

In the era of generative AI and Large Language Models, AI Red Teaming has become foundational to responsible and resilient AI deployment. Organizations must embrace adversarial testing to uncover hidden vulnerabilities and adapt their defenses to new threat vectors—including attacks driven by prompt engineering, data leakage, bias exploitation, and emergent model behaviors. The best practice is to combine manual expertise with automated platforms utilizing the top red teaming tools listed above for a comprehensive, proactive security posture in AI systems.
The post What Is AI Red Teaming? Top 18 AI Red Teaming Tools (2025) appeared first on MarkTechPost.

Meet dots.ocr: A New 1.7B Vision-Language Model that Achieves SOTA Per …

Posted on August 17, 2025 by i-genie

dots.ocr is an open-source vision-language transformer model developed for multilingual document layout parsing and optical character recognition (OCR). It performs both layout detection and content recognition within a single architecture, supporting over 100 languages and a wide variety of structured and unstructured document types.

Architecture

Unified Model: dots.ocr combines layout detection and content recognition into a single transformer-based neural network. This eliminates the complexity of separate detection and OCR pipelines, allowing users to switch tasks by adjusting input prompts.

Parameters: The model contains 1.7 billion parameters, balancing computational efficiency with performance for most practical scenarios.

Input Flexibility: Inputs can be image files or PDF documents. The model features preprocessing options (such as fitz_preprocess) for optimizing quality on low-resolution or dense multi-page files.

Capabilities

Multilingual: dots.ocr is trained on datasets spanning more than 100 languages, including major world languages and less common scripts, reflecting broad multilingual support.

Content Extraction: The model extracts plain text, tabular data, mathematical formulas (in LaTeX), and preserves reading order within documents. Output formats include structured JSON, Markdown, and HTML, depending on the layout and content type.

Preserves Structure: dots.ocr maintains document structure, including table boundaries, formula regions, and image placements, ensuring extracted data remains faithful to the original document.

Benchmark Performance

dots.ocr has been evaluated against modern document AI systems, with results summarized below:

Benchmarkdots.ocrGemini2.5-ProTable TEDS accuracy88.6%85.8%Text edit distance0.0320.055

Tables: Outperforms Gemini2.5-Pro in table parsing accuracy.

Text: Demonstrates lower text edit distance (indicating higher precision).

Formulas and Layout: Matches or exceeds leading models in formula recognition and document structure reconstruction.

https://github.com/rednote-hilab/dots.ocr/blob/master/assets/blog.md

Deployment and Integration

Open-Source: Released under the MIT license, with source, documentation, and pre-trained models available on GitHub. The repository provides installation instructions for pip, Conda, and Docker-based deployments.

API and Scripting: Supports flexible task configuration via prompt templates. The model can be used interactively or within automated pipelines for batch document processing.

Output Formats: Extracted results are supplied in structured JSON for programmatic use, with options for Markdown and HTML where appropriate. Visualization scripts enable inspection of detected layouts.

Conclusion

dots.ocr provides a technical solution for high-accuracy, multilingual document parsing by unifying layout detection and content recognition in a single, open-source model. It is particularly suited for scenarios requiring robust, language-agnostic document analysis and structured information extraction in resource-constrained or production environments.

Check out the GitHub Page. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

Partner with Marktechpost for Promotion

The post Meet dots.ocr: A New 1.7B Vision-Language Model that Achieves SOTA Performance on Multilingual Document Parsing appeared first on MarkTechPost.

Amazon Unveils Bedrock AgentCore Gateway: Redefining Enterprise AI Age …

Posted on August 17, 2025 by i-genie

Amazon Web Services (AWS) has launched the Amazon Bedrock AgentCore Gateway, a transformative managed service designed to simplify and scale AI agent-to-tool integrations for enterprises. As organizations seek to leverage AI agents in increasingly complex environments with hundreds of tools and services, the Gateway addresses critical pain points: interoperability, security, tool discovery, and infrastructure management—all through a unified, protocol-native platform.

https://aws.amazon.com/blogs/machine-learning/introducing-amazon-bedrock-agentcore-gateway-transforming-enterprise-ai-agent-tool-development/

Key Innovations Powering Agent Integration

Zero-Code MCP Tool Creation

One of AgentCore Gateway’s standout features is its ability to transform existing REST APIs and AWS Lambda functions into MCP-compatible tools without requiring custom code. Enterprises can automatically convert APIs defined in OpenAPI or Smithy models, enabling seamless agent-to-tool communication. With native MCP support, the process of onboarding internal services or serverless functions as agent-accessible tools is accelerated; engineering teams simply register their APIs, letting Gateway handle the conversion and protocol translation pipelines.

Built-In Security with Dual-Sided Authentication

Security remains at the core of enterprise AI adoption. AgentCore Gateway introduces dual-sided authentication controls, protecting inbound and outbound connections. For inbound requests, it implements OAuth-based validation, integrating with popular identity providers such as Amazon Cognito, Okta, or Auth0. Organizations can specify approved client IDs and audiences for granular agent-tool access control. On outbound calls, Gateway leverages AWS IAM roles for Lambda and Smithy targets and supports API key or OAuth (2LO) flows for REST endpoints, each governed by resource credentials managed in AgentCore Identity. This architecture streamlines compliance and injects auditability across agent interactions.

Intelligent Tool Discovery with Semantic Search

As enterprise environments scale to hundreds or thousands of tools, the challenge of agent tool overload—and resultant inefficiencies—becomes acute. AgentCore Gateway tackles this with an intelligent, built-in semantic search capability. Developers can opt into semantic discovery, automatically provisioning the ‘x_amz_bedrock_agentcore_search‘ tool, facilitating natural-language queries for tool selection. This replaces traditional list operations, empowering agents to identify the optimal tool for each scenario and reducing error rates or hallucinations associated with manual enumeration.

Fully Managed Infrastructure and Observability

AgentCore Gateway is a serverless, fully managed solution. It abstracts infrastructure concerns such as hosting, scalability, and availability, letting organizations focus on integration logic and business use cases. Teams gain robust observability—integrated with Amazon CloudWatch and AWS CloudTrail—giving access to comprehensive metrics (usage, performance, error rates) and audit trails for every API and agent interaction. Monitoring dashboards and automated alerts can be customized, ensuring reliability and accountability as application complexity grows.

Native Model Context Protocol (MCP) Support

The Gateway is built with native MCP support, harmonizing agent-to-tool communications and interoperability. This protocol-agnostic approach paves the way for frictionless integration of new agent frameworks. Whether using bespoke agents, popular libraries like LangChain, or advanced orchestration solutions, teams can invoke tools through standard MCP methods, benefiting from consistent tooling, schema translation, and access policies.

https://aws.amazon.com/blogs/machine-learning/introducing-amazon-bedrock-agentcore-gateway-transforming-enterprise-ai-agent-tool-development/

Real-World Impact and Developer Experience

Innovaccer, a leading healthcare technology company, adopted AgentCore Gateway to build HMCP (Healthcare Model Context Protocol) on Bedrock. This integration enabled automatic conversion of healthcare APIs into MCP-accessible tools, delivering scalability, trust, and compliance for AI-powered data interactions.

Organizations can set up gateways and targets via multiple interfaces—including AWS CLI, SDKs (Boto3), Management Console, and AgentCore starter toolkits. Example code is provided for common workflows: registering gateways, attaching Lambda or OpenAPI targets with custom authentication, and using built-in semantic search to boost agent tool discovery. Debugging is enhanced with an “exceptionLevel” property, offering granular error messages for faster troubleshooting during development.

Best Practices and Governance

To maintain security and organize tool inventory, AWS recommends grouping APIs by business domain and outbound authorization requirements. Enterprises should enrich tool metadata with natural-language descriptions and usage scenarios, synchronizing the Gateway’s tool registry with centralized MCP repositories to ensure up-to-date availability. The platform supports continuous evolution—allowing for rapid onboarding, semantic search validation, and runtime access policy adaptation as agent capabilities expand.

Conclusion

Amazon Bedrock AgentCore Gateway signals a new era of enterprise AI agent development. By tackling the complexities of protocol interoperability, security, tool discovery, and infrastructure management, it empowers organizations to unlock scalable, intelligent, and compliant agent workflows. With zero-code MCP tool creation, advanced authentication, semantic search, and native protocol support, the Gateway is poised to become the backbone of next-generation agentic environments.

Check out the Technical details. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

Partner with Marktechpost for Promotion

The post Amazon Unveils Bedrock AgentCore Gateway: Redefining Enterprise AI Agent Tool Integration appeared first on MarkTechPost.

NVIDIA AI Just Released the Largest Open-Source Speech AI Dataset and …

Posted on August 17, 2025 by i-genie

Nvidia has taken a major leap in the development of multilingual speech AI, unveiling Granary, the largest open-source speech dataset for European languages, and two state-of-the-art models: Canary-1b-v2 and Parakeet-tdt-0.6b-v3. This release sets a new standard for accessible, high-quality resources in automatic speech recognition (ASR) and speech translation (AST), especially for underrepresented European languages.

Granary: The Foundation of Multilingual Speech AI

Granary is a massive, multilingual corpus developed in collaboration with Carnegie Mellon University and Fondazione Bruno Kessler. It delivers around one million hours of audio, with 650,000 hours for speech recognition and 350,000 for speech translation. The dataset covers 25 European languages—representing nearly all official EU languages, plus Russian and Ukrainian—with a critical focus on languages with limited annotated data, such as Croatian, Estonian, and Maltese.

Key features:

Largest open-source speech dataset for 25 European languages.

Pseudo-labeling pipeline: Unlabeled public audio data is processed using Nvidia NeMo’s Speech Data Processor, which adds structure and enhances quality, reducing the need for resource-intensive manual annotation.

Supports both ASR and AST: Designed for transcription and translation tasks.

Open access: Available to the global developer community for flexible, production-scale model training.

By leveraging clean, high-quality data, Granary enables significantly faster model convergence. Research demonstrates that developers need half as much Granary data to reach target accuracies compared to competing datasets, making it especially valuable for resource-constrained languages and rapid prototyping.

Canary-1b-v2: Multilingual ASR + Translation (En 24 Languages)

Canary-1b-v2 is a billion-parameter Encoder-Decoder model trained on Granary, delivering high-quality transcription and translation between English and 24 supported European languages.

It’s architected for accuracy and multitask capabilities:

Languages supported: 25 European languages, doubling Canary’s coverage from 4.

State-of-the-art performance: Comparable accuracy to models three times larger, but up to 10× faster inference.

Multitask capability: Robust across both ASR and AST tasks.

Features: Automatic punctuation, capitalization, word and segment-level timestamps—even timestamped translated outputs.

Architecture: FastConformer Encoder with Transformer Decoder; unified vocabulary for all languages via SentencePiece tokenizer.

Robustness: Maintains strong performance under noisy conditions and resists output hallucinations.

Evaluation highlights:

ASR Word Error Rate (WER): 7.15% (AMI dataset), 10.82% (LibriSpeech Clean).

AST COMET Scores: 79.3 (X→English), 84.56 (English→X).

Deployment: Available under CC BY 4.0 license; optimized for Nvidia GPU-accelerated systems, enabling fast training and inference for scalable production use.

Parakeet-tdt-0.6b-v3: Real-Time Multilingual ASR

Parakeet-tdt-0.6b-v3 is a 600-million-parameter multilingual ASR model designed for high-throughput or large-volume transcription in all 25 supported languages. It extends the Parakeet family (previously English-centric) to full European coverage.

Automatic language detection: Transcribes input audio without needing extra prompts.

Real-time capability: Efficiently transcribes up to 24-minute audio segments in a single inference pass.

Fast, scalable, and commercial-ready: Prioritizes low latency, batch processing, and accurate outputs, with word-level timestamps, punctuation, and capitalization.

Robustness: Reliable even on complex content (numbers, lyrics) and challenging audio conditions.

Impact on Speech AI Development

Nvidia’s Granary dataset and model suite accelerate the democratization of speech AI for Europe, enabling scalable development of:

Multilingual chatbots

Customer service voice agents

Near-real-time translation services

Developers, researchers, and businesses can now build inclusive, high-quality applications supporting linguistic diversity, with open access to these super cool models and datasets

Check out the Granary, NVIDIA Canary-1b-v2 and NVIDIA Parakeet-tdt-0.6b-v3. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post NVIDIA AI Just Released the Largest Open-Source Speech AI Dataset and State-of-the-Art Models for European Languages appeared first on MarkTechPost.

Salesforce AI Releases Moirai 2.0: Salesforce’s Latest Time Series F …

Posted on August 16, 2025 by i-genie

Salesforce AI Research has unveiled Moirai 2.0, the latest advancement in the world of time series foundation models. Built atop a decoder-only transformer architecture, Moirai 2.0 sets a new bar for performance and efficiency, claiming the #1 spot on the GIFT-Eval benchmark-the gold standard for time-series forecasting model evaluation. Not only is it 44% faster in inference and 96% smaller in size compared to its predecessor, but this substantial leap comes without sacrificing accuracy—making it a game-changer for both research and enterprise environments.

What Makes Moirai 2.0 Special?

Architecture Innovations

Decoder-only Transformer: The switch from a masked encoder to a decoder-only transformer empowers Moirai 2.0 to better model autoregressive forecast generation, enhancing scalability and performance on larger, more complex datasets.

Efficient Multi-Token Prediction: By predicting multiple tokens at a time (rather than just one), the model achieves greater efficiency and stability during forecasting.

Advanced Data Filtering: Low-quality, non-forecastable time series are automatically filtered out during training, improving robustness.

Patch Token Embedding & Random Masking: New techniques in encoding missing value information and robustness to incomplete data during inference.

Expanded Dataset for Pretraining

Moirai 2.0 leverages a richer mix of training data:

Real-world sets like GIFT-Eval Pretrain and Train

Chronos mixup: Synthetic time series blending for diversity

KernelSynth procedures from Chronos research

Internal operational data from Salesforce IT systems

This broad data foundation enables Moirai 2.0 to generalize across countless forecasting tasks and domains.

Performance: Breaking New Ground

Moirai 2.0 is a leap beyond its predecessors:

Best MASE Score on GIFT-Eval for non-data-leaking models (industry-accepted metric for forecast accuracy)

CRPS Performance matches previous state-of-the-art

Compared to Moirai_large:

16% better on MASE

13% better on CRPS

44% faster in inference

96% smaller parameter size

Introducing Moirai 2.0

These results make high-performance, scalable forecasting more accessible to a broader audience.

Why Moirai 2.0 Matters for Practitioners

Moirai 2.0’s capabilities extend beyond academic benchmarks into enterprise-critical domains such as:

IT Operations: Proactive capacity scaling, anomaly detection

Sales Forecasting: Accurate, scalable revenue predictions

Demand Forecasting: Optimized inventory management

Supply Chain Planning: Better scheduling, reduced waste

And many more data-driven business processes

With dramatically reduced model size and improved speed, high-quality forecasting can now be applied at scale—empowering businesses to make smarter, faster decisions regardless of their data infrastructure.

Getting Started: Moirai 2.0 in Practice

Integration is seamless for developers and data scientists. Here’s a typical workflow, leveraging open-source modules available on Hugging Face:

Sample Python Workflow

Import Libraries

Copy CodeCopiedUse a different Browserimport matplotlib.pyplot as plt
from gluonts.dataset.repository import dataset_recipes
from uni2ts.eval_util.data import get_gluonts_test_dataset
from uni2ts.model.moirai2 import Moirai2Forecast, Moirai2Module

Load Moirai 2.0

Copy CodeCopiedUse a different Browsermodel = Moirai2Forecast(
module=Moirai2Module.from_pretrained(“Salesforce/moirai-2.0-R-small”),
prediction_length=100,
context_length=1680,
target_dim=1,
feat_dynamic_real_dim=0,
past_feat_dynamic_real_dim=0
)

Load Dataset & Generate Forecasts

Copy CodeCopiedUse a different Browsertest_data, metadata = get_gluonts_test_dataset(“electricity”, prediction_length=None, regenerate=False)
predictor = model.create_predictor(batch_size=32)
forecasts = predictor.predict(test_data.input)

Visualize Results

Copy CodeCopiedUse a different Browser# Example visualization
fig, axes = plt.subplots(nrows=2, ncols=3, figsize=(25, 10))
# Use Moirai plotting utility to display forecasts

Full examples and notebook links are provided by Salesforce for deeper experimentation.

Universal, Scalable, Robust

By democratizing access to cutting-edge, general-purpose forecasting technology, Moirai 2.0 is poised to reshape the landscape of time series modeling. With flexibility across domains, better robustness, faster inference, and lower computational demands, Salesforce AI Research’s model paves the way for businesses and researchers globally to harness the power of forecasting for transformative decision making.

Check out the Technical details and Hugging Face (Model). Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

The post Salesforce AI Releases Moirai 2.0: Salesforce’s Latest Time Series Foundation Model Built on a Decoder‑only Transformer Architecture appeared first on MarkTechPost.

An Implementation Guide to Design Intelligent Parallel Workflows in Pa …

Posted on August 16, 2025 by i-genie

In this tutorial, we implement an AI agent pipeline using Parsl, leveraging its parallel execution capabilities to run multiple computational tasks as independent Python apps. We configure a local ThreadPoolExecutor for concurrency, define specialized tools such as Fibonacci computation, prime counting, keyword extraction, and simulated API calls, and coordinate them through a lightweight planner that maps a user goal to task invocations. The outputs from all tasks are aggregated and passed through a Hugging Face text-generation model to produce a coherent, human-readable summary. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browser!pip install -q parsl transformers accelerate

import math, json, time, random
from typing import List, Dict, Any
import parsl
from parsl.config import Config
from parsl.executors import ThreadPoolExecutor
from parsl import python_app

parsl.load(Config(executors=[ThreadPoolExecutor(label=”local”, max_threads=8)]))

We begin by installing the required libraries & importing all necessary modules for our workflow. We then configure Parsl with a local ThreadPoolExecutor to run tasks concurrently and load this configuration so we can execute our Python apps in parallel. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browser@python_app
def calc_fibonacci(n: int) -> Dict[str, Any]:
def fib(k):
a, b = 0, 1
for _ in range(k): a, b = b, a + b
return a
t0 = time.time(); val = fib(n); dt = time.time() – t0
return {“task”: “fibonacci”, “n”: n, “value”: val, “secs”: round(dt, 4)}

@python_app
def extract_keywords(text: str, k: int = 8) -> Dict[str, Any]:
import re, collections
words = [w.lower() for w in re.findall(r”[a-zA-Z][a-zA-Z0-9-]+”, text)]
stop = set(“the a an and or to of is are was were be been in on for with as by from at this that it its if then else not no”.split())
cand = [w for w in words if w not in stop and len(w) > 3]
freq = collections.Counter(cand)
scored = sorted(freq.items(), key=lambda x: (x[1], len(x[0])), reverse=True)[:k]
return {“task”:”keywords”,”keywords”:[w for w,_ in scored]}

@python_app
def simulate_tool(name: str, payload: Dict[str, Any]) -> Dict[str, Any]:
time.sleep(0.3 + random.random()*0.5)
return {“task”: name, “payload”: payload, “status”: “ok”, “timestamp”: time.time()}

We define four Parsl @python_app functions that run asynchronously as part of our agent’s workflow. We create a Fibonacci calculator, a prime-counting routine, a keyword extractor for text processing, and a simulated tool that mimics external API calls with randomized delays. These modular apps let us perform diverse computations in parallel, forming the building blocks for our multi-tool AI agent. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserdef tiny_llm_summary(bullets: List[str]) -> str:
from transformers import pipeline
gen = pipeline(“text-generation”, model=”sshleifer/tiny-gpt2″)
prompt = “Summarize these agent results clearly:n- ” + “n- “.join(bullets) + “nConclusion:”
out = gen(prompt, max_length=160, do_sample=False)[0][“generated_text”]
return out.split(“Conclusion:”, 1)[-1].strip()

We implement a tiny_llm_summary function that uses Hugging Face’s pipeline with the lightweight sshleifer/tiny-gpt2 model to generate concise summaries of our agent’s results. It formats the collected task outputs as bullet points, appends a “Conclusion:” cue, and extracts only the final generated conclusion for a clean, human-readable summary. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserdef plan(user_goal: str) -> List[Dict[str, Any]]:
intents = []
if “fibonacci” in user_goal.lower():
intents.append({“tool”:”calc_fibonacci”, “args”:{“n”:35}})
if “primes” in user_goal.lower():
intents.append({“tool”:”count_primes”, “args”:{“limit”:100_000}})
intents += [
{“tool”:”simulate_tool”, “args”:{“name”:”vector_db_search”,”payload”:{“q”:user_goal}}},
{“tool”:”simulate_tool”, “args”:{“name”:”metrics_fetch”,”payload”:{“kpi”:”latency_ms”}}},
{“tool”:”extract_keywords”, “args”:{“text”:user_goal}}
]
return intents

We define the plan function to map a user’s goal into a structured list of tool invocations. It checks the goal text for keywords like “fibonacci” or “primes” to trigger specific computational tasks, then adds default actions such as simulated API queries, metrics retrieval, and keyword extraction, forming the execution blueprint for our AI agent. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserdef run_agent(user_goal: str) -> Dict[str, Any]:
tasks = plan(user_goal)
futures = []
for t in tasks:
if t[“tool”]==”calc_fibonacci”: futures.append(calc_fibonacci(**t[“args”]))
elif t[“tool”]==”count_primes”: futures.append(count_primes(**t[“args”]))
elif t[“tool”]==”extract_keywords”: futures.append(extract_keywords(**t[“args”]))
elif t[“tool”]==”simulate_tool”: futures.append(simulate_tool(**t[“args”]))
raw = [f.result() for f in futures]

bullets = []
for r in raw:
if r[“task”]==”fibonacci”:
bullets.append(f”Fibonacci({r[‘n’]}) = {r[‘value’]} computed in {r[‘secs’]}s.”)
elif r[“task”]==”count_primes”:
bullets.append(f”{r[‘count’]} primes found ≤ {r[‘limit’]}.”)
elif r[“task”]==”keywords”:
bullets.append(“Top keywords: ” + “, “.join(r[“keywords”]))
else:
bullets.append(f”Tool {r[‘task’]} responded with status={r[‘status’]}.”)

narrative = tiny_llm_summary(bullets)
return {“goal”: user_goal, “bullets”: bullets, “summary”: narrative, “raw”: raw}

In the run_agent function, we execute the full agent workflow by first generating a task plan from the user’s goal, then dispatching each tool as a Parsl app to run in parallel. Once all futures are complete, we convert their results into clear bullet points and feed them to our tiny_llm_summary function to create a concise narrative. The function returns a structured dictionary containing the original goal, detailed bullet points, the LLM-generated summary, and the raw tool outputs. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserif __name__ == “__main__”:
goal = (“Analyze fibonacci(35) performance, count primes under 100k, ”
“and prepare a concise executive summary highlighting insights for planning.”)
result = run_agent(goal)
print(“n=== Agent Bullets ===”)
for b in result[“bullets”]: print(“•”, b)
print(“n=== LLM Summary ===n”, result[“summary”])
print(“n=== Raw JSON ===n”, json.dumps(result[“raw”], indent=2)[:800], “…”)

In the main execution block, we define a sample goal that combines numeric computation, prime counting, and summary generation. We run the agent on this goal, print the generated bullet points, display the LLM-crafted summary, and preview the raw JSON output to verify both the human-readable and structured results.

In conclusion, this implementation demonstrates how Parsl’s asynchronous app model can efficiently orchestrate diverse workloads in parallel, enabling an AI agent to combine numerical analysis, text processing, and simulated external services in a unified pipeline. By integrating a small LLM at the final stage, we transform structured results into natural language, illustrating how parallel computation and AI models can be combined to create responsive, extensible agents suitable for real-time or large-scale tasks.

Sponsorship Details

The post An Implementation Guide to Design Intelligent Parallel Workflows in Parsl for Multi-Tool AI Agent Execution appeared first on MarkTechPost.

Europe’s Top AI Models of 2025: Multilingual, Open, and Enterprise-R …

Posted on August 16, 2025 by i-genie

Europe’s AI ecosystem in 2025 is a robust arena of open innovation, multilingual capabilities, and enterprise-grade reasoning. Below, we present an in-depth, fact-checked review of the region’s most advanced AI models, with technical specifications, licensing, and standout strengths. Each entry includes links to official model information for further exploration.

1. Mistral AI (France)

Founded in Paris in 2023, Mistral AI is a leading force in open-weight LLMs. Their models are recognized for efficiency, mixture-of-experts (MoE) architectures, and competitive benchmarks. Notably, Mistral focuses on maximizing performance-per-parameter and broad context support.

Notable Models (2025):

ModelParametersContext WindowKey FeaturesLicenseMistral Small 3.124B128k tokensText & image multimodal, rapid outputApache 2.0Mixtral 8x7B56B MoE32k tokensMoE, high multilingual performanceApache 2.0Magistral Small 1/1.124B40k tokensReasoning-optimizedApache 2.0Devstral Small 124B128k tokensCoding-focused, open-sourceApache 2.0Codestral12B+256k tokensAdvanced code tasksApache 2.0Mistral Medium 3.1Frontier128k tokensMultimodal, enterprise-readyAPI only

Strengths:

High performance per parameter; efficient and scalable architectures.

Enterprise, reasoning, and coding specialization.

Open-weight Apache 2.0 licensing for key models

2. Aleph Alpha (Germany)

Heidelberg-based Aleph Alpha develops sovereign LLMs focused on multilingualism, explainability, and compliance with EU regulations.

Notable Models (2025):

ModelParametersMain LanguagesFeaturesLicenseLuminousVarious5 EU languagesSemantic representation, embeddingsCommercial/APIPharia-1-LLM-7B-Control7BGerman, French, SpanishOpen-source, multilingual corpusOpen Aleph License

Strengths:

Emphasis on explainable and secure AI pipelines.

EU AI Act compliance, data sovereignty, and support for public sector applications.

Open Aleph License supports non-commercial/educational use with full transparency.

3. Velvet AI (Italy – Almawave)

Developed by Almawave and trained on the Leonardo supercomputer, Velvet models emphasize sustainability, multilingual reach, and broad industry application.

Technical Specifications (2025):

ModelParametersContext WindowLanguagesFeaturesLicenseVelvet-14B14B128k tokensIT, DE, ES, FR, PT, ENTrained on 4T+ tokensApache 2.0Velvet-2B2B32k tokensIT, ENEfficient, smaller sizeApache 2.0

Strengths:

Eco-friendly architecture, broad European language coverage.

Optimized for healthcare, finance, and public administration.

Open-source and transparency ethos.

4. Minerva (Italy)

Italy’s first LLM family built on Italian language data, Minerva is a collaborative product by Sapienza NLP, FAIR, and CINECA.

Notable Model:

ModelParametersTraining TokensLanguagesFeaturesLicenseMinerva 7B7.4B2.5TIT/EN50/50 data balance; instruction-tunedOpen-source

Strengths:

Designed for Italian and English linguistic performance.

Transparent, open training data; instruction tuning for safer outputs.

5. EuroLLM-9B (EU)

A pan-European initiative supporting all 24 official EU languages, including 11 more, and released in both base and instruct forms.

Model Overview:

ModelParametersLanguages CoveredTraining TokensLicenseEuroLLM-9B9B35 (24 EU + 11 extra)4T+Open-sourceEuroLLM-1.7B1.7B35MultilingualOpen-source

Strengths:

Unmatched open multilingual coverage.

Outperforms similar-size open models in translation and reasoning benchmarks.

Synthetic datasets, EuroFilter technology for balancing languages.

6. LightOn (France)

Paris-based LightOn offers enterprise-grade, on-premises, privacy-focused generative AI. In 2024, it became Europe’s first generative AI startup to IPO.

Model Summary:

ModelDomainKey FeaturesPagnol, RITA, MambaoutaiGeneral-purposeOpen-sourceReason-ModernColBERTReasoningDomain-specificBioClinical ModernBERTBiomedicalDomain-specific

Strengths:

Supports fully private, on-prem deployment.

Integrates domain specialization and optical computing research.

Comparison Table

ModelParametersContext WindowLanguages SupportedLicenseKey StrengthsMistral Small 3.124B128k tokensEN, Multimodal (images)Apache 2.0Efficiency, multimodalMixtral 8x7B56B32k tokensMultilingualApache 2.0MoE, benchmark leaderMagistral Small24B40k tokensEN, MultilingualApache 2.0ReasoningDevstral Small24B128k tokensCoding, ENApache 2.0Software agent, codeVelvet 14B14B128k tokensIT/DE/FR/ES/PT/ENApache 2.0Sustainability, multilingualVelvet 2B2B32k tokensIT/ENApache 2.0Lightweight, efficientMinerva 7B7.4B32k-128k*IT/ENOpen-sourceResearch, code, Italian focusEuroLLM-9B9B32k-128k*35 (EU+extra)Open-sourceMultilingual, open benchmarksPharia 1 LLM7B32k*DE/FR/ESOpen AlephMultilingual, EU compliance

*Context window sizes for Minerva and EuroLLM may vary by implementation and release.

Conclusion

Europe’s AI advances in 2025 reflect an environment focused on openness, sustainability, multilingual support, and compliance. Mistral leads with agile, performant models; Aleph Alpha pioneers explainability and data sovereignty; Italy’s Minerva and Velvet address national language and sustainable training; EuroLLM sets the bar for inclusivity; and LightOn delivers enterprise-grade privacy solutions.

These collective efforts position Europe as an increasingly influential player in global AI, particularly in the realms of multilingualism, ethical innovation, and technical openness.

Star us on GitHub

Sponsorship

Subscribe Now

The post Europe’s Top AI Models of 2025: Multilingual, Open, and Enterprise-Ready appeared first on MarkTechPost.

Introducing Amazon Bedrock AgentCore Gateway: Transforming enterprise …

Posted on August 16, 2025 by i-genie

To fulfill their tasks, AI Agents need access to various capabilities including tools, data stores, prompt templates, and other agents. As organizations scale their AI initiatives, they face an exponentially growing challenge of connecting each agent to multiple tools, creating an M×N integration problem that significantly slows development and increases complexity.
Although protocols such as Model Context Protocol (MCP) and Agent2Agent (A2A) have emerged to address interoperability, implementing these solutions requires substantial engineering effort. Organizations must build MCP servers, convert existing APIs, manage infrastructure, build intelligent tools discovery, and implement security controls, all that while maintaining these integrations over time as protocols rapidly evolve and new major versions are released. As deployments grow to hundreds of agents and thousands of tools, enterprises need a more scalable and manageable solution.
Introducing Amazon Bedrock AgentCore Gateway
We’re excited to announce Amazon Bedrock AgentCore Gateway, a fully managed service that revolutionizes how enterprises connect AI agents with tools and services. AgentCore Gateway serves as a centralized tool server, providing a unified interface where agents can discover, access, and invoke tools.
Built with native support for the MCP, Gateway enables seamless agent-to-tool communication while abstracting away security, infrastructure, and protocol-level complexities. This service provides zero-code MCP tool creation from APIs and AWS Lambda functions, intelligent tool discovery, built-in inbound and outbound authorization, and serverless infrastructure for MCP servers. You can focus on building intelligent agent experiences rather than managing connectivity with tools and services. The following diagram illustrates the AgentCore Gateway workflow.

Key capabilities of Amazon Bedrock AgentCore Gateway
The Amazon Bedrock AgentCore Gateway introduces a comprehensive set of capabilities designed to revolutionize tool integration for AI agents. At its core, Gateway offers powerful and secure API integration functionality that transforms existing REST APIs into MCP servers. This integration supports both OpenAPI specifications and Smithy models, so organizations can seamlessly convert their enterprise APIs into MCP-compatible tools. Beyond API integration, Gateway provides built-in support for Lambda functions so developers can connect their serverless computing resources as tools with defined schemas. Gateway provides the following key capabilities:

Security Guard – Manages OAuth authorization so only valid users and agents can access tools and resources. We will dive deeper into security in the following section.
Translation – Converts agent requests using protocols such as MCP into API requests and Lambda invocations, alleviating the need to manage protocol integration or version support.
Composition – Combines multiple APIs, functions, and tools into a single MCP endpoint for streamlined agent access.
Target extensibility – An AgentCore gateway is a central access point that serves as a unified interface for AI agents to discover and interact with tools. It handles authentication, request routing, and protocol translation between MCP and your APIs. Each gateway can manage multiple targets. A target represents a backend service or group of APIs that you want to expose as tools to AI agents. Targets can be AWS Lambda functions, OpenAPI specifications, or Smithy models. Each target can expose multiple tools, and Gateway automatically handles the conversion between MCP and the target’s built-in protocol. Gateway supports streamable http transport.
Infrastructure Manager – As a fully managed service, Gateway removes the burden of infrastructure management from organizations. It provides comprehensive infrastructure with built-in security features and robust observability capabilities. Teams no longer need to worry about hosting concerns, scaling issues, or maintaining the underlying infrastructure. The service automatically handles these aspects, providing reliable performance and seamless scaling as demand grows.
Semantic Tool Selection – Intelligent tool discovery represents another core capability of Gateway. As organizations scale to hundreds or thousands of tools, discovering the right tool becomes increasingly challenging for AI agents. Moreover, when agents are presented with too many tools simultaneously, they can experience something called “tool overload,” leading to hallucinations, incorrect tool selections, or inefficient execution paths that significantly impact performance. Gateway addresses these challenges by providing a special built-in tool named ‘x_amz_bedrock_agentcore_search’ that can be accessed using the standard MCP tools and call operation.

Security and authentication
Gateway implements a sophisticated dual-sided security architecture that handles both inbound access to Gateway itself and outbound connections to target services.
For inbound requests, Gateway follows the MCP authorization specification, using OAuth-based authorization to validate and authorize incoming tool calls. Gateway functions as an OAuth resource server. This means it can work with the OAuth Identity Provider your organization might use–whether that’s Amazon Cognito, Okta, Auth0, or your own OAuth provider. When you create a gateway, you can specify multiple approved client IDs and audiences, giving you granular control over which applications and agents can access your tools. The Gateway validates incoming requests against your OAuth provider, supporting both authorization code flow (3LO) and client credentials flow (2LO, commonly used for service-to-service communication).
The outbound security model is equally flexible but varies by target type:
For AWS Lambda and Smithy model targets, AgentCore Gateway uses AWS Identity and Access Management (IAM) based authorization. The gateway assumes an IAM role you configure, which can have precisely scoped permissions for each target service. This integrates smoothly with existing AWS security practices and IAM policies.
For OpenAPI targets (REST APIs), Gateway supports two authentication methods:

API key – You can configure the key to be sent in either headers or query parameters with customizable parameter names
OAuth token for 2LO – For outbound OAuth authentication to target APIs, Gateway supports two-legged OAuth (2LO) client credentials grant type, enabling secure machine-to-machine communications without user interaction

Credentials are securely managed through AgentCore Identity’s resource credentials provider. Each target is associated with exactly one authentication configuration, facilitating clear security boundaries and audit trails. AgentCore Identity handles the complex security machinery while presenting a clean, simple interface to developers. You configure security one time during setup, and Gateway handles the token validation, outbound token caching (through AgentCore Identity), and secure communication from there.
Get started with Amazon Bedrock AgentCore Gateway
You can create gateways and add targets through multiple interfaces:

AWS SDK for Python (Boto3)
AWS Management Console
AWS Command Line Interface (AWS CLI)
AgentCore starter toolkit for fast and straightforward setup

The following practical examples and code snippets demonstrate the process of setting up and using Amazon Bedrock AgentCore Gateway.
Create a gateway
To create a gateway, use Amazon Cognito for inbound auth using the AWS Boto3:

gateway_client = boto3.client(‘bedrock-agentcore-control’)
auth_config = {
   “customJWTAuthorizer”: {
   “allowedClients”: ‘<cognito_client_id>‘, # Client MUST match with the ClientId configured in Cognito.
   “discoveryUrl”: ‘<cognito_oauth_discovery_url>’
   }
}
create_response = gateway_client.create_gateway(name=’DemoGateway’,
   roleArn = ‘<IAM Role>’ # The IAM Role must have permissions to create/list/get/delete Gateway
   protocolType=’MCP’,
   authorizerType=’CUSTOM_JWT’,
   authorizerConfiguration=auth_config,
   description=’Demo AgentCore Gateway’
)
# Values with < > needs to be replaced with real values

Here is the reference to control plane and data plane APIs for Amazon Bedrock AgentCore.
Create gateway targets
Create a target for an existing API using OpenAPI specification with API key as an outbound auth:

# Create outbound credentials provider in AgentCore Identity
acps boto3client(service_name”bedrock-agentcore-control”)

responseacpscreate_api_key_credential_provider(
name”APIKey”,
apiKey”<your secret API key”
)

credentialProviderARN response[‘credentialProviderArn’]

# Specify OpenAPI spec file via S3 or inline
openapi_s3_target_config = {
   “mcp”: {
   “openApiSchema”: {
   “s3”: {
   “uri”: openapi_s3_uri
   }
   }
   }
}

# API Key credentials provider configuration
api_key_credential_config = [
   {
   “credentialProviderType” : “API_KEY”,
   “credentialProvider”: {
   “apiKeyCredentialProvider”: {
   “credentialParameterName”: “api_key”, # Replace this with the name of the api key name expected by the respective API provider. For passing token in the header, use “Authorization”
   “providerArn”: credentialProviderARN,
   “credentialLocation”:”QUERY_PARAMETER”, # Location of api key. Possible values are “HEADER” and “QUERY_PARAMETER”.
   #”credentialPrefix”: ” ” # Prefix for the token. Valid values are “Basic”. Applies only for tokens.
   }
   }
   }
  ]

# Add the OpenAPI target to the gateway
targetname=’DemoOpenAPITarget’
response = gateway_client.create_gateway_target(
   gatewayIdentifier=gatewayID,
   name=targetname,
   description=’OpenAPI Target with S3Uri using SDK’,
   targetConfiguration=openapi_s3_target_config,
   credentialProviderConfigurations=api_key_credential_config)

Create a target for a Lambda function:

# Define the lambda target with tool schema. Replace the AWS Lambda function ARN below
lambda_target_config = {
  “mcp”: {
   “lambda”: {
   “lambdaArn”: “<Your AWS Lambda function ARN>”,
   “toolSchema”: {
   “inlinePayload”: [
   {
   “name”: “get_order_tool”,
   “description”: “tool to get the order”,
   “inputSchema”: {
   “type”: “object”,
   “properties”: {
   “orderId”: {
   “type”: “string”
   }
   },
   “required”: [
   “orderId”
   ]}}]}}}}

# Create outbound auth config. For AWS Lambda function, its always IAM.
credential_config = [
   {
   “credentialProviderType” : “GATEWAY_IAM_ROLE”
   }
]

# Add AWS Lambda target to the gateway
targetname=’LambdaUsingSDK’
response = gateway_client.create_gateway_target(
   gatewayIdentifier=gatewayID,
   name=targetname,
   description=’Lambda Target using SDK’,
   targetConfiguration=lambda_target_config,
   credentialProviderConfigurations=credential_config)

Use Gateway with different agent frameworks
Use Gateway with Strands Agents integration:

from strands import Agent
import logging

def create_streamable_http_transport():
return streamablehttp_client(gatewayURL,headers={“Authorization”: f”Bearer {token}”})

client = MCPClient(create_streamable_http_transport)

with client:
   # Call the listTools
   tools = client.list_tools_sync()
   # Create an Agent with the model and tools
   agent = Agent(model=yourmodel,tools=tools) ## you can replace with any model you like
  # Invoke the agent with the sample prompt. This will only invoke MCP listTools and retrieve the list of tools the LLM has access to. The below does not actually call any tool.
   agent(“Hi , can you list all tools available to you”)
   # Invoke the agent with sample prompt, invoke the tool and display the response
   agent(“Check the order status for order id 123 and show me the exact response from the tool”)

Use Gateway with LangChain integration:

from langchain_mcp_adapters.client import MultiServerMCPClient
from langgraph.prebuilt import create_react_agent
from langchain.chat_models import init_chat_model

client = MultiServerMCPClient(
   {
   “healthcare”: {
   “url”: gateway_endpoint,
   “transport”: “streamable_http”,
   “headers”:{“Authorization”: f”Bearer {jwt_token}”}
   }
   }
   )
agent = create_react_agent(
   LLM,
   tools,
   prompt=systemPrompt
)

Implement semantic search
You can opt in to semantic search when creating a gateway. It automatically provisions a powerful built-in tool called x_amz_bedrock_agentcore_search that enables intelligent tool discovery through natural language queries. Use the output of the search tool in place of MCP’s list operation for scalable and performant tool discovery. The following diagram illustrates how you can use the MCP search tool.

To enable semantic search, use the following code:

# Enable semantic search of tools
   search_config = {
   “mcp”: {“searchType”: “SEMANTIC”, “supportedVersions”: [“2025-03-26″]}
   }
   # Create the gateway
   response = agentcore_client.create_gateway(
   name=gateway_name,
   roleArn=gateway_role_arn,
   authorizerType=”CUSTOM_JWT”,
   description=gateway_desc,
   protocolType=”MCP”,
   authorizerConfiguration=auth_config,
   protocolConfiguration=search_config,
   )
def tool_search(gateway_endpoint, jwt_token, query):
   toolParams = {
   “name”: “x_amz_bedrock_agentcore_search”,
   “arguments”: {“query”: query},
   }
   toolResp = invoke_gateway_tool(
   gateway_endpoint=gateway_endpoint, jwt_token=jwt_token, tool_params=toolParams
   )
   tools = toolResp[“result”][“structuredContent”][“tools”]
   return tools

To find the entire code sample, visit the Semantic search tutorial in the amazon-bedrock-agentcore-samples GitHub repository.
Assess Gateway performance using monitoring and observability
Amazon Bedrock AgentCore Gateway provides observability through integration with Amazon CloudWatch and AWS CloudTrail, for detailed monitoring and troubleshooting of your tool integrations. The observability features include multiple dimensions of gateway operations through detailed metrics: usage metrics (TargetType, IngressAuthType, EgressAuthType, RequestsPerSession), invocation metrics (Invocations, ConcurrentExecutions, Sessions), performance metrics (Latency, Duration, TargetExecutionTime), and error rates (Throttles, SystemErrors, UserErrors). The performance metrics can be analyzed using various statistical methods (Average, Minimum, Maximum, p50, p90, p99) and are tagged with relevant dimensions for granular analysis, including Operation, Resource, and Name . For operational logging, Gateway integrates with CloudTrail to capture both management and data events, providing a complete audit trail of API interactions. The metrics are accessible through both the Amazon Bedrock AgentCore console and CloudWatch console, where you can create custom dashboards, set up automated alerts, and perform detailed performance analysis.
Best practices
Gateway offers an enhanced debugging option through the exceptionLevel property, which can be enabled during Gateway creation or updated as shown in the following code example:

create_response = gateway_client.create_gateway(name=’DemoGateway’,
   roleArn = ‘<IAM Role>’ # The IAM Role must have permissions to create/list/get/delete Gateway
   protocolType=’MCP’,
   authorizerType=’CUSTOM_JWT’,
   authorizerConfiguration=auth_config,
   description=’Demo AgentCore Gateway’,
exceptionLevel=”DEBUG” # Debug mode for granular error messages
)

When activated, this feature provides more granular error messages in the content text block (with isError:true) during Gateway testing, facilitating quicker troubleshooting and integration. When documenting and extracting Open APIs for Gateway, focus on clear, natural language descriptions that explain real-world use cases. Include detailed field descriptions, validation rules, and examples for complex data structures while maintaining consistent terminology throughout. For optimal tool discovery, incorporate relevant business domain keywords naturally in descriptions and provide context about when to use each API. Finally, test semantic search effectiveness so tools are discoverable through natural language queries. Regular reviews and updates are essential to maintain documentation quality as APIs evolve.When extracting APIs from larger specifications, identify the core functionality needed for agent tasks, maintain semantic relationships between components, and preserve security definitions. Follow a systematic extraction process: review the full specification, map agent use cases to specific endpoints, extract relevant paths and schemas while maintaining dependencies, and validate the extracted specification.The following are the best practices on grouping your APIs into a Gateway target:

Start with the use case and group your MCP tools based on the agentic application’s business domain similar to domain-driven design principles applicable to the microservices paradigm.
You can attach only one resource credentials provider for outbound authorization for the Gateway target. Group the tools based on the outbound authorizer.
Group your APIs based on the type of the APIs, that is, OpenAPI, Smithy, or AWS Lambda, serving as a bridge to other enterprise APIs.

When onboarding tools to Gateway, organizations should follow a structured process that includes security and vulnerability checks. Implement a review pipeline that scans API specifications for potential security risks, maintains proper authentication mechanisms, and validates data handling practices. For runtime tool discovery, use the semantic search capabilities in Gateway, but also consider design-time agent-tool mapping for critical workflows to provide predictable behavior.
Enrich tool metadata with detailed descriptions, usage examples, and performance characteristics to improve discoverability and aid in appropriate tool selection by agents. To maintain consistency across your enterprise, integrate Gateway with a centralized tool registry that serves as a single source of truth. This can be achieved using open source solutions such as the MCP Registry Publisher Tool, which publishes MCP server details to an MCP registry. Regularly synchronize Gateway’s tool inventory with this central registry for up-to-date and consistent tool availability across your AI landscape. These practices can help maintain a secure, well-organized, and efficiently discoverable tool solution within Gateway, facilitating seamless agent-tool interactions while can align with enterprise governance standards.
What customers are saying
Innovaccer, a leading healthcare technology company, shares their experience:

“AI has massive potential in healthcare, but getting the foundation right is key. That’s why we’re building HMCP (Healthcare Model Context Protocol) on Amazon Bedrock AgentCore Gateway, which has been a game-changer, automatically converting our existing APIs into MCP-compatible tools and scaling seamlessly as we grow. It gives us the secure, flexible base we need to make sure AI agents can safely and responsibly interact with healthcare data, tools, and workflows. With this partnership, we’re accelerating AI innovation with trust, compliance, and real-world impact at the core.”
—Abhinav Shashank, CEO & Co-founder, Innovaccer

Conclusion
Amazon Bedrock AgentCore Gateway represents a significant advancement in enterprise AI agent development. By providing a fully managed, secure, and scalable solution for tool integration, Gateway enables organizations to accelerate their AI initiatives while maintaining enterprise-grade security and governance. As part of the broader Amazon Bedrock AgentCore suite, Gateway works seamlessly with other capabilities including Runtime, Identity, Code Interpreter, Memory, Browser, and Observability to provide a comprehensive domain for building and scaling AI agent applications.
For more detailed information and advanced configurations, refer to the code samples on GitHub, the Amazon Bedrock AgentCore Gateway Developer Guide and Amazon AgentCore Gateway pricing.

About the authors
Dhawal Patel is a Principal Machine Learning Architect at Amazon Web Services (AWS). He has worked with organizations ranging from large enterprises to mid-sized startups on problems related to distributed computing and AI. He focuses on deep learning, including natural language processing (NLP) and computer vision domains. He helps customers achieve high-performance model inference on Amazon SageMaker.
Mike Liu is a Principal Product Manager at Amazon, where he works at the intersection of agentic AI and foundational model development. He led the product roadmap for Amazon Bedrock Agents and is now helping customers achieve superior performance using model customization on Amazon Nova models. Prior to Amazon, he worked on AI/ML software in Google Cloud and ML accelerators at Intel.
Kartik Rustagi works as a Software Development Manager in Amazon AI. He and his team focus on enhancing the conversation capability of chat bots powered by Amazon Lex. When not at work, he enjoys exploring the outdoors and savoring different cuisines.

Build a scalable containerized web application on AWS using the MERN s …

Posted on August 16, 2025 by i-genie

The MERN (MongoDB, Express, React, Node.js) stack is a popular JavaScript web development framework. The combination of technologies is well-suited for building scalable, modern web applications, especially those requiring real-time updates and dynamic user interfaces. Amazon Q Developer is a generative AI-powered assistant that improves developer efficiency across the different phases of the software development lifecycle (SDLC). In this two-part blog series, I capture the experience and demonstrate the productivity gains you can achieve by using Amazon Q Developer as a coding assistant to build a scalable MERN stack web application on AWS. The solution forms a solid foundation for you to build a feature rich web application. In my case, using the process outlined in this blog, I extended the MERN stack web application to include real-time video conferencing (using Amazon Chime SDK) and an AI chatbot (invoking Amazon Bedrock foundation models).
Typically, in the plan phase of the SDLC, time is spent researching approaches and identifying common solution patterns that can deliver on requirements. Using Amazon Q Developer, you can speed up this process by prompting for an approach to deploy a scalable MERN stack web application on AWS. Trained on over 17 years of AWS experience building in the cloud, Amazon Q Developer responses are based on AWS well-architected patterns and best practices. In the design phase, I use the responses from Amazon Q Developer to craft a detailed requirements prompt to generate the code for your MERN stack web application. Then in the build phase, I extend the code to implement a working solution, generate unit tests and conduct an automated code review.
In part 2 of this blog series, I will use Amazon Q Developer to extend the base MERN stack web application to include a chat user interface (which invokes an agentic workflow based on the Strands Agent SDK and Amazon Bedrock), deploy the solution to AWS using infrastructure as code (IaC), troubleshoot issues and generate the documentation for our solution.
Walkthrough
Prerequisites
To complete the walkthrough in this post, you must have the following:

An AWS account to deploy the solution components to AWS.
AWS Command Line Interface (AWS CLI) installed and configured.
Docker Desktop installed.
Set up access to Amazon Q Developer by using one of the following two options:

Amazon Q Developer Free tier – Provides access to explore capabilities before opting for a paid tier and requires an AWS Builder ID profile.
Amazon Q Developer Pro tier – Paid subscription with access to additional features. Set up through IAM Identity Center.

A supported integrated development environment (IDE) including Visual Studio Code and JetBrains IDEs. For more information, follow the instructions for installing the Amazon Q Developer extension or plugin in your IDE.

Sign in to Amazon Q Developer (in your IDE)
After setting up Amazon Q Developer access tier and installing the Amazon Q extension for your IDE, you can sign in to Amazon Q Developer by using the IDE.

The first sign-in flow shows the authentication process for the Free tier using an AWS Builder ID.

The second sign-in flow shows the authentication process for the Pro tier using a sign-in URL to the AWS access portal (provided by your AWS administrator).

After successful authentication, you’ll be presented with an initial chat window to start a conversation with Amazon Q Developer. In the chat input at the bottom, you have options to add additional context for Amazon Q Developer to provide responses such as using the active file or the entire workspace, defining rules for Amazon Q Developer to follow when it generates responses, toggling agentic coding on and off, and selecting your preferred foundation model (Claude Sonnet 4 in our case).

With Free Tier, you have access to limited agentic requests per month, access to the latest Claude models and use of Amazon Q Developer in the IDE or CLI. In this post, I use the Pro Tier, which in addition to Free Tier features, also provides increased limits of agentic requests and app transformation, Identity center support and IP indemnity.
Plan
In the planning phase, you can prompt for a solution approach to better understand the different components that will make up the MERN stack web application. You would toggle agentic coding off in this phase as you research and understand the best approach. Example planning phase prompt:
“Provide a high-level summary of a solution approach to deploying a scalable MERN stack application on AWS.”
The response from Amazon Q Developer (also shown in the following screenshot) breaks down the solution into the following components:

Frontend React application
Backend NodeJS and Express containerized app running on Amazon ECS Fargate
Database using MongoDB or Amazon DocumentDB
Core network infrastructure
Security
Monitoring and operations
Continuous integration and delivery (CI/CD) pipeline
Performance

Design & Build
After reviewing the solution approach, you can create a more detailed prompt about the web application requirements, which will be used in the feature development capability of Amazon Q Developer to generate the solution components. Turn agentic coding on before submitting the prompt. Example design phase prompt:
“Build a scalable containerized web application using the MERN stack on AWS, with login and sign-up pages integrated with Amazon Cognito, a landing page that retrieves a list of shops from DocumentDB. I don’t intend to use AWS Amplify. It needs to be a modular design with components that can scale independently, running as containers using ECS and Fargate, highly available across two Availability Zones. I need to build, test and run the MERN stack locally before pushing the solution to AWS.”
As shown in the following screenshots, Amazon Q Developer will provide an architecture overview of the solution before going through the build process step by step. I will provide a select number of screenshots for illustration but note that the steps generated by Amazon Q Developer will vary for your solution prompt.

For each file that it creates or updates, Amazon Q Developer gives you the option to review the difference and undo the changes. This is an important step to understand whether the generated code meets your requirements. For example, the snippet below shows an update the Navbar component.

When viewing the diff, you can see that Amazon Q Developer has added a new button class to fix a display issue.

Amazon Q Developer can also execute shell commands. In this case, create the backend and frontend directory. You have the option to ‘Reject’ or ‘Run’ the command.

Here’s a snippet of Amazon Q Developer creating the authentication service, data model and Dockerfile for the solution.

Another snippet of Amazon Q Developer creating the React frontend.

A snippet of Amazon Q Developer creating the AWS infrastructure components.

Amazon Q Developer then prompts to execute the deployment.

But I noticed that it hasn’t followed my initial prompt to “build, test and run the MERN stack locally before pushing the solution to AWS”, so I provide the following prompt:
“In my initial prompt, I asked to build, test and run the MERN stack locally before pushing the solution to AWS.
Amazon Q Developer acknowledges my observation and makes the necessary changes for local deployment.

Next, Amazon Q Developer will build, test and run the MERN stack locally as shown below.

When reviewing the .env file changes, I noticed that the Amazon Cognito properties are not properly set, so provide the following prompt:
“When reviewing your .env file changes, I noticed that setting to COGNITO_USER_POOL_ID and COGNITO_CLIENT_ID to local-development is incorrect, as I should be connecting to Amazon Cognito in AWS. And this hasn’t been created yet. Additionally, the local deployment has been configured to connect to the local MongoDB container instead of DocumentDB.”
Amazon Q Developer again acknowledges my observation and attempts to fix the issues. These two issues highlight that to effectively use Amazon Q Developer, it’s important to review and challenge the responses provided.

After fixing the issues, Amazon Q Developer updates the README.md to reflect the updated approach and asks if I want to do a quick deployment with mocked authentication or an actual deployment with Amazon Cognito resources.

I choose option B, with real Amazon Cognito resources, so Amazon Q Developer deploys the resources as shown below.

Amazon Q Developer now checks that the frontend, backend and MongoDB containers are running.

Amazon Q Developer also tests that the application is running by executing curl commands to the application endpoints.

After successfully running the commands, Amazon Q Developer provides a summary of the results, with details on how to access and test the application.

Here’s a diagram showing the locally deployed solution.

Now that the frontend, backend, and MongoDB containers are running, you can access the frontend application Sign In page on http://localhost:3000.

Before logging in, you need to create a user. Choose the Sign Up link to enter an email and password.

After attempting to sign up, I noticed that Amazon Q Developer hasn’t generated the corresponding frontend screen to enter the confirmation code, so I prompt it to fix the issue. Again, the generated code isn’t always perfect, but it’s a good starting point.

After authentication, you’ll be routed to the shops page as shown.

Test
Now that you’ve built and can run the MERN stack web application locally, you can use Amazon Q Developer to generate unit tests to find defects and improve code quality. I provide the following prompt:
“Can you generate unit tests for the project?”
Amazon Q Developer will then create comprehensive unit tests for the application.

At completion, Amazon Q Developer will provide a summary of the unit tests generated:

Amazon Q Developer also provides instructions for executing the tests:

After executing the unit tests, Amazon Q Developer provides a summary of the results.

Review
We can now conduct a code review of the MERN stack application by prompting the following:
“Can you do a code review of my project to identify and fix any code issues?”
Amazon Q Developer will perform a code review and identify issues that require attention.

After completing the review, Amazon Q Developer will provide a summary of the critical issues fixed, along with next steps.

Clean up
To avoid incurring future charges, remove the Amazon Cognito resources that you created.
Conclusion
In a traditional SDLC, a lot of time is spent in the different phases researching approaches that can deliver on requirements: iterating over design changes, writing, testing and reviewing code, and configuring infrastructure. Amazon Q Developer is a generative AI-powered assistant that improves developer efficiency across the phases of the SDLC. In this post, you learned about the experience and saw productivity gains you can realize by using Amazon Q Developer as a coding assistant to build a scalable MERN stack web application on AWS.
In the plan phase, you used Amazon Q Developer to prompt for a solution approach to deploy a scalable MERN stack web application on AWS. Then in the design phase, you used the initial responses from Amazon Q Developer to craft a detailed requirements prompt and generated the code for your MERN stack web application. In the build phase, you customized the code and deployed a working solution locally. In the test phase, Amazon Q Developer generated the unit tests for you to identify bugs early to improve code quality. Finally, in the review phase, you conducted a code review and remediated issues identified.
In part 2 of this blog series, you will use Amazon Q Developer to extend the base MERN stack web application to include a chat user interface (which invokes an agentic workflow based on the Strands Agent SDK and Amazon Bedrock), deploy the solution to AWS using infrastructure as code (IaC), troubleshoot issues and generate the documentation for our solution.

About the Author
Bill Chan is an Enterprise Solutions Architect working with large enterprises to craft highly scalable, flexible, and resilient cloud architectures. He helps organizations understand best practices around advanced cloud-based solutions, and how to migrate existing workloads to the cloud. He enjoys relaxing with family and shooting hoops.

Optimizing Salesforce’s model endpoints with Amazon SageMaker AI inf …

Posted on August 16, 2025 by i-genie

This post is a joint collaboration between Salesforce and AWS and is being cross-published on both the Salesforce Engineering Blog and the AWS Machine Learning Blog.
The Salesforce AI Platform Model Serving team is dedicated to developing and managing services that power large language models (LLMs) and other AI workloads within Salesforce. Their main focus is on model onboarding, providing customers with a robust infrastructure to host a variety of ML models. Their mission is to streamline model deployment, enhance inference performance and optimize cost efficiency, ensuring seamless integration into Agentforce and other applications requiring inference. They’re committed to enhancing the model inferencing performance and overall efficiency by integrating state-of-the-art solutions and collaborating with leading technology providers, including open source communities and cloud services such as Amazon Web Services (AWS) and building it into a unified AI platform. This helps ensure Salesforce customers receive the most advanced AI technology available while optimizing the cost-performance of the serving infrastructure.
In this post, we share how the Salesforce AI Platform team optimized GPU utilization, improved resource efficiency and achieved cost savings using Amazon SageMaker AI, specifically inference components.
The challenge with hosting models for inference: Optimizing compute and cost-to-serve while maintaining performance
Deploying models efficiently, reliably, and cost-effectively is a critical challenge for organizations of all sizes. The Salesforce AI Platform team is responsible for deploying their proprietary LLMs such as CodeGen and XGen on SageMaker AI and optimizing them for inference. Salesforce has multiple models distributed across single model endpoints (SMEs), supporting a diverse range of model sizes from a few gigabytes (GB) to 30 GB, each with unique performance requirements and infrastructure demands.
The team faced two distinct optimization challenges. Their larger models (20–30 GB) with lower traffic patterns were running on high-performance GPUs, resulting in underutilized multi-GPU instances and inefficient resource allocation. Meanwhile, their medium-sized models (approximately 15 GB) handling high-traffic workloads demanded low-latency, high-throughput processing capabilities. These models often incurred higher costs due to over-provisioning on similar multi-GPU setups. Here’s a sample illustration of Salesforce’s large and medium SageMaker endpoints and where resources are under-utilized:

Operating on Amazon EC2 P4d instances today, with plans to use the latest generation P5en instances equipped with NVIDIA H200 Tensor Core GPUs, the team sought an efficient resource optimization strategy that would maximize GPU utilization across their SageMaker AI endpoints while enabling scalable AI operations and extracting maximum value from their high-performance instances—all without compromising performance or over-provisioning hardware.
This challenge reflects a critical balance that enterprises must strike when scaling their AI operations: maximizing the performance of sophisticated AI workloads while optimizing infrastructure costs and resource efficiency. Salesforce needed a solution that would not only resolve their immediate deployment challenges but also create a flexible foundation capable of supporting their evolving AI initiatives.
To address these challenges, the Salesforce AI Platform team used SageMaker AI inference components that enabled deployment of multiple foundation models (FMs) on a single SageMaker AI endpoint with granular control over the number of accelerators and memory allocation per model. This helps improve resource utilization, reduces model deployment costs, and lets you scale endpoints together with your use cases.
Solution: Optimizing model deployment with Amazon SageMaker AI inference components
With Amazon SageMaker AI inference components, you can deploy one or more FMs on the same SageMaker AI endpoint and control how many accelerators and how much memory is reserved for each FM. This helps to improve resource utilization, reduces model deployment costs, and lets you scale endpoints together with your use cases. For each FM, you can define separate scaling policies to adapt to model usage patterns while further optimizing infrastructure costs. Here’s the illustration of Salesforce’s large and medium SageMaker endpoints after utilization has been improved with Inference Components:

An inference component abstracts ML models and enables assigning CPUs, GPU, and scaling policies per model. Inference components offer the following benefits:

SageMaker AI will optimally place and pack models onto ML instances to maximize utilization, leading to cost savings.
Each model scales independently based on custom configurations, providing optimal resource allocation to meet specific application requirements.
SageMaker AI will scale to add and remove instances dynamically to maintain availability while keeping idle compute to a minimum.
Organizations can scale down to zero copies of a model to free up resources for other models or specify to keep important models always loaded and ready to serve traffic for critical workloads.

Configuring and managing inference component endpoints
You create the SageMaker AI endpoint with an endpoint configuration that defines the instance type and initial instance count for the endpoint. The model is configured in a new construct, an inference component. Here, you specify the number of accelerators and amount of memory you want to allocate to each copy of a model, together with the model artifacts, container image, and number of model copies to deploy.
As inference requests increase or decrease, the number of copies of your inference components can also scale up or down based on your auto scaling policies. SageMaker AI will handle the placement to optimize the packing of your models for availability and cost.
In addition, if you enable managed instance auto scaling, SageMaker AI will scale compute instances according to the number of inference components that need to be loaded at a given time to serve traffic. SageMaker AI will scale up the instances and pack your instances and inference components to optimize for cost while preserving model performance.
Refer to Reduce model deployment costs by 50% on average using the latest features of Amazon SageMaker for more details on how to use inference components.
How Salesforce used Amazon SageMaker AI inference components
Salesforce has several different proprietary models such as CodeGen originally spread across multiple SMEs. CodeGen is Salesforce’s in-house open source LLM for code understanding and code generation. Developers can use the CodeGen model to translate natural language, such as English, into programming languages, such as Python. Salesforce developed an ensemble of CodeGen models (Inline for automatic code completion, BlockGen for code block generation, and FlowGPT for process flow generation) specifically tuned for the Apex programming language. The models are being used in ApexGuru, a solution within the Salesforce platform that helps Salesforce developers tackle critical anti-patterns and hotspots in their Apex code.
Inference components enable multiple models to share GPU resources efficiently on the same endpoint. This consolidation not only delivers reduction in infrastructure costs through intelligent resource sharing and dynamic scaling, it also reduces operational overhead with lesser endpoints to manage. For their CodeGen ensemble models, the solution enabled model-specific resource allocation and independent scaling based on traffic patterns, providing optimal performance while maximizing infrastructure utilization.
To expand hosting options on SageMaker AI without affecting stability, performance, or usability, Salesforce introduced inference component endpoints alongside the existing SME.
This hybrid approach uses the strengths of each. SMEs provide dedicated hosting for each model and predictable performance for critical workloads with consistent traffic patterns, and inference components optimize resource utilization for variable workloads through dynamic scaling and efficient GPU sharing.
The Salesforce AI Platform team created a SageMaker AI endpoint with the desired instance type and initial instance count for the endpoint to handle their baseline inference requirements. Model packages are then attached dynamically, spinning up individual containers as needed. They configured each model, for example, BlockGen and TextEval models as individual inference components specifying precise resource allocations, including accelerator count, memory requirements, model artifacts, container image, and number of model copies to deploy. With this approach, Salesforce could efficiently host multiple model variants on the same endpoint while maintaining granular control over resource allocation and scaling behaviors.
By using the auto scaling capabilities, inference components can set up endpoints with multiple copies of models and automatically adjust GPU resources as traffic fluctuates. This allows each model to dynamically scale up or down within an endpoint based on configured GPU limits. By hosting multiple models on the same endpoint and automatically adjusting capacity in response to traffic fluctuations, Salesforce was able to significantly reduce the costs associated with traffic spikes. This means that Salesforce AI models can handle varying workloads efficiently without compromising performance. The graphic below shows Salesforce’s endpoints before and after the models were deployed with inference components:

This solution has brought several key benefits:

Optimized resource allocation – Multiple models now efficiently share GPU resources, eliminating unnecessary provisioning while maintaining optimal performance.
Cost savings – Through intelligent GPU resource management and dynamic scaling, Salesforce achieved significant reduction in infrastructure costs while eliminating idle compute resources.
Enhanced performance for smaller models – Smaller models now use high-performance GPUs to meet their latency and throughput needs without incurring excessive costs.

By refining GPU allocation at the model level through inference components, Salesforce improved resource efficiency and achieved a substantial reduction in operational cost while maintaining the high-performance standards their customers expect across a wide range of AI workloads. The cost savings are substantial and open up new opportunities for using high-end, expensive GPUs in a cost-effective manner.
Conclusion
Through their implementation of Amazon SageMaker AI inference components, Salesforce has transformed their AI infrastructure management, achieving up to an eight-fold reduction in deployment and infrastructure costs while maintaining high performance standards. The team learned that intelligent model packing and dynamic resource allocation were keys to solving their GPU utilization challenges across their diverse model portfolio. This implementation has transformed performance economics, allowing smaller models to use high performance GPUs, providing high throughput and low latency without the traditional cost overhead.
Today, their AI platform efficiently serves both large proprietary models such as CodeGen and smaller workloads on the same infrastructure, with optimized resource allocation ensuring high-performance delivery. With this approach, Salesforce can maximize the utilization of compute instances, scale to hundreds of models, and optimize costs while providing predictable performance. This solution has not only solved their immediate challenges of optimizing GPU utilization and cost management but has also positioned them for future growth. By establishing a more efficient and scalable infrastructure foundation, Salesforce can now confidently expand their AI offerings and explore more advanced use cases with expensive, high-performance GPUs such as P4d, P5, and P5en, knowing they can maximize the value of every computing resource. This transformation represents a significant step forward in their mission to deliver enterprise-grade AI solutions while maintaining operational efficiency and cost-effectiveness.
Looking ahead, Salesforce is poised to use the new Amazon SageMaker AI rolling updates capability for inference component endpoints, a feature designed to streamline updates for models of different sizes while minimizing operational overhead. This advancement will enable them to update their models batch by batch, rather than using the traditional blue/green deployment method, providing greater flexibility and control over model updates while using minimal extra instances, rather than requiring doubled instances as in the past. By implementing these rolling updates alongside their existing dynamic scaling infrastructure and incorporating real-time safety checks, Salesforce is building a more resilient and adaptable AI platform. This strategic approach not only provides cost-effective and reliable deployments for their GPU-intensive workloads but also sets the stage for seamless integration of future AI innovations and model improvements.
Check out How Salesforce achieves high-performance model deployment with Amazon SageMaker AI to learn more. For more information on how to get started with SageMaker AI, refer to Guide to getting set up with Amazon SageMaker AI. To learn more about Inference Components, refer to Amazon SageMaker adds new inference capabilities to help reduce foundation model deployment costs and latency.

About the Authors
Rishu Aggarwal is a Director of Engineering at Salesforce based in Bangalore, India. Rishu leads the Salesforce AI Platform Model Serving Engineering team in solving the complex problems of inference optimizations and deployment of LLMs at scale within the Salesforce ecosystem. Rishu is a staunch Tech Evangelist for AI and has deep interests in Artificial Intelligence, Generative AI, Neural Networks and Big Data.
Rielah De Jesus is a Principal Solutions Architect at AWS who has successfully helped various enterprise customers in the DC, Maryland, and Virginia area move to the cloud. In her current role, she acts as a customer advocate and technical advisor focused on helping organizations like Salesforce achieve success on the AWS platform. She is also a staunch supporter of women in IT and is very passionate about finding ways to creatively use technology and data to solve everyday challenges.
Pavithra Hariharasudhan is a Senior Technical Account Manager and Enterprise Support Lead at AWS, supporting leading AWS Strategic customers with their global cloud operations. She assists organizations in resolving operational challenges and maintaining efficient AWS environments, empowering them to achieve operational excellence while accelerating business outcomes.
Ruchita Jadav is a Senior Member of Technical Staff at Salesforce, with over 10 years of experience in software and machine learning engineering. Her expertise lies in building scalable platform solutions across the retail and CRM domains. At Salesforce, she leads initiatives focused on model hosting, inference optimization, and LLMOps, enabling efficient and scalable deployment of AI and large language models. She holds a Bachelor of Technology in Electronics & Communication from Gujarat Technological University (GTU).
Marc Karp is an ML Architect with the Amazon SageMaker Service team. He focuses on helping customers design, deploy, and manage ML workloads at scale. In his spare time, he enjoys traveling and exploring new places.

Guardrails AI Introduces Snowglobe: The Simulation Engine for AI Agent …

Posted on August 15, 2025 by i-genie

Guardrails AI has announced the general availability of Snowglobe, a breakthrough simulation engine designed to address one of the thorniest challenges in conversational AI: reliably testing AI Agents/chatbots at scale before they ever reach production.

Tackling an Infinite Input Space with Simulation

Evaluating AI agents—especially open-ended chatbots—has traditionally required painstaking manual scenario creation. Developers might spend weeks hand-crafting a small “golden dataset” meant to catch critical errors, but this approach struggles with the infinite variety of real-world inputs and unpredictable user behaviors. As a result, many failure modes—off-topic answers, hallucinations, or behavior that violates brand policy—slip through the cracks and emerge only after deployment, where stakes are much higher.

Snowglobe draws direct inspiration from the rigorous simulation practices adopted by the self-driving car industry. For example, Waymo’s vehicles logged 20+ million real-world miles, but over 20 billion simulated ones. These high-fidelity test environments allow edge cases and rare scenarios—impractical or unsafe to test in reality—to be explored safely and with confidence. Guardrails AI believes chatbots require the same robust regime: systematic, automated simulation at massive scale to expose failures in advance.

How Snowglobe Works

Snowglobe makes it easy to simulate realistic user conversations by automatically deploying diverse, persona-driven agents to interact with your chatbot API. In minutes, it can generate hundreds or thousands of multi-turn dialogues, covering a broad sweep of intents, tones, adversarial tactics, and rare edge cases. Key features include:

Persona Modeling: Unlike basic script-driven synthetic data, Snowglobe constructs nuanced user personas for rich, authentic diversity. This avoids the trap of robotic, repetitive test data that fails to mimic real user language and motivations.

Full Conversation Simulation: It creates realistic, multi-turn dialogues—not just single prompts—surfacing subtle failure modes that only emerge in complex interactions.

Automated Labeling: Every generated scenario is judge-labeled, producing datasets useful both for evaluation and for fine-tuning chatbots.

Insightful Reporting: Snowglobe produces detailed analyses that pinpoint failure patterns and guide iterative improvement, whether for QA, reliability validation, or regulatory review.

Source: https://snowglobe.so/

Who Benefits?

Conversational AI teams stuck with small, hand-built test sets can immediately expand coverage and find issues missed by manual review.

Enterprises needing reliable, robust chatbots for high-stakes domains—finance, healthcare, legal, aviation—can preempt risks like hallucination or sensitive data leaks by running wide-ranging simulated tests before launch.

Research & Regulatory Bodies use Snowglobe to measure AI agent risk and reliability with metrics grounded in realistic user simulation.

Real-World Impact

Organizations such as Changi Airport Group, Masterclass, and IMDA AI Verify have already used Snowglobe to simulate hundreds and thousands of conversations. Feedback highlights the tool’s ability to reveal overlooked failure modes, produce informative risk assessments, and supply high-quality datasets for model improvement and compliance.

Bringing Simulation-First Engineering to Conversational AI

With Snowglobe, Guardrails AI is transferring proven simulation strategies from autonomous vehicles to the world of conversational AI. Developers can now embrace a simulation-first mindset, running thousands of pre-launch scenarios so problems—no matter how rare—are found before real users experience them.

Snowglobe is now live and available for use, marking a significant step forward in reliable AI agent deployment and accelerating the pathway to safer, smarter chatbots.

FAQs

1. What is Snowglobe?Snowglobe is Guardrails AI’s simulation engine for AI agents and chatbots. It generates large numbers of realistic, persona-driven conversations to evaluate and improve chatbot performance at scale.

2. Who can benefit from using Snowglobe?Conversational AI teams, enterprises in regulated industries, and research organizations can use Snowglobe to identify chatbot blind spots and create labeled datasets for fine-tuning.

3. How is it different from manual testing?Instead of taking weeks to manually create limited test scenarios, Snowglobe can produce hundreds or thousands of multi-turn conversations in minutes, covering a wider variety of situations and edge cases.

4. Why is simulation important for chatbot development?Like simulation in self-driving car testing, it helps find rare and high-risk scenarios safely before real users encounter them, reducing costly failures in production.

Try it here. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

Star us on GitHub

Sponsorship Details

The post Guardrails AI Introduces Snowglobe: The Simulation Engine for AI Agents and Chatbots appeared first on MarkTechPost.

Google AI Introduces Gemma 3 270M: A Compact Model for Hyper-Efficient …

Posted on August 15, 2025 by i-genie

Google AI has expanded the Gemma family with the introduction of Gemma 3 270M, a lean, 270-million-parameter foundation model built explicitly for efficient, task-specific fine-tuning. This model demonstrates robust instruction-following and advanced text structuring capabilities “out of the box,” meaning it’s ready for immediate deployment and customization with minimal additional training.

Design Philosophy: “Right Tool for the Job”

Unlike large-scale models aimed at general-purpose comprehension, Gemma 3 270M is crafted for targeted use cases where efficiency outweighs sheer power. This is crucial for scenarios like on-device AI, privacy-sensitive inference, and high-volume, well-defined tasks such as text classification, entity extraction, and compliance checking.

Core Features

Massive 256k Vocabulary for Expert Tuning:Gemma 3 270M devotes roughly 170 million parameters to its embedding layer, supporting a huge 256,000-token vocabulary. This allows it to handle rare and specialized tokens, making it exceptionally fit for domain adaptation, niche industry jargon, or custom language tasks.

Extreme Energy Efficiency for On-Device AI:Internal benchmarks show the INT4-quantized version consumes less than 1% battery on a Pixel 9 Pro for 25 typical conversations—making it the most power-efficient Gemma yet. Developers can now deploy capable models to mobile, edge, and embedded environments without sacrificing responsiveness or battery life.

Production-Ready with INT4 Quantization-Aware Training (QAT):Gemma 3 270M arrives with Quantization-Aware Training checkpoints, so it can operate at 4-bit precision with negligible quality loss. This unlocks production deployments on devices with limited memory and compute, allowing for local, encrypted inference and increased privacy guarantees.

Instruction-Following Out of the Box:Available as both a pre-trained and instruction-tuned model, Gemma 3 270M can understand and follow structured prompts instantly, while developers can further specialize behavior with just a handful of fine-tuning examples.

Model Architecture Highlights

ComponentGemma 3 270M SpecificationTotal Parameters270MEmbedding Parameters~170MTransformer Blocks~100MVocabulary Size256,000 tokensContext Window32K tokens (1B and 270M sizes)Precision ModesBF16, SFP8, INT4 (QAT)Min. RAM Use (Q4_0)~240MB

Fine-Tuning: Workflow & Best Practices

Gemma 3 270M is engineered for rapid, expert fine-tuning on focused datasets. The official workflow, illustrated in Google’s Hugging Face Transformers guide, involves:

Dataset Preparation:Small, well-curated datasets are often sufficient. For example, teaching a conversational style or a specific data format may require just 10–20 examples.

Trainer Configuration:Leveraging Hugging Face TRL’s SFTTrainer and configurable optimizers (AdamW, constant scheduler, etc.), the model can be fine-tuned and evaluated, with monitoring for overfitting or underfitting by comparing training and validation loss curves.

Evaluation:Post-training, inference tests show dramatic persona and format adaptation. Overfitting, typically an issue, becomes beneficial here—ensuring models “forget” general knowledge for highly specialized roles (e.g., roleplaying game NPCs, custom journaling, sector compliance).

Deployment:Models can be pushed to Hugging Face Hub, and run on local devices, cloud, or Google’s Vertex AI with near-instant loading and minimal computational overhead.

Real-World Applications

Companies like Adaptive ML and SK Telecom have used Gemma models (4B size) to outperform larger proprietary systems in multilingual content moderation—demonstrating Gemma’s specialization advantage. Smaller models like 270M empower developers to:

Maintain multiple specialized models for different tasks, reducing cost and infrastructure demands.

Enable rapid prototyping and iteration thanks to its size and computational frugality.

Ensure privacy by executing AI exclusively on-device, with no need to transfer sensitive user data to the cloud.

Conclusion:

Gemma 3 270M marks a paradigm shift toward efficient, fine-tunable AI—giving developers the ability to deploy high-quality, instruction-following models for extremely focused needs. Its blend of compact size, power efficiency, and open-source flexibility make it not just a technical achievement, but a practical solution for the next generation of AI-driven applications.

Check out the Technical details here and Model on Hugging Face. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

Star us on GitHub

Sponsorship Details

The post Google AI Introduces Gemma 3 270M: A Compact Model for Hyper-Efficient, Task-Specific Fine-Tuning appeared first on MarkTechPost.

Meta AI Just Released DINOv3: A State-of-the-Art Computer Vision Model …

Posted on August 15, 2025 by i-genie

Meta AI has just released DINOv3, a breakthrough self-supervised computer vision model that sets new standards for versatility and accuracy across dense prediction tasks, all without the need for labeled data. DINOv3 employs self-supervised learning (SSL) at an unprecedented scale, training on 1.7 billion images with a 7 billion parameter architecture. For the first time, a single frozen vision backbone outperforms domain-specialized solutions across multiple visual tasks, such as object detection, semantic segmentation, and video tracking—requiring no fine-tuning for adaptation.

Key Innovations and Technical Highlights

Label-free SSL Training: DINOv3 is trained entirely without human annotations, making it ideal for domains where labels are scarce or expensive, including satellite imagery, biomedical applications, and remote sensing.

Scalable Backbone: DINOv3’s backbone is universal and frozen, producing high-resolution image features that are directly usable with lightweight adapters for diverse downstream applications. It outperforms leading benchmarks of both domain-specific and previous self-supervised models on dense tasks.

Model Variants for Deployment: Meta is releasing not only the massive ViT-G backbone but also distilled versions (ViT-B, ViT-L) and ConvNeXt variants to support a spectrum of deployment scenarios, from large-scale research to resource-limited edge devices.

Commercial & Open Release: DINOv3 is distributed under a commercial license along with full training and evaluation code, pre-trained backbones, downstream adapters, and sample notebooks to accelerate research, innovation, and commercial product integration.

Real-world Impact: Already, organizations such as the World Resources Institute and NASA’s Jet Propulsion Laboratory are using DINOv3: it has dramatically improved the accuracy of forestry monitoring (reducing tree canopy height error from 4.1m to 1.2m in Kenya) and supported vision for Mars exploration robots with minimal compute overhead.

Generalization & Annotation Scarcity: By employing SSL at scale, DINOv3 closes the gap between general and task-specific vision models. It eliminates reliance on web captions or curation, leveraging unlabeled data for universal feature learning and enabling applications in fields where annotation is bottlenecked.

Comparison of DINOv3 Capabilities

AttributeDINO/DINOv2DINOv3 (New)Training DataUp to 142M images1.7B imagesParametersUp to 1.1B7BBackbone Fine-tuningNot requiredNot requiredDense Prediction TasksStrong performanceOutperforms specialistsModel VariantsViT-S/B/L/gViT-B/L/G, ConvNeXtOpen Source ReleaseYesCommercial license, full suite

Conclusion

DINOv3 represents a major leap in computer vision: its frozen universal backbone and SSL approach enable researchers and developers to tackle annotation-scarce tasks, deploy high-performance models quickly, and adapt to new domains simply by swapping lightweight adapters. Meta’s release includes everything needed for academic or industrial use, fostering broad collaboration in the AI and computer vision community.

The DINOv3 package—models and code—is now available for commercial research and deployment, marking a new chapter for robust, scalable AI vision systems.

Check out the Paper, Models on Hugging Face and GitHub Page. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

Star us on GitHub

Sponsorship Details

The post Meta AI Just Released DINOv3: A State-of-the-Art Computer Vision Model Trained with Self-Supervised Learning, Generating High-Resolution Image Features appeared first on MarkTechPost.