September 2025 - Page 8 of 9

Accelerating HPC and AI research in universities with Amazon SageMaker …

Posted on September 6, 2025 by i-genie

This post was written with Mohamed Hossam of Brightskies.
Research universities engaged in large-scale AI and high-performance computing (HPC) often face significant infrastructure challenges that impede innovation and delay research outcomes. Traditional on-premises HPC clusters come with long GPU procurement cycles, rigid scaling limits, and complex maintenance requirements. These obstacles restrict researchers’ ability to iterate quickly on AI workloads such as natural language processing (NLP), computer vision, and foundation model (FM) training. Amazon SageMaker HyperPod alleviates the undifferentiated heavy lifting involved in building AI models. It helps quickly scale model development tasks such as training, fine-tuning, or inference across a cluster of hundreds or thousands of AI accelerators (NVIDIA GPUs H100, A100, and others) integrated with preconfigured HPC tools and automated scaling.
In this post, we demonstrate how a research university implemented SageMaker HyperPod to accelerate AI research by using dynamic SLURM partitions, fine-grained GPU resource management, budget-aware compute cost tracking, and multi-login node load balancing—all integrated seamlessly into the SageMaker HyperPod environment.
Solution overview
Amazon SageMaker HyperPod is designed to support large-scale machine learning operations for researchers and ML scientists. The service is fully managed by AWS, removing operational overhead while maintaining enterprise-grade security and performance.
The following architecture diagram illustrates how to access SageMaker HyperPod to submit jobs. End users can use AWS Site-to-Site VPN, AWS Client VPN, or AWS Direct Connect to securely access the SageMaker HyperPod cluster. These connections terminate on the Network Load Balancer that efficiently distributes SSH traffic to login nodes, which are the primary entry points for job submission and cluster interaction. At the core of the architecture is SageMaker HyperPod compute, a controller node that orchestrates cluster operations, and multiple compute nodes arranged in a grid configuration. This setup supports efficient distributed training workloads with high-speed interconnects between nodes, all contained within a private subnet for enhanced security.
The storage infrastructure is built around two main components: Amazon FSx for Lustre provides high-performance file system capabilities, and Amazon S3 for dedicated storage for datasets and checkpoints. This dual-storage approach provides both fast data access for training workloads and secure persistence of valuable training artifacts.

The implementation consisted of several stages. In the following steps, we demonstrate how to deploy and configure the solution.
Prerequisites
Before deploying Amazon SageMaker HyperPod, make sure the following prerequisites are in place:

AWS configuration:

The AWS Command Line Interface (AWS CLI) configured with appropriate permissions
Cluster configuration files prepared: cluster-config.json and provisioning-parameters.json

Network setup:

A virtual private cloud (VPC) configured for cluster resources.
Security groups with Elastic Fabric Adapter (EFA) communication enabled.
An Amazon FSx for Lustre file system for shared, high-performance storage

An AWS Identity and Management (IAM) role with permissions for the following:

Amazon Elastic Compute Cloud (Amazon EC2) instance and Amazon SageMaker cluster management
FSx for Lustre and Amazon Simple Storage Service (Amazon S3) access
Amazon CloudWatch Logs and AWS Systems Manager integration
EFA network configuration

Launch the CloudFormation stack
We launched an AWS CloudFormation stack to provision the necessary infrastructure components, including a VPC and subnet, FSx for Lustre file system, S3 bucket for lifecycle scripts and training data, and IAM roles with scoped permissions for cluster operation. Refer to the Amazon SageMaker HyperPod workshop for CloudFormation templates and automation scripts.
Customize SLURM cluster configuration
To align compute resources with departmental research needs, we created SLURM partitions to reflect the organizational structure, for example NLP, computer vision, and deep learning teams. We used the SLURM partition configuration to define slurm.conf with custom partitions. SLURM accounting was enabled by configuring slurmdbd and linking usage to departmental accounts and supervisors.
To support fractional GPU sharing and efficient utilization, we enabled Generic Resource (GRES) configuration. With GPU stripping, multiple users can access GPUs on the same node without contention. The GRES setup followed the guidelines from the Amazon SageMaker HyperPod workshop.
Provision and validate the cluster
We validated the cluster-config.json and provisioning-parameters.json files using the AWS CLI and a SageMaker HyperPod validation script:

$curl -O https://raw.githubusercontent.com/aws-samples/awsome-distributed-training/main/1.architectures/5.sagemaker-hyperpod/validate-config.py

$pip3 install boto3

$python3 validate-config.py –cluster-config cluster-config.json –provisioning-parameters provisioning-parameters.json

Then we created the cluster:

$aws sagemaker create-cluster
–cli-input-json file://cluster-config.json
–region us-west-2

Implement cost tracking and budget enforcement
To monitor usage and control costs, each SageMaker HyperPod resource (for example, Amazon EC2, FSx for Lustre, and others) was tagged with a unique ClusterName tag. AWS Budgets and AWS Cost Explorer reports were configured to track monthly spending per cluster. Additionally, alerts were set up to notify researchers if they approached their quota or budget thresholds.
This integration helped facilitate efficient utilization and predictable research spending.
Enable load balancing for login nodes
As the number of concurrent users increased, the university adopted a multi-login node architecture. Two login nodes were deployed in EC2 Auto Scaling groups. A Network Load Balancer was configured with target groups to route SSH and Systems Manager traffic. Lastly, AWS Lambda functions enforced session limits per user using Run-As tags with Session Manager, a capability of Systems Manager.
For details about the full implementation, see Implementing login node load balancing in SageMaker HyperPod for enhanced multi-user experience.
Configure federated access and user mapping
To facilitate secure and seamless access for researchers, the institution integrated AWS IAM Identity Center with their on-premises Active Directory (AD) using AWS Directory Service. This allowed for unified control and administration of user identities and access privileges across SageMaker HyperPod accounts. The implementation consisted of the following key components:

Federated user integration – We mapped AD users to POSIX user names using Session Manager run-as tags, allowing fine-grained control over compute node access
Secure session management – We configured Systems Manager to make sure users access compute nodes using their own accounts, not the default ssm-user
Identity-based tagging – Federated user names were automatically mapped to user directories, workloads, and budgets through resource tags

For full step-by-step guidance, refer to the Amazon SageMaker HyperPod workshop.
This approach streamlined user provisioning and access control while maintaining strong alignment with institutional policies and compliance requirements.
Post-deployment optimizations
To help prevent unnecessary consumption of compute resources by idle sessions, the university configured SLURM with Pluggable Authentication Modules (PAM). This setup enforces automatic logout for users after their SLURM jobs are complete or canceled, supporting prompt availability of compute nodes for queued jobs.
The configuration improved job scheduling throughput by freeing idle nodes immediately and reduced administrative overhead in managing inactive sessions.
Additionally, QoS policies were configured to control resource consumption, limit job durations, and enforce fair GPU access across users and departments. For example:

MaxTRESPerUser – Makes sure GPU or CPU usage per user stays within defined limits
MaxWallDurationPerJob – Helps prevent excessively long jobs from monopolizing nodes
Priority weights – Aligns priority scheduling based on research group or project

These enhancements facilitated an optimized, balanced HPC environment that aligns with the shared infrastructure model of academic research institutions.
Clean up
To delete the resources and avoid incurring ongoing charges, complete the following steps:

Delete the SageMaker HyperPod cluster:

$aws sagemaker delete-cluster –cluster-name <name>

Delete the CloudFormation stack used for the SageMaker HyperPod infrastructure:

$aws cloudformation delete-stack –stack-name <stack-name> –region <region>

This will automatically remove associated resources, such as the VPC and subnets, FSx for Lustre file system, S3 bucket, and IAM roles. If you created these resources outside of CloudFormation, you must delete them manually.
Conclusion
SageMaker HyperPod provides research universities with a powerful, fully managed HPC solution tailored for the unique demands of AI workloads. By automating infrastructure provisioning, scaling, and resource optimization, institutions can accelerate innovation while maintaining budget control and operational efficiency. Through customized SLURM configurations, GPU sharing using GRES, federated access, and robust login node balancing, this solution highlights the potential of SageMaker HyperPod to transform research computing, so researchers can focus on science, not infrastructure.
For more details on making the most of SageMaker HyperPod, check out the SageMaker HyperPod workshop and explore further blog posts about SageMaker HyperPod.

About the authors
Tasneem Fathima is Senior Solutions Architect at AWS. She supports Higher Education and Research customers in the United Arab Emirates to adopt cloud technologies, improve their time to science, and innovate on AWS.
Mohamed Hossam is a Senior HPC Cloud Solutions Architect at Brightskies, specializing in high-performance computing (HPC) and AI infrastructure on AWS. He supports universities and research institutions across the Gulf and Middle East in harnessing GPU clusters, accelerating AI adoption, and migrating HPC/AI/ML workloads to the AWS Cloud. In his free time, Mohamed enjoys playing video games.

Exploring the Real-Time Race Track with Amazon Nova

Posted on September 6, 2025 by i-genie

This post is co-written by Jake Friedman, President + Co-founder of Wildlife.
Amazon Nova is enhancing sports fan engagement through an immersive Formula 1 (F1)-inspired experience that turns traditional spectators into active participants. This post explores the Real-Time Race Track (RTRT), an interactive experience built using Amazon Nova in Amazon Bedrock, that lets fans design, customize, and share their own racing circuits. We highlight how generative AI capabilities come together to deliver strategic racing insights such as pit timing and tire choices, and interactive features like an AI voice assistant and a retro-style racing poster.
Evolving fan expectations and the technical barriers to real-time, multimodal engagement
Today’s sports audiences expect more than passive viewing—they want to participate, customize, and share. As fan expectations evolve, delivering engaging and interactive experiences has become essential to keeping audiences invested. Static digital content no longer holds attention; fans are drawn to immersive formats that make it possible to influence or co-create aspects of the event. For brands and rights holders, this shift presents both an opportunity and a challenge: how to deliver dynamic, meaningful engagement at scale. Delivering this level of interactivity comes with a unique set of technical challenges. It requires support for multiple modalities—text, speech, image, and data—working together in real time to create a seamless and immersive experience. Because fan-facing experiences are often offered for free, cost-efficiency becomes critical to sustain engagement at scale. And with users expecting instant responses, maintaining low-latency performance across interactions is essential to avoid disrupting the experience.
Creating immersive fan engagement with the RTRT using Amazon Nova
To foster an engaging and immersive experience, we developed the Real-Time Race Track, allowing F1 fans to design their own custom racing circuit using Amazon Nova. You can draw your track in different lengths and shapes while receiving real-time AI recommendations to modify your racing conditions. You can choose any location around the world for your race track and Amazon Nova Pro will use it to generate your track’s name and simulate realistic track conditions using that region’s weather and climate data. When your track is complete, Amazon Nova Pro analyzes the track to produce metrics like top speed and projected lap time, and offers two viable race strategies focused on tire management. You can also consult with Amazon Nova Sonic, a speech-to-speech model, for strategic track design recommendations. The experience culminates with Amazon Nova Canvas generating a retro-inspired racing poster of your custom track design that you can share or download. The following screenshots show some examples of the RTRT interface.

Amazon Nova models are cost-effective and deliver among the best price-performance in their respective class, helping enterprises create scalable fan experiences while managing costs effectively. With fast speech processing and high efficiency, Amazon Nova provides seamless, real-time, multimodal interactions that meet the demands of interactive fan engagement. Additionally, Amazon Nova comes with built-in controls to maintain the safe and responsible use of AI. Combining comprehensive capabilities, cost-effectiveness, low latency, and trusted reliability, Amazon Nova is the ideal solution for applications requiring real-time, dynamic engagement.
Prompts, inputs, and system design behind the RTRT experience
The RTRT uses the multimodal capabilities of Amazon Nova Pro to effectively lead users from a single line path drawing to a fully viable race track design, including strategic racing recommendations and a bold visual representation of their circuit in the style of a retro racing poster.
The following diagram gives an overview of the system architecture.

Prompt engineering plays a crucial role in delivering structured output that can flow seamlessly into the UI, which has been optimized for at-a-glance takeaways that use Amazon Nova Pro to quickly analyze multiple data inputs to accelerate users’ decision making. In the RTRT, this extends to the input images provided to Amazon Nova Pro for vision analysis. Each time the user adds new segments to their racing circuit, a version of the path is relayed to Amazon Nova Pro with visible coordinate markers that produce accurate path analysis (see the following screenshot) and corresponding output data, which can be visually represented back to users with color-coded track sectors.

This is paired with multiple system prompts to define the role of Amazon Nova Pro at each stage of the app, as well as to return responses that are ready to be consumed by the front end.
The following is a prompt example:

The system is designed to analyze the input image of a completed racetrack path outline.
You must always return valid JSON.

The prompts also use sets of examples to produce consistent results across a diverse range of possible track designs and locations:

Using the input data craft a track title for a fictional Formula 1 track.
Use the names of existing tracks from <example/> as a framework of how to format the
title.
The title must not infringe on any existing track names or copyrighted material.
The title should take into account the location of the track when choosing what
language certain components of the track title are in.

This is also a key stage in which to employ responsible use of AI, instructing the model not to generate content that might infringe on existing race tracks or other copyrighted material.

These considerations are essential when working with creative models like Amazon Nova Canvas. Race cars commonly feature liveries that contain a dozen or more sponsor logos. To avoid concern, and to provide the cleanest, most aesthetically appealing retro racing poster designs, Amazon Nova Canvas was given a range of conditioning images that facilitate vehicle accuracy and consistency. The images work in tandem with our prompt for a bold illustration style featuring cinematic angles.
The following is a prompt example:

Use a bold vector-style illustration approach with flat color fills, bold outlines,
stylized gradients. Maintain a vintage racing poster aesthetic with minimal texture.
Position the viewer to emphasize motion and speed.

The following images show the output.

Conclusion
The Real-Time Race Track showcases how generative AI can deliver personalized, interactive moments that resonate with modern sports audiences. Amazon Nova models power each layer of the experience, from speech and image generation to strategy and analysis, delivering rich, low-latency interactions at scale. This collaboration highlights how brands can use Amazon Nova to build tailored and engaging experiences.

About the authors
Raechel Frick is a Sr. Product Marketing Manager at AWS. With over 20 years of experience in the tech industry, she brings a customer-first approach and growth mindset to building integrated marketing programs.
Anuj Jauhari is a Sr. Product Marketing Manager at AWS, enabling customers to innovate and achieve business impact with generative AI solutions built on Amazon Nova models.
Jake Friedman is the President and Co-founder at Wildlife, where he leads a team launching interactive experiences and content campaigns for global brands. His work has been recognized with the Titanium Grand Prix at the Cannes Lions International Festival of Creativity for “boundary-busting, envy-inspiring work that marks a new direction for the industry and moves it forward.”

Google AI Releases EmbeddingGemma: A 308M Parameter On-Device Embeddin …

Posted on September 5, 2025 by i-genie

EmbeddingGemma is Google’s new open text embedding model optimized for on-device AI, designed to balance efficiency with state-of-the-art retrieval performance.

How compact is EmbeddingGemma compared to other models?

At just 308 million parameters, EmbeddingGemma is lightweight enough to run on mobile devices and offline environments. Despite its size, it performs competitively with much larger embedding models. Inference latency is low (sub-15 ms for 256 tokens on EdgeTPU), making it suitable for real-time applications.

How well does it perform on multilingual benchmarks?

EmbeddingGemma was trained across 100+ languages and achieved the highest ranking on the Massive Text Embedding Benchmark (MTEB) among models under 500M parameters. Its performance rivals or exceeds embedding models nearly twice its size, particularly in cross-lingual retrieval and semantic search.

https://developers.googleblog.com/en/introducing-embeddinggemma/

What is the underlying architecture?

EmbeddingGemma is built on a Gemma 3–based encoder backbone with mean pooling. Importantly, the architecture does not use the multimodal-specific bidirectional attention layers that Gemma 3 applies for image inputs. Instead, EmbeddingGemma employs a standard transformer encoder stack with full-sequence self-attention, which is typical for text embedding models.

This encoder produces 768-dimensional embeddings and supports sequences up to 2,048 tokens, making it well-suited for retrieval-augmented generation (RAG) and long-document search. The mean pooling step ensures fixed-length vector representations regardless of input size.

https://developers.googleblog.com/en/introducing-embeddinggemma/

What makes its embeddings flexible?

EmbeddingGemma employs Matryoshka Representation Learning (MRL). This allows embeddings to be truncated from 768 dimensions down to 512, 256, or even 128 dimensions with minimal loss of quality. Developers can tune the trade-off between storage efficiency and retrieval precision without retraining.

Can it run entirely offline?

Yes. EmbeddingGemma was specifically designed for on-device, offline-first use cases. Since it shares a tokenizer with Gemma 3n, the same embeddings can directly power compact retrieval pipelines for local RAG systems, with privacy benefits from avoiding cloud inference.

What tools and frameworks support EmbeddingGemma?

It integrates seamlessly with:

Hugging Face (transformers, Sentence-Transformers, transformers.js)

LangChain and LlamaIndex for RAG pipelines

Weaviate and other vector databases

ONNX Runtime for optimized deployment across platformsThis ecosystem ensures developers can slot it directly into existing workflows.

How can it be implemented in practice?

(1) Load and Embed

from sentence_transformers import SentenceTransformer
model = SentenceTransformer(“google/embeddinggemma-300m”)
emb = model.encode([“example text to embed”])

(2) Adjust Embedding SizeUse full 768 dims for maximum accuracy or truncate to 512/256/128 dims for lower memory or faster retrieval.

(3) Integrate into RAGRun similarity search locally (cosine similarity) and feed top results into Gemma 3n for generation. This enables a fully offline RAG pipeline.

Why EmbeddingGemma?

Efficiency at scale – High multilingual retrieval accuracy in a compact footprint.

Flexibility – Adjustable embedding dimensions via MRL.

Privacy – End-to-end offline pipelines without external dependencies.

Accessibility – Open weights, permissive licensing, and strong ecosystem support.

EmbeddingGemma proves that smaller embedding models can achieve best-in-class retrieval performance while being light enough for offline deployment. It marks an important step toward efficient, privacy-conscious, and scalable on-device AI.

Check out the Model and Technical details. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post Google AI Releases EmbeddingGemma: A 308M Parameter On-Device Embedding Model with State-of-the-Art MTEB Results appeared first on MarkTechPost.

Google DeepMind Finds a Fundamental Bug in RAG: Embedding Limits Break …

Posted on September 5, 2025 by i-genie

Retrieval-Augmented Generation (RAG) systems generally rely on dense embedding models that map queries and documents into fixed-dimensional vector spaces. While this approach has become the default for many AI applications, a recent research from Google DeepMind team explains a fundamental architectural limitation that cannot be solved by larger models or better training alone.

What Is the Theoretical Limit of Embedding Dimensions?

At the core of the issue is the representational capacity of fixed-size embeddings. An embedding of dimension d cannot represent all possible combinations of relevant documents once the database grows beyond a critical size. This follows from results in communication complexity and sign-rank theory.

For embeddings of size 512, retrieval breaks down around 500K documents.

For 1024 dimensions, the limit extends to about 4 million documents.

For 4096 dimensions, the theoretical ceiling is 250 million documents.

These values are best-case estimates derived under free embedding optimization, where vectors are directly optimized against test labels. Real-world language-constrained embeddings fail even earlier.

https://arxiv.org/pdf/2508.21038

How Does the LIMIT Benchmark Expose This Problem?

To test this limitation empirically, Google DeepMind Team introduced LIMIT (Limitations of Embeddings in Information Retrieval), a benchmark dataset specifically designed to stress-test embedders. LIMIT has two configurations:

LIMIT full (50K documents): In this large-scale setup, even strong embedders collapse, with recall@100 often falling below 20%.

LIMIT small (46 documents): Despite the simplicity of this toy-sized setup, models still fail to solve the task. Performance varies widely but remains far from reliable:

Promptriever Llama3 8B: 54.3% recall@2 (4096d)

GritLM 7B: 38.4% recall@2 (4096d)

E5-Mistral 7B: 29.5% recall@2 (4096d)

Gemini Embed: 33.7% recall@2 (3072d)

Even with just 46 documents, no embedder reaches full recall, highlighting that the limitation is not dataset size alone but the single-vector embedding architecture itself.

In contrast, BM25, a classical sparse lexical model, does not suffer from this ceiling. Sparse models operate in effectively unbounded dimensional spaces, allowing them to capture combinations that dense embeddings cannot.

https://arxiv.org/pdf/2508.21038

Why Does This Matter for RAG?

CCurrent RAG implementations typically assume that embeddings can scale indefinitely with more data. The Google DeepMind research team explains how this assumption is incorrect: embedding size inherently constrains retrieval capacity. This affects:

Enterprise search engines handling millions of documents.

Agentic systems that rely on complex logical queries.

Instruction-following retrieval tasks, where queries define relevance dynamically.

Even advanced benchmarks like MTEB fail to capture these limitations because they test only a narrow part/section of query-document combinations.

What Are the Alternatives to Single-Vector Embeddings?

The research team suggested that scalable retrieval will require moving beyond single-vector embeddings:

Cross-Encoders: Achieve perfect recall on LIMIT by directly scoring query-document pairs, but at the cost of high inference latency.

Multi-Vector Models (e.g., ColBERT): Offer more expressive retrieval by assigning multiple vectors per sequence, improving performance on LIMIT tasks.

Sparse Models (BM25, TF-IDF, neural sparse retrievers): Scale better in high-dimensional search but lack semantic generalization.

The key insight is that architectural innovation is required, not simply larger embedders.

What is the Key Takeaway?

The research team’s analysis shows that dense embeddings, despite their success, are bound by a mathematical limit: they cannot capture all possible relevance combinations once corpus sizes exceed limits tied to embedding dimensionality. The LIMIT benchmark demonstrates this failure concretely:

On LIMIT full (50K docs): recall@100 drops below 20%.

On LIMIT small (46 docs): even the best models max out at ~54% recall@2.

Classical techniques like BM25, or newer architectures such as multi-vector retrievers and cross-encoders, remain essential for building reliable retrieval engines at scale.

Check out the PAPER here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post Google DeepMind Finds a Fundamental Bug in RAG: Embedding Limits Break Retrieval at Scale appeared first on MarkTechPost.

What is OLMoASR and How Does It Compare to OpenAI’s Whisper in Speec …

Posted on September 5, 2025 by i-genie

The Allen Institute for AI (AI2) has released OLMoASR, a suite of open automatic speech recognition (ASR) models that rival closed-source systems such as OpenAI’s Whisper. Beyond just releasing model weights, AI2 has published training data identifiers, filtering steps, training recipes, and benchmark scripts—an unusually transparent move in the ASR space. This makes OLMoASR one of the most trending and extensible platforms for speech recognition research.

Why Open Automatic Speech Recognition ASR?

Most speech recognition models available today—whether from OpenAI, Google, or Microsoft—are only accessible via APIs. While these services provide high performance, they operate as black boxes: the training datasets are opaque, the filtering methods are undocumented, and the evaluation protocols are not always aligned with research standards.

This lack of transparency poses challenges for reproducibility and scientific progress. Researchers cannot verify claims, test variations, or adapt models to new domains without re-building large datasets themselves. OLMoASR addresses this problem by opening the entire pipeline. The release is not just about enabling practical transcription—it’s about pushing ASR toward a more open, scientific foundation.

Model Architecture and Scaling

OLMoASR uses a transformer encoder–decoder architecture, the dominant paradigm in modern ASR.

The encoder ingests audio waveforms and produces hidden representations.

The decoder generates text tokens conditioned on the encoder’s outputs.

This design is similar to Whisper, but OLMoASR makes the implementation fully open.

The family of models covers six sizes, all trained on English:

tiny.en – 39M parameters, designed for lightweight inference

base.en – 74M parameters

small.en – 244M parameters

medium.en – 769M parameters

large.en-v1 – 1.5B parameters, trained on 440K hours

large.en-v2 – 1.5B parameters, trained on 680K hours

This range allows developers to trade off between inference cost and accuracy. Smaller models are suited for embedded devices or real-time transcription, while the larger models maximize accuracy for research or batch workloads.

Data: From Web Scraping to Curated Mixes

One of the core contributions of OLMoASR is the open release of training datasets, not just the models.

OLMoASR-Pool (~3M hours)

This massive collection contains weakly supervised speech paired with transcripts scraped from the web. It includes around 3 million hours of audio and 17 million text transcripts. Like Whisper’s original dataset, it is noisy, containing misaligned captions, duplicates, and transcription errors.

OLMoASR-Mix (~1M hours)

To address quality issues, AI2 applied rigorous filtering:

Alignment heuristics to ensure audio and transcripts match

Fuzzy deduplication to remove repeated or low-diversity examples

Cleaning rules to eliminate duplicate lines and mismatched text

The result is a high-quality, 1M-hour dataset that boosts zero-shot generalization—critical for real-world tasks where data may differ from training distributions.

This two-tiered data strategy mirrors practices in large-scale language model pretraining: use vast noisy corpora for scale, then refine with filtered subsets to improve quality.

Performance Benchmarks

AI2 benchmarked OLMoASR against Whisper across both short-form and long-form speech tasks, using datasets like LibriSpeech, TED-LIUM3, Switchboard, AMI, and VoxPopuli.

Medium Model (769M)

12.8% WER (word error rate) on short-form speech

11.0% WER on long-form speech

This nearly matches Whisper-medium.en, which achieves 12.4% and 10.5% respectively.

Large Models (1.5B)

large.en-v1 (440K hours): 13.0% WER short-form vs Whisper large-v1 at 12.2%

large.en-v2 (680K hours): 12.6% WER, closing the gap to less than 0.5%

Smaller Models

Even the tiny and base versions perform competitively:

tiny.en: ~20.5% WER short-form, ~15.6% WER long-form

base.en: ~16.6% WER short-form, ~12.9% WER long-form

This gives developers flexibility to choose models based on compute and latency requirements.

How to use?

Transcribing audio takes just a few lines of code:

Copy CodeCopiedUse a different Browserimport olmoasr

model = olmoasr.load_model(“medium”, inference=True)
result = model.transcribe(“audio.mp3”)
print(result)

The output includes both the transcription and time-aligned segments, making it useful for captioning, meeting transcription, or downstream NLP pipelines.

Fine-Tuning and Domain Adaptation

Since AI2 provides full training code and recipes, OLMoASR can be fine-tuned for specialized domains:

Medical speech recognition – adapting models on datasets like MIMIC-III or proprietary hospital recordings

Legal transcription – training on courtroom audio or legal proceedings

Low-resource accents – fine-tuning on dialects not well covered in OLMoASR-Mix

This adaptability is critical: ASR performance often drops when models are used in specialized domains with domain-specific jargon. Open pipelines make domain adaptation straightforward.

Applications

OLMoASR opens up exciting opportunities across academic research and real-world AI development:

Educational Research: Researchers can explore the intricate relationships between model architecture, dataset quality, and filtering techniques to understand their effects on speech recognition performance.

Human-Computer Interaction: Developers gain the freedom to embed speech recognition capabilities directly into conversational AI systems, real-time meeting transcription platforms, and accessibility applications—all without dependency on proprietary APIs or external services.

Multimodal AI Development: When combined with large language models, OLMoASR enables the creation of advanced multimodal assistants that can seamlessly process spoken input and generate intelligent, contextually-aware responses.

Research Benchmarking: The open availability of both training data and evaluation metrics positions OLMoASR as a standardized reference point, allowing researchers to compare new approaches against a consistent, reproducible baseline in future ASR studies.

Conclusion

The release of OLMoASR brings high-quality speech recognition can be developed and released in a way that prioritizes transparency and reproducibility. While the models are currently limited to English and still demand significant compute for training, they provide a solid foundation for adaptation and extension. This release sets a clear reference point for future work in open ASR and makes it easier for researchers and developers to study, benchmark, and apply speech recognition models in different domains.

Check out the MODEL on Hugging Face, GitHub Page and TECHNICAL DETAILS. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post What is OLMoASR and How Does It Compare to OpenAI’s Whisper in Speech Recognition? appeared first on MarkTechPost.

Build character consistent storyboards using Amazon Nova in Amazon Bed …

Posted on September 5, 2025 by i-genie

Although careful prompt crafting can yield good results, achieving professional-grade visual consistency often requires adapting the underlying model itself. Building on the prompt engineering and character development approach covered in Part 1 of this two-part series, we now push the consistency level for specific characters by fine-tuning an Amazon Nova Canvas foundation model (FM). Through fine-tuning techniques, creators can instruct the model to maintain precise control over character appearances, expressions, and stylistic elements across multiple scenes.
In this post, we take an animated short film, Picchu, produced by FuzzyPixel from Amazon Web Services (AWS), prepare training data by extracting key character frames, and fine-tune a character-consistent model for the main character Mayu and her mother, so we can quickly generate storyboard concepts for new sequels like the following images.

Solution overview
To implement an automated workflow, we propose the following comprehensive solution architecture that uses AWS services for an end-to-end implementation.

The workflow consists of the following steps:

The user uploads a video asset to an Amazon Simple Storage Service (Amazon S3) bucket.
Amazon Elastic Container Service (Amazon ECS) is triggered to process the video asset.
Amazon ECS downsamples the frames, selects those containing the character, and then center-crops them to produce the final character images.
Amazon ECS invokes an Amazon Nova model (Amazon Nova Pro) from Amazon Bedrock to create captions from the images.
Amazon ECS writes the image captions and metadata to the S3 bucket.
The user uses a notebook environment in Amazon SageMaker AI to invoke the model training job.
The user fine-tunes a custom Amazon Nova Canvas model by invoking Amazon Bedrock create_model_customization_job and create_model_provisioned_throughput API calls to create a custom model available for inference.

This workflow is structured in two distinct phases. The initial phase, in Steps 1–5, focuses on preparing the training data. In this post, we walk through an automated pipeline to extract images from an input video and then generate labeled training data. The second phase, in Steps 6–7, focuses on fine-tuning the Amazon Nova Canvas model and performing test inference using the custom-trained model. For these latter steps, we provide the preprocessed image data and comprehensive example code in the following GitHub repository to guide you through the process.
Prepare the training data
Let’s begin with the first phase of our workflow. In our example, we build an automated video object/character extraction pipeline to extract high-resolution images with accurate caption labels using the following steps.
Creative character extraction
We recommend first sampling video frames at fixed intervals (for example, 1 frame per second). Then, apply Amazon Rekognition label detection and face collection search to identify frames and characters of interest. Label detection can identify over 2,000 unique labels and locate their positions within frames, making it ideal for initial detection of general character categories or non-human characters. To distinguish between different characters, we then use the Amazon Rekognition feature to search faces in a collection. This feature identifies and tracks characters by matching their faces against a pre-populated face collection. If these two approaches aren’t precise enough, we can use Amazon Rekognition Custom Labels to train a custom model for detecting specific characters. The following diagram illustrates this workflow.

After detection, we center-crop each character with appropriate pixel padding and then run a deduplication algorithm using the Amazon Titan Multimodal Embeddings model to remove semantically similar images above a threshold value. Doing so helps us build a diverse dataset because redundant or nearly identical frames could lead to model overfitting (when a model learns the training data too precisely, including its noise and fluctuations, making it perform poorly on new, unseen data). We can calibrate the similarity threshold to fine-tune what we consider to be identical images, so we can better control the balance between dataset diversity and redundancy elimination.
Data labeling
We generate captions for each image using Amazon Nova Pro in Amazon Bedrock and then upload the image and label manifest file to an Amazon S3 location. This process focuses on two critical aspects of prompt engineering: character description to help the FM identify and name the characters based on their unique attributes, and varied description generation that avoids repetitive patterns in the caption (for example, “an animated character”). The following is an example prompt template used during our data labeling process:

system_prompt = “””
You are an expert image description specialist who creates concise, natural alt
text that makes visual content accessible while maintaining clarity and focus.
Your task is to analyze the provided image and provide a creative description
(20-30 words) that emphasizes the Three main characters, capturing the essential
elements of their interaction while avoiding unnecessary details.
“””

prompt = “””

1. Identify the main characters in the image: Character 1, Character 2, and
Character 3 at least one will be in the picture so provide at a minimum a
description with at least one character name.
– “Character 1” describe the first character, key traits, background, attributes.
– “Character 2” describe the second character, key traits, background, attributes.
– “Character 3” describe the third character, key traits, background, attributes.
2. Just state their name WITHOUT adding any standard characteristics.
3. Only capture visual element outside the standard characteristics
4. Capture the core interaction between them
5. Include only contextual details that are crucial for understanding the scene
6. Create a natural, flowing description using everyday language

Here are some examples

…

[Identify the main characters]
[Assessment of their primary interaction]
[Selection of crucial contextual elements]
[Crafting of concise, natural description]

{
“alt_text”: “[Concise, natural description focusing on the main characters]”
}

Note: Provide only the JSON object as the final response.

The data labeling output is formatted as a JSONL file, where each line pairs an image reference Amazon S3 path with a caption generated by Amazon Nova Pro. This JSONL file is then uploaded to Amazon S3 for training. The following is an example of the file:

{“image_ref”: “s3://media-ip-dataset/characters/blue_character_01.jpg”, “alt_text”: “This
animated character features a round face with large expressive eyes. The character
has a distinctive blue color scheme with a small tuft of hair on top. The design is
stylized with clean lines and a minimalist approach typical of modern animation.”}
{“image_ref”: “s3://media-ip-dataset/props/iconic_prop_series1.jpg”, “alt_text”: “This
object appears to be an iconic prop from the franchise. It has a metallic appearance
with distinctive engravings and a unique shape that fans would immediately recognize.
The lighting highlights its dimensional qualities and fine details that make it
instantly identifiable.”}

Human verification
For enterprise use cases, we recommend incorporating a human-in-the-loop process to verify labeled data before proceeding with model training. This verification can be implemented using Amazon Augmented AI (Amazon A2I), a service that helps annotators verify both image and caption quality. For more details, refer to Get Started with Amazon Augmented AI.
Fine-tune Amazon Nova Canvas
Now that we have the training data, we can fine-tune the Amazon Nova Canvas model in Amazon Bedrock. Amazon Bedrock requires an AWS Identity and Access Management (IAM) service role to access the S3 bucket where you stored your model customization training data. For more details, see Model customization access and security. You can perform the fine-tuning task directly on the Amazon Bedrock console or use the Boto3 API. We explain both approaches in this post, and you can find the end-to-end code sample in picchu-finetuning.ipynb.
Create a fine-tuning job on the Amazon Bedrock console
Let’s start by creating an Amazon Nova Canvas fine-tuning job on the Amazon Bedrock console:

On the Amazon Bedrock console, in the navigation pane, choose Custom models under Foundation models.
Choose Customize model and then Create Fine-tuning job.

On the Create Fine-tuning job details page, choose the model you want to customize and enter a name for the fine-tuned model.
In the Job configuration section, enter a name for the job and optionally add tags to associate with it.
In the Input data section, enter the Amazon S3 location of the training dataset file.
In the Hyperparameters section, enter values for hyperparameters, as shown in the following screenshot.

In the Output data section, enter the Amazon S3 location where Amazon Bedrock should save the output of the job.
Choose Fine-tune model job to begin the fine-tuning process.

This hyperparameter combination yielded good results during our experimentation. In general, increasing the learning rate makes the model train more aggressively, which often presents an interesting trade-off: we might achieve character consistency more quickly, but it might impact overall image quality. We recommend a systematic approach to adjusting hyperparameters. Start with the suggested batch size and learning rate, and try increasing or decreasing the number of training steps first. If the model struggles to learn your dataset even after 20,000 steps (the maximum allowed in Amazon Bedrock), then we suggest either increasing the batch size or adjusting the learning rate upward. These adjustments, through subtle, can make a significant difference in our model’s performance. For more details about the hyperparameters, refer to Hyperparameters for Creative Content Generation models.
Create a fine-tuning job using the Python SDK
The following Python code snippet creates the same fine-tuning job using the create_model_customization_job API:

bedrock = boto3.client(‘bedrock’)
jobName = “picchu-canvas-v0”
# Set parameters
hyperParameters = {
   “stepCount”: “14000”,
   “batchSize”: “64”,
   “learningRate”: “0.000001”,
   }

# Create job
response_ft = bedrock.create_model_customization_job(
   jobName=jobName,
   customModelName=jobName,
   roleArn=roleArn,
   baseModelIdentifier=”amazon.nova-canvas-v1:0″,
   hyperParameters=hyperParameters,
   trainingDataConfig={“s3Uri”: training_path},
   outputDataConfig={“s3Uri”: f”s3://{bucket}/{prefix}”}
)

jobArn = response_ft.get(‘jobArn’)
print(jobArn)

When the job is complete, you can retrieve the new customModelARN using the following code:

custom_model_arn = bedrock.list_model_customization_jobs(
nameContains=jobName
)[“modelCustomizationJobSummaries”][0][“customModelArn”]

Deploy the fine-tuned model
With the preceding hyperparameter configuration, this fine-tuning job might take up to 12 hours to complete. When it’s complete, you should see a new model in the custom models list. You can then create provisioned throughput to host the model. For more details on provisioned throughput and different commitment plans, see Increase model invocation capacity with Provisioned Throughput in Amazon Bedrock.
Deploy the model on the Amazon Bedrock console
To deploy the model from the Amazon Bedrock console, complete the following steps:

On the Amazon Bedrock console, choose Custom models under Foundation models in the navigation pane.
Select the new custom model and choose Purchase provisioned throughput.

In the Provisioned Throughput details section, enter a name for the provisioned throughput.
Under Select model, choose the custom model you just created.
Then specify the commitment term and model units.

After you purchase provisioned throughput, a new model Amazon Resource Name (ARN) is created. You can invoke this ARN when the provisioned throughput is in service.

Deploy the model using the Python SDK
The following Python code snippet creates provisioned throughput using the create_provisioned_model_throughput API:

custom_model_name = “picchu-canvas-v0″

# Create the provision throughput job and retrieve the provisioned model id
provisioned_model_id = bedrock.create_provisioned_model_throughput(
   modelUnits=1,
   # create a name for your provisioned throughput model
   provisionedModelName=custom_model_name,
   modelId=custom_model_arn
)[‘provisionedModelArn’]

Test the fine-tuned model
When the provisioned throughput is live, we can use the following code snippet to test the custom model and experiment with generating some new images for a sequel to Picchu:

import json
import io
from PIL import Image
import base64

def decode_base64_image(img_b64):
return Image.open(io.BytesIO(base64.b64decode(img_b64)))

def generate_image(prompt,
negative_prompt=”text, ugly, blurry, distorted, low
quality, pixelated, watermark, text, deformed”,
num_of_images=3,
seed=1):
“””
Generate an image using Amazon Nova Canvas.
“””

image_gen_config = {
“numberOfImages”: num_of_images,
“quality”: “premium”,
“width”: 1024, # Maximum resolution 2048 x 2048
“height”: 1024, # 1:1 ratio
“cfgScale”: 8.0,
“seed”: seed,
}

# Prepare the request body
request_body = {
“taskType”: “TEXT_IMAGE”,
“textToImageParams”: {
“text”: prompt,
“negativeText”: negative_prompt, # List things to avoid
},
“imageGenerationConfig”: image_gen_config
}

response = bedrock_runtime.invoke_model(
modelId=provisioned_model_id,
body=json.dumps(request_body)
)

# Parse the response
response_body = json.loads(response[‘body’].read())

if “images” in response_body:
# Extract the image
return [decode_base64_image(img) for img in response_body[‘images’]]
else:
return
seed = random.randint(1, 858993459)
print(f”seed: {seed}”)

images = generate_image(prompt=prompt, seed=seed)

Mayu face shows a mix of nervousness and determination. Mommy kneels beside her, gently holder her. A landscape is visible in the background.
A steep cliff face with a long wooden ladder extending downwards. Halfway down the ladder is Mayu with a determined expression on her face. Mayu’s small hands grip the sides of the ladder tightly as she carefully places her feet on each rung. The surrounding environment shows a rugged, mountainous landscape.
Mayu standing proudly at the entrance of a simple school building. Her face beams with a wide smile, expressing pride and accomplishment.

Clean up
To avoid incurring AWS charges after you are done testing, complete the cleanup steps in picchu-finetuning.ipynb and delete the following resources:

Amazon SageMaker Studio domain
Fine-tuned Amazon Nova model and provision throughput endpoint

Conclusion
In this post, we demonstrated how to elevate character and style consistency in storyboarding from Part 1 by fine-tuning Amazon Nova Canvas in Amazon Bedrock. Our comprehensive workflow combines automated video processing, intelligent character extraction using Amazon Rekognition, and precise model customization using Amazon Bedrock to create a solution that maintains visual fidelity and dramatically accelerates the storyboarding process. By fine-tuning the Amazon Nova Canvas model on specific characters and styles, we’ve achieved a level of consistency that surpasses standard prompt engineering, so creative teams can produce high-quality storyboards in hours rather than weeks. Start experimenting with Nova Canvas fine-tuning today, so you can also elevate your storytelling with better character and style consistency.

About the authors
Dr. Achin Jain is a Senior Applied Scientist at Amazon AGI, where he works on building multi-modal foundation models. He brings over 10+ years of combined industry and academic research experience. He has led the development of several modules for Amazon Nova Canvas and Amazon Titan Image Generator, including supervised fine-tuning (SFT), model customization, instant customization, and guidance with color palette.
James Wu is a Senior AI/ML Specialist Solution Architect at AWS. helping customers design and build AI/ML solutions. James’s work covers a wide range of ML use cases, with a primary interest in computer vision, deep learning, and scaling ML across the enterprise. Prior to joining AWS, James was an architect, developer, and technology leader for over 10 years, including 6 years in engineering and 4 years in marketing & advertising industries.
Randy Ridgley is a Principal Solutions Architect focused on real-time analytics and AI. With expertise in designing data lakes and pipelines. Randy helps organizations transform diverse data streams into actionable insights. He specializes in IoT solutions, analytics, and infrastructure-as-code implementations. As an open-source contributor and technical leader, Randy provides deep technical knowledge to deliver scalable data solutions across enterprise environments.

Build character consistent storyboards using Amazon Nova in Amazon Bed …

Posted on September 5, 2025 by i-genie

The art of storyboarding stands as the cornerstone of modern content creation, weaving its essential role through filmmaking, animation, advertising, and UX design. Though traditionally, creators have relied on hand-drawn sequential illustrations to map their narratives, today’s AI foundation models (FMs) are transforming this landscape. FMs like Amazon Nova Canvas and Amazon Nova Reel offer capabilities in transforming text and image inputs into professional-grade visuals and short clips that promise to revolutionize preproduction workflows.
This technological leap forward, however, presents its own set of challenges. Although these models excel at generating diverse concepts rapidly—a boon for creative exploration—maintaining consistent character designs and stylistic coherence across scenes remains a significant hurdle. Even subtle modifications to prompts or model configurations can yield dramatically different visual outputs, potentially disrupting narrative continuity and creating additional work for content creators.
To address these challenges, we’ve developed this two-part series exploring practical solutions for achieving visual consistency. In Part 1, we deep dive into prompt engineering and character development pipelines, sharing tested prompt patterns that deliver reliable, consistent results with Amazon Nova Canvas and Amazon Nova Reel. Part 2 explores techniques like fine-tuning Amazon Nova Canvas to achieve exceptional visual consistency and precise character control.

Consistent character design with Amazon Nova Canvas
The foundation of effective storyboarding begins with establishing well-defined character designs. Amazon Nova Canvas offers several powerful techniques to create and maintain character consistency throughout your visual narrative. To help you implement these techniques in your own projects, we’ve provided comprehensive code examples and resources in our GitHub repository. We encourage you to follow along as we walk through each step in detail. If you’re new to Amazon Nova Canvas, we recommend first reviewing Generating images with Amazon Nova to familiarize yourself with the basic concepts.
Basic text prompting
Amazon Nova Canvas transforms text descriptions into visual representations. Unlike large language models (LLMs), image generation models don’t interpret commands or engage in reasoning—they respond best to descriptive captions. Including specific details in your prompts, such as physical attributes, clothing, and styling elements, directly influences the generated output.
For example, “A 7-year-old Peruvian girl with dark hair in two low braids wearing a school uniform” provides clear visual elements for the model to generate an initial character concept, as shown in the following example image.

Visual style implementation
Consistency in storyboarding requires both character features and unified visual style. Our approach separates style information into two key components in the prompt:

Style description – An opening phrase that defines the visual medium (for example, “A graphic novel style illustration of”)
Style details – A closing phrase that specifies artistic elements (for example, “Bold linework, dramatic shadows, flat color palettes”)

This structured technique enables exploration of various artistic styles, including graphic novels, sketches, and 3D illustrations, while maintaining character consistency throughout the storyboard. The following is an example prompt template and some style information you can experiment with:

{style_description} A 7 year old peruvian girl with dark hair in two low braids wearing a
school uniform. {style_details}
styles = [
{
“name”: “graphic-novel”,
“description”: “A graphic novel style illustation of”,
“details”: “Bold linework, dramatic shadows, and flat color palettes. Use
high contrast lighting and cinematic composition typical of comic book
panels. Include expressive line work to convey emotion and movement.”,
},
{
“name”: “sketch”,
“description”: “A simple black and white line sketch of”,
“details”: “Rough, sketch-like lines create a storyboard aesthetic. High
contrast. No color”,
},
{
“name”: “digital-illustration”,
“description”: “A 3D digital drawing of”,
“details”: “High contrast. Rounded character design. Smooth rendering.
Soft texture. Luminous lighting”,
},
]

Character variation through seed values
The seed parameter serves as a tool for generating character variations while adhering to the same prompt. By keeping the text description constant and varying only the seed value, creators can explore multiple interpretations of their character design without starting from scratch, as illustrated in the following example images.

Seed = 1
Seed = 20
Seed = 57
Seed = 139
Seed = 12222

Prompt adherence control with cfgScale
The cfgScale parameter is another tool for maintaining character consistency, controlling how strictly Amazon Nova Canvas follows your prompt. Operating on a scale from 1.1–10, lower values give the model more creative freedom and higher values enforce strict prompt adherence. The default value of 6.5 typically provides an optimal balance, but as demonstrated in the following images, finding the right setting is crucial. Too low a value can result in inconsistent character representations, whereas too high a value might overemphasize prompt elements at the cost of natural composition.

Seed = 57, cfgScale = 1.1
Seed = 57, cfgScale = 3.5
Seed = 57, cfgScale = 6.5
Seed = 57, cfgScale = 8.0
Seed = 57, cfgScale = 10

Scene integration with consistent parameters
Now we can put these techniques together to test for character consistency across different narrative contexts, as shown in the following example images. We maintain consistent input for style, seed, and cfgScale, varying only the scene description to make sure character remains recognizable throughout the scene sequences.

Seed = 57, Cfg_scale: 6.5
Seed = 57, Cfg_scale: 6.5
Seed = 57, Cfg_scale: 6.5

A graphic novel style illustration of a 7 year old Peruvian girl with dark hair in two low braids wearing a school uniform riding a bike on a mountain pass Bold linework, dramatic shadows, and flat color palettes. Use high contrast lighting and cinematic composition typical of comic book panels. Include expressive line work to convey emotion and movement.
A graphic novel style illustation of a 7 year old Peruvian girl with dark hair in two low braids wearing a school uniform walking on a path through tall grass in the Andes Bold linework, dramatic shadows, and flat color palettes. Use high contrast lighting and cinematic composition typical of comic book panels. Include expressive line work to convey emotion and movement.
A graphic novel style illustration of a 7 year old Peruvian girl with dark hair in two low braids wearing a school uniform eating ice cream at the beach Bold linework, dramatic shadows, and flat color palettes. Use high contrast lighting and cinematic composition typical of comic book panels. Include expressive line work to convey emotion and movement.

Storyboard development pipeline
Building upon the character consistency techniques we’ve discussed, we can now implement an end-to-end storyboard development pipeline that transforms written scene and character descriptions into visually coherent storyboards. This systematic approach uses our established parameters for style descriptions, seed values, and cfgScale values to provide character consistency while adapting to different narrative contexts. The following are some example scene and character descriptions:

“scenes”:[
{
“description”: “Mayu stands at the edge of a mountainous path, clutching
a book. Her mother, Maya, kneels beside her, offering words of encouragement
and handing her the book. Mayu looks nervous but determined as she prepares
to start her journey.”
},
{
“description”: “Mayu encounters a ‘danger’ sign with a drawing of a
snake. She looks scared, but then remembers her mother’s words. She takes a
deep breath, looks at her book for reassurance, and then searches for a stick
on the ground.”
},
{
“description”: “Mayu bravely makes her way through tall grass, swinging
her stick and making noise to scare off potential snakes. Her face shows a
mix of fear and courage as she pushes forward on her journey.”
}
],
“characters”:{
“Mayu”: “A 7-year-old Peruvian girl with dark hair in two low braids wearing a
school uniform”,
“Maya”: “An older Peruvian woman with long dark hair tied back in a bun, wearing
traditional Peruvian clothing”
}

Our pipeline uses Amazon Nova Lite to first craft optimized image prompts incorporating our established best practices, which are then passed to Amazon Nova Canvas for image generation. By setting numberOfImages higher (typically three variations), while maintaining consistent seed and cfgScale values, we give creators multiple options that preserve character consistency. We used the following prompt for Amazon Nova Lite to generate optimized image prompts:

Describe an image that best represents the scene described. Here are some examples:
scene: Rosa is in the kitchen, rummaging through the pantry, looking for a snack. She
hears a strange noise coming from the back of the pantry and becomes startled.
imagery: A dimly lit pantry with shelves stocked with various food items, and Rosa
peering inside, her face expressing curiosity and a hint of fear.
scene: Rosa says goodbye to her mother, Maya. Maya offers her words of encouragement.
imagery: A wide shot of Rosa’s determined face, facing Maya and receiving a small wrapped
gift.
Only describe the imagery. Use no more than 60 words.
scene: {scene_description}
imagery:

Our pipeline generated the following storyboard panels.

Mayu stands at the edge of a mountainous path, clutching a book. Her mother, Maya, kneels beside her, offering words of encouragement and handing her the book. Mayu looks nervous but determined as she prepares to start her journey.

Mayu encounters a ‘danger’ sign with a drawing of a snake. She looks scared, but then remembers her mother’s words. She takes a deep breath, looks at her book for reassurance, and then searches for a stick on the ground.

Mayu bravely makes her way through tall grass, swinging her stick and making noise to scare off potential snakes. Her face shows a mix of fear and courage as she pushes forward on her journey.

Although these techniques noticeably improve character consistency, they aren’t perfect. Upon closer inspection, you will notice that even images within the same scene show variations in character consistency. Using consistent seed values helps control these variations, and the techniques outlined in this post significantly improve consistency compared to basic prompt engineering. However, if your use case requires near-perfect character consistency, we recommend proceeding to Part 2, where we explore advanced fine-tuning techniques.
Video generation for animated storyboards
If you want to go beyond static scene images to transform your storyboard into short, animated video clips, you can use Amazon Nova Reel. We use Amazon Nova Lite to convert image prompts into video prompts, adding subtle motion and camera movements optimized for the Amazon Nova Reel model. These prompts, along with the original images, serve as creative constraints for Amazon Nova Reel to generate the final animated sequences. The following is the example prompt and its resulting animated scene in GIF format:

A sunlit forest path with a ‘Danger’ sign featuring a snake. A 7-year-old Peruvian girl
stands, visibly scared but resolute. Bold linework, dramatic shadows, and flat color
palettes. High contrast lighting and cinematic composition. Mist slowly drifting.
Camera dolly in.

Input Image
Output Video

Conclusion
In this first part of our series, we explored fundamental techniques for achieving character and style consistency using Amazon Nova Canvas, from structured prompt engineering to building an end-to-end storyboarding pipeline. We demonstrated how combining style descriptions, seed values, and careful cfgScale parameter control can significantly improve character consistency across different scenes. We also showed how integrating Amazon Nova Lite with Amazon Nova Reel can enhance the storyboarding workflow, enabling both optimized prompt generation and animated sequences.
Although these techniques provide a solid foundation for consistent storyboard generation, they aren’t perfect—subtle variations might still occur. We invite you to continue to Part 2, where we explore advanced model fine-tuning techniques that can help achieve near-perfect character consistency and visual fidelity.

About the authors
Alex Burkleaux is a Senior AI/ML Specialist Solution Architect at AWS. She helps customers use AI Services to build media solutions using Generative AI. Her industry experience includes over-the-top video, database management systems, and reliability engineering.
James Wu is a Senior AI/ML Specialist Solution Architect at AWS, helping customers design and build AI/ML solutions. James’s work covers a wide range of ML use cases, with a primary interest in computer vision, deep learning, and scaling ML across the enterprise. Prior to joining AWS, James was an architect, developer, and technology leader for over 10 years, including 6 years in engineering and 4 years in marketing & advertising industries.
Vladimir Budilov is a Principal Solutions Architect at AWS focusing on agentic & generative AI, and software architecture. He leads large-scale GenAI implementations, bridging cutting-edge AI capabilities with production-ready business solutions, while optimizing for cost and solution resilience.
Nora Shannon Johnson is a Solutions Architect at Amazon Music focused on discovery and growth through AI/ML. In the past, she supported AWS through the development of generative AI prototypes and tools for developers in financial services, health care, retail, and more. She has been an engineer and consultant in various industries including DevOps, fintech, industrial AI/ML, and edtech in the United States, Europe, and Latin America.
Ehsan Shokrgozar is a Senior Solutions Architect specializing in Media and Entertainment at AWS. He is passionate about helping M&E customers build more efficient workflows. He combines his previous experience as Technical Director and Pipeline Engineer at various Animation/VFX studios with his knowledge of building M&E workflows in the cloud to help customers achieve their business goals.

Google Brings Gemini CLI to GitHub Actions: Secure, Free, and Enterpri …

Posted on September 4, 2025 by i-genie

How do devs integrate coding capabilities directly into their GitHub repositories? Google has recently introduced Gemini CLI GitHub Actions, a new way for developers to integrate Gemini’s AI coding capabilities directly into their GitHub repositories. Built on top of GitHub’s workflow automation framework, this Google’s new release turns Gemini from a terminal-only coding assistant into a collaborative teammate that participates in issue triage, pull request reviews, and repository maintenance.

But how is it different from Microsoft’s GitHub Copilot? Unlike Microsoft’s GitHub Copilot features, which require paid subscriptions for advanced functionality, Google’s integration is available at no cost. This really helps open-source devs, small teams, and enterprises that want to embed AI into their workflows without additional licensing overhead.

From Terminal to Repository Integration

Google first released Gemini CLI earlier this year as a command-line interface that connected developers directly to the Gemini 2.5 Pro model. With a one-million-token context window, built-in tools, and open-source licensing, Gemini CLI was designed for local, developer-focused workflows.

The new GitHub Actions integration extends those capabilities to collaborative environments. Instead of operating only on a developer’s machine, Gemini can now participate in repository-level automation action, where it assists teams during code reviews, issue management, and continuous integration processes, saving hours of time for dev and helps in faster code deployment.

Core Capabilities

Gemini CLI GitHub Actions comes with three key use cases:

Automated Issue TriageNew issues are automatically labeled, categorized, and prioritized. This reduces the time dev maintainers spend manually managing backlogs and helps teams focus on critical bugs or features.

AI-Powered Pull Request ReviewsEvery new pull request can be reviewed by Gemini before real human dev reviewers. The system checks code for style adherence, potential bugs, and correctness. This allows human dev maintainers to focus on design-level concerns rather than surface-level errors. Saving a lot of time and effort!

On-Demand Collaboration via CommandsDevelopers can interact with Gemini directly in GitHub comments. By mentioning @gemini-cli and issuing commands such as /review, /triage, or /write-tests, they can trigger specific actions. This makes Gemini act like a conversational collaborator inside the repository just like how devs interact with each other inside Slack or JIRA.

Setup and Configuration

Integrating Gemini CLI GitHub Actions is very straightforward. Developers need Gemini CLI version 0.1.18 or higher. Running the command /setup-github inside the CLI scaffolds the necessary workflow files under .github/workflows and ensures configuration settings are properly managed.

For authentication, Google provides two methods:

API Key Authentication: Developers can store a GEMINI_API_KEY in GitHub Secrets. This method is simple and sufficient for most individual and team projects.

Workload Identity Federation (WIF): For enterprise users, WIF provides a more secure option by replacing long-lived credentials with short-lived, federated tokens. This approach aligns with modern security best practices for CI/CD pipelines.

Gemini’s behavior can be further customized using a GEMINI.md file placed in the repository. This file can contain coding guidelines, documentation links, or project-specific rules. The AI model then uses this context to tailor its reviews and responses.

Security Model

But apart from all these cool benefits of Gemini CLI GitHub Actions, the question is how secure it is? The commands executed by the model are run in isolated environments since the system supports multiple sandboxing technologies—Docker, Podman, and macOS Seatbelt.

Additionally, since version 0.1.14 of Gemini CLI, all executions are logged for auditability. Any commands flagged as unusual or potentially unsafe require explicit developer confirmation before execution. For production environments, Google strongly recommends using WIF authentication to avoid risks associated with static API keys.

Example Workflow

The following minimal YAML configuration enables Gemini to automatically review pull requests. This workflow ensures that every new or updated pull request is analyzed by Gemini before merging, providing consistent automated review across the repository.

Copy CodeCopiedUse a different Browsername: Gemini Pull Request Review
on:
pull_request:
types: [opened, synchronize]
jobs:
gemini-review:
runs-on: ubuntu-latest
steps:
– uses: actions/checkout@v4
– uses: google-github-actions/run-gemini-cli@v0.1
with:
args: review –files .
env:
GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}

Summary

Gemini CLI GitHub Actions represents a significant step in Google’s effort to embed AI into collaborative software development. By combining free access, flexible configuration, and strong security practices, the release lowers the barrier for teams to experiment with AI-driven automation inside their repositories.
The post Google Brings Gemini CLI to GitHub Actions: Secure, Free, and Enterprise-Ready AI Integration appeared first on MarkTechPost.

AI and the Brain: How DINOv3 Models Reveal Insights into Human Visual …

Posted on September 4, 2025 by i-genie

Introduction

Understanding how the brain builds internal representations of the visual world is one of the most fascinating challenges in neuroscience. Over the past decade, deep learning has reshaped computer vision, producing neural networks that not only perform at human-level accuracy on recognition tasks but also seem to process information in ways that resemble our brains. This unexpected overlap raises an intriguing question: can studying AI models help us better understand how the brain itself learns to see?

Researchers at Meta AI and École Normale Supérieure set out to explore this question by focusing on DINOv3, a self-supervised vision transformer trained on billions of natural images. They compared DINOv3’s internal activations with human brain responses to the same images, using two complementary neuroimaging techniques. fMRI provided high-resolution spatial maps of cortical activity, while MEG captured the precise timing of brain responses. Together, these datasets offered a rich view of how the brain processes visual information.

https://arxiv.org/pdf/2508.18226

Technical Details

The research team explores three factors that might drive brain-model similarity: model size, the amount of training data, and the type of images used for training. To do this, the team trained multiple versions of DINOv3, varying these factors independently.

Brain-Model Similarity

The research team found strong evidence of convergence while looking at how well DINOv3 matched brain responses. The model’s activations predicted fMRI signals in both early visual regions and higher-order cortical areas. Peak voxel correlations reached R = 0.45, and MEG results showed that alignment started as early as 70 milliseconds after image onset and lasted up to three seconds. Importantly, early DINOv3 layers aligned with regions like V1 and V2, while deeper layers matched activity in higher-order regions, including parts of the prefrontal cortex.

Training Trajectories

Tracking these similarities over the course of training revealed a developmental trajectory. Low-level visual alignments emerged very early, after only a small fraction of training, while higher-level alignments required billions of images. This mirrors the way the human brain develops, with sensory areas maturing earlier than associative cortices. The study showed that temporal alignment emerged fastest, spatial alignment more slowly, and encoding similarity in between, highlighting the layered nature of representational development.

Role of Model Factors

The role of model factors was equally telling. Larger models consistently achieved higher similarity scores, especially in higher-order cortical regions. Longer training improved alignment across the board, with high-level representations benefiting most from extended exposure. The type of images mattered as well: models trained on human-centric images produced the strongest alignment. Those trained on satellite or cellular images showed partial convergence in early visual regions but much weaker similarity in higher-level brain areas. This suggests that ecologically relevant data are crucial for capturing the full range of human-like representations.

Links to Cortical Properties

Interestingly, the timing of when DINOv3’s representations emerged also lined up with structural and functional properties of the cortex. Regions with greater developmental expansion, thicker cortex, or slower intrinsic timescales aligned later in training. Conversely, highly myelinated regions aligned earlier, reflecting their role in fast information processing. These correlations suggest that AI models can offer clues about the biological principles underlying cortical organization.

Nativism vs. Empiricism

The study highlights a balance between innate structure and learning. DINOv3’s architecture gives it a hierarchical processing pipeline, but full brain-like similarity only emerged with prolonged training on ecologically valid data. This interplay between architectural priors and experience echoes debates in cognitive science about nativism and empiricism.

Developmental Parallels

The parallels to human development are striking. Just as sensory cortices in the brain mature quickly and associative areas develop more slowly, DINOv3 aligned with sensory regions early in training and with prefrontal areas much later. This suggests that training trajectories in large-scale AI models may serve as computational analogues for the staged maturation of human brain functions.

Beyond the Visual Pathway

The results also extended beyond traditional visual pathways. DINOv3 showed alignment in prefrontal and multimodal regions, raising questions about whether such models capture higher-order features relevant for reasoning and decision-making. While this study focused only on DINOv3, it points toward exciting possibilities for using AI as a tool to test hypotheses about brain organization and development.

https://arxiv.org/pdf/2508.18226

Conclusion

In conclusion, this research shows that self-supervised vision models like DINOv3 are more than just powerful computer vision systems. They also approximate aspects of human visual processing, revealing how size, training, and data shape convergence between brains and machines. By studying how models learn to “see,” we gain valuable insights into how the human brain itself develops the ability to perceive and interpret the world.

Check out the PAPER here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post AI and the Brain: How DINOv3 Models Reveal Insights into Human Visual Processing appeared first on MarkTechPost.

Tencent Hunyuan Open-Sources Hunyuan-MT-7B and Hunyuan-MT-Chimera-7B: …

Posted on September 4, 2025 by i-genie

Introduction

Tencent’s Hunyuan team has released Hunyuan-MT-7B (a translation model) and Hunyuan-MT-Chimera-7B (an ensemble model). Both models are designed specifically for multilingual machine translation and were introduced in conjunction with Tencent’s participation in the WMT2025 General Machine Translation shared task, where Hunyuan-MT-7B ranked first in 30 out of 31 language pairs.

https://github.com/Tencent-Hunyuan/Hunyuan-MT/blob/main/Hunyuan_MT_Technical_Report.pdf

Model Overview

Hunyuan-MT-7B

A 7B parameter translation model.

Supports mutual translation across 33 languages, including Chinese ethnic minority languages such as Tibetan, Mongolian, Uyghur, and Kazakh.

Optimized for both high-resource and low-resource translation tasks, achieving state-of-the-art results among models of comparable size.

Hunyuan-MT-Chimera-7B

An integrated weak-to-strong fusion model.

Combines multiple translation outputs at inference time and produces a refined translation using reinforcement learning and aggregation techniques.

Represents the first open-source translation model of this type, improving translation quality beyond single-system outputs.

https://github.com/Tencent-Hunyuan/Hunyuan-MT/blob/main/Hunyuan_MT_Technical_Report.pdf

Training Framework

The models were trained using a five-stage framework designed for translation tasks:

General Pre-training

1.3 trillion tokens covering 112 languages and dialects.

Multilingual corpora assessed for knowledge value, authenticity, and writing style.

Diversity maintained through disciplinary, industry, and thematic tagging systems.

MT-Oriented Pre-training

Monolingual corpora from mC4 and OSCAR, filtered using fastText (language ID), minLSH (deduplication), and KenLM (perplexity filtering).

Parallel corpora from OPUS and ParaCrawl, filtered with CometKiwi.

Replay of general pre-training data (20%) to avoid catastrophic forgetting.

Supervised Fine-Tuning (SFT)

Stage I: ~3M parallel pairs (Flores-200, WMT test sets, curated Mandarin–minority data, synthetic pairs, instruction-tuning data).

Stage II: ~268k high-quality pairs selected through automated scoring (CometKiwi, GEMBA) and manual verification.

Reinforcement Learning (RL)

Algorithm: GRPO.

Reward functions:

XCOMET-XXL and DeepSeek-V3-0324 scoring for quality.

Terminology-aware rewards (TAT-R1).

Repetition penalties to avoid degenerate outputs.

Weak-to-Strong RL

Multiple candidate outputs generated and aggregated through reward-based output

Applied in Hunyuan-MT-Chimera-7B, improving translation robustness and reducing repetitive errors.

Benchmark Results

Automatic Evaluation

WMT24pp (English⇔XX): Hunyuan-MT-7B achieved 0.8585 (XCOMET-XXL), surpassing larger models like Gemini-2.5-Pro (0.8250) and Claude-Sonnet-4 (0.8120).

FLORES-200 (33 languages, 1056 pairs): Hunyuan-MT-7B scored 0.8758 (XCOMET-XXL), outperforming open-source baselines including Qwen3-32B (0.7933).

Mandarin⇔Minority Languages: Scored 0.6082 (XCOMET-XXL), higher than Gemini-2.5-Pro (0.5811), showing significant improvements in low-resource settings.

Comparative Results

Outperforms Google Translator by 15–65% across evaluation categories.

Outperforms specialized translation models such as Tower-Plus-9B and Seed-X-PPO-7B despite having fewer parameters.

Chimera-7B adds ~2.3% improvement on FLORES-200, particularly in Chinese⇔Other and non-English⇔non-Chinese translations.

Human Evaluation

A custom evaluation set (covering social, medical, legal, and internet domains) compared Hunyuan-MT-7B with state-of-the-art models:

Hunyuan-MT-7B: Avg. 3.189

Gemini-2.5-Pro: Avg. 3.223

DeepSeek-V3: Avg. 3.219

Google Translate: Avg. 2.344

This shows that Hunyuan-MT-7B, despite being smaller at 7B parameters, approaches the quality of much larger proprietary models.

Case Studies

The report highlights several real-world cases:

Cultural References: Correctly translates “小红薯” as the platform “REDnote,” unlike Google Translate’s “sweet potatoes.”

Idioms: Interprets “You are killing me” as “你真要把我笑死了” (expressing amusement), avoiding literal misinterpretation.

Medical Terms: Translates “uric acid kidney stones” precisely, while baselines generate malformed outputs.

Minority Languages: For Kazakh and Tibetan, Hunyuan-MT-7B produces coherent translations, where baselines fail or output nonsensical text.

Chimera Enhancements: Adds improvements in gaming jargon, intensifiers, and sports terminology.

Conclusion

Tencent’s release of Hunyuan-MT-7B and Hunyuan-MT-Chimera-7B establishes a new standard for open-source translation. By combining a carefully designed training framework with specialized focus on low-resource and minority language translation, the models achieve quality on par with or exceeding larger closed-source systems. The launch of these 2 models provides the AI research community with accessible, high-performance tools for multilingual translation research and deployment.

Check out the Paper, GitHub Page, and Model on Hugging Face. All credit for this research goes to the researchers of this project. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post Tencent Hunyuan Open-Sources Hunyuan-MT-7B and Hunyuan-MT-Chimera-7B: A State-of-the-Art Multilingual Translation Models appeared first on MarkTechPost.

Authenticate Amazon Q Business data accessors using a trusted token is …

Posted on September 4, 2025 by i-genie

Since its general availability in 2024, Amazon Q Business (Amazon Q) has enabled independent software vendors (ISVs) to enhance their Software as a Service (SaaS) solutions through secure access to customers’ enterprise data by becoming Amazon Q Business data accessor. To find out more on data accessor, see this page. The data accessor now supports trusted identity propagation. With trusted token issuer (TTI) authorization support, ISVs as data accessor can integrate with Amazon Q index while maintaining enterprise-grade security standards for their software-as-a-service (SaaS) solutions.
Prior to TTI support, data accessors needed to implement authorization code flow with AWS IAM Identity Center integration when accessing the Amazon Q index. With TTI support for data accessors, ISVs can now use their own OpenID Provider to authenticate enterprise users, alleviating the need for double authentication while maintaining security standards.
In this blog post, we show you how to implement TTI authorization for data accessors, compare authentication options, and provide step-by-step guidance for both ISVs and enterprises.
Prerequisites
Before you begin, make sure you have the following requirements:

An AWS account with administrator access
Access to Amazon Q Business
For ISVs:

An OpenID Connect (OIDC) compatible authorization server

For enterprises:

Amazon Q Business administrator access
Permission to create trusted token issuers

Solution Overview
This solution demonstrates how to implement TTI authentication for Amazon Q Business data accessors. The following diagram illustrates the overall flow between different resources, from ISV becoming a data accessor, customer enabling ISV data accessor, to ISV accessing customer’s Amazon Q index:

Understanding Trusted Token Issuer Authentication
Trusted Token Issuer represents an advanced identity integration capability for Amazon Q. At its core, TTI is a token exchange API that propagates identity information into IAM role sessions, enabling AWS services to make authorization decisions based on the actual end user’s identity and group memberships. This mechanism allows AWS services to apply authorization and security controls based on the authenticated user context. The TTI support simplifies the identity integration process while maintaining robust security standards, making it possible for organizations to ensure that access to Amazon Q respects user-level permissions and group memberships. This enables fine-grained access control and maintains proper security governance within Amazon Q implementations.
Trusted Token Issuer authentication simplifies the identity integration process for Amazon Q by enabling the propagation of user identity information into AWS IAM role sessions. Each token exchange allows AWS services to make authorization decisions based on the authenticated user’s identity and group memberships. The TTI support streamlines the integration process while maintaining robust security standards, enabling organizations to implement appropriate access controls within their Amazon Q implementations.
Understanding Data Accessors
A data accessor is an ISV that has registered with AWS and is authorized to use their customers’ Amazon Q index for the ISV’s Large Language Model (LLM) solution. The process begins with ISV registration, where they provide configuration information including display name, business logo, and OpenID Connect (OIDC) configuration details for TTI support.
During ISV registration, providers must specify their tenantId configuration – a unique identifier for their application tenant. This identifier might be known by different names in various applications (such as Workspace ID in Slack or Domain ID in Asana) and is required for proper customer isolation in multi-tenant environments.
Amazon Q customers then add the ISV as a data accessor to their environment, granting access to their Amazon Q index based on specific permissions and data source selections. Once authorized, the ISV can query the customers’ index through API requests using their TTI authentication flow, creating a secure and controlled pathway for accessing customer data.
Implementing TTI Authentication for Amazon Q index Access
This section explains how to implement TTI authentication for accessing the Amazon Q index. The implementation involves initial setup by the customer and subsequent authentication flow implemented by data accessors for user access.
TTI provides capabilities that enable identity-enhanced IAM role sessions through Trusted Identity Propagation (TIP), allowing AWS services to make authorization decisions based on authenticated user identities and group memberships. Here’s how it works:
To enable data accessor access to a customer’s Amazon Q index through TTI, customers must perform an initial one-time setup by adding a data accessor on Amazon Q Business application. During setup, a TTI with the data accessor’s identity provider information is created in the customer’s AWS IAM Identity Center, allowing the data accessor’s identity provider to authenticate access to the customer’s Amazon Q index.

The process to set up an ISV data accessor with TTI authentication consists of the following steps:

The customer’s IT administrator accesses their Amazon Q Business application and creates a trusted token issuer with the ISV’s OAuth information. This returns a TrustedTokenIssuer (TTI) Amazon Resource Name (ARN).
The IT administrator creates an ISV data accessor with the TTI ARN received in Step 1.
Amazon Q Business confirms the provided TTI ARN with AWS IAM Identity Center and creates a data accessor application.
Upon successful creation of the ISV data accessor, the IT administrator receives data accessor details to share with the ISV.
The IT administrator provides these details to the ISV application.

Once the data accessor setup is complete in the customer’s Amazon Q environment, users can access the Amazon Q index through the ISV application by authenticating only against the data accessor’s identity provider.

The authentication flow proceeds as follows:

A user authenticates against the data accessor’s identity provider through the ISV application. The ISV application receives an ID token for that user, generated from the ISV’s identity provider with the same client ID registered on their data accessor.
The ISV application needs to use the AWS Identity and Access Management (IAM) role that they created during the data accessor onboarding process by calling AssumeRole API, then make CreateTokenWithIAM API request to the customer’s AWS IAM Identity Center with the ID token. AWS IAM Identity Center validates the ID token with the ISV’s identity provider and returns an IAM Identity Center token.
The ISV application requests an AssumeRole API with: IAM Identity Center token, extracted identity context, and tenantId. The tenantId is a security control jointly established between the ISV and their customer, with the customer maintaining control over how it’s used in their trust relationships. This combination facilitates secure access to the correct customer environment.
The ISV application calls the SearchRelevantContent API with the session credentials and receives relevant content from the customer’s Amazon Q index.

Choosing between TTI and Authorization Code
When implementing Amazon Q integration, ISVs need to consider two approaches, each with its own benefits and considerations:

Trusted Token Issuer
Authorization Code

Advantages
Single authentication on the ISV system
Enhanced security through mandatory user initiation for each session

Enables backend-only access to SearchRelevantContent API without user interaction

Considerations
Some enterprises may prefer authentication flows that require explicit user consent for each session, providing additional control over API access timing and duration
Requires double authentication on the ISV system

Requires ISVs to host and maintain OpenID Provider

TTI excels in providing a seamless user experience through single authentication on the ISV system and enables backend-only implementations for SearchRelevantContent API access without requiring direct user interaction. However, this approach requires ISVs to maintain their own OIDC authorization server, which may present implementation challenges for some organizations. Additionally, some enterprises might have concerns about ISVs having persistent ability to make API requests on behalf of their users without explicit per-session authorization.
Next Steps
For ISVs: Becoming a Data Accessor with TTI Authentication
Getting started on Amazon Q data accessor registration process with TTI authentication is straightforward. If you already have an OIDC compatible authorization server for your application’s authentication, you’re most of the way there.
To begin the registration process, you’ll need to provide the following information:

Display name and business logo that will be displayed on AWS Management Console
OIDC configuration details (OIDC ClientId and discovery endpoint URL)
TenantID configuration details that specify how your application identifies different customer environments

For details, see Information to be provided to the Amazon Q Business team.
For ISVs using Amazon Cognito as their OIDC authorization server, here’s how to retrieve the required OIDC configuration details:

To get the OIDC ClientId:- Navigate to the Amazon Cognito console- Select your User Pool- Go to “Applications” > “App clients”- The ClientId is listed under “Client ID” for your app client
To get the discovery endpoint URL:- The URL follows this format:https://cognito-idp.{region}.amazonaws.com/{userPoolId}/.well-known/openid-configuration– Replace {region} with your AWS region (e.g., us-east-1)- Replace {userPoolId} with your Cognito User Pool IDFor example, if your User Pool is in us-east-1 with ID ‘us-east-1_abcd1234’, your discovery endpoint URL would be: https://cognito-idp.us-east-1.amazonaws.com/us-east-1_abcd1234/.well-known/openid-configuration

Note: While this example uses Amazon Cognito, the process will vary depending on your OIDC provider. Common providers like Auth0, Okta, or custom implementations will have their own methods for accessing these configuration details.
Once registered, you can enhance your generative AI application with the powerful capabilities of Amazon Q, allowing your customers to access their enterprise knowledge base through your familiar interface. AWS provides comprehensive documentation and support to help you implement the authentication flow and API integration efficiently.
For Enterprises: Enabling TTI-authenticated Data Accessor
To enable a TTI-authenticated data accessor, your IT administrator needs to complete the following steps in the Amazon Q console:

Create a trusted token issuer using the ISV’s OAuth information
Set up the data accessor with the generated TTI ARN
Configure appropriate data source access permissions

This streamlined setup allows your users to access Amazon Q index through the ISV’s application using their existing ISV application credentials, alleviating the need for multiple logins while maintaining security controls over your enterprise data.
Both ISVs and enterprises benefit from AWS’s comprehensive documentation and support throughout the implementation process, facilitating a smooth and secure integration experience.
Clean up resources
To avoid unused resources, follow these steps if you no longer need the data accessor:

Delete the data accessor:

On the Amazon Q Business console, choose Data accessors in the navigation pane
Select your data accessor and choose Delete.

Delete the TTI:

On the IAM Identity Center console, choose Trusted Token Issuers in the navigation pane.
Select the associated issuer and choose Delete.

Conclusion
The introduction of Trusted Token Issuer (TTI) authentication for Amazon Q data accessors marks a significant advancement in how ISVs integrate with Amazon Q Business. By enabling data accessors to use their existing OIDC infrastructure, we’ve alleviated the need for double authentication while maintaining enterprise-grade security standards through TTI’s robust tenant isolation mechanisms and secure multi-tenant access controls, making sure each customer’s data remains protected within their dedicated environment. This streamlined approach not only enhances the end-user experience but also simplifies the integration process for ISVs building generative AI solutions.
In this post, we showed how to implement TTI authentication for Amazon Q data accessors. We covered the setup process for both ISVs and enterprises and demonstrated how TTI authentication simplifies the user experience while maintaining security standards.
To learn more about Amazon Q Business and data accessor integration, refer to Share your enterprise data with data accessors using Amazon Q index and Information to be provided to the Amazon Q Business team. You can also contact your AWS account team for personalized guidance. Visit the Amazon Q Business console to begin using these enhanced authentication capabilities today.

About the Authors
Takeshi Kobayashi is a Senior AI/ML Solutions Architect within the Amazon Q Business team, responsible for developing advanced AI/ML solutions for enterprise customers. With over 14 years of experience at Amazon in AWS, AI/ML, and technology, Takeshi is dedicated to leveraging generative AI and AWS services to build innovative solutions that address customer needs. Based in Seattle, WA, Takeshi is passionate about pushing the boundaries of artificial intelligence and machine learning technologies.
Siddhant Gupta is a Software Development Manager on the Amazon Q team based in Seattle, WA. He is driving innovation and development in cutting-edge AI-powered solutions.
Akhilesh Amara is a Software Development Engineer on the Amazon Q team based in Seattle, WA. He is contributing to the development and enhancement of intelligent and innovative AI tools.

Unlocking the future of professional services: How Proofpoint uses Ama …

Posted on September 4, 2025 by i-genie

This post was written with Stephen Coverdale and Alessandra Filice of Proofpoint.
At the forefront of cybersecurity innovation, Proofpoint has redefined its professional services by integrating Amazon Q Business, a fully managed, generative AI powered assistant that you can configure to answer questions, provide summaries, generate content, and complete tasks based on your enterprise data. This synergy has transformed how Proofpoint delivers value to its customers, optimizing service efficiency and driving useful insights. In this post, we explore how Amazon Q Business transformed Proofpoint’s professional services, detailing its deployment, functionality, and future roadmap.
We started this journey in January 2024 and launched production use within the services team in October 2024. Since that time, the active users have achieved a 40% productivity increase in administrative tasks, with Amazon Q Apps now saving us over 18,300 hours annually. The impact has been significant given that consultants typically spend about 12 hours per week on non-call administrative tasks.
The time savings are evident in several key areas:

Over 10,000 hours annually through apps that support customer data analysis and deliver insights and recommendations
3,000 hours per year saved in executive reporting generation, which will likely double when we deploy automated presentation creation with AI-powered hyper-personalization
1,000 hours annually on meeting summarizations
300 hours per year preparing renewal justifications—but the real benefit here is how quickly we can now turn around customized content at a scale we couldn’t achieve before

Beyond these time savings, we’ve seen benefits in upskilling our teams with better access to knowledge, delivering additional value to clients, improving our renewal processes, and gaining deeper customer understanding through Amazon Q Business. This productivity increase means our consultants can focus more time on strategic initiatives and direct customer engagement, ultimately delivering higher value to our customers.
A paradigm shift in cybersecurity service delivery
Proofpoint’s commitment to evolving our customer interactions into delightful experiences led us to adopt Amazon Q Business across our services and consulting teams. This integration has enabled:

Enhanced productivity – Consultants save significant time on repetitive tasks, reallocating focus to high-value client interactions
Deeper insights – AI-driven analytics provide a granular understanding of customer environments
Scalable solutions – Tailored applications (Amazon Q Apps) empower consultants to address customer needs effectively

Transformative use cases through Amazon Q Apps
Amazon Q Business has been instrumental in our deployment, and we’ve developed over 30 custom Amazon Q Apps, each addressing specific challenges within our service portfolio.
Some of the use cases are:
1. Follow-up email automation

Challenge – Consultants spent hours drafting follow-up emails post-meetings
Solution – Amazon Q Apps generates curated follow-up emails outlining discussion points and action items
Impact – Consistent customer tracking, reduced response time, and multilingual capabilities for global reach

2. Health check analysis

Challenge – Analyzing complex customer health assessments and understanding customer changes over time
Solution – Amazon Q Apps compares files, providing an overview of key changes between two health checks, and a generated summary to help support customer business reviews (CBRs) and progress updates
Impact – Improved communication and enhanced customer satisfaction

3. Renewal justifications

Challenge – Time-intensive preparation for renewal discussions
Solution – Tailored renewal justification points crafted to demonstrate the value we’re delivering
Impact – Scalable, targeted value articulation, fostering customer retention

4. Drafting custom responses

Challenge – Providing in-depth and specific responses for customer inquiries
Solution – Amazon Q Apps creates a personalized email draft using our best practices and documentation
Impact – Faster, more accurate communication

The following diagram shows the Proofpoint use cases for Amazon Q Business.

The following diagram shows the Proofpoint implementation. Proofpoint Chat UI is the front end that connects to Amazon Q Business, which connects to data sources in Amazon Simple Storage Service (Amazon S3), Amazon Redshift, Microsoft SharePoint, and Totango.

Data strategy: Laying the foundation to a successful deployment
Proofpoint’s successful integration of Amazon Q Business followed a comprehensive data strategy and a phased deployment approach. The journey involved crucial data preparation and documentation overhaul with key aspects noted below.
Quality documentation:

Conducted thorough review of existing documentation
Organized and added metadata to our documentation for improved accessibility
Established vetting process for new documents

Knowledge capture:

Developed processes to document tribal knowledge
Created strategies for ongoing knowledge enrichment
Established metadata tagging standards for improved searchability

We’ve primarily used Microsoft SharePoint document libraries to manage and support this process, and we’re now replicating this model as we onboard additional teams. Conducting sufficient testing that Amazon Q Business remains accurate is a key to maintaining the high efficacy we’ve seen from the results.
Going forward, we’re also expanding our data strategy to capture more information and insights into our customer journey. We want to make more data sources available to Amazon Q Business to expand this project scope so it covers more work tasks and more teams.
Journey of our successful Amazon Q Business rollout
Through our AWS Enterprise Support relationship, Proofpoint received full support on this project from the AWS account team, who helped us evaluate in detail the viability of the project and use expert technical resources. They engaged fully to help our teams with the use of service features and functionality and gain early usage of new feature previews. These helped us optimize and align our development timelines with the service roadmaps.
We established a rigorous vetting process for new documents to maintain data quality and developed strategies to document institutional knowledge. This made sure our AI assistant was trained in the most accurate and up-to-date information. This process enlightened us to the full benefits of Amazon Q Business.
The pilot and discovery phases were critical in shaping our AI strategy. We quickly identified the limitations of solely having the chat functionality and recognized the game-changing potential of Amazon Q Apps. To make sure we were addressing real needs, we conducted in-depth interviews with consultants to determine pain points so we could then invest in developing the Amazon Q Apps that would provide the most benefits and time savings. App development and refinement became a central focus of our efforts. We spent a significant amount of time prompt engineering our apps to provide consistent high-quality results that would provide practical value to our users and encourage them to adopt the apps as part of their processes. We also continued updating the weighting of our documents, using the metadata to enhance the output. This additional work upfront led to a successful deployment.
Lessons learned
Throughout our journey of integrating Amazon Q Business, we’ve gleaned valuable lessons that have shaped our approach to AI implementation within our services and consulting areas. One of the most compelling insights is the importance of a robust data strategy. We’ve learned that AI is only as smart as we make it, and the quality of data fed into the system directly impacts its performance. This realization led us to invest significant time in identifying avenues to make our AI smarter, with a focus on developing a clear data strategy across our services and consulting teams to make sure we realize the full benefits of AI. We also discovered that having AI thought leaders embedded within our services function is key to the success of AI implementation, to bring that necessary understanding of both the technology and our business processes.
Another lesson was that time investment is required to get the most out of Amazon Q Business. The customization and ongoing management are key to delivering optimal results. We found that creating custom apps is the most effective route to adoption. Amazon Q Business features no-code simplicity for creating the apps by business-oriented teams instead of programmers. The prompt engineering required to provide high-quality and consistent results is a time-intensive process. This underscores the need for dedicated resources with expertise in AI, our business, and our processes.
Experimentation on agentic features
Amazon Q Business has taken a significant leap forward in enhancing workplace productivity with the introduction of an intelligent orchestration feature for Amazon Q Business. This feature transforms how users interact with their enterprise data and applications by automatically directing queries to appropriate data sources and plugins. Instead of manually switching between different work applications, users now seamlessly interact with popular business tools such as Jira, Salesforce, ServiceNow, Smartsheet, and PagerDuty through a single conversational interface. The feature uses Retrieval Augmented Generation (RAG) data for enterprise-specific knowledge and works with both built-in and custom plugins, making it a powerful addition to the workplace technology landscape. We’re experimenting on agentic integration with Totango and a few other custom plugins with Orchestrator and are seeing good results.
Looking ahead
Looking ahead, Proofpoint has outlined an ambitious roadmap for expanding our Amazon Q Business deployment across our customer-facing teams. The key priorities of this roadmap include:

Expansion of data sources – Proofpoint will be working to incorporate more data sources, helping to unify our information across our teams and allowing for a more comprehensive view of our customers. This will include using the many Amazon Q Business data source connectors, such as Salesforce, Confluence, Amazon S3, and Smartsheet, and will expand the impact of our Amazon Q Apps.
Using Amazon Q Business actions – Building on our successful Amazon Q deployment, Proofpoint is set to enhance its tool integration strategy to further streamline operations and reduce administrative burden. We plan to take advantage of Amazon Q Business actions using the plugin capabilities so we can post data into our different customer success tools. With this integration approach, we can take note of more detailed customer insights. For example, we can capture project progress from a meeting transcript and store it in our customer success tool to identify sentiment concerns. We’ll be able to gather richer data about our customer engagements, which translates to providing even greater and more personalized service to our customers.
Automated workflows – Future enhancements will include expanded automation and integrations to further streamline our service delivery. By combining our enhanced data sources with automated actions, we can make sure our teams receive the right information and insights at the right time while reducing manual intervention.
Data strategy enhancement – We’ll continue to refine our data strategy across Proofpoint Premium Services, establishing best practices for documentation and implementing systems to record undocumented knowledge. This will include developing better ways to understand and document our customer journey through the integration of various tools and data sources.

Security and compliance
As a cybersecurity leader, Proofpoint makes sure that AI processes comply with strict security and privacy standards:

Secure integration – Amazon Q Apps seamlessly connects to our various data sources, safeguarding sensitive data
Continuous monitoring – Embedded feedback mechanisms and daily synchronization uphold quality control

Conclusion: Redefining cybersecurity services
Amazon Q Business exemplifies Proofpoint’s innovative approach to cybersecurity. With Amazon Q Business AI capabilities, we’re elevating our customer experience and scaling our service delivery.
As we refine and expand this program, our focus remains unwavering: delivering unmatched value and protection to our clients. Through Amazon Q Business, Proofpoint continues to set the benchmark in cybersecurity services, making sure organizations can navigate an increasingly complex threat landscape with confidence.
Learn more

More Amazon Q Business Blogs
Amazon Q main product page
Amazon Q details for IT pros and developers
Get started with Amazon Q

About the Authors
Stephen Coverdale is a Senior Manager, Professional Services at Proofpoint. In addition to managing a Professional Services team, he leads an AI integration team developing and driving a strategy to leverage the transformative capabilities of AI within Proofpoint’s services teams to enhance Proofpoint’s client engagements.
Alessandra Filice is a Senior AI Integration Specialist at Proofpoint, where she plays a lead role in implementing AI solutions across Proofpoint’s services teams. In this role, she specializes in developing and deploying AI capabilities to enhance service delivery and operational efficiency. Working closely with stakeholders across Proofpoint, she identifies opportunities for AI implementation, designs innovative solutions, and facilitates successful integration of AI technologies.
Ram Krishnan is a Senior Technical Account Manager at AWS. He serves as a key technical resource for independent software vendor (ISV) customers, providing help and guidance across their AWS needs including AI/ML focus — from adoption and migration to design, deployment, and optimizations across AWS services.
Abhishek Maligehalli Shivalingaiah is a Senior Generative AI Solutions Architect at AWS, specializing in Amazon Q Business. With a deep passion for using agentic AI frameworks to solve complex business challenges, he brings nearly a decade of expertise in developing data and AI solutions that deliver tangible value for enterprises. Beyond his professional endeavors, Abhishek is an artist who finds joy in creating portraits of family and friends, expressing his creativity through various artistic mediums.

Enhancing LLM accuracy with Coveo Passage Retrieval on Amazon Bedrock

Posted on September 4, 2025 by i-genie

This post is co-written with Keith Beaudoin and Nicolas Bordeleau from Coveo.
As generative AI transforms business operations, enterprises face a critical challenge: how can they help large language models (LLMs) provide accurate and trustworthy responses? Without reliable data foundations, these AI models can generate misleading or inaccurate responses, potentially reducing user trust and organizational credibility.
As an AWS Partner, Coveo addresses this challenge with its Passage Retrieval API. This solution enhances the reliability of LLM-powered applications by providing them with relevant, context-aware enterprise knowledge to inform generated responses. In Retrieval Augmented Generation (RAG) systems, the retrieval process is the most complex component. It requires extracting the most relevant, precise information from enterprise data sources. By integrating the Coveo AI-Relevance Platform with Amazon Bedrock Agents, organizations gain access to an industry-leading enterprise search service featuring a secured, unified hybrid index that respects enterprise permission models and offers robust connectivity. The Coveo AI-Relevance Platform uses machine learning (ML) and in-depth usage analytics to continuously optimize relevance. This enables Amazon Bedrock Agents to deliver grounded, contextually relevant responses tailored to complex enterprise content.
The Coveo AI-Relevance Platform is an industry-leading service that connects and unifies the content of cloud and on-premises repositories in a single index, making it fast and simple to find relevant content quickly. Its ML algorithms analyze user behavior, in-app context, as well as profile and permissions data to retrieve personalized search results and recommendations. It then aggregates and reports insights back to content and experience managers. By integrating seamlessly with enterprise systems (such as websites, knowledge bases, and CRM) and enforcing security permissions, Coveo helps users get the most pertinent information while maintaining data protection.
In this post, we show how to deploy Coveo’s Passage Retrieval API as an Amazon Bedrock Agents action group to enhance response accuracy, so Coveo users can use their current index to rapidly deploy new generative experiences across their organization.
Coveo’s Passage Retrieval API
The Coveo Passage Retrieval API enhances LLM applications by passing ranked text passages (or chunks) retrieved from the unified index, along with appropriate metadata such as source URLs for citations, so that generated answers are grounded in an organization’s proprietary knowledge. Built on Coveo’s unified hybrid index, the Passage Retrieval API applies a two-stage retrieval process to extract the most relevant passages from structured and unstructured content sources, providing accuracy, security, and real-time relevance. The following diagram illustrates these stages.

The process consists of the following key components:

Relevant passage extraction using a two-stage retrieval process – In the first retrieval stage, Coveo’s hybrid search system is used to identify the most relevant documents. Then, it extracts the most relevant text passages from these documents, along with ranking scores, citation links, and other key metadata. This two-stage approach allows Coveo to identify accessible and relevant documents from the sources and then more precisely identify the most relevant passages.
Enhanced search results with hybrid ranking – Combining semantic (vector) search, and lexical (keyword) matching helps Coveo retrieve the right information in the right context.
ML relevancy – AI continuously learns from user interactions, tailoring retrieval to each user’s journey, behavior, and profile for context-aware responses.
Content connected across sources with a unified index – A centralized hybrid index connects structured and unstructured content across sources. This unified hybrid index provides better multi-source relevancy than a federated search approach by applying the ranking function across sources. Coveo also provides a robust library of pre-built connectors to maintain seamless integration with third-party services (such as Salesforce, SharePoint, and Google Drive), facilitating data freshness with automatic updates for real-time retrieval.
Analytics and insights for performance tracking – With events tracking through the Data Platform and Knowledge Hub, you can see exactly how your generated answers perform, where information is missing or underused, and which content needs tuning. With those insights, you can boost answer quality and drive measurable business impact.
Enterprise-grade security – Coveo provides the native permission model of each connected content source by importing item‑level permissions at crawl time through an early‑binding approach. Resolving access rights before indexing helps prevent data leakage and boosts search performance by filtering out items a user can’t see before a query is run.
Redefining enterprise-ready RAG – Coveo reimagines what RAG systems can achieve by going beyond basic vector search, offering a dynamic and intelligent retrieval pipeline designed for enterprise needs. At its core is a unified hybrid index that seamlessly connects structured, unstructured, and permission-sensitive data. This foundation, combined with real-time ML-driven tuning, helps validate that every response delivered to an LLM is relevant and securely grounded in the right context.

Through native access control enforcement, behavior-based relevance adjustment, and deep analytics into content usage and performance, Coveo empowers organizations to continuously refine their generative AI experiences. Backed by consistent recognition from leading analyst firms such as Gartner, Forrester, and IDC, Coveo delivers a reliable, secure, and scalable foundation for enterprise-grade generative AI.
Solution overview
This solution demonstrates how you can integrate Coveo’s Passage Retrieval API with Amazon Bedrock Agents to build LLM-powered assistants that deliver accurate, context-aware, and grounded responses. It applies broadly across use cases where agents need to access proprietary knowledge, whether to support customers, assist employees, or empower sales and service teams. This approach helps AI responses stay grounded in your most relevant and trusted information across structured and unstructured content. Amazon Bedrock Agents acts as the intelligent backbone of this solution, interpreting natural language queries and orchestrating the retrieval process to deliver grounded, contextually relevant insights. For this use case, the agent is equipped to answer a user’s questions on Coveo services capabilities, API documentation, and integration guides. The agent is designed to fetch precise passages from enterprise content in response to user questions, enabling applications such as virtual assistants, support copilots, or internal knowledge bots. By using structured definitions and instructions, the agent understands when and how to trigger Coveo’s Passage Retrieval API, making sure that LLM-generated responses are backed by accurate and trusted content.
The action group defines the structured API operations that the Amazon Bedrock agent can invoke. Using OpenAPI specifications, it defines the interface between the agent and AWS Lambda functions. In this use case, fetching relevant passages based on the user’s search intent is the only available operation.
The following diagram illustrates the solution architecture.

For this demo, the agent is set with the following instructions during creation:

You will act as an expert on Coveo documentation, platform, APIs, analytics, and integration guides.
   Use the CoveoPRAPI Action Group to retrieve relevant information on Coveo documentation, platform, APIs, analytics, and integration guides.
   Summarize the information retrieved from the Action Group response clearly and accurately.
   OuputFormatting guidelines:
   – Provide clear, direct answers without introductory phrases such as “As an expert,” “Sure,” or “Here is…”
   – When appropriate, organize content using:
   – Numbered or bulleted lists
   – Structured sections (e.g., features, steps, key points)
   Keep answers concise, informative, and free of conversational filler.

The Lambda function defined in the action group is essential for enabling the Amazon Bedrock agent to call the Coveo Passage Retrieval API. The Lambda function performs the following tasks:

Receives incoming requests from the Amazon Bedrock agent
Queries the Coveo Passage Retrieval API using the user’s input
Returns the relevant search results back to the Amazon Bedrock agent

Lastly, the Coveo AI-Relevance Platform provides indexed and structured search results through the Passage Retrieval API.
Prerequisites
Before you begin, you must have the following prerequisites:

An AWS account with AWS Identity and Access Management (IAM) permissions to deploy an AWS CloudFormation stack
Access to models in Amazon Bedrock
A Coveo index created and ready to use
The following Coveo information:

Coveo organization ID
Coveo search hub
Coveo API key

Deploy the solution with AWS CloudFormation
To deploy your agent, complete the following steps:

Launch the CloudFormation stack:
Enter a stack name and values for AgentModelID, AgentName, CoveoApiKey, CoveoOrgID, and CoveoSearchHub.
In the Capabilities section, select the acknowledge check box.
Choose Create stack.
Wait for the stack creation to complete.
Verify all resources are created on the stack details page.

Test the solution
To test the solution, complete the following steps:

On the Amazon Bedrock console, choose Agents in the navigation pane.
Choose the agent created by the CloudFormation stack.

Under Test, enter your question in the message box.

For this demo, Coveo technical documentation (from the website) was ingested in an existing Coveo Search index. Let’s start with the query “What is the difference between Coveo Atomic and Headless?”
Here’s how the agent answered the question when Claude 3 Haiku v1 is used as the LLM.

Choose Show trace in the right pane and expand Trace step 1 to see the agent’s rationale.

The following screenshot demonstrates how Amazon Bedrock Agents processed and answered the question. First, it formed a rationale:

To understand the difference between Coveo Atomic and Headless, I will need to retrieve relevant information from the Coveo technical documentation using the CoveoPRAPI Action Group.

Then it invokes the CoveoPRAP action group, which is specifically designed to retrieve relevant passages, through a Lambda function to make an API call to /rest/search/v3/passages/retrieve.
This example illustrates the agent’s systematic approach to planning and executing necessary actions through the CoveoPRAP1 action group, and retrieving relevant document chunks before formulating its final response.

The Lambda function code includes a debugging feature that logs each retrieved passage from the Passage Retrieval API. This logging mechanism iterates through the returned chunks, numbering them sequentially for quick reference. These logs are available in Amazon CloudWatch, so you can see exactly which passages were retrieved for each user query, and how they contributed to the final response. To visualize the logs, open the CloudWatch console, and on the Log groups page, locate the Lambda function name.
The following screenshot shows the agent detailed logs in CloudWatch. In the logs, the Coveo Passage Retrieval API returns the five most relevant chunks to the LLM.

Clean up
To avoid ongoing charges, delete the resources deployed as part of the CloudFormation stack:

On the AWS CloudFormation console, choose Stacks in the navigation pane.
Choose the stack you created, then choose Delete.
Choose Delete stack when prompted.

For more information, refer to Deleting a stack on the AWS CloudFormation console.
Conclusion
By integrating Amazon Bedrock capabilities with Coveo’s AI-driven retrieval, organizations can develop AI applications that provide validated responses based on their enterprise content. This approach helps reduce inaccuracies while delivering real-time, secure responses.
We encourage you to explore pre-built examples in the GitHub repository to help you get started with Amazon Bedrock.
To learn more about the Coveo AI-Relevance Platform and how to implement the Passage Retrieval API in your Coveo organization, refer to Passage Retrieval (CPR) implementation overview.

About the authors
Yanick Houngbedji is a Solutions Architect for Independent Software Vendors (ISV) at Amazon Web Services (AWS), based in Montréal, Canada. He specializes in helping customers architect and implement highly scalable, performant, and secure cloud solutions on AWS. Before joining AWS, he spent over 8 years providing technical leadership in data engineering, big data analytics, business intelligence, and data science solutions.
Keith Beaudoin is a Senior Solution Architect at Coveo Labs. He is specialized in designing and implementing intelligent search and AI-powered relevance solutions, with expertise in Agentic solutions, cloud technologies, search architecture, and third-party integrations. Keith helps organizations harness the full potential of Coveo’s platform, optimizing digital transformation strategies for seamless and impactful search implementations that drive business value
Nicolas Bordeleau is Head of Product Relations at Coveo. With over 19 years of experience in the search industry, including 11 years in product management, Nicolas has a keen understanding of enterprises and developers’ needs related to search and information retrieval. He has applied this knowledge to develop award-winning products that fulfill those needs in innovative ways.

Google AI Introduces Stax: A Practical AI Tool for Evaluating Large La …

Posted on September 3, 2025 by i-genie

Evaluating large language models (LLMs) is not straightforward. Unlike traditional software testing, LLMs are probabilistic systems. This means they can generate different responses to identical prompts, which complicates testing for reproducibility and consistency. To address this challenge, Google AI has released Stax, an experimental developer tool that provides a structured way to assess and compare LLMs with custom and pre-built autoraters.

Stax is built for developers who want to understand how a model or a specific prompt performs for their use cases rather than relying solely on broad benchmarks or leaderboards.

Why Standard Evaluation Approaches Fall Short

Leaderboards and general-purpose benchmarks are useful for tracking model progress at a high level, but they don’t reflect domain-specific requirements. A model that does well on open-domain reasoning tasks may not handle specialized use cases such as compliance-oriented summarization, legal text analysis, or enterprise-specific question answering.

Stax addresses this by letting developers define the evaluation process in terms that matter to them. Instead of abstract global scores, developers can measure quality and reliability against their own criteria.

Key Capabilities of Stax

Quick Compare for Prompt Testing

The Quick Compare feature allows developers to test different prompts across models side by side. This makes it easier to see how variations in prompt design or model choice affect outputs, reducing time spent on trial-and-error.

Projects and Datasets for Larger Evaluations

When testing needs to go beyond individual prompts, Projects & Datasets provide a way to run evaluations at scale. Developers can create structured test sets and apply consistent evaluation criteria across many samples. This approach supports reproducibility and makes it easier to evaluate models under more realistic conditions.

Custom and Pre-Built Evaluators

At the center of Stax is the concept of autoraters. Developers can either build custom evaluators tailored to their use cases or use the pre-built evaluators provided. The built-in options cover common evaluation categories such as:

Fluency – grammatical correctness and readability.

Groundedness – factual consistency with reference material.

Safety – ensuring the output avoids harmful or unwanted content.

This flexibility helps align evaluations with real-world requirements rather than one-size-fits-all metrics.

Analytics for Model Behavior Insights

The Analytics dashboard in Stax makes results easier to interpret. Developers can view performance trends, compare outputs across evaluators, and analyze how different models perform on the same dataset. The focus is on providing structured insights into model behavior rather than single-number scores.

Practical Use Cases

Prompt iteration – refining prompts to achieve more consistent results.

Model selection – comparing different LLMs before choosing one for production.

Domain-specific validation – testing outputs against industry or organizational requirements.

Ongoing monitoring – running evaluations as datasets and requirements evolve.

Summary

Stax provides a systematic way to evaluate generative models with criteria that reflect actual use cases. By combining quick comparisons, dataset-level evaluations, customizable evaluators, and clear analytics, it gives developers tools to move from ad-hoc testing toward structured evaluation.

For teams deploying LLMs in production environments, Stax offers a way to better understand how models behave under specific conditions and to track whether outputs meet the standards required for real applications.
The post Google AI Introduces Stax: A Practical AI Tool for Evaluating Large Language Models LLMs appeared first on MarkTechPost.

Apple Released FastVLM: A Novel Hybrid Vision Encoder which is 85x Fas …

Posted on September 3, 2025 by i-genie

Table of contentsIntroductionExisting VLM ArchitecturesApple’s FastVLMBenchmark ComparisonsConclusion

Introduction

Vision Language Models (VLMs) allow both text inputs and visual understanding. However, image resolution is crucial for VLM performance for processing text and chart-rich data. Increasing image resolution creates significant challenges. First, pretrained vision encoders often struggle with high-resolution images due to inefficient pretraining requirements. Running inference on high-resolution images increases computational costs and latency during visual token generation, whether through single high-resolution processing or multiple lower-resolution tile strategies. Second, high-resolution images produce more tokens, which leads to an increase in LLM prefilling time and time-to-first-token (TTFT), which is the sum of the vision encoder latency and the LLM prefilling time.

Existing VLM Architectures

Large multimodal models such as Frozen and Florence used cross-attention to combine image and text embeddings within the intermediate LLM layers. Auto-regressive architectures like LLaVA, mPLUG-Owl, MiniGPT-4, and Cambrian-1 are effective. For efficient image encoding, CLIP-pretrained vision transformers remain widely adopted, with variants like SigLIP, EVA-CLIP, InternViT, and DFNCLIP. Methods like LLaVA-PruMerge and Matryoshka-based token sampling attempt dynamic token pruning, while hierarchical backbones such as ConvNeXT and FastViT reduce token count through progressive downsampling. Recently, ConvLLaVA was introduced, which uses a pure-convolutional vision encoder to encode images for a VLM.

Apple’s FastVLM

Researchers from Apple have proposed FastVLM, a model that achieves an optimized tradeoff between resolution, latency, and accuracy by analyzing how image quality, processing time, number of tokens, and LLM size affect each other. It utilizes FastViTHD, a hybrid vision encoder designed to output fewer tokens and reduce encoding time for high-resolution images. FastVLM achieves an optimal balance between visual token count and image resolution only by scaling the input image. It shows a 3.2 times improvement in TTFT in the LLaVA1.5 setup and achieves superior performance on key benchmarks using the same 0.5B LLM when compared to LLaVA-OneVision at maximum resolution. It delivers 85 times faster TTFT while using a 3.4 times smaller vision encoder.

All FastVLM models are trained on a single node with 8 times NVIDIA H100-80GB GPUs, where stage 1 training of VLM is fast, taking around 30 minutes to train with a Qwen2-7B decoder. Further, FastViTHD enhances the base FastViT architecture by introducing an additional stage with a downsampling layer. This ensures self-attention operates on tensors downsampled by a factor of 32 rather than 16, reducing image encoding latency while generating 4 times fewer tokens for the LLM decoder. The FastViTHD architecture contains five stages: the first three stages utilize RepMixer blocks for efficient processing, while the final two stages employ multi-headed self-attention blocks, creating an optimal balance between computational efficiency and high-resolution image understanding.

Benchmark Comparisons

When compared with ConvLLaVA using the same LLM and similar training data, FastVLM achieves 8.4% better performance on TextVQA and 12.5% improvement on DocVQA while operating 22% faster. The performance advantage increases at higher resolutions, where FastVLM maintains 2× faster processing speeds than ConvLLaVA across various benchmarks. FastVLM matches or surpasses MM1 performance across diverse benchmarks by using intermediate pretraining with 15M samples for resolution scaling, while generating 5 times fewer visual tokens. Moreover, FastVLM not only outperforms Cambrian-1 but also runs 7.9 times faster. With scaled instruction tuning, it delivers better results while using 2.3 times fewer visual tokens.

Conclusion

In conclusion, researchers introduced FastVLM, an advancement in VLM by utilizing the FastViTHD vision backbone for efficient high-resolution image encoding. The hybrid architecture, pretrained on reinforced image-text data, reduces visual token output while maintaining minimal accuracy sacrifice compared to existing approaches. FastVLM achieves competitive performance across VLM benchmarks while delivering notable efficiency improvements in both TTFT and vision backbone parameter count. Rigorous benchmarking on M1 MacBook Pro hardware shows that FastVLM offers a state-of-the-art resolution-latency-accuracy trade-off superior to the current methods.

Check out the Paper and Model on Hugging Face. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

The post Apple Released FastVLM: A Novel Hybrid Vision Encoder which is 85x Faster and 3.4x Smaller than Comparable Sized Vision Language Models (VLMs) appeared first on MarkTechPost.