Uncategorized Archives - Page 38 of 269

Building an Advanced Convolutional Neural Network with Attention for D …

Posted on September 16, 2025 by i-genie

In this tutorial, we take a hands-on approach to building an advanced convolutional neural network for DNA sequence classification. We focus on simulating real biological tasks, such as promoter prediction, splice site detection, and regulatory element identification. By combining one-hot encoding, multi-scale convolutional layers, and an attention mechanism, we design a model that not only learns complex motifs but also provides interpretability. As we progress, we generate synthetic data, train with robust callbacks, and visualize results to ensure we fully understand the strengths and limitations of our approach. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserimport numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
import seaborn as sns
import random

np.random.seed(42)
tf.random.set_seed(42)
random.seed(42)

We begin by importing the libraries for deep learning, data handling, and visualization. We set random seeds to ensure reproducibility so that our experiments run consistently each time. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserclass DNASequenceClassifier:
def __init__(self, sequence_length=200, num_classes=2):
self.sequence_length = sequence_length
self.num_classes = num_classes
self.model = None
self.history = None

def one_hot_encode(self, sequences):
mapping = {‘A’: 0, ‘T’: 1, ‘G’: 2, ‘C’: 3}
encoded = np.zeros((len(sequences), self.sequence_length, 4))

for i, seq in enumerate(sequences):
for j, nucleotide in enumerate(seq[:self.sequence_length]):
if nucleotide in mapping:
encoded[i, j, mapping[nucleotide]] = 1
return encoded

def attention_layer(self, inputs, name=”attention”):
attention_weights = layers.Dense(1, activation=’tanh’, name=f”{name}_weights”)(inputs)
attention_weights = layers.Flatten()(attention_weights)
attention_weights = layers.Activation(‘softmax’, name=f”{name}_softmax”)(attention_weights)
attention_weights = layers.RepeatVector(inputs.shape[-1])(attention_weights)
attention_weights = layers.Permute([2, 1])(attention_weights)

attended = layers.Multiply(name=f”{name}_multiply”)([inputs, attention_weights])
return layers.GlobalMaxPooling1D()(attended)

def build_model(self):
inputs = layers.Input(shape=(self.sequence_length, 4), name=”dna_input”)

conv_layers = []
filter_sizes = [3, 7, 15, 25]

for i, filter_size in enumerate(filter_sizes):
conv = layers.Conv1D(
filters=64,
kernel_size=filter_size,
activation=’relu’,
padding=’same’,
name=f”conv_{filter_size}”
)(inputs)
conv = layers.BatchNormalization(name=f”bn_conv_{filter_size}”)(conv)
conv = layers.Dropout(0.2, name=f”dropout_conv_{filter_size}”)(conv)

attended = self.attention_layer(conv, name=f”attention_{filter_size}”)
conv_layers.append(attended)

if len(conv_layers) > 1:
merged = layers.Concatenate(name=”concat_multiscale”)(conv_layers)
else:
merged = conv_layers[0]

dense = layers.Dense(256, activation=’relu’, name=”dense_1″)(merged)
dense = layers.BatchNormalization(name=”bn_dense_1″)(dense)
dense = layers.Dropout(0.5, name=”dropout_dense_1″)(dense)

dense = layers.Dense(128, activation=’relu’, name=”dense_2″)(dense)
dense = layers.BatchNormalization(name=”bn_dense_2″)(dense)
dense = layers.Dropout(0.3, name=”dropout_dense_2″)(dense)

if self.num_classes == 2:
outputs = layers.Dense(1, activation=’sigmoid’, name=”output”)(dense)
loss = ‘binary_crossentropy’
metrics = [‘accuracy’, ‘precision’, ‘recall’]
else:
outputs = layers.Dense(self.num_classes, activation=’softmax’, name=”output”)(dense)
loss = ‘categorical_crossentropy’
metrics = [‘accuracy’]

self.model = keras.Model(inputs=inputs, outputs=outputs, name=”DNA_CNN_Classifier”)

optimizer = keras.optimizers.Adam(
learning_rate=0.001,
beta_1=0.9,
beta_2=0.999,
epsilon=1e-7
)

self.model.compile(
optimizer=optimizer,
loss=loss,
metrics=metrics
)

return self.model

def generate_synthetic_data(self, n_samples=10000):
sequences = []
labels = []

positive_motifs = [‘TATAAA’, ‘CAAT’, ‘GGGCGG’, ‘TTGACA’]
negative_motifs = [‘AAAAAAA’, ‘TTTTTTT’, ‘CCCCCCC’, ‘GGGGGGG’]

nucleotides = [‘A’, ‘T’, ‘G’, ‘C’]

for i in range(n_samples):
sequence = ”.join(random.choices(nucleotides, k=self.sequence_length))

if i < n_samples // 2:
motif = random.choice(positive_motifs)
pos = random.randint(0, self.sequence_length – len(motif))
sequence = sequence[:pos] + motif + sequence[pos + len(motif):]
label = 1
else:
if random.random() < 0.3:
motif = random.choice(negative_motifs)
pos = random.randint(0, self.sequence_length – len(motif))
sequence = sequence[:pos] + motif + sequence[pos + len(motif):]
label = 0

sequences.append(sequence)
labels.append(label)

return sequences, np.array(labels)

def train(self, X_train, y_train, X_val, y_val, epochs=50, batch_size=32):
callbacks = [
keras.callbacks.EarlyStopping(
monitor=’val_loss’,
patience=10,
restore_best_weights=True
),
keras.callbacks.ReduceLROnPlateau(
monitor=’val_loss’,
factor=0.5,
patience=5,
min_lr=1e-6
)
]

self.history = self.model.fit(
X_train, y_train,
validation_data=(X_val, y_val),
epochs=epochs,
batch_size=batch_size,
callbacks=callbacks,
verbose=1
)

return self.history

def evaluate_and_visualize(self, X_test, y_test):
y_pred_proba = self.model.predict(X_test)
y_pred = (y_pred_proba > 0.5).astype(int).flatten()

print(“Classification Report:”)
print(classification_report(y_test, y_pred))

fig, axes = plt.subplots(2, 2, figsize=(15, 10))

axes[0,0].plot(self.history.history[‘loss’], label=’Training Loss’)
axes[0,0].plot(self.history.history[‘val_loss’], label=’Validation Loss’)
axes[0,0].set_title(‘Training History – Loss’)
axes[0,0].set_xlabel(‘Epoch’)
axes[0,0].set_ylabel(‘Loss’)
axes[0,0].legend()

axes[0,1].plot(self.history.history[‘accuracy’], label=’Training Accuracy’)
axes[0,1].plot(self.history.history[‘val_accuracy’], label=’Validation Accuracy’)
axes[0,1].set_title(‘Training History – Accuracy’)
axes[0,1].set_xlabel(‘Epoch’)
axes[0,1].set_ylabel(‘Accuracy’)
axes[0,1].legend()

cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt=’d’, ax=axes[1,0], cmap=’Blues’)
axes[1,0].set_title(‘Confusion Matrix’)
axes[1,0].set_ylabel(‘Actual’)
axes[1,0].set_xlabel(‘Predicted’)

axes[1,1].hist(y_pred_proba[y_test==0], bins=50, alpha=0.7, label=’Negative’, density=True)
axes[1,1].hist(y_pred_proba[y_test==1], bins=50, alpha=0.7, label=’Positive’, density=True)
axes[1,1].set_title(‘Prediction Score Distribution’)
axes[1,1].set_xlabel(‘Prediction Score’)
axes[1,1].set_ylabel(‘Density’)
axes[1,1].legend()

plt.tight_layout()
plt.show()

return y_pred, y_pred_proba

We define a DNASequenceClassifier that encodes sequences, learns multi-scale motifs with CNNs, and applies an attention mechanism for interpretability. We build and compile the model, generate synthetic motif-rich data, and then train with robust callbacks and visualize performance to evaluate classification quality. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserdef main():
print(” Advanced DNA Sequence Classification with CNN”)
print(“=” * 50)

classifier = DNASequenceClassifier(sequence_length=200, num_classes=2)

print(“Generating synthetic DNA sequences…”)
sequences, labels = classifier.generate_synthetic_data(n_samples=10000)

print(“Encoding DNA sequences…”)
X = classifier.one_hot_encode(sequences)

X_trn, X_test, y_trn, y_test = train_test_split(
X, labels, test_size=0.2, random_state=42, stratify=labels
)
X_trn, X_val, y_trn, y_val = train_test_split(
X_trn, y_trn, test_size=0.2, random_state=42, stratify=y_train
)

print(f”Training set: {X_train.shape}”)
print(f”Validation set: {X_val.shape}”)
print(f”Test set: {X_test.shape}”)

print(“Building CNN model…”)
model = classifier.build_model()
print(model.summary())

print(“Training model…”)
classifier.train(X_train, y_train, X_val, y_val, epochs=30, batch_size=64)

print(“Evaluating model…”)
y_pred, y_pred_proba = classifier.evaluate_and_visualize(X_test, y_test)

print(” Training and evaluation complete!”)

if __name__ == “__main__”:
main()

We wrap up the workflow in the main() function, where we generate synthetic DNA data, encode it, split it into training, validation, and test sets, then build, train, and evaluate our CNN model. We conclude by visualizing the performance and confirming that the classification pipeline runs successfully from start to finish.

In conclusion, we successfully demonstrate how a carefully designed CNN with attention can classify DNA sequences with high accuracy and interpretability. We see how synthetic biological motifs help validate the model’s capacity for pattern recognition, and how visualization techniques provide meaningful insights into training dynamics and predictions. Through this journey, we enhance our ability to integrate deep learning architectures with biological data, laying the groundwork for applying these methods to real-world genomics research.

Check out the FULL CODES here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post Building an Advanced Convolutional Neural Network with Attention for DNA Sequence Classification and Interpretability appeared first on MarkTechPost.

OpenAI Introduces GPT-5-Codex: An Advanced Version of GPT-5 Further Op …

Posted on September 16, 2025 by i-genie

OpenAI has just released GPT-5-Codex, a version of GPT-5 further optimized for “agentic coding” tasks within the Codex ecosystem. The goal: improve reliability, speed, and autonomous behavior so that Codex acts more like a teammate, not just a prompt-executor.

Codex is now available across the full developer workflow: CLI, IDE extensions, web, mobile, GitHub code reviews. It integrates well with cloud environments and developer tools.

https://openai.com/index/introducing-upgrades-to-codex/

Key Capabilities / Improvements

Agentic behaviorGPT-5-Codex can take on long, complex, multi-step tasks more autonomously. It balances “interactive” sessions (short feedback loops) with “independent execution” (long refactors, tests, etc.).

Steerability & style complianceLess need for developers to micro-specify style / hygiene. The model better understands high-level instructions (“do this”, “follow cleanliness guidelines”) without being told every detail each time.

Code review improvements

Trained to catch critical bugs, not just surface or stylistic issues.

It examines the full context: codebase, dependencies, tests.

Can run code & tests to validate behavior.

Evaluated on pull requests / commits from popular open source. Feedback from actual engineers confirms fewer “incorrect/unimportant” comments.

Performance & efficiency

For small requests, the model is “snappier”.

For big tasks, it “thinks more”—spends more compute/time reasoning, editing, iterating.

On internal testing: bottom-10% of user turns (by tokens) use ~93.7% fewer tokens than vanilla GPT-5. Top-10% use roughly twice as much reasoning/iteration.

Tooling & integration improvements

Codex CLI: better tracking of progress (to-do lists), ability to embed/share images (wireframes, screenshots), upgraded terminal UI, improved permission modes.

IDE Extension: works in VSCode, Cursor (and forks); maintains context of open files / selection; allows switching between cloud/local work seamlessly; preview local code changes directly.

Cloud environment enhancements:

Cached containers → median completion time for new tasks / follow-ups ↓ ~90%.

Automatic setup of environments (scanning for setup scripts, installing dependencies).

Configurable network access and ability to run pip installs etc. at runtime.

Visual & front-end contextThe model now accepts image or screenshot inputs (e.g. UI designs or bugs) and can show visual output, e.g. screenshots of its work. Better human preference performance in mobile web / front-end tasks.

Safety, trust, and deployment controls

Default sandboxed execution (network access disabled unless explicitly permitted).

Approval modes in tools: read-only vs auto access vs full access.

Support for reviewing agent work, terminal logs, test results.

Marked as “High capability” in Biological / Chemical domains; extra safeguards.

Use Cases & Scenarios

Large scale refactoring: changing architecture, propagating context (e.g. threading a variable through many modules) in multiple languages (Python, Go, OCaml) as demonstrated.

Feature additions with tests: generate new functionality and tests, fixing broken tests, handling test failures.

Continuous code reviews: PR review suggestions, catching regressions or security flaws earlier.

Front-end / UI design workflows: prototype or debug UI from specs/screenshots.

Hybrid workflows human + agent: human gives high-level instruction; Codex manages sub-tasks, dependencies, iteration.

https://openai.com/index/introducing-upgrades-to-codex/

Implications

For engineering teams: can shift more burden to Codex for repetitive / structurally heavy work (refactoring, test scaffolding), freeing human time for architectural decisions, design, etc.

For codebases: maintaining consistency in style, dependencies, test coverage could be easier since Codex consistently applies patterns.

For hiring / workflow: teams may need to adjust roles: reviewer focus may shift from “spotting minor errors” to oversight of agent suggestions.

Tool ecosystem: tighter IDE integrations mean workflows become more seamless; code reviews via bots may become more common & expected.

Risk management: organizations will need policy & audit controls for agentic code tasks, esp. for production-critical or high-security code.

Comparison: GPT-5 vs GPT-5-Codex

DimensionGPT-5 (base)GPT-5-CodexAutonomy on long tasksLess, more interactive / prompt heavyMore: longer independent execution, iterative workUse in agentic coding environmentsPossible, but not optimizedPurpose-built and tuned for Codex workflows onlySteerability & instruction complianceRequires more detailed directionsBetter adherence to high-level style / code quality instructionsEfficiency (token usage, latency)More tokens and passes; slower on big tasksMore efficient on small tasks; spends extra reasoning only when needed

Conclusion

GPT-5-Codex represents a meaningful step forward in AI-assisted software engineering. By optimizing for long tasks, autonomous work, and integrating deeply into developer workflows (CLI, IDE, cloud, code review), it offers tangible improvements in speed, quality, and efficiency. But it does not eliminate the need for expert oversight; safe usage requires policies, review loops, and understanding of the system’s limitations.

Check out the FULL TECHNICAL DETAILS here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post OpenAI Introduces GPT-5-Codex: An Advanced Version of GPT-5 Further Optimized for Agentic Coding in Codex appeared first on MarkTechPost.

NVIDIA AI Open-Sources ViPE (Video Pose Engine): A Powerful and Versat …

Posted on September 16, 2025 by i-genie

How do you create 3D datasets to train AI for Robotics without expensive traditional approaches? A team of researchers from NVIDIA released “ViPE: Video Pose Engine for 3D Geometric Perception” bringing a key improvement for Spatial AI. It addresses the central, agonizing bottleneck that has constrained the field of 3D computer vision for years.

ViPE is a robust, versatile engine designed to process raw, unconstrained, “in-the-wild” video footage and automatically output the critical elements of 3D reality:

Camera Intrinsics (sensor calibration parameters)

Precise Camera Motion (pose)

Dense, Metric Depth Maps (real-world distances for every pixel)

To truly know the magnitude of this breakthrough, we must first understand the profound difficulty of the problem it solves.

The challenge: Unlocking 3D Reality from 2D Video

The ultimate goal of Spatial AI is to enable machines, robots , autonomous vehicles, and AR glasses, to perceive and interact with the world in 3D. We live in a 3D world, but the vast majority of our recorded data, from smartphone clips to cinematic footage, is trapped in 2D.

The Core Problem: How do we reliably and scalably reverse-engineer the 3D reality hidden inside these flat video streams?

Achieving this accurately from everyday video, which features shaky movements, dynamic objects, and unknown camera types, is notoriously difficult, yet it is the essential first step for virtually any advanced spatial application.

Problems with Existing Approaches

For decades, the field has been forced to choose between 2 powerful yet flawed paradigms.

1. The Precision Trap (Classical SLAM/SfM)

Traditional methods like Simultaneous Localization and Mapping (SLAM) and Structure-from-Motion (SfM) rely on sophisticated geometric optimization. They are capable of pinpoint accuracy under ideal conditions.

The Fatal Flaw: Brittleness. These systems generally assume the world is static. Introduce a moving car, a textureless wall, or use an unknown camera, and the entire reconstruction can shatter. They are too delicate for the messy reality of everyday video.

2. The Scalability Wall (End-to-End Deep Learning)

Recently, powerful deep learning models have emerged. By training on vast datasets, they learn robust “priors” about the world and are impressively resilient to noise and dynamism.

The Fatal Flaw: Intractability. These models are computationally hungry. Their memory requirements explode as video length increases, making the processing of long videos practically impossible. They simply do not scale.

This deadlock created a dilemma. The future of advanced AI demands massive datasets annotated with perfect 3D geometry, but the tools required to generate that data were either too brittle or too slow to deploy at scale.

Meet ViPE: NVIDIA’s Hybrid Breakthrough Shatters the Mold

This is where ViPE changes the game. It is not merely an incremental improvement; it is a well-designed and well-integrated hybrid pipeline that successfully fuses the best of both worlds. It takes the efficient, mathematically rigorous optimization framework of classical SLAM and injects it with the powerful, learned intuition of modern deep neural networks.

This synergy allows ViPE to be accurate, robust, efficient, and versatile simultaneously. ViPE delivers a solution that scales without compromising on precision.

How it Works: Inside the ViPE Engine

ViPE‘s architecture uses a keyframe-based Bundle Adjustment (BA) framework for efficiency.

Here are the Key Innovations:

Key Innovation 1: A Synergy of Powerful Constraints

ViPE achieves unprecedented accuracy by masterfully balancing three critical inputs:

Dense Flow (Learned Robustness): Uses a learned optical flow network for robust correspondences between frames, even in tough conditions.

Sparse Tracks (Classical Precision): Incorporates high-resolution, traditional feature tracking to capture fine-grained details, drastically improving localization accuracy.

Metric Depth Regularization (Real-World Scale): ViPE integrates priors from state-of-the-art monocular depth models to produce results in true, real-world metric scale.

Key Innovation 2: Mastering Dynamic, Real-World Scenes

To handle the chaos of real-world video, ViPE employs advanced foundational segmentation tools, GroundingDINO and Segment Anything (SAM), to identify and mask out moving objects (e.g., people, cars). By intelligently ignoring these dynamic regions, ViPE ensures the camera motion is calculated based only on the static environment.

Key Innovation 3: Fast Speed & General Versatility

ViPE operates at a remarkable 3-5 FPS on a single GPU, making it significantly faster than comparable methods. Furthermore, ViPE is universally applicable, supporting diverse camera models including standard, wide-angle/fisheye, and even 360° panoramic videos, automatically optimizing the intrinsics for each.

Key Innovation 4: High-Fidelity Depth Maps

The final output is enhanced by a sophisticated post-processing step. ViPE smoothly aligns high-detail depth maps with the geometrically consistent maps from its core process. The result is stunning: depth maps that are both high-fidelity and temporally stable.

The results are stunning even complex scenes…see below

Proven Performance

ViPE demonstrates superior performance, outperforming existing uncalibrated pose estimation baselines by a staggering:

18% on the TUM dataset (indoor dynamics)

50% on the KITTI dataset (outdoor driving)

Crucially, the evaluations confirm that ViPE provides accurate metric scale, while other approaches/engines often produce inconsistent, unusable scales.

The Real Innovation: A Data Explosion for Spatial AI

The most significant contribution of this work is not just the engine itself, but its deployment as a large-scale data annotation factory to fuel the future of AI. The lack of massive, diverse, geometrically annotated video data has been the primary bottleneck for training robust 3D models. ViPE solves this problem!.How

The research team used ViPE to create and release an unprecedented dataset totaling approximately 96 million annotated frames:

Dynpose-100K++: Nearly 100,000 real-world internet videos (15.7M frames) with high-quality poses and dense geometry.

Wild-SDG-1M: A massive collection of 1 million high-quality, AI-generated videos (78M frames).

Web360: A specialized dataset of annotated panoramic videos.

This massive release provides the necessary fuel for the next generation of 3D geometric foundation models and is already proving instrumental in training advanced world generation models like NVIDIA’s Gen3C and Cosmos.

By resolving the fundamental conflicts between accuracy, robustness, and scalability, ViPE provides the practical, efficient, and universal tool needed to unlock the 3D structure of almost any video. Its release is poised to dramatically accelerate innovation across the entire landscape of Spatial AI, robotics, and AR/VR.

NVIDIA AI has released the code here

Sources /links

https://research.nvidia.com/labs/toronto-ai/vipe/

https://github.com/nv-tlabs/vipe

Datasets:

https://huggingface.co/datasets/nvidia/vipe-dynpose-100kpp

https://huggingface.co/datasets/nvidia/vipe-wild-sdg-1m

https://huggingface.co/datasets/nvidia/vipe-web360

https://www.nvidia.com/en-us/ai/cosmos/

Thanks to the NVIDIA team for the thought leadership/ Resources for this article. NVIDIA team has supported and sponsored this content/article.
The post NVIDIA AI Open-Sources ViPE (Video Pose Engine): A Powerful and Versatile 3D Video Annotation Tool for Spatial AI appeared first on MarkTechPost.

Schedule topology-aware workloads using Amazon SageMaker HyperPod task …

Posted on September 16, 2025 by i-genie

Today, we are excited to announce a new capability of Amazon SageMaker HyperPod task governance to help you optimize training efficiency and network latency of your AI workloads. SageMaker HyperPod task governance streamlines resource allocation and facilitates efficient compute resource utilization across teams and projects on Amazon Elastic Kubernetes Service (Amazon EKS) clusters. Administrators can govern accelerated compute allocation and enforce task priority policies, improving resource utilization. This helps organizations focus on accelerating generative AI innovation and reducing time to market, rather than coordinating resource allocation and replanning tasks. Refer to Best practices for Amazon SageMaker HyperPod task governance for more information.
Generative AI workloads typically demand extensive network communication across Amazon Elastic Compute Cloud (Amazon EC2) instances, where network bandwidth impacts both workload runtime and processing latency. The network latency of these communications depends on the physical placement of instances within a data center’s hierarchical infrastructure. Data centers can be organized into nested organizational units such as network nodes and node sets, with multiple instances per network node and multiple network nodes per node set. For example, instances within the same organizational unit experience faster processing time compared to those across different units. This means fewer network hops between instances results in lower communication.
To optimize the placement of your generative AI workloads in your SageMaker HyperPod clusters by considering the physical and logical arrangement of resources, you can use EC2 network topology information during your job submissions. An EC2 instance’s topology is described by a set of nodes, with one node in each layer of the network. Refer to How Amazon EC2 instance topology works for details on how EC2 topology is arranged. Network topology labels offer the following key benefits:

Reduced latency by minimizing network hops and routing traffic to nearby instances
Improved training efficiency by optimizing workload placement across network resources

With topology-aware scheduling for SageMaker HyperPod task governance, you can use topology network labels to schedule your jobs with optimized network communication, thereby improving task efficiency and resource utilization for your AI workloads.
In this post, we introduce topology-aware scheduling with SageMaker HyperPod task governance by submitting jobs that represent hierarchical network information. We provide details about how to use SageMaker HyperPod task governance to optimize your job efficiency.
Solution overview
Data scientists interact with SageMaker HyperPod clusters. Data scientists are responsible for the training, fine-tuning, and deployment of models on accelerated compute instances. It’s important to make sure data scientists have the necessary capacity and permissions when interacting with clusters of GPUs.
To implement topology-aware scheduling, you first confirm the topology information for all nodes in your cluster, then run a script that tells you which instances are on the same network nodes, and finally schedule a topology-aware training task on your cluster. This workflow facilitates higher visibility and control over the placement of your training instances.
In this post, we walk through viewing node topology information and submitting topology-aware tasks to your cluster. For reference, NetworkNodes describes the network node set of an instance. In each network node set, three layers comprise the hierarchical view of the topology for each instance. Instances that are closest to each other will share the same layer 3 network node. If there are no common network nodes in the bottom layer (layer 3), then see if there is commonality at layer 2.
Prerequisites
To get started with topology-aware scheduling, you must have the following prerequisites:

An EKS cluster
A SageMaker HyperPod cluster with instances enabled for topology information
The SageMaker HyperPod task governance add-on installed (version 1.2.2 or later)
Kubectl installed
(Optional) The SageMaker HyperPod CLI installed

Get node topology information
Run the following command to show node labels in your cluster. This command provides network topology information for each instance.

kubectl get nodes -L topology.k8s.aws/network-node-layer-1
kubectl get nodes -L topology.k8s.aws/network-node-layer-2
kubectl get nodes -L topology.k8s.aws/network-node-layer-3

Instances with the same network node layer 3 are as close as possible, following EC2 topology hierarchy. You should see a list of node labels that look like the following:topology.k8s.aws/network-node-layer-3: nn-33333exampleRun the following script to show the nodes in your cluster that are on the same layers 1, 2, and 3 network nodes:

git clone https://github.com/aws-samples/awsome-distributed-training.git
cd awsome-distributed-training/1.architectures/7.sagemaker-hyperpod-eks/task-governance
chmod +x visualize_topology.sh
bash visualize_topology.sh

The output of this script will print a flow chart that you can use in a flow diagram editor such as Mermaid.js.org to visualize the node topology of your cluster. The following figure is an example of the cluster topology for a seven-instance cluster.

Submit tasks
SageMaker HyperPod task governance offers two ways to submit tasks using topology awareness. In this section, we discuss these two options and a third alternative option to task governance.
Modify your Kubernetes manifest file
First, you can modify your existing Kubernetes manifest file to include one of two annotation options:

kueue.x-k8s.io/podset-required-topology – Use this option if you must have all pods scheduled on nodes on the same network node layer in order to begin the job
kueue.x-k8s.io/podset-preferred-topology – Use this option if you ideally want all pods scheduled on nodes in the same network node layer, but you have flexibility

The following code is an example of a sample job that uses the kueue.x-k8s.io/podset-required-topology setting to schedule pods that share the same layer 3 network node:

apiVersion: batch/v1
kind: Job
metadata:
  name: test-tas-job
  namespace: hyperpod-ns-team-a
  labels:
   kueue.x-k8s.io/queue-name: hyperpod-ns-team-a-localqueue
   kueue.x-k8s.io/priority-class: inference-priority
spec:
  parallelism: 10
  completions: 10
  suspend: true
  template:
   metadata:
   labels:
   kueue.x-k8s.io/queue-name: hyperpod-ns-team-a-localqueue
   annotations:
   kueue.x-k8s.io/podset-required-topology: “topology.k8s.aws/network-node-layer-3”
   spec:
   containers:
   – name: dummy-job
   image: public.ecr.aws/docker/library/alpine:latest
   command: [“sleep”, “3600s”]
   resources:
   requests:
   cpu: “1”
   restartPolicy: Never

To verify which nodes your pods are running on, use the following command to view node IDs per pod:kubectl get pods -n hyperpod-ns-team-a -o wide
Use the SageMaker HyperPod CLI
The second way to submit a job is through the SageMaker HyperPod CLI. Be sure to install the latest version (version pending) to use topology-aware scheduling. To use topology-aware scheduling with the SageMaker HyperPod CLI, you can include either the –preferred-topology parameter or the –required-topology parameter in your create job command.
The following code is an example command to start a topology-aware mnist training job using the SageMaker HyperPod CLI, replace XXXXXXXXXXXX with your AWS account ID:

hyp create hyp-pytorch-job
–job-name test-pytorch-job-cli
–image XXXXXXXXXXXX.dkr.ecr.us-west-2.amazonaws.com/ptjob:mnist
–pull-policy “Always”
–tasks-per-node 1
–max-retry 1
–preferred-topology topology.k8s.aws/network-node-layer-3

Clean up
If you deployed new resources while following this post, refer to the Clean Up section in the SageMaker HyperPod EKS workshop to make sure you don’t accrue unwanted charges.
Conclusion
During large language model (LLM) training, pod-to-pod communication distributes the model across multiple instances, requiring frequent data exchange between these instances. In this post, we discussed how SageMaker HyperPod task governance helps schedule workloads to enable job efficiency by optimizing throughput and latency. We also walked through how to schedule jobs using SageMaker HyperPod topology network information to optimize network communication latency for your AI tasks.
We encourage you to try out this solution and share your feedback in the comments section.

About the authors
Nisha Nadkarni is a Senior GenAI Specialist Solutions Architect at AWS, where she guides companies through best practices when deploying large scale distributed training and inference on AWS. Prior to her current role, she spent several years at AWS focused on helping emerging GenAI startups develop models from ideation to production.
Siamak Nariman is a Senior Product Manager at AWS. He is focused on AI/ML technology, ML model management, and ML governance to improve overall organizational efficiency and productivity. He has extensive experience automating processes and deploying various technologies.
Zican Li is a Senior Software Engineer at Amazon Web Services (AWS), where he leads software development for Task Governance on SageMaker HyperPod. In his role, he focuses on empowering customers with advanced AI capabilities while fostering an environment that maximizes engineering team efficiency and productivity.
Anoop Saha is a Sr GTM Specialist at Amazon Web Services (AWS) focusing on generative AI model training and inference. He partners with top frontier model builders, strategic customers, and AWS service teams to enable distributed training and inference at scale on AWS and lead joint GTM motions. Before AWS, Anoop held several leadership roles at startups and large corporations, primarily focusing on silicon and system architecture of AI infrastructure.

How msg enhanced HR workforce transformation with Amazon Bedrock and m …

Posted on September 16, 2025 by i-genie

This post is co-written with Stefan Walter from msg.
With more than 10,000 experts in 34 countries, msg is both an independent software vendor and a system integrator operating in highly regulated industries, with over 40 years of domain-specific expertise. msg.ProfileMap is a software as a service (SaaS) solution for skill and competency management. It’s an AWS Partner qualified software available on AWS Marketplace, currently serving more than 7,500 users. HR and strategy departments use msg.ProfileMap for project staffing and workforce transformation initiatives. By offering a centralized view of skills and competencies, msg.ProfileMap helps organizations map their workforce’s capabilities, identify skill gaps, and implement targeted development strategies. This supports more effective project execution, better alignment of talent to roles, and long-term workforce planning.
In this post, we share how msg automated data harmonization for msg.ProfileMap, using Amazon Bedrock to power its large language model (LLM)-driven data enrichment workflows, resulting in higher accuracy in HR concept matching, reduced manual workload, and improved alignment with compliance requirements under the EU AI Act and GDPR.
The importance of AI-based data harmonization
HR departments face increasing pressure to operate as data-driven organizations, but are often constrained by the inconsistent, fragmented nature of their data. Critical HR documents are unstructured, and legacy systems use mismatched formats and data models. This not only impairs data quality but also leads to inefficiencies and decision-making blind spots.Accurate and harmonized HR data is foundational for key activities such as matching candidates to roles, identifying internal mobility opportunities, conducting skills gap analysis, and planning workforce development. msg identified that without automated, scalable methods to process and unify this data, organizations would continue to struggle with manual overhead and inconsistent results.
Solution overview
HR data is typically scattered across diverse sources and formats, ranging from relational databases to Excel files, Word documents, and PDFs. Additionally, entities such as personnel numbers or competencies have different unique identifiers as well as different text descriptions, although with the same semantics. msg addressed this challenge with a modular architecture, tailored for IT workforce scenarios. As illustrated in the following diagram, at the core of msg.ProfileMap is a robust text extraction layer, which transforms heterogeneous inputs into structured data. This is then passed to an AI-powered harmonization engine that provides consistency across data sources by avoiding duplication and aligning disparate concepts.

The harmonization process uses a hybrid retrieval approach that combines vector-based semantic similarity and string-based matching techniques. These methods align incoming data with existing entities in the system. Amazon Bedrock is used to semantically enrich data, improving cross-source compatibility and matching precision. Extracted and enriched data is indexed and stored using Amazon OpenSearch Service and Amazon DynamoDB, facilitating fast and accurate retrieval, as shown in the following diagram.

The framework is designed to be unsupervised and domain independent. Although it’s optimized for IT workforce use cases, it has demonstrated strong generalization capabilities in other domains as well.
msg.ProfileMap is a cloud-based application that uses several AWS services, notably Amazon Neptune, Amazon DynamoDB, and Amazon Bedrock. The following diagram illustrates the full solution architecture.

Results and technical validation
msg evaluated the effectiveness of the data harmonization framework through internal testing on IT workforce concepts and external benchmarking in the Bio-ML Track of the Ontology Alignment Evaluation Initiative (OAEI), an international and EU-funded research initiative that evaluates ontology matching technologies since 2004.
During internal testing, the system processed 2,248 concepts across multiple suggestion types. High-probability merge recommendations reached 95.5% accuracy, covering nearly 60% of all inputs. This helped msg reduce manual validation workload by over 70%, significantly improving time-to-value for HR teams.
During OAEI 2024, msg.ProfileMap ranked at the top of the 2024 Bio-ML benchmark, outperforming other systems across multiple biomedical datasets. On NCIT-DOID, it achieved a 0.918 F1 score, with Hits@1 exceeding 92%, validating the engine’s generalizability beyond the HR domain. Additional details are available in the official test results.
Why Amazon Bedrock
msg relies on LLMs to semantically enrich data in near real time. These workloads require low-latency inference, flexible scaling, and operational simplicity. Amazon Bedrock met these needs by providing a fully managed, serverless interface to leading foundation models—without the need to manage infrastructure or deploy custom machine learning stacks.
Unlike hosting models on Amazon Elastic Compute Cloud (Amazon EC2) or Amazon SageMaker, Amazon Bedrock abstracts away provisioning, versioning, scaling, and model selection. Its consumption-based pricing aligns directly with msg’s SaaS delivery model—resources are used (and billed) only when needed. This simplified integration reduced overhead and helped msg scale elastically as customer demand grew.
Amazon Bedrock also helped msg meet compliance goals under the EU AI Act and GDPR by enabling tightly scoped, auditable interactions with model APIs—critical for HR use cases that handle sensitive workforce data.
Conclusion
msg’s successful integration of Amazon Bedrock into msg.ProfileMap demonstrates that large-scale AI adoption doesn’t require complex infrastructure or specialized model training. By combining modular design, ontology-based harmonization, and the fully managed LLM capabilities of Amazon Bedrock, msg delivered an AI-powered workforce intelligence platform that is accurate, scalable, and compliant.This solution improved concept match precision and achieved top marks in international AI benchmarks, demonstrating what’s possible when generative AI is paired with the right cloud-based service. With Amazon Bedrock, msg has built a platform that’s ready for today’s HR challenges—and tomorrow’s.
msg.ProfileMap is available as a SaaS offering on AWS Marketplace. If you are interested in knowing more, you can reach out to msg.hcm.backoffice@msg.group.
The content and opinions in this blog post are those of the third-party author and AWS is not responsible for the content or accuracy of this post.

About the authors
Stefan Walter is Senior Vice President of AI SaaS Solutions at msg. With over 25 years of experience in IT software development, architecture, and consulting, Stefan Walter leads with a vision for scalable SaaS innovation and operational excellence. As a BU lead at msg, Stefan has spearheaded transformative initiatives that bridge business strategy with technology execution, especially in complex, multi-entity environments.
Gianluca Vegetti is a Senior Enterprise Architect in the AWS Partner Organization, aligned to Strategic Partnership Collaboration and Governance (SPCG) engagements. In his role, he supports the definition and execution of Strategic Collaboration Agreements with selected AWS partners.
Yuriy Bezsonov is a Senior Partner Solution Architect at AWS. With over 25 years in the tech, Yuriy has progressed from a software developer to an engineering manager and Solutions Architect. Now, as a Senior Solutions Architect at AWS, he assists partners and customers in developing cloud solutions, focusing on container technologies, Kubernetes, Java, application modernization, SaaS, developer experience, and GenAI. Yuriy holds AWS and Kubernetes certifications, and he is a recipient of the AWS Golden Jacket and the CNCF Kubestronaut Blue Jacket.

Top 5 No-Code Tools for AI Engineers/Developers

Posted on September 15, 2025 by i-genie

In today’s AI-driven world, no-code tools are transforming how people create and deploy intelligent applications. They empower anyone—regardless of coding expertise—to build solutions quickly and efficiently. From developing enterprise-grade RAG systems to designing multi-agent workflows or fine-tuning hundreds of LLMs, these platforms dramatically reduce development time and effort. In this article, we’ll explore five powerful no-code tools that make building AI solutions faster and more accessible than ever.

Sim AI

Sim AI is an open-source platform for visually building and deploying AI agent workflows—no coding required. Using its drag-and-drop canvas, you can connect AI models, APIs, databases, and business tools to create:

AI Assistants & Chatbots: Agents that search the web, access calendars, send emails, and interact with business apps.

Business Process Automation: Streamline tasks such as data entry, report creation, customer support, and content generation.

Data Processing & Analysis: Extract insights, analyze datasets, create reports, and sync data across systems.

API Integration Workflows: Orchestrate complex logic, unify services, and manage event-driven automation.

Key features:

Visual canvas with “smart blocks” (AI, API, logic, output).

Multiple triggers (chat, REST API, webhooks, schedulers, Slack/GitHub events).

Real-time team collaboration with permissions control.

80+ built-in integrations (AI models, communication tools, productivity apps, dev platforms, search services, and databases).

MCP support for custom integrations.

Deployment options:

Cloud-hosted (managed infrastructure with scaling & monitoring).

Self-hosted (via Docker, with local model support for data privacy).

RAGFlow

RAGFlow is a powerful retrieval-augmented generation (RAG) engine that helps you build grounded, citation-rich AI assistants on top of your own datasets. It runs on x86 CPUs or NVIDIA GPUs (with optional ARM builds) and provides full or slim Docker images for quick deployment. After spinning up a local server, you can connect an LLM—via API or local runtimes like Ollama—to handle chat, embedding, or image-to-text tasks. RAGFlow supports most popular language models and allows you to set defaults or customize models for each assistant.

Key capabilities include:

Knowledge base management: Upload and parse files (PDF, Word, CSV, images, slides, and more) into datasets, select an embedding model, and organize content for efficient retrieval.

Chunk editing & optimization: Inspect parsed chunks, add keywords, or manually adjust content to improve search accuracy.

AI chat assistants: Create chats linked to one or multiple knowledge bases, configure fallback responses, and fine-tune prompts or model settings.

Explainability & testing: Use built-in tools to validate retrieval quality, monitor performance, and view real-time citations.

Integration & extensibility: Leverage HTTP and Python APIs for app integration, with an optional sandbox for safe code execution inside chats.

Transformer Lab

Transformer Lab is a free, open-source workspace for Large Language Models (LLMs) and Diffusion models, designed to run on your local machine—whether that’s a GPU, TPU, or Apple M-series Mac—or in the cloud. It enables you to download, chat with, and evaluate LLMs, generate images using Diffusion models, and compute embeddings, all from one flexible environment.

Key capabilities include:

Model management: Download and interact with LLMs, or generate images using state-of-the-art Diffusion models.

Data preparation & training: Create datasets, fine-tune, or train models, including support for RLHF and preference tuning.

Retrieval-augmented generation (RAG): Use your own documents to power intelligent, grounded conversations.

Embeddings & evaluation: Calculate embeddings and assess model performance across different inference engines.

Extensibility & community: Build plugins, contribute to the core application, and collaborate via the active Discord community.

Llama Factory

LLaMA-Factory is a powerful no-code platform for training and fine-tuning open-source Large Language Models (LLMs) and Vision-Language Models (VLMs). It supports over 100 models, multimodal fine-tuning, advanced optimization algorithms, and scalable resource configurations. Designed for researchers and practitioners, it offers extensive tools for pre-training, supervised fine-tuning, reward modeling, and reinforcement learning methods like PPO and DPO—along with easy experiment tracking and faster inference.

Key highlights include:

Broad model support: Works with LLaMA, Mistral, Qwen, DeepSeek, Gemma, ChatGLM, Phi, Yi, Mixtral-MoE, and many more.

Training methods: Supports continuous pre-training, multimodal SFT, reward modeling, PPO, DPO, KTO, ORPO, and more.

Scalable tuning options: Full-tuning, freeze-tuning, LoRA, QLoRA (2–8 bit), OFT, DoRA, and other resource-efficient techniques.

Advanced algorithms & optimizations: Includes GaLore, BAdam, APOLLO, Muon, FlashAttention-2, RoPE scaling, NEFTune, rsLoRA, and others.

Tasks & modalities: Handles dialogue, tool use, image/video/audio understanding, visual grounding, and more.

Monitoring & inference: Integrates with LlamaBoard, TensorBoard, Wandb, MLflow, and SwanLab, plus offers fast inference via OpenAI-style APIs, Gradio UI, or CLI with vLLM/SGLang workers.

Flexible infrastructure: Compatible with PyTorch, Hugging Face Transformers, Deepspeed, BitsAndBytes, and supports both CPU/GPU setups with memory-efficient quantization.

AutoAgent

AutoAgent is a fully automated, self-developing framework that lets you create and deploy LLM-powered agents using natural language alone. Designed to simplify complex workflows, it enables you to build, customize, and run intelligent tools and assistants without writing a single line of code.

Key features include:

High performance: Achieves top-tier results on the GAIA benchmark, rivaling advanced deep research agents.

Effortless agent & workflow creation: Build tools, agents, and workflows through simple natural language prompts—no coding required.

Agentic-RAG with native vector database: Comes with a self-managing vector database, offering superior retrieval compared to traditional solutions like LangChain.

Broad LLM compatibility: Integrates seamlessly with leading models such as OpenAI, Anthropic, DeepSeek, vLLM, Grok, Hugging Face, and more.

Flexible interaction modes: Supports both function-calling and ReAct-style reasoning for versatile use cases.

Lightweight & extensible: A dynamic personal AI assistant that’s easy to customize and extend while remaining resource-efficient.

The post Top 5 No-Code Tools for AI Engineers/Developers appeared first on MarkTechPost.

Software Frameworks Optimized for GPUs in AI: CUDA, ROCm, Triton, Tens …

Posted on September 15, 2025 by i-genie

Table of contentsWhat actually determines performance on modern GPUsCUDA: nvcc/ptxas, cuDNN, CUTLASS, and CUDA GraphsROCm: HIP/Clang toolchain, rocBLAS/MIOpen, and the 6.x seriesTriton: a DSL and compiler for custom kernelsTensorRT (and TensorRT-LLM): builder-time graph optimization for inferencePractical guidance: choosing and tuning the stack

Deep-learning throughput hinges on how effectively a compiler stack maps tensor programs to GPU execution: thread/block schedules, memory movement, and instruction selection (e.g., Tensor Core MMA pipelines). In this article we will focus on four dominant stacks—CUDA, ROCm, Triton, and TensorRT—from the compiler’s perspective and explains which optimizations move the needle in practice.

What actually determines performance on modern GPUs

Across vendors, the same levers recur:

Operator scheduling & fusion: reduce kernel launches and round-trips to HBM; expose longer producer→consumer chains for register/shared-memory reuse. TensorRT and cuDNN “runtime fusion engines” exemplify this for attention and conv blocks.

Tiling & data layout: match tile shapes to Tensor Core/WGMMA/WMMA native fragment sizes; avoid shared-memory bank conflicts and partition camping. CUTLASS documents warp-level GEMM tiling for both Tensor Cores and CUDA cores.

Precision & quantization: FP16/BF16/FP8 for training/inference; INT8/INT4 (calibrated or QAT) for inference. TensorRT automates calibration and kernel selection under these precisions.

Graph capture & runtime specialization: graph execution to amortize launch overheads; dynamic fusion of common subgraphs (e.g., attention). cuDNN 9 added graph support for attention fusion engines.

Autotuning: search tile sizes, unroll factors, and pipelining depths per arch/SKU. Triton and CUTLASS expose explicit autotune hooks; TensorRT performs builder-time tactic selection.

With that lens, here’s how each stack implements the above.

CUDA: nvcc/ptxas, cuDNN, CUTLASS, and CUDA Graphs

Compiler path. CUDA code compiles through nvcc into PTX, then ptxas lowers PTX to SASS (arch-specific machine code). Controlling optimization requires feeding flags to both host and device phases; for kernels the key is -Xptxas. Developers often miss that -O3 alone affects only host code.

Kernel generation & libraries.

CUTLASS provides parametric templates for GEMM/conv, implementing warp-level tiling, Tensor Core MMA pipelines, and smem iterators designed for conflict-free access—canonical references for writing peak kernels, including Hopper’s WGMMA path.

cuDNN 9 introduced runtime fusion engines (notably for attention blocks), native CUDA Graph integration for those engines, and updates for new compute capabilities—materially reducing dispatch overheads and improving memory locality in Transformer workloads.

Performance implications.

Moving from unfused PyTorch ops to cuDNN attention fusion typically cuts kernel launches and global memory traffic; combined with CUDA Graphs, it reduces CPU bottlenecks in short-sequence inference.

On Hopper/Blackwell, aligning tile shapes to WGMMA/Tensor Core native sizes is decisive; CUTLASS tutorials quantify how mis-sized tiles waste tensor-core throughput.

When CUDA is the right tool. You need maximum control over instruction selection, occupancy, and smem choreography; or you’re extending kernels beyond library coverage while staying on NVIDIA GPUs.

ROCm: HIP/Clang toolchain, rocBLAS/MIOpen, and the 6.x series

Compiler path. ROCm uses Clang/LLVM to compile HIP (CUDA-like) into GCN/RDNA ISA. The 6.x series has focused on perf and framework coverage; release notes track component-level optimizations and HW/OS support.

Libraries and kernels.

rocBLAS and MIOpen implement GEMM/conv primitives with arch-aware tiling and algorithm selection similar in spirit to cuBLAS/cuDNN. The consolidated changelog highlights iterative perf work across these libraries.

Recent ROCm workstream includes better Triton enablement on AMD GPUs, enabling Python-level kernel authoring while still lowering through LLVM to AMD backends.

Performance implications.

On AMD GPUs, matching LDS (shared memory) bank widths and vectorized global loads to matrix tile shapes is as pivotal as smem bank alignment on NVIDIA. Compiler-assisted fusion in frameworks (e.g., attention) plus library autotuning in rocBLAS/MIOpen typically closes a large fraction of the gap to handwritten kernels, contingent on architecture/driver. Release documentation indicates continuous tuner improvements in 6.0–6.4.x.

When ROCm is the right tool. You need native support and optimization on AMD accelerators, with HIP portability from existing CUDA-style kernels and a clear LLVM toolchain.

Triton: a DSL and compiler for custom kernels

Compiler path. Triton is a Python-embedded DSL that lowers via LLVM; it handles vectorization, memory coalescing, and register allocation while giving explicit control over block sizes and program IDs. Build docs show the LLVM dependency and custom builds; NVIDIA’s developer materials discuss Triton’s tuning for newer architectures (e.g., Blackwell) with FP16/FP8 GEMM improvements.

Optimizations.

Autotuning over tile sizes, num_warps, and pipelining stages; static masking for boundary conditions without scalar fallbacks; shared-memory staging and software pipelining to overlap global loads with compute.

Triton’s design aims to automate the error-prone parts of CUDA-level optimization while leaving block-level tiling choices to the author; the original announcement outlines that separation of concerns.

Performance implications.

Triton shines when you need a fused, shape-specialized kernel outside library coverage (e.g., bespoke attention variants, normalization-activation-matmul chains). On modern NVIDIA parts, vendor collabs report architecture-specific improvements in the Triton backend, reducing the penalty versus CUTLASS-style kernels for common GEMMs.

When Triton is the right tool. You want near-CUDA performance for custom fused ops without writing SASS/WMMA, and you value Python-first iteration with autotuning.

TensorRT (and TensorRT-LLM): builder-time graph optimization for inference

Compiler path. TensorRT ingests ONNX or framework graphs and emits a hardware-specific engine. During the build, it performs layer/tensor fusion, precision calibration (INT8, FP8/FP16), and kernel tactic selection; best-practice docs describe these builder phases. TensorRT-LLM extends this with LLM-specific runtime optimizations.

Optimizations.

Graph-level: constant folding, concat-slice canonicalization, conv-bias-activation fusion, attention fusion.

Precision: post-training calibration (entropy/percentile/mse) and per-tensor quantization, plus smooth-quant/QAT workflows in TensorRT-LLM.

Runtime: paged-KV cache, in-flight batching, and scheduling for multi-stream/multi-GPU deployments (TensorRT-LLM docs).

Performance implications.

The largest wins typically come from: end-to-end INT8 (or FP8 on Hopper/Blackwell where supported), removing framework overhead via a single engine, and aggressive attention fusion. TensorRT’s builder produces per-arch engine plans to avoid generic kernels at runtime.

When TensorRT is the right tool. Production inference on NVIDIA GPUs where you can pre-compile an optimized engine and benefit from quantization and large-graph fusion.

Practical guidance: choosing and tuning the stack

Training vs. inference.

Training/experimental kernels → CUDA + CUTLASS (NVIDIA) or ROCm + rocBLAS/MIOpen (AMD); Triton for custom fused ops.

Production inference on NVIDIA → TensorRT/TensorRT-LLM for global graph-level gains.

Exploit architecture-native instructions.

On NVIDIA Hopper/Blackwell, ensure tiles map to WGMMA/WMMA sizes; CUTLASS materials show how warp-level GEMM and smem iterators should be structured.

On AMD, align LDS usage and vector widths to CU datapaths; leverage ROCm 6.x autotuners and Triton-on-ROCm for shape-specialized ops.

Fuse first, then quantize.

Kernel/graph fusion reduces memory traffic; quantization reduces bandwidth and increases math density. TensorRT’s builder-time fusions plus INT8/FP8 often deliver multiplicative gains.

Use graph execution for short sequences.

CUDA Graphs integrated with cuDNN attention fusions amortize launch overheads in autoregressive inference.

Treat compiler flags as first-class.

For CUDA, remember device-side flags: example, -Xptxas -O3,-v (and -Xptxas -O0 when diagnosing). Host-only -O3 isn’t sufficient.

References:

https://developer.nvidia.com/blog/introducing-cudnn-9/

https://rocmdocs.amd.com/en/latest/relnotes/relnotes.html

https://rocmdocs.amd.com/en/latest/develop/performance/tuning-guides/triton.html

https://github.com/NVIDIA/cutlass

https://docs.nvidia.com/deeplearning/cudnn/latest/index.html

https://docs.nvidia.com/deeplearning/tensorrt/archives/index.html

https://github.com/ROCm/ROCm/releases

https://triton-lang.org/main/getting-started/installation.html

https://github.com/NVIDIA/cutlass/tree/main/examples

https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html

Getting Started with CUDA Graphs

https://rocmdocs.amd.com/en/latest/release/changelog.html

https://triton-lang.org/main/getting-started/tutorials/index.html

https://github.com/NVIDIA/cutlass/blob/main/media/docs/warplevel-gemm.md

https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#compiler-options

https://nvidia.github.io/TensorRT-LLM/

https://github.com/ROCm/ROCm/releases

https://developer.nvidia.com/blog/nvidia-triton-on-blackwell-gpus

The post Software Frameworks Optimized for GPUs in AI: CUDA, ROCm, Triton, TensorRT—Compiler Paths and Performance Implications appeared first on MarkTechPost.

UT Austin and ServiceNow Research Team Releases AU-Harness: An Open-So …

Posted on September 15, 2025 by i-genie

Voice AI is becoming one of the most important frontiers in multimodal AI. From intelligent assistants to interactive agents, the ability to understand and reason over audio is reshaping how machines engage with humans. Yet while models have grown rapidly in capability, the tools for evaluating them have not kept pace. Existing benchmarks remain fragmented, slow, and narrowly focused, often making it difficult to compare models or test them in realistic, multi-turn settings.

To address this gap, UT Austin and ServiceNow Research Team has released AU-Harness, a new open-source toolkit built to evaluate Large Audio Language Models (LALMs) at scale. AU-Harness is designed to be fast, standardized, and extensible, enabling researchers to test models across a wide range of tasks—from speech recognition to complex audio reasoning—within a single unified framework.

Why do we need a new audio evaluation framework?

Current audio benchmarks have focused on applications like speech-to-text or emotion recognition. Frameworks such as AudioBench, VoiceBench, and DynamicSUPERB-2.0 broadened coverage, but they left some really critical gaps.

Three issues stand out. First is throughput bottlenecks: many toolkits don’t take advantage of batching or parallelism, making large-scale evaluations painfully slow. Second is prompting inconsistency, which makes results across models hard to compare. Third is restricted task scope: key areas like diarization (who spoke when) and spoken reasoning (following instructions delivered in audio) are missing in many cases.

These gaps limit the progress of LALMs, especially as they evolve into multimodal agents that must handle long, context-heavy, and multi-turn interactions.

https://arxiv.org/pdf/2509.08031

How does AU-Harness improve efficiency?

The research team designed AU-Harness with focus on speed. By integrating with the vLLM inference engine, it introduces a token-based request scheduler that manages concurrent evaluations across multiple nodes. It also shards datasets so that workloads are distributed proportionally across compute resources.

This design allows near-linear scaling of evaluations and keeps hardware fully utilized. In practice, AU-Harness delivers 127% higher throughput and reduces the real-time factor (RTF) by nearly 60% compared to existing kits. For researchers, this translates into evaluations that once took days now completing in hours.

Can evaluations be customized?

Flexibility is another core feature of AU-Harness. Each model in an evaluation run can have its own hyperparameters, such as temperature or max token settings, without breaking standardization. Configurations allow for dataset filtering (e.g., by accent, audio length, or noise profile), enabling targeted diagnostics.

Perhaps most importantly, AU-Harness supports multi-turn dialogue evaluation. Earlier toolkits were limited to single-turn tasks, but modern voice agents operate in extended conversations. With AU-Harness, researchers can benchmark dialogue continuity, contextual reasoning, and adaptability across multi-step exchanges.

What tasks does AU-Harness cover?

AU-Harness dramatically expands task coverage, supporting 50+ datasets, 380+ subsets, and 21 tasks across six categories:

Speech Recognition: from simple ASR to long-form and code-switching speech.

Paralinguistics: emotion, accent, gender, and speaker recognition.

Audio Understanding: scene and music comprehension.

Spoken Language Understanding: question answering, translation, and dialogue summarization.

Spoken Language Reasoning: speech-to-coding, function calling, and multi-step instruction following.

Safety & Security: robustness evaluation and spoofing detection.

Two innovations stand out:

LLM-Adaptive Diarization, which evaluates diarization through prompting rather than specialized neural models.

Spoken Language Reasoning, which tests models’ ability to process and reason about spoken instructions, rather than just transcribe them.

https://arxiv.org/pdf/2509.08031

What do the benchmarks reveal about today’s models?

When applied to leading systems like GPT-4o, Qwen2.5-Omni, and Voxtral-Mini-3B, AU-Harness highlights both strengths and weaknesses.

Models excel at ASR and question answering, showing strong accuracy in speech recognition and spoken QA tasks. But they lag in temporal reasoning tasks, such as diarization, and in complex instruction-following, particularly when instructions are given in audio form.

A key finding is the instruction modality gap: when identical tasks are presented as spoken instructions instead of text, performance drops by as much as 9.5 points. This suggests that while models are adept at processing text-based reasoning, adapting those skills to the audio modality remains an open challenge.

https://arxiv.org/pdf/2509.08031

Summary

AU-Harness marks an important step toward standardized and scalable evaluation of audio language models. By combining efficiency, reproducibility, and broad task coverage—including diarization and spoken reasoning—it addresses the long-standing gaps in benchmarking voice-enabled AI. Its open-source release and public leaderboard invite the community to collaborate, compare, and push the boundaries of what voice-first AI systems can achieve.

Check out the Paper, Project and GitHub Page. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post UT Austin and ServiceNow Research Team Releases AU-Harness: An Open-Source Toolkit for Holistic Evaluation of Audio LLMs appeared first on MarkTechPost.

How to Build a Robust Advanced Neural AI Agent with Stable Training, A …

Posted on September 14, 2025 by i-genie

In this tutorial, we explore the design and implementation of an Advanced Neural Agent that combines classical neural network techniques with modern stability improvements. We build the network using Xavier initialization for balanced gradient flow and add stable activations like leaky ReLU, sigmoid, and tanh with clipping to avoid overflow. To stabilize training, we apply gradient clipping, momentum-inspired updates, and weight decay. The training loop includes mini-batches, early stopping, adaptive learning rates, and resets on instability, making the model robust for complex datasets. We also normalize targets, compute MSE, MAE, and R², and extend the agent with experience replay and exploratory decision-making, turning it into a flexible system for regression, classification-to-regression, and RL-style tasks. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserimport numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification, make_regression
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings(‘ignore’)

We start by importing essential libraries like NumPy, Matplotlib, and scikit-learn, which we use for data generation, preprocessing, and splitting. We also suppress warnings to keep our workflow clean and focused. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserclass AdvancedNeuralAgent:
def __init__(self, input_size, hidden_layers=[64, 32], output_size=1, learning_rate=0.001):
“””Advanced AI Agent with stable training and decision making capabilities”””
self.lr = learning_rate
self.initial_lr = learning_rate
self.layers = []
self.memory = []
self.performance_history = []
self.epsilon = 1e-8

layer_sizes = [input_size] + hidden_layers + [output_size]
for i in range(len(layer_sizes) – 1):
fan_in, fan_out = layer_sizes[i], layer_sizes[i+1]
limit = np.sqrt(6.0 / (fan_in + fan_out))

layer = {
‘weights’: np.random.uniform(-limit, limit, (layer_sizes[i], layer_sizes[i+1])),
‘bias’: np.zeros((1, layer_sizes[i+1])),
‘momentum_w’: np.zeros((layer_sizes[i], layer_sizes[i+1])),
‘momentum_b’: np.zeros((1, layer_sizes[i+1]))
}
self.layers.append(layer)

def activation(self, x, func=’relu’):
“””Stable activation functions with clipping”””
x = np.clip(x, -50, 50)

if func == ‘relu’:
return np.maximum(0, x)
elif func == ‘sigmoid’:
return 1 / (1 + np.exp(-x))
elif func == ‘tanh’:
return np.tanh(x)
elif func == ‘leaky_relu’:
return np.where(x > 0, x, x * 0.01)
elif func == ‘linear’:
return x

def activation_derivative(self, x, func=’relu’):
“””Stable derivatives”””
x = np.clip(x, -50, 50)

if func == ‘relu’:
return (x > 0).astype(float)
elif func == ‘sigmoid’:
s = self.activation(x, ‘sigmoid’)
return s * (1 – s)
elif func == ‘tanh’:
return 1 – np.tanh(x)**2
elif func == ‘leaky_relu’:
return np.where(x > 0, 1, 0.01)
elif func == ‘linear’:
return np.ones_like(x)

def forward(self, X):
“””Forward pass with gradient clipping”””
self.activations = [X]
self.z_values = []

current_input = X
for i, layer in enumerate(self.layers):
z = np.dot(current_input, layer[‘weights’]) + layer[‘bias’]
z = np.clip(z, -50, 50)
self.z_values.append(z)

if i < len(self.layers) – 1:
a = self.activation(z, ‘leaky_relu’)
else:
a = self.activation(z, ‘linear’)

self.activations.append(a)
current_input = a

return current_input

def clip_gradients(self, gradients, max_norm=1.0):
“””Gradient clipping to prevent explosion”””
grad_norm = np.linalg.norm(gradients)
if grad_norm > max_norm:
gradients = gradients * (max_norm / (grad_norm + self.epsilon))
return gradients

def backward(self, X, y, output):
“””Stable backpropagation with gradient clipping”””
m = X.shape[0]

dz = (output – y.reshape(-1, 1)) / m
dz = np.clip(dz, -10, 10)

for i in reversed(range(len(self.layers))):
layer = self.layers[i]

dw = np.dot(self.activations[i].T, dz)
db = np.sum(dz, axis=0, keepdims=True)

dw = self.clip_gradients(dw, max_norm=1.0)
db = self.clip_gradients(db, max_norm=1.0)

momentum = 0.9
layer[‘momentum_w’] = momentum * layer[‘momentum_w’] + (1 – momentum) * dw
layer[‘momentum_b’] = momentum * layer[‘momentum_b’] + (1 – momentum) * db

weight_decay = 0.0001
layer[‘weights’] -= self.lr * (layer[‘momentum_w’] + weight_decay * layer[‘weights’])
layer[‘bias’] -= self.lr * layer[‘momentum_b’]

if i > 0:
activation_func = ‘leaky_relu’ if i > 1 else ‘leaky_relu’
dz = np.dot(dz, layer[‘weights’].T) * self.activation_derivative(
self.z_values[i-1], activation_func)
dz = np.clip(dz, -10, 10)

def adapt_learning_rate(self, epoch, performance_history):
“””Adaptive learning rate with performance-based adjustment”””
if epoch > 10:
recent_performance = performance_history[-10:]
if len(recent_performance) >= 5:
if recent_performance[-1] >= recent_performance[-5]:
self.lr = max(self.lr * 0.95, self.initial_lr * 0.01)
elif recent_performance[-1] < recent_performance[-5] * 0.98:
self.lr = min(self.lr * 1.02, self.initial_lr * 2)

def calculate_loss(self, y_true, y_pred):
“””Stable loss calculation”””
y_true = y_true.reshape(-1, 1)
y_pred = np.clip(y_pred, -1e6, 1e6)

mse = np.mean((y_true – y_pred) ** 2)
mae = np.mean(np.abs(y_true – y_pred))

if not np.isfinite(mse):
mse = 1e6
if not np.isfinite(mae):
mae = 1e6

return mse, mae

def store_experience(self, state, action, reward, next_state):
“””Experience replay for RL aspects”””
experience = {
‘state’: state,
‘action’: action,
‘reward’: reward,
‘next_state’: next_state,
‘timestamp’: len(self.memory)
}
self.memory.append(experience)

if len(self.memory) > 1000:
self.memory.pop(0)

def make_decision(self, X, exploration_rate=0.1):
“””Stable decision making”””
prediction = self.forward(X)

if np.random.random() < exploration_rate:
noise_scale = np.std(prediction) * 0.1 if np.std(prediction) > 0 else 0.1
noise = np.random.normal(0, noise_scale, prediction.shape)
prediction += noise

return np.clip(prediction, -1e6, 1e6)

def reset_if_unstable(self):
“””Reset network if training becomes unstable”””
print(” Resetting network due to instability…”)
for i, layer in enumerate(self.layers):
fan_in, fan_out = layer[‘weights’].shape
limit = np.sqrt(6.0 / (fan_in + fan_out))
layer[‘weights’] = np.random.uniform(-limit, limit, (fan_in, fan_out))
layer[‘bias’] = np.zeros((1, fan_out))
layer[‘momentum_w’] = np.zeros((fan_in, fan_out))
layer[‘momentum_b’] = np.zeros((1, fan_out))
self.lr = self.initial_lr

def train(self, X, y, epochs=500, batch_size=32, validation_split=0.2, verbose=True):
“””Robust training with stability checks”””
y_mean, y_std = np.mean(y), np.std(y)
y_normalized = (y – y_mean) / (y_std + self.epsilon)

X_trn, X_val, y_trn, y_val = train_test_split(
X, y_normalized, test_size=validation_split, random_state=42)

best_val_loss = float(‘inf’)
patience = 30
patience_counter = 0

train_losses, val_losses = [], []
reset_count = 0

for epoch in range(epochs):
if epoch > 0 and (not np.isfinite(train_losses[-1]) or train_losses[-1] > 1e6):
if reset_count < 2:
self.reset_if_unstable()
reset_count += 1
continue
else:
print(” Training unstable, stopping…”)
break

indices = np.random.permutation(len(X_train))
X_train_shuffled = X_train[indices]
y_train_shuffled = y_train[indices]

epoch_loss = 0
batches = 0
for i in range(0, len(X_trn), batch_size):
batch_X = X_train_shuffled[i:i+batch_size]
batch_y = y_train_shuffled[i:i+batch_size]

if len(batch_X) == 0:
continue

output = self.forward(batch_X)
self.backward(batch_X, batch_y, output)

loss, _ = self.calculate_loss(batch_y, output)
epoch_loss += loss
batches += 1

avg_train_loss = epoch_loss / max(batches, 1)

val_output = self.forward(X_val)
val_loss, val_mae = self.calculate_loss(y_val, val_output)

train_losses.append(avg_train_loss)
val_losses.append(val_loss)
self.performance_history.append(val_loss)

if val_loss < best_val_loss:
best_val_loss = val_loss
patience_counter = 0
else:
patience_counter += 1

if patience_counter >= patience:
if verbose:
print(f” Early stopping at epoch {epoch}”)
break

if epoch > 0:
self.adapt_learning_rate(epoch, self.performance_history)

if verbose and (epoch % 50 == 0 or epoch < 10):
print(f”Epoch {epoch:3d}: Train Loss = {avg_train_loss:.4f}, ”
f”Val Loss = {val_loss:.4f}, LR = {self.lr:.6f}”)

self.y_mean, self.y_std = y_mean, y_std
return train_losses, val_losses

def predict(self, X):
“””Make predictions with denormalization”””
normalized_pred = self.forward(X)
if hasattr(self, ‘y_mean’) and hasattr(self, ‘y_std’):
return normalized_pred * self.y_std + self.y_mean
return normalized_pred

def evaluate_performance(self, X, y):
“””Comprehensive performance evaluation”””
predictions = self.predict(X)
mse, mae = self.calculate_loss(y, predictions)

y_mean = np.mean(y)
ss_tot = np.sum((y – y_mean) ** 2)
ss_res = np.sum((y.reshape(-1, 1) – predictions) ** 2)
r2 = 1 – (ss_res / (ss_tot + self.epsilon))

return {
‘mse’: float(mse) if np.isfinite(mse) else float(‘inf’),
‘mae’: float(mae) if np.isfinite(mae) else float(‘inf’),
‘r2’: float(r2) if np.isfinite(r2) else -float(‘inf’),
‘predictions’: predictions.flatten()
}

def visualize_training(self, train_losses, val_losses):
“””Visualize training progress”””
plt.figure(figsize=(15, 5))

plt.subplot(1, 3, 1)
plt.plot(train_losses, label=’Training Loss’, alpha=0.8)
plt.plot(val_losses, label=’Validation Loss’, alpha=0.8)
plt.title(‘Training Progress’)
plt.xlabel(‘Epoch’)
plt.ylabel(‘Loss’)
plt.legend()
plt.grid(True, alpha=0.3)
plt.yscale(‘log’)

plt.subplot(1, 3, 2)
if len(self.performance_history) > 0:
plt.plot(self.performance_history)
plt.title(‘Performance History’)
plt.xlabel(‘Epoch’)
plt.ylabel(‘Validation Loss’)
plt.grid(True, alpha=0.3)
plt.yscale(‘log’)

plt.subplot(1, 3, 3)
if hasattr(self, ‘lr_history’):
plt.plot(self.lr_history)
plt.title(‘Learning Rate Schedule’)
plt.xlabel(‘Epoch’)
plt.ylabel(‘Learning Rate’)
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

We implement an AdvancedNeuralAgent that we initialize with Xavier limits, leaky-ReLU activations, and momentum buffers to stabilize gradients and speed convergence. We train with mini-batches, gradient clipping, L2 weight decay, adaptive learning rates, early stopping, and automatic resets, and we track MSE/MAE/R² with normalization for reliable metrics. We also add experience replay and exploratory decisions for agent-like behavior, and we expose plotting utilities to visualize losses, validation history, and the LR schedule. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserclass AIAgentDemo:
“””Demo class for testing the AI Agent with various scenarios”””

def __init__(self):
self.agents = {}
self.results = {}

def generate_datasets(self):
“””Generate multiple test datasets”””
datasets = {}

X1, y1 = make_regression(n_samples=600, n_features=5, n_informative=4,
noise=0.1, random_state=42)
datasets[‘simple’] = (X1, y1, “Simple Regression”)

X2, y2 = make_regression(n_samples=800, n_features=10, n_informative=8,
noise=0.2, random_state=123)
datasets[‘complex’] = (X2, y2, “Complex Regression”)

X3, y3 = make_classification(n_samples=700, n_features=8, n_informative=6,
n_classes=2, random_state=456)
y3 = y3.astype(float) + np.random.normal(0, 0.1, len(y3))
datasets[‘classification’] = (X3, y3, “Classification-to-Regression”)

return datasets

def test_agent_configuration(self, config_name, X, y, **agent_params):
“””Test agent with specific configuration”””
print(f”n Testing {config_name}…”)

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

default_params = {
‘input_size’: X_scaled.shape[1],
‘hidden_layers’: [32, 16],
‘output_size’: 1,
‘learning_rate’: 0.005
}
default_params.update(agent_params)

agent = AdvancedNeuralAgent(**default_params)

try:
train_losses, val_losses = agent.train(
X_scaled, y, epochs=150, batch_size=32, verbose=False)

X_trn, X_test, y_trn, y_test = train_test_split(
X_scaled, y, test_size=0.2, random_state=42)

performance = agent.evaluate_performance(X_test, y_test)

self.agents[config_name] = agent
self.results[config_name] = {
‘performance’: performance,
‘train_losses’: train_losses,
‘val_losses’: val_losses,
‘data_shape’: X_scaled.shape
}

print(f” {config_name}: R²={performance[‘r2’]:.3f}, MSE={performance[‘mse’]:.3f}”)
return True

except Exception as e:
print(f” {config_name} failed: {str(e)[:50]}…”)
return False

def run_comprehensive_demo(self):
“””Run comprehensive testing of the AI agent”””
print(” COMPREHENSIVE AI AGENT DEMO”)
print(“=” * 60)

datasets = self.generate_datasets()

configs = {
‘lightweight’: {‘hidden_layers’: [16, 8], ‘learning_rate’: 0.01},
‘standard’: {‘hidden_layers’: [32, 16], ‘learning_rate’: 0.005},
‘deep’: {‘hidden_layers’: [64, 32, 16], ‘learning_rate’: 0.003},
‘wide’: {‘hidden_layers’: [128, 64], ‘learning_rate’: 0.002}
}

success_count = 0
total_tests = len(datasets) * len(configs)

for dataset_name, (X, y, desc) in datasets.items():
print(f”n Dataset: {desc} – Shape: {X.shape}”)
print(f”Target range: [{np.min(y):.2f}, {np.max(y):.2f}]”)

for config_name, config_params in configs.items():
test_name = f”{dataset_name}_{config_name}”
if self.test_agent_configuration(test_name, X, y, **config_params):
success_count += 1

print(f”n OVERALL RESULTS: {success_count}/{total_tests} tests successful”)

if self.results:
self.show_best_performers()
self.demonstrate_agent_intelligence()

def show_best_performers(self):
“””Show top performing configurations”””
print(f”n TOP PERFORMERS:”)

sorted_results = sorted(self.results.items(),
key=lambda x: x[1][‘performance’][‘r2’],
reverse=True)

for i, (name, result) in enumerate(sorted_results[:5]):
perf = result[‘performance’]
print(f”{i+1}. {name}: R²={perf[‘r2’]:.3f}, MSE={perf[‘mse’]:.3f}, MAE={perf[‘mae’]:.3f}”)

def demonstrate_agent_intelligence(self):
“””Demonstrate advanced AI capabilities”””
if not self.agents:
return

print(f”n INTELLIGENCE DEMONSTRATION:”)

best_name = max(self.results.keys(),
key=lambda x: self.results[x][‘performance’][‘r2’])
best_agent = self.agents[best_name]

print(f”Using best agent: {best_name}”)

print(f” Memory capacity: {len(best_agent.memory)} experiences”)

dummy_input = np.random.randn(3, best_agent.layers[0][‘weights’].shape[0])
conservative_decisions = best_agent.make_decision(dummy_input, exploration_rate=0.0)
exploratory_decisions = best_agent.make_decision(dummy_input, exploration_rate=0.3)

print(f” Decision making:”)
print(f” Conservative: {conservative_decisions.flatten()[:3]}”)
print(f” Exploratory: {exploratory_decisions.flatten()[:3]}”)

if len(best_agent.performance_history) > 10:
initial_perf = np.mean(best_agent.performance_history[:5])
final_perf = np.mean(best_agent.performance_history[-5:])
improvement = ((initial_perf – final_perf) / initial_perf) * 100
print(f” Learning improvement: {improvement:.1f}%”)

total_params = sum(layer[‘weights’].size + layer[‘bias’].size
for layer in best_agent.layers)
print(f” Network complexity: {total_params} parameters”)

return best_agent

We orchestrate a comprehensive demo where we generate multiple datasets, sweep agent configurations, and train/evaluate each setup with standardized metrics (R², MSE, MAE). We log results, rank top performers, and then showcase “intelligence” by probing memory, exploration vs. exploitation decisions, learning improvement, and total parameter count. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserdef run_quick_demo():
“””Quick demo for immediate testing”””
print(” QUICK AI AGENT DEMO”)
print(“=” * 40)

X, y = make_regression(n_samples=500, n_features=6, noise=0.15, random_state=42)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

print(f”Dataset: {X_scaled.shape[0]} samples, {X_scaled.shape[1]} features”)

agent = AdvancedNeuralAgent(
input_size=X_scaled.shape[1],
hidden_layers=[24, 12],
output_size=1,
learning_rate=0.008
)

print(“Training agent…”)
train_losses, val_losses = agent.train(X_scaled, y, epochs=100, verbose=False)

X_trn, X_test, y_trn, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)
performance = agent.evaluate_performance(X_test, y_test)

print(f”n RESULTS:”)
print(f”R² Score: {performance[‘r2’]:.3f}”)
print(f”MSE: {performance[‘mse’]:.3f}”)
print(f”MAE: {performance[‘mae’]:.3f}”)

agent.visualize_training(train_losses, val_losses)

return agent

We add a quick demo utility that trains the agent on a simple regression dataset with six features, using a lightweight two-layer configuration. We normalize the data, train for 100 epochs, evaluate on a test split, and display R², MSE, and MAE before plotting training vs. validation loss curves for immediate feedback. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserif __name__ == “__main__”:
print(“Choose demo type:”)
print(“1. Quick Demo (fast)”)
print(“2. Comprehensive Demo (detailed)”)

demo = AIAgentDemo()
best_agent = demo.run_comprehensive_demo()

We define the main entry point so the script can be run directly. We display demo options, initialize AIAgentDemo, and by default execute the comprehensive demo, which trains multiple configurations across datasets, evaluates performance, and highlights the best agent.

In conclusion, we demonstrate how stability-aware engineering choices, ranging from weight decay regularization to dynamic learning rate scaling based on validation loss history, play a critical role in achieving consistent performance across diverse datasets. The agent is not just a static predictor; it actively adapts by storing past experiences, injecting controlled exploration into its decisions, and resetting its parameters when instability thresholds are reached. We further validate the design through comprehensive demos across lightweight, standard, deep, and wide configurations, benchmarking performance on simple, complex, and classification-derived regression datasets. The results highlight measurable improvements in R², MSE, and MAE, while visualization tools provide insight into learning dynamics and convergence behavior.

Check out the FULL CODES here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post How to Build a Robust Advanced Neural AI Agent with Stable Training, Adaptive Learning, and Intelligent Decision-Making? appeared first on MarkTechPost.

Top 12 Robotics AI Blogs/NewsWebsites 2025

Posted on September 14, 2025 by i-genie

Robotics and artificial intelligence are converging at an unprecedented pace, driving breakthroughs in automation, perception, and human-machine collaboration. Staying current with these advancements requires following specialized sources that deliver technical depth, research updates, and industry insights. The following list highlights 12 of the most authoritative robotics and AI-focused blogs and websites to track in 2025.

IEEE Spectrum – Robotics

IEEE Spectrum’s robotics section remains one of the most respected sources for deep technical reporting on autonomy, robot design, locomotion, and control. It combines industry analysis with lab-level insights.

MarkTechPost

MarkTechPost regularly covers robotics research within the broader AI and machine learning ecosystem. It highlights cutting-edge work in robot learning, perception, simulation, and multi-agent systems.

Robohub

Robohub is a community-driven platform with contributions from robotics researchers, engineers, and practitioners worldwide. It includes interviews, technical discussions, and updates from research labs.

The Robot Report

This news platform blends robotics industry news with technical reporting. It tracks startup activity, industrial automation, and advanced robot designs across sectors.

Academic & Research Lab Blogs

Blogs from labs such as MIT CSAIL, CMU Robotics Institute, and Berkeley Artificial Intelligence Research (BAIR) often post about their latest robotics research, datasets, and open-source releases.

Specialist AI-Robotics Hybrids

AI-focused platforms like DeepMind’s blog and Meta AI Research blog frequently publish robotics-related research at the intersection of deep learning, simulation, and embodied AI.

Robotics Industries Association (RIA) – Robotics.org

The RIA offers updates on robotics standards, system integration, and industrial automation with strong technical context.

Phys.org – Robotics Section

Phys.org aggregates global robotics research news, covering new algorithms, robotic platforms, and mechanical innovations across academia and industry.

ZDNet – Robotics

ZDNet’s robotics coverage focuses on automation in enterprise settings, offering insight into emerging robotic platforms and their technical deployment.

Singularity Hub – Robots

Singularity Hub explores robotics research along with long-term societal implications. Articles often bridge lab breakthroughs with discussions on AI ethics and human-robot coexistence.

IEEE Robotics & Automation Society

The IEEE RAS blog and conference sites (e.g., IROS, RSS) share technical papers, tutorials, and summaries, making them essential for academic and applied robotics communities.

Towards Data Science – Robotics/AI Articles

Practitioners publish robotics-AI tutorials, implementations, and control algorithm discussions here, bridging applied ML with robotics systems.

Conclusion

As robotics continues to evolve across industrial, academic, and consumer domains, these platforms provide essential perspectives on research progress, engineering practices, and real-world deployment. Whether the focus is on control systems, embodied AI, or collaborative robots, these resources remain critical for understanding the trajectory of robotics and its integration with AI in 2025 and beyond.
The post Top 12 Robotics AI Blogs/NewsWebsites 2025 appeared first on MarkTechPost.

Google AI Releases VaultGemma: The Largest and Most Capable Open Model …

Posted on September 14, 2025 by i-genie

Google AI Research and DeepMind have released VaultGemma 1B, the largest open-weight large language model trained entirely with differential privacy (DP). This development is a major step toward building AI models that are both powerful and privacy-preserving.

Why Do We Need Differential Privacy in LLMs?

Large language models trained on vast web-scale datasets are prone to memorization attacks, where sensitive or personally identifiable information can be extracted from the model. Studies have shown that verbatim training data can resurface, especially in open-weight releases.

Differential Privacy offers a mathematical guarantee that prevents any single training example from significantly influencing the model. Unlike approaches that apply DP only during fine-tuning, VaultGemma enforces full private pretraining, ensuring that privacy protection begins at the foundational level.

https://services.google.com/fh/files/blogs/vaultgemma_tech_report.pdf

What Is the Architecture of VaultGemma?

VaultGemma is architecturally similar to earlier Gemma models, but optimized for private training.

Model size: 1B parameters, 26 layers.

Transformer type: Decoder-only.

Activations: GeGLU with feedforward dimension of 13,824.

Attention: Multi-Query Attention (MQA) with global span of 1024 tokens.

Normalization: RMSNorm in pre-norm configuration.

Tokenizer: SentencePiece with a 256K vocabulary.

A notable change is the reduction of sequence length to 1024 tokens, which lowers compute costs and enables larger batch sizes under DP constraints.

What Data Was Used for Training?

VaultGemma was trained on the same 13 trillion-token dataset as Gemma 2, composed primarily of English text from web documents, code, and scientific articles.

The dataset underwent several filtering stages to:

Remove unsafe or sensitive content.

Reduce personal information exposure.

Prevent evaluation data contamination.

This ensures both safety and fairness in benchmarking.

How Was Differential Privacy Applied?

VaultGemma used DP-SGD (Differentially Private Stochastic Gradient Descent) with gradient clipping and Gaussian noise addition. Implementation was built on JAX Privacy and introduced optimizations for scalability:

Vectorized per-example clipping for parallel efficiency.

Gradient accumulation to simulate large batches.

Truncated Poisson Subsampling integrated into the data loader for efficient on-the-fly sampling.

The model achieved a formal DP guarantee of (ε ≤ 2.0, δ ≤ 1.1e−10) at the sequence level (1024 tokens).

How Do Scaling Laws Work for Private Training?

Training large models under DP constraints requires new scaling strategies. The VaultGemma team developed DP-specific scaling laws with three innovations:

Optimal learning rate modeling using quadratic fits across training runs.

Parametric extrapolation of loss values to reduce reliance on intermediate checkpoints.

Semi-parametric fits to generalize across model size, training steps, and noise-batch ratios.

This methodology enabled precise prediction of achievable loss and efficient resource use on the TPUv6e training cluster.

What Were the Training Configurations?

VaultGemma was trained on 2048 TPUv6e chips using GSPMD partitioning and MegaScale XLA compilation.

Batch size: ~518K tokens.

Training iterations: 100,000.

Noise multiplier: 0.614.

The achieved loss was within 1% of predictions from the DP scaling law, validating the approach.

How Does VaultGemma Perform Compared to Non-Private Models?

On academic benchmarks, VaultGemma trails its non-private counterparts but shows strong utility:

ARC-C: 26.45 vs. 38.31 (Gemma-3 1B).

PIQA: 68.0 vs. 70.51 (GPT-2 1.5B).

TriviaQA (5-shot): 11.24 vs. 39.75 (Gemma-3 1B).

These results suggest that DP-trained models are currently comparable to non-private models from about five years ago. Importantly, memorization tests confirmed that no training data leakage was detectable in VaultGemma, unlike in non-private Gemma models.

https://services.google.com/fh/files/blogs/vaultgemma_tech_report.pdf

Summary

In summary, VaultGemma 1B proves that large-scale language models can be trained with rigorous differential privacy guarantees without making them impractical to use. While a utility gap remains compared to non-private counterparts, the release of both the model and its training methodology provides the community with a strong foundation for advancing private AI. This work signals a shift toward building models that are not only capable but also inherently safe, transparent, and privacy-preserving.

Check out the Paper, Model on Hugging Face and Technical Details. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post Google AI Releases VaultGemma: The Largest and Most Capable Open Model (1B-parameters) Trained from Scratch with Differential Privacy appeared first on MarkTechPost.

Deepdub Introduces Lightning 2.5: A Real-Time AI Voice Model With 2.8x …

Posted on September 13, 2025 by i-genie

Deepdub, an Israeli Voice AI startup, has introduced Lightning 2.5, a real-time foundational voice model designed to power scalable, production-grade voice applications. The new release delivers substantial improvements in performance and efficiency, positioning it for use in live interactive systems such as contact centers, AI agents, and real-time dubbing.

Performance and Efficiency

Lightning 2.5 achieves 2.8× higher throughput compared to previous versions, alongside a 5× efficiency gain in terms of computational resource utilization. Delivering latency as low as 200 milliseconds—roughly half a second faster than typical industry benchmarks—Lightning enables true real-time performance across use cases like live conversational AI, on-the-fly voiceovers, and event-driven AI pipelines.

The model is optimized for NVIDIA GPU-accelerated environments, ensuring deployment at scale without compromising qualitu. By leveraging parallelized inference pipelines, Deepdub has positioned Lightning 2.5 as a high-performance solution for latency-sensitive scenarios.

Real-Time Applications

Lightning 2.5 positions itself in a landscape where voice is at core to user experience. Deployment applications include:

Customer support platforms that require seamless multilingual conversations.

AI agents and virtual assistants delivering natural, real-time interactions.

Media localization through instant dubbing across multiple languages.

Gaming and entertainment voice chat requiring expressive and natural speech output.

In a PR release, Deepdub team emphasized that Lightning maintains voice fidelity, natural prosody, and emotional nuance while scaling across multiple languages, a challenge for most real-time TTS (text-to-speech) systems.

Summary

Lightning 2.5 underscores Deepdub’s push to make real-time, high-quality multilingual voice generation practical at scale. With notable gains in throughput and efficiency, the model positions the company to compete in enterprise voice AI, though its ultimate impact will depend on adoption, integration ease, and how it measures up against rival systems in real-world deployments.
The post Deepdub Introduces Lightning 2.5: A Real-Time AI Voice Model With 2.8x Throughput Gains for Scalable AI Agents and Enterprise AI appeared first on MarkTechPost.

BentoML Released llm-optimizer: An Open-Source AI Tool for Benchmarkin …

Posted on September 13, 2025 by i-genie

BentoML has recently released llm-optimizer, an open-source framework designed to streamline the benchmarking and performance tuning of self-hosted large language models (LLMs). The tool addresses a common challenge in LLM deployment: finding optimal configurations for latency, throughput, and cost without relying on manual trial-and-error.

Why is tuning the LLM performance difficult?

Tuning LLM inference is a balancing act across many moving parts—batch size, framework choice (vLLM, SGLang, etc.), tensor parallelism, sequence lengths, and how well the hardware is utilized. Each of these factors can shift performance in different ways, which makes finding the right combination for speed, efficiency, and cost far from straightforward. Most teams still rely on repetitive trial-and-error testing, a process that is slow, inconsistent, and often inconclusive. For self-hosted deployments, the cost of getting it wrong is high: poorly tuned configurations can quickly translate into higher latency and wasted GPU resources.

How llm-optimizer is different?

llm-optimizer provides a structured way to explore the LLM performance landscape. It eliminates repetitive guesswork by enabling systematic benchmarking and automated search across possible configurations.

Core capabilities include:

Running standardized tests across inference frameworks such as vLLM and SGLang.

Applying constraint-driven tuning, e.g., surfacing only configurations where time-to-first-token is below 200ms.

Automating parameter sweeps to identify optimal settings.

Visualizing tradeoffs with dashboards for latency, throughput, and GPU utilization.

The framework is open-source and available on GitHub.

How can devs explore results without running benchmarks locally?

Alongside the optimizer, BentoML released the LLM Performance Explorer, a browser-based interface powered by llm-optimizer. It provides pre-computed benchmark data for popular open-source models and lets users:

Compare frameworks and configurations side by side.

Filter by latency, throughput, or resource thresholds.

Browse tradeoffs interactively without provisioning hardware.

How does llm-optimizer impact LLM deployment practices?

As the use of LLMs grows, getting the most out of deployments comes down to how well inference parameters are tuned. llm-optimizer lowers the complexity of this process, giving smaller teams access to optimization techniques that once required large-scale infrastructure and deep expertise.

By providing standardized benchmarks and reproducible results, the framework adds much-needed transparency to the LLM space. It makes comparisons across models and frameworks more consistent, closing a long-standing gap in the community.

Ultimately, BentoML’s llm-optimizer brings a constraint-driven, benchmark-focused method to self-hosted LLM optimization, replacing ad-hoc trial and error with a systematic and repeatable workflow.

Check out the GitHub Page. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post BentoML Released llm-optimizer: An Open-Source AI Tool for Benchmarking and Optimizing LLM Inference appeared first on MarkTechPost.

How to Build a Multilingual OCR AI Agent in Python with EasyOCR and Op …

Posted on September 13, 2025 by i-genie

In this tutorial, we build an Advanced OCR AI Agent in Google Colab using EasyOCR, OpenCV, and Pillow, running fully offline with GPU acceleration. The agent includes a preprocessing pipeline with contrast enhancement (CLAHE), denoising, sharpening, and adaptive thresholding to improve recognition accuracy. Beyond basic OCR, we filter results by confidence, generate text statistics, and perform pattern detection (emails, URLs, dates, phone numbers) along with simple language hints. The design also supports batch processing, visualization with bounding boxes, and structured exports for flexible usage. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browser!pip install easyocr opencv-python pillow matplotlib

import easyocr
import cv2
import numpy as np
from PIL import Image, ImageEnhance, ImageFilter
import matplotlib.pyplot as plt
import os
import json
from typing import List, Dict, Tuple, Optional
import re
from google.colab import files
import io

We start by installing the required libraries, EasyOCR, OpenCV, Pillow, and Matplotlib, to set up our environment. We then import all necessary modules so we can handle image preprocessing, OCR, visualization, and file operations seamlessly. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserclass AdvancedOCRAgent:
“””
Advanced OCR AI Agent with preprocessing, multi-language support,
and intelligent text extraction capabilities.
“””

def __init__(self, languages: List[str] = [‘en’], gpu: bool = True):
“””Initialize OCR agent with specified languages.”””
print(” Initializing Advanced OCR Agent…”)
self.languages = languages
self.reader = easyocr.Reader(languages, gpu=gpu)
self.confidence_threshold = 0.5
print(f” OCR Agent ready! Languages: {languages}”)

def upload_image(self) -> Optional[str]:
“””Upload image file through Colab interface.”””
print(” Upload your image file:”)
uploaded = files.upload()
if uploaded:
filename = list(uploaded.keys())[0]
print(f” Uploaded: {filename}”)
return filename
return None

def preprocess_image(self, image: np.ndarray, enhance: bool = True) -> np.ndarray:
“””Advanced image preprocessing for better OCR accuracy.”””
if len(image.shape) == 3:
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
else:
gray = image.copy()

if enhance:
clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8,8))
gray = clahe.apply(gray)

gray = cv2.fastNlMeansDenoising(gray)

kernel = np.array([[-1,-1,-1], [-1,9,-1], [-1,-1,-1]])
gray = cv2.filter2D(gray, -1, kernel)

binary = cv2.adaptiveThreshold(
gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2
)

return binary

def extract_text(self, image_path: str, preprocess: bool = True) -> Dict:
“””Extract text from image with advanced processing.”””
print(f” Processing image: {image_path}”)

image = cv2.imread(image_path)
if image is None:
raise ValueError(f”Could not load image: {image_path}”)

if preprocess:
processed_image = self.preprocess_image(image)
else:
processed_image = image

results = self.reader.readtext(processed_image)

extracted_data = {
‘raw_results’: results,
‘filtered_results’: [],
‘full_text’: ”,
‘confidence_stats’: {},
‘word_count’: 0,
‘line_count’: 0
}

high_confidence_text = []
confidences = []

for (bbox, text, confidence) in results:
if confidence >= self.confidence_threshold:
extracted_data[‘filtered_results’].append({
‘text’: text,
‘confidence’: confidence,
‘bbox’: bbox
})
high_confidence_text.append(text)
confidences.append(confidence)

extracted_data[‘full_text’] = ‘ ‘.join(high_confidence_text)
extracted_data[‘word_count’] = len(extracted_data[‘full_text’].split())
extracted_data[‘line_count’] = len(high_confidence_text)

if confidences:
extracted_data[‘confidence_stats’] = {
‘mean’: np.mean(confidences),
‘min’: np.min(confidences),
‘max’: np.max(confidences),
‘std’: np.std(confidences)
}

return extracted_data

def visualize_results(self, image_path: str, results: Dict, show_bbox: bool = True):
“””Visualize OCR results with bounding boxes.”””
image = cv2.imread(image_path)
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

plt.figure(figsize=(15, 10))

if show_bbox:
plt.subplot(2, 2, 1)
img_with_boxes = image_rgb.copy()

for item in results[‘filtered_results’]:
bbox = np.array(item[‘bbox’]).astype(int)
cv2.polylines(img_with_boxes, [bbox], True, (255, 0, 0), 2)

x, y = bbox[0]
cv2.putText(img_with_boxes, f”{item[‘confidence’]:.2f}”,
(x, y-10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 1)

plt.imshow(img_with_boxes)
plt.title(“OCR Results with Bounding Boxes”)
plt.axis(‘off’)

plt.subplot(2, 2, 2)
processed = self.preprocess_image(image)
plt.imshow(processed, cmap=’gray’)
plt.title(“Preprocessed Image”)
plt.axis(‘off’)

plt.subplot(2, 2, 3)
confidences = [item[‘confidence’] for item in results[‘filtered_results’]]
if confidences:
plt.hist(confidences, bins=20, alpha=0.7, color=’blue’)
plt.xlabel(‘Confidence Score’)
plt.ylabel(‘Frequency’)
plt.title(‘Confidence Score Distribution’)
plt.axvline(self.confidence_threshold, color=’red’, linestyle=’–‘,
label=f’Threshold: {self.confidence_threshold}’)
plt.legend()

plt.subplot(2, 2, 4)
stats = results[‘confidence_stats’]
if stats:
labels = [‘Mean’, ‘Min’, ‘Max’]
values = [stats[‘mean’], stats[‘min’], stats[‘max’]]
plt.bar(labels, values, color=[‘green’, ‘red’, ‘blue’])
plt.ylabel(‘Confidence Score’)
plt.title(‘Confidence Statistics’)
plt.ylim(0, 1)

plt.tight_layout()
plt.show()

def smart_text_analysis(self, text: str) -> Dict:
“””Perform intelligent analysis of extracted text.”””
analysis = {
‘language_detection’: ‘unknown’,
‘text_type’: ‘unknown’,
‘key_info’: {},
‘patterns’: []
}

email_pattern = r’b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[A-Z|a-z]{2,}b’
phone_pattern = r'(+d{1,3}[-.s]?)?(?d{3})?[-.s]?d{3}[-.s]?d{4}’
url_pattern = r’http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*,]|(?:%[0-9a-fA-F][0-9a-fA-F]))+’
date_pattern = r’bd{1,2}[/-]d{1,2}[/-]d{2,4}b’

patterns = {
’emails’: re.findall(email_pattern, text, re.IGNORECASE),
‘phones’: re.findall(phone_pattern, text),
‘urls’: re.findall(url_pattern, text, re.IGNORECASE),
‘dates’: re.findall(date_pattern, text)
}

analysis[‘patterns’] = {k: v for k, v in patterns.items() if v}

if any(patterns.values()):
if patterns.get(’emails’) or patterns.get(‘phones’):
analysis[‘text_type’] = ‘contact_info’
elif patterns.get(‘urls’):
analysis[‘text_type’] = ‘web_content’
elif patterns.get(‘dates’):
analysis[‘text_type’] = ‘document_with_dates’

if re.search(r'[а-яё]’, text.lower()):
analysis[‘language_detection’] = ‘russian’
elif re.search(r'[àáâãäåæçèéêëìíîïñòóôõöøùúûüý]’, text.lower()):
analysis[‘language_detection’] = ‘romance_language’
elif re.search(r'[一-龯]’, text):
analysis[‘language_detection’] = ‘chinese’
elif re.search(r'[ひらがなカタカナ]’, text):
analysis[‘language_detection’] = ‘japanese’
elif re.search(r'[a-zA-Z]’, text):
analysis[‘language_detection’] = ‘latin_based’

return analysis

def process_batch(self, image_folder: str) -> List[Dict]:
“””Process multiple images in batch.”””
results = []
supported_formats = (‘.png’, ‘.jpg’, ‘.jpeg’, ‘.bmp’, ‘.tiff’)

for filename in os.listdir(image_folder):
if filename.lower().endswith(supported_formats):
image_path = os.path.join(image_folder, filename)
try:
result = self.extract_text(image_path)
result[‘filename’] = filename
results.append(result)
print(f” Processed: {filename}”)
except Exception as e:
print(f” Error processing {filename}: {str(e)}”)

return results

def export_results(self, results: Dict, format: str = ‘json’) -> str:
“””Export results in specified format.”””
if format.lower() == ‘json’:
output = json.dumps(results, indent=2, ensure_ascii=False)
filename = ‘ocr_results.json’
elif format.lower() == ‘txt’:
output = results[‘full_text’]
filename = ‘extracted_text.txt’
else:
raise ValueError(“Supported formats: ‘json’, ‘txt'”)

with open(filename, ‘w’, encoding=’utf-8′) as f:
f.write(output)

print(f” Results exported to: {filename}”)
return filename

We define an AdvancedOCRAgent that we initialize with multilingual EasyOCR and a GPU, and we set a confidence threshold to control output quality. We preprocess images (CLAHE, denoise, sharpen, adaptive threshold), extract text, visualize bounding boxes and confidence, run smart pattern/language analysis, support batch folders, and export results as JSON or TXT. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserdef demo_ocr_agent():
“””Demonstrate the OCR agent capabilities.”””
print(” Advanced OCR AI Agent Demo”)
print(“=” * 50)

ocr = AdvancedOCRAgent(languages=[‘en’], gpu=True)

image_path = ocr.upload_image()
if image_path:
try:
results = ocr.extract_text(image_path, preprocess=True)

print(“n OCR Results:”)
print(f”Words detected: {results[‘word_count’]}”)
print(f”Lines detected: {results[‘line_count’]}”)
print(f”Average confidence: {results[‘confidence_stats’].get(‘mean’, 0):.2f}”)

print(“n Extracted Text:”)
print(“-” * 30)
print(results[‘full_text’])
print(“-” * 30)

analysis = ocr.smart_text_analysis(results[‘full_text’])
print(f”n Smart Analysis:”)
print(f”Detected text type: {analysis[‘text_type’]}”)
print(f”Language hints: {analysis[‘language_detection’]}”)
if analysis[‘patterns’]:
print(f”Found patterns: {list(analysis[‘patterns’].keys())}”)

ocr.visualize_results(image_path, results)

ocr.export_results(results, ‘json’)

except Exception as e:
print(f” Error: {str(e)}”)
else:
print(“No image uploaded. Please try again.”)

if __name__ == “__main__”:
demo_ocr_agent()

We create a demo function that walks us through the full OCR workflow: we initialize the agent with English and GPU support, upload an image, preprocess it, and extract text with confidence stats. We then display the results, run smart text analysis to detect patterns and language hints, visualize bounding boxes and scores, and finally export everything into a JSON file.

In conclusion, we create a robust OCR pipeline that combines preprocessing, recognition, and analysis in a single Colab workflow. We enhance EasyOCR outputs using OpenCV techniques, visualize results for interpretability, and add confidence metrics for reliability. The agent is modular, allowing both single-image and batch processing, with results exported in JSON or text formats. This shows that open-source tools can deliver production-grade OCR without external APIs, while leaving room for domain-specific extensions like invoice or document parsing.

Check out the FULL CODES here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post How to Build a Multilingual OCR AI Agent in Python with EasyOCR and OpenCV appeared first on MarkTechPost.

Automate advanced agentic RAG pipeline with Amazon SageMaker AI

Posted on September 13, 2025 by i-genie

Retrieval Augmented Generation (RAG) is a fundamental approach for building advanced generative AI applications that connect large language models (LLMs) to enterprise knowledge. However, crafting a reliable RAG pipeline is rarely a one-shot process. Teams often need to test dozens of configurations (varying chunking strategies, embedding models, retrieval techniques, and prompt designs) before arriving at a solution that works for their use case. Furthermore, management of high-performing RAG pipeline involves complex deployment, with teams often using manual RAG pipeline management, leading to inconsistent results, time-consuming troubleshooting, and difficulty in reproducing successful configurations. Teams struggle with scattered documentation of parameter choices, limited visibility into component performance, and the inability to systematically compare different approaches. Additionally, the lack of automation creates bottlenecks in scaling the RAG solutions, increases operational overhead, and makes it challenging to maintain quality across multiple deployments and environments from development to production.
In this post, we walk through how to streamline your RAG development lifecycle from experimentation to automation, helping you operationalize your RAG solution for production deployments with Amazon SageMaker AI, helping your team experiment efficiently, collaborate effectively, and drive continuous improvement. By combining experimentation and automation with SageMaker AI, you can verify that the entire pipeline is versioned, tested, and promoted as a cohesive unit. This approach provides comprehensive guidance for traceability, reproducibility, and risk mitigation as the RAG system advances from development to production, supporting continuous improvement and reliable operation in real-world scenarios.
Solution overview
By streamlining both experimentation and operational workflows, teams can use SageMaker AI to rapidly prototype, deploy, and monitor RAG applications at scale. Its integration with SageMaker managed MLflow provides a unified platform for tracking experiments, logging configurations, and comparing results, supporting reproducibility and robust governance throughout the pipeline lifecycle. Automation also minimizes manual intervention, reduces errors, and streamlines the process of promoting the finalized RAG pipeline from the experimentation phase directly into production. With this approach, every stage from data ingestion to output generation operates efficiently and securely, while making it straightforward to transition validated solutions from development to production deployment.
For automation, Amazon SageMaker Pipelines orchestrates end-to-end RAG workflows from data preparation and vector embedding generation to model inference and evaluation all with repeatable and version-controlled code. Integrating continuous integration and delivery (CI/CD) practices further enhances reproducibility and governance, enabling automated promotion of validated RAG pipelines from development to staging or production environments. Promoting an entire RAG pipeline (not just an individual subsystem of the RAG solution like a chunking layer or orchestration layer) to higher environments is essential because data, configurations, and infrastructure can vary significantly across staging and production. In production, you often work with live, sensitive, or much larger datasets, and the way data is chunked, embedded, retrieved, and generated can impact system performance and output quality in ways that are not always apparent in lower environments. Each stage of the pipeline (chunking, embedding, retrieval, and generation) must be thoroughly evaluated with production-like data for accuracy, relevance, and robustness. Metrics at every stage (such as chunk quality, retrieval relevance, answer correctness, and LLM evaluation scores) must be monitored and validated before the pipeline is trusted to serve real users.
The following diagram illustrates the architecture of a scalable RAG pipeline built on SageMaker AI, with MLflow experiment tracking seamlessly integrated at every stage and the RAG pipeline automated using SageMaker Pipelines. SageMaker managed MLflow provides a unified platform for centralized RAG experiment tracking across all pipeline stages. Every MLflow execution run whether for RAG chunking, ingestion, retrieval, or evaluation sends execution logs, parameters, metrics, and artifacts to SageMaker managed MLflow. The architecture uses SageMaker Pipelines to orchestrate the entire RAG workflow through versioned, repeatable automation. These RAG pipelines manage dependencies between critical stages, from data ingestion and chunking to embedding generation, retrieval, and final text generation, supporting consistent execution across environments. Integrated with CI/CD practices, SageMaker Pipelines enable seamless promotion of validated RAG configurations from development to staging and production environments while maintaining infrastructure as code (IaC) traceability.

For the operational workflow, the solution follows a structured lifecycle: During experimentation, data scientists iterate on pipeline components within Amazon SageMaker Studio notebooks while SageMaker managed MLflow captures parameters, metrics, and artifacts at every stage. Validated workflows are then codified into SageMaker Pipelines and versioned in Git. The automated promotion phase uses CI/CD to trigger pipeline execution in target environments, rigorously validating stage-specific metrics (chunk quality, retrieval relevance, answer correctness) against production data before deployment. The other core components include:

Amazon SageMaker JumpStart for accessing the latest LLM models and hosting them on SageMaker endpoints for model inference with the embedding model huggingface-textembedding-all-MiniLM-L6-v2 and text generation model deepseek-llm-r1-distill-qwen-7b.
Amazon OpenSearch Service as a vector database to store document embeddings with the OpenSearch index configured for k-nearest neighbors (k-NN) search.
The Amazon Bedrock model anthropic.claude-3-haiku-20240307-v1:0 as an LLM-as-a-judge component for all the MLflow LLM evaluation metrics.
A SageMaker Studio notebook for a development environment to experiment and automate the RAG pipelines with SageMaker managed MLflow and SageMaker Pipelines.

You can implement this agentic RAG solution code from the GitHub repository. In the following sections, we use snippets from this code in the repository to illustrate RAG pipeline experiment evolution and automation.
Prerequisites
You must have the following prerequisites:

An AWS account with billing enabled.
A SageMaker AI domain. For more information, see Use quick setup for Amazon SageMaker AI.
Access to a running SageMaker managed MLflow tracking server in SageMaker Studio. For more information, see the instructions for setting up a new MLflow tracking server.
Access to SageMaker JumpStart to host LLM embedding and text generation models.
Access to the Amazon Bedrock foundation models (FMs) for RAG evaluation tasks. For more details, see Subscribe to a model.

SageMaker MLFlow RAG experiment
SageMaker managed MLflow provides a powerful framework for organizing RAG experiments, so teams can manage complex, multi-stage processes with clarity and precision. The following diagram illustrates the RAG experiment stages with SageMaker managed MLflow experiment tracking at every stage. This centralized tracking offers the following benefits:

Reproducibility: Every experiment is fully documented, so teams can replay and compare runs at any time
Collaboration: Shared experiment tracking fosters knowledge sharing and accelerates troubleshooting
Actionable insights: Visual dashboards and comparative analytics help teams identify the impact of pipeline changes and drive continuous improvement

The following diagram illustrates the solution workflow.

Each RAG experiment in MLflow is structured as a top-level run under a specific experiment name. Within this top-level run, nested runs are created for each major pipeline stage, such as data preparation, data chunking, data ingestion, RAG retrieval, and RAG evaluation. This hierarchical approach allows for granular tracking of parameters, metrics, and artifacts at every step, while maintaining a clear lineage from raw data to final evaluation results.
The following screenshot shows an example of the experiment details in MLflow.

The various RAG pipeline steps defined are:

Data preparation: Logs dataset version, preprocessing steps, and initial statistics
Data chunking: Records chunking strategy, chunk size, overlap, and resulting chunk counts
Data ingestion: Tracks embedding model, vector database details, and document ingestion metrics
RAG retrieval: Captures retrieval model, context size, and retrieval performance metrics
RAG evaluation: Logs evaluation metrics (such as answer similarity, correctness, and relevance) and sample results

This visualization provides a clear, end-to-end view of the RAG pipeline’s execution, so you can trace the impact of changes at any stage and achieve full reproducibility. The architecture supports scaling to multiple experiments, each representing a distinct configuration or hypothesis (for example, different chunking strategies, embedding models, or retrieval parameters). MLflow’s experiment UI visualizes these experiments side by side, enabling side-by-side comparison and analysis across runs. This structure is especially valuable in enterprise settings, where dozens or even hundreds of experiments might be conducted to optimize RAG performance.
We use MLflow experimentation throughout the RAG pipeline to log metrics and parameters, and the different experiment runs are initialized as shown in the following code snippet:

with mlflow.start_run() as run:
   main_run_id = run.info.run_id
   print(“mlflow_run”, run_id)
   with mlflow.start_run(run_name=”DataPreparation”, nested=True):

RAG pipeline experimentation
The key components of the RAG workflow are ingestion, chunking, retrieval, and evaluation, which we explain in this section. The MLflow dashboard makes it straightforward to visualize and analyze these parameters and metrics, supporting data-driven refinement of the chunking stage within the RAG pipeline.

Data ingestion and preparation
In the RAG workflow, rigorous data preparation is foundational to downstream performance and reliability. Tracking detailed metrics on data quality, such as the total number of question-answer pairs, the count of unique questions, average context length, and initial evaluation predictions, provides essential visibility into the dataset’s structure and suitability for RAG tasks. These metrics help validate the dataset is comprehensive, diverse, and contextually rich, which directly impacts the relevance and accuracy of the RAG system’s responses. Additionally, logging critical RAG parameters like the data source, detected personally identifiable information (PII) types, and data lineage information is vital for maintaining compliance, reproducibility, and trust in enterprise environments. Capturing this metadata in SageMaker managed MLflow supports robust experiment tracking, auditability, efficient comparison, and root cause analysis across multiple data preparation runs, as visualized in the MLflow dashboard. This disciplined approach to data preparation lays the groundwork for effective experimentation, governance, and continuous improvement throughout the RAG pipeline. The following screenshot shows an example of the experiment run details in MLflow.

Data chunking
After data preparation, the next step is to split documents into manageable chunks for efficient embedding and retrieval. This process is pivotal, because the quality and granularity of chunks directly affect the relevance and completeness of answers returned by the RAG system. The RAG workflow in this post supports experimentation and RAG pipeline automation with both fixed-size and recursive chunking strategies for comparison and validations. However, this RAG solution can be expanded to many other chucking techniques.

FixedSizeChunker divides text into uniform chunks with configurable overlap
RecursiveChunker splits text along logical boundaries such as paragraphs or sentences

Tracking detailed chunking metrics such as total_source_contexts_entries, total_contexts_chunked, and total_unique_chunks_final is crucial for understanding how much of the source data is represented, how effectively it is segmented, and whether the chunking approach is yielding the desired coverage and uniqueness. These metrics help diagnose issues like excessive duplication or under-segmentation, which can impact retrieval accuracy and model performance.
Additionally, logging parameters such as chunking_strategy_type (for example, FixedSizeChunker), chunking_strategy_chunk_size (for example, 500 characters), and chunking_strategy_chunk_overlap provide transparency and reproducibility for each experiment. Capturing these details in SageMaker managed MLflow helps teams systematically compare the impact of different chunking configurations, optimize for efficiency and contextual relevance, and maintain a clear audit trail of how chunking decisions evolve over time. The MLflow dashboard makes it straightforward to visualize and analyze these parameters and metrics, supporting data-driven refinement of the chunking stage within the RAG pipeline. The following screenshot shows an example of the experiment run details in MLflow.

After the documents are chunked, the next step is to convert these chunks into vector embeddings using a SageMaker embedding endpoint, after which the embeddings are ingested into a vector database such as OpenSearch Service for fast semantic search. This ingestion phase is crucial because the quality, completeness, and traceability of what enters the vector store directly determine the effectiveness and reliability of downstream retrieval and generation stages.
Tracking ingestion metrics such as the number of documents and chunks ingested provides visibility into pipeline throughput and helps identify bottlenecks or data loss early in the process. Logging detailed parameters, including the embedding model ID, endpoint used, and vector database index, is essential for reproducibility and auditability. This metadata helps teams trace exactly which model and infrastructure were used for each ingestion run, supporting root cause analysis and compliance, especially when working with evolving datasets or sensitive information.
Retrieval and generation
For a given query, we generate an embedding and retrieve the top-k relevant chunks from OpenSearch Service. For answer generation, we use a SageMaker LLM endpoint. The retrieved context and the query are combined into a prompt, and the LLM generates an answer. Finally, we orchestrate retrieval and generation using LangGraph, enabling stateful workflows and advanced tracing:

graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_with_context = graph_builder.compile()

With the GenerativeAI agent defined with LangGraph framework, the agentic layers are evaluated for each iteration of RAG development, verifying the efficacy of the RAG solution for agentic applications. Each retrieval and generation run is logged to SageMaker managed MLflow, capturing the prompt, generated response, and key metrics and parameters such as retrieval performance, top-k values, and the specific model endpoints used. Tracking these details in MLflow is essential for evaluating the effectiveness of the retrieval stage, making sure the returned documents are relevant and that the generated answers are accurate and complete. It is equally important to track the performance of the vector database during retrieval, including metrics like query latency, throughput, and scalability. Monitoring these system-level metrics alongside retrieval relevance and accuracy makes sure the RAG pipeline delivers correct and relevant answers and meets production requirements for responsiveness and scalability. The following screenshot shows an example of the Langraph RAG retrieval tracing in MLflow.

RAG Evaluation
Evaluation is conducted on a curated test set, and results are logged to MLflow for quick comparison and analysis. This helps teams identify the best-performing configurations and iterate toward production-grade solutions. With MLflow you can evaluate the RAG solution with heuristics metrics, content similarity metrics and LLM-as-a-judge. In this post, we evaluate the RAG pipeline using advanced LLM-as-a-judge MLflow metrics (answer similarity, correctness, relevance, faithfulness):

metrics_genai_only = [answer_correctness_aws, answer_similarity_aws, answer_relevance_aws, answer_faithfulness_aws]

The following screenshot shows an RAG evaluation stage experiment run details in MLflow.

You can use MLflow to log all metrics and parameters, enabling quick comparison of different experiment runs. See the following code for reference:

with mlflow.start_run(run_id=main_run_id) as run:
   with mlflow.start_run(run_name=”RAGEvaluation”, nested=True):
   results = mlflow.evaluate(
… # Other parameters
   extra_metrics=metrics_genai_only,
   evaluator_config={
… # Config parameters
   }
   }
   )

By using MLflow’s evaluation capabilities (such as mlflow.evaluate()), teams can systematically assess retrieval quality, identify potential gaps or misalignments in chunking or embedding strategies, and compare the performance of different retrieval and generation configurations. MLflow’s flexibility allows for seamless integration with external libraries and evaluation libraries such as RAGAS for comprehensive RAG pipeline assessment. RAGAS is an open source library that provide tools specifically for evaluation of LLM applications and generative AI agents. RAGAS includes the method ragas.evaluate() to run evaluations for LLM agents with the choice of LLM models (evaluators) for scoring the evaluation, and an extensive list of default metrics. To incorporate RAGAS metrics into your MLflow experiments, refer to the following GitHub repository.
Comparing experiments
In the MLflow UI, you can compare runs side by side. For example, comparing FixedSizeChunker and RecursiveChunker as shown in the following screenshot reveals differences in metrics such as answer_similarity (a difference of 1 point), providing actionable insights for pipeline optimization.

Automation with Amazon SageMaker pipelines
After systematically experimenting with and optimizing each component of the RAG workflow through SageMaker managed MLflow, the next step is transforming these validated configurations into production-ready automated pipelines. Although MLflow experiments help identify the optimal combination of chunking strategies, embedding models, and retrieval parameters, manually reproducing these configurations across environments can be error-prone and inefficient.
To produce the automated RAG pipeline, we use SageMaker Pipelines, which helps teams codify their experimentally validated RAG workflows into automated, repeatable pipelines that maintain consistency from development through production. By converting the successful MLflow experiments into pipeline definitions, teams can make sure the exact same chunking, embedding, retrieval, and evaluation steps that performed well in testing are reliably reproduced in production environments.
SageMaker Pipelines offers a serverless workflow orchestration for converting experimental notebook code into a production-grade pipeline, versioning and tracking pipeline configurations alongside MLflow experiments, and automating the end-to-end RAG workflow. The automated Sagemaker pipeline-based RAG workflow offers dependency management, comprehensive custom testing and validation before production deployment, and CI/CD integration for automated pipeline promotion.
With SageMaker Pipelines, you can automate your entire RAG workflow, from data preparation to evaluation, as reusable, parameterized pipeline definitions. This provides the following benefits:

Reproducibility – Pipeline definitions capture all dependencies, configurations, and executions logic in version-controlled code
Parameterization – Key RAG parameters (chunk sizes, model endpoints, retrieval settings) can be quickly modified between runs
Monitoring – Pipeline executions provide detailed logs and metrics for each step
Governance – Built-in lineage tracking supports full audibility of data and model artifacts
Customization – Serverless workflow orchestration is customizable to your unique enterprise landscape, with scalable infrastructure and flexibility with instances optimized for CPU, GPU, or memory-intensive tasks, memory configuration, and concurrency optimization

To implement a RAG workflow in SageMaker pipelines, each major component of the RAG process (data preparation, chunking, ingestion, retrieval and generation, and evaluation) is included in a SageMaker processing job. These jobs are then orchestrated as steps within a pipeline, with data flowing between them, as shown in the following screenshot. This structure allows for modular development, quick debugging, and the ability to reuse components across different pipeline configurations.

The key RAG configurations are exposed as pipeline parameters, enabling flexible experimentation with minimal code changes. For example, the following code snippets showcase the modifiable parameters for RAG configurations, which can be used as pipeline configurations:

processor PyTorchProcessor(
…
arguments[
“–experiment-name”, experiment_name,
“–mlflow-tracking-uri”, mlflow_tracking_uri,
“–embedding-endpoint-name”, embedding_endpoint_name,
“–text-endpoint-name”, text_endpoint_name,
“–domain-name”, domain_name,
“–index-name”, index_name,
“–chunking-strategy”, chunking_strategy,
“–chunk-size”, chunk_size,
“–chunk-overlap”, chunk_overlap,
“–context-retrieval-size”, context_retrieval_size,
“–embedding-model-id”, embedding_model_id,
“–text-model-id”, text_model_id,
“–output-data-path”, “/opt/ml/processing/output”,
“–role-arn”, role
],
)

In this post, we provide two agentic RAG pipeline automation approaches to building the SageMaker pipeline, each with own benefits: single-step SageMaker pipelines and multi-step pipelines.
The single-step pipeline approach is designed for simplicity, running the entire RAG workflow as one unified process. This setup is ideal for straightforward or less complex use cases, because it minimizes pipeline management overhead. With fewer steps, the pipeline can start quickly, benefitting from reduced execution times and streamlined development. This makes it a practical option when rapid iteration and ease of use are the primary concerns.
The multi-step pipeline approach is preferred for enterprise scenarios where flexibility and modularity are essential. By breaking down the RAG process into distinct, manageable stages, organizations gain the ability to customize, swap, or extend individual components as needs evolve. This design enables plug-and-play adaptability, making it straightforward to reuse or reconfigure pipeline steps for various workflows. Additionally, the multi-step format allows for granular monitoring and troubleshooting at each stage, providing detailed insights into performance and facilitating robust enterprise management. For enterprises seeking maximum flexibility and the ability to tailor automation to unique requirements, the multi-step pipeline approach is the superior choice.
CI/CD for an agentic RAG pipeline
Now we integrate the SageMaker RAG pipeline with CI/CD. CI/CD is important for making a RAG solution enterprise-ready because it provides faster, more reliable, and scalable delivery of AI-powered workflows. Specifically for enterprises, CI/CD pipelines automate the integration, testing, deployment, and monitoring of changes in the RAG system, which brings several key benefits, such as faster and more reliable updates, version control and traceability, consistency across environments, modularity and flexibility for customization, enhanced collaboration and monitoring, risk mitigation, and cost savings. This aligns with general CI/CD benefits in software and AI systems, emphasizing automation, quality assurance, collaboration, and continuous feedback essential to enterprise AI readiness.
When your SageMaker RAG pipeline definition is in place, you can implement robust CI/CD practices by integrating your development workflow and toolsets already enabled at your enterprise. This setup makes it possible to automate code promotion, pipeline deployment, and model experimentation through simple Git triggers, so changes are versioned, tested, and systematically promoted across environments. For demonstration, in this post, we show the CI/CD integration using GitHub Actions and by using GitHub Actions as the CI/CD orchestrator. Each code change, such as refining chunking strategies or updating pipeline steps, triggers an end-to-end automation workflow, as shown in the following screenshot. You can use the same CI/CD pattern with your choice of CI/CD tool instead of GitHub Actions, if needed.

Each GitHub Actions CI/CD execution automatically triggers the SageMaker pipeline (shown in the following screenshot), allowing for seamless scaling of serverless compute infrastructure.

Throughout this cycle, SageMaker managed MLflow records every executed pipeline (shown in the following screenshot), so you can seamlessly review results, compare performance across different pipeline runs, and manage the RAG lifecycle.

After an optimal RAG pipeline configuration is determined, the new desired configuration (Git version tracking captured in MLflow as shown in the following screenshot) can be promoted to higher stages or environments directly through an automated workflow, minimizing manual intervention and reducing risk.

Clean up
To avoid unnecessary costs, delete resources such as the SageMaker managed MLflow tracking server, SageMaker pipelines, and SageMaker endpoints when your RAG experimentation is complete. You can visit the SageMaker Studio console to destroy resources that aren’t needed anymore or call appropriate AWS APIs actions.
Conclusion
By integrating SageMaker AI, SageMaker managed MLflow, and Amazon OpenSearch Service, you can build, evaluate, and deploy RAG pipelines at scale. This approach provides the following benefits:

Automated and reproducible workflows with SageMaker Pipelines and MLflow, minimizing manual steps and reducing the risk of human error
Advanced experiment tracking and comparison for different chunking strategies, embedding models, and LLMs, so every configuration is logged, analyzed, and reproducible
Actionable insights from both traditional and LLM-based evaluation metrics, helping teams make data-driven improvements at every stage
Seamless deployment to production environments, with automated promotion of validated pipelines and robust governance throughout the workflow

Automating your RAG pipeline with SageMaker Pipelines brings additional benefits: it enables consistent, version-controlled deployments across environments, supports collaboration through modular, parameterized workflows, and supports full traceability and auditability of data, models, and results. With built-in CI/CD capabilities, you can confidently promote your entire RAG solution from experimentation to production, knowing that each stage meets quality and compliance standards.
Now it’s your turn to operationalize RAG workflows and accelerate your AI initiatives. Explore SageMaker Pipelines and managed MLflow using the solution from the GitHub repository to unlock scalable, automated, and enterprise-grade RAG solutions.

About the authors
Sandeep Raveesh is a GenAI Specialist Solutions Architect at AWS. He works with customers through their AIOps journey across model training, generative AI applications like agents, and scaling generative AI use cases. He also focuses on Go-To-Market strategies, helping AWS build and align products to solve industry challenges in the generative AI space. You can find Sandeep on LinkedIn.
Blake Shin is an Associate Specialist Solutions Architect at AWS who enjoys learning about and working with new AI/ML technologies. In his free time, Blake enjoys exploring the city and playing music.