Uncategorized Archives - Page 27 of 269

An Implementation on Building Advanced Multi-Endpoint Machine Learning …

Posted on October 25, 2025 by i-genie

In this tutorial, we explore LitServe, a lightweight and powerful serving framework that allows us to deploy machine learning models as APIs with minimal effort. We build and test multiple endpoints that demonstrate real-world functionalities such as text generation, batching, streaming, multi-task processing, and caching, all running locally without relying on external APIs. By the end, we clearly understand how to design scalable and flexible ML serving pipelines that are both efficient and easy to extend for production-level applications. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browser!pip install litserve torch transformers -q

import litserve as ls
import torch
from transformers import pipeline
import time
from typing import List

We begin by setting up our environment on Google Colab and installing all required dependencies, including LitServe, PyTorch, and Transformers. We then import the essential libraries and modules that will allow us to define, serve, and test our APIs efficiently. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserclass TextGeneratorAPI(ls.LitAPI):
def setup(self, device):
self.model = pipeline(“text-generation”, model=”distilgpt2″, device=0 if device == “cuda” and torch.cuda.is_available() else -1)
self.device = device
def decode_request(self, request):
return request[“prompt”]
def predict(self, prompt):
result = self.model(prompt, max_length=100, num_return_sequences=1, temperature=0.8, do_sample=True)
return result[0][‘generated_text’]
def encode_response(self, output):
return {“generated_text”: output, “model”: “distilgpt2”}

class BatchedSentimentAPI(ls.LitAPI):
def setup(self, device):
self.model = pipeline(“sentiment-analysis”, model=”distilbert-base-uncased-finetuned-sst-2-english”, device=0 if device == “cuda” and torch.cuda.is_available() else -1)
def decode_request(self, request):
return request[“text”]
def batch(self, inputs: List[str]) -> List[str]:
return inputs
def predict(self, batch: List[str]):
results = self.model(batch)
return results
def unbatch(self, output):
return output
def encode_response(self, output):
return {“label”: output[“label”], “score”: float(output[“score”]), “batched”: True}

Here, we create two LitServe APIs, one for text generation using a local DistilGPT2 model and another for batched sentiment analysis. We define how each API decodes incoming requests, performs inference, and returns structured responses, demonstrating how easy it is to build scalable, reusable model-serving endpoints. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserclass StreamingTextAPI(ls.LitAPI):
def setup(self, device):
self.model = pipeline(“text-generation”, model=”distilgpt2″, device=0 if device == “cuda” and torch.cuda.is_available() else -1)
def decode_request(self, request):
return request[“prompt”]
def predict(self, prompt):
words = [“Once”, “upon”, “a”, “time”, “in”, “a”, “digital”, “world”]
for word in words:
time.sleep(0.1)
yield word + ” ”
def encode_response(self, output):
for token in output:
yield {“token”: token}

In this section, we design a streaming text-generation API that emits tokens as they are generated. We simulate real-time streaming by yielding words one at a time, demonstrating how LitServe can handle continuous token generation efficiently. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserclass MultiTaskAPI(ls.LitAPI):
def setup(self, device):
self.sentiment = pipeline(“sentiment-analysis”, device=-1)
self.summarizer = pipeline(“summarization”, model=”sshleifer/distilbart-cnn-6-6″, device=-1)
self.device = device
def decode_request(self, request):
return {“task”: request.get(“task”, “sentiment”), “text”: request[“text”]}
def predict(self, inputs):
task = inputs[“task”]
text = inputs[“text”]
if task == “sentiment”:
result = self.sentiment(text)[0]
return {“task”: “sentiment”, “result”: result}
elif task == “summarize”:
if len(text.split()) < 30:
return {“task”: “summarize”, “result”: {“summary_text”: text}}
result = self.summarizer(text, max_length=50, min_length=10)[0]
return {“task”: “summarize”, “result”: result}
else:
return {“task”: “unknown”, “error”: “Unsupported task”}
def encode_response(self, output):
return output

We now develop a multi-task API that handles both sentiment analysis and summarization via a single endpoint. This snippet demonstrates how we can manage multiple model pipelines through a unified interface, dynamically routing each request to the appropriate pipeline based on the specified task. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserclass CachedAPI(ls.LitAPI):
def setup(self, device):
self.model = pipeline(“sentiment-analysis”, device=-1)
self.cache = {}
self.hits = 0
self.misses = 0
def decode_request(self, request):
return request[“text”]
def predict(self, text):
if text in self.cache:
self.hits += 1
return self.cache[text], True
self.misses += 1
result = self.model(text)[0]
self.cache[text] = result
return result, False
def encode_response(self, output):
result, from_cache = output
return {“label”: result[“label”], “score”: float(result[“score”]), “from_cache”: from_cache, “cache_stats”: {“hits”: self.hits, “misses”: self.misses}}

We implement an API that uses caching to store previous inference results, reducing redundant computation for repeated requests. We track cache hits and misses in real time, illustrating how simple caching mechanisms can drastically improve performance in repeated inference scenarios. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserdef test_apis_locally():
print(“=” * 70)
print(“Testing APIs Locally (No Server)”)
print(“=” * 70)

api1 = TextGeneratorAPI(); api1.setup(“cpu”)
decoded = api1.decode_request({“prompt”: “Artificial intelligence will”})
result = api1.predict(decoded)
encoded = api1.encode_response(result)
print(f”✓ Result: {encoded[‘generated_text’][:100]}…”)

api2 = BatchedSentimentAPI(); api2.setup(“cpu”)
texts = [“I love Python!”, “This is terrible.”, “Neutral statement.”]
decoded_batch = [api2.decode_request({“text”: t}) for t in texts]
batched = api2.batch(decoded_batch)
results = api2.predict(batched)
unbatched = api2.unbatch(results)
for i, r in enumerate(unbatched):
encoded = api2.encode_response(r)
print(f”✓ ‘{texts[i]}’ -> {encoded[‘label’]} ({encoded[‘score’]:.2f})”)

api3 = MultiTaskAPI(); api3.setup(“cpu”)
decoded = api3.decode_request({“task”: “sentiment”, “text”: “Amazing tutorial!”})
result = api3.predict(decoded)
print(f”✓ Sentiment: {result[‘result’]}”)

api4 = CachedAPI(); api4.setup(“cpu”)
test_text = “LitServe is awesome!”
for i in range(3):
decoded = api4.decode_request({“text”: test_text})
result = api4.predict(decoded)
encoded = api4.encode_response(result)
print(f”✓ Request {i+1}: {encoded[‘label’]} (cached: {encoded[‘from_cache’]})”)

print(“=” * 70)
print(” All tests completed successfully!”)
print(“=” * 70)

test_apis_locally()

We test all our APIs locally to verify their correctness and performance without starting an external server. We sequentially evaluate text generation, batched sentiment analysis, multi-tasking, and caching, ensuring each component of our LitServe setup runs smoothly and efficiently.

In conclusion, we create and run diverse APIs that showcase the framework’s versatility. We experiment with text generation, sentiment analysis, multi-tasking, and caching to experience LitServe’s seaMLess integration with Hugging Face pipelines. As we complete the tutorial, we realize how LitServe simplifies model deployment workflows, enabling us to serve intelligent ML systems in just a few lines of Python code while maintaining flexibility, performance, and simplicity.

Check out the FULL CODES here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
The post An Implementation on Building Advanced Multi-Endpoint Machine Learning APIs with LitServe: Batching, Streaming, Caching, and Local Inference appeared first on MarkTechPost.

Salesforce AI Research Introduces WALT (Web Agents that Learn Tools): …

Posted on October 25, 2025 by i-genie

A team of Salesforce AI researchers introduced WALT (Web Agents that Learn Tools), a framework that reverse-engineers latent website functionality into reusable invocable tools. It reframes browser automation around callable tools rather than long chains of clicks. Agents then call operations such as search, filter, sort, post_comment, and create_listing. This reduces dependence on large language model step by step reasoning and increases determinism during execution.

https://arxiv.org/pdf/2510.01524

What WALT builds?

Web agents often fail when layouts shift or when tasks require long sequences. WALT targets this failure mode by mining site functionality offline, then exposing it as tools that encapsulate navigation, selection, extraction, and optional agentic steps. Tools carry contracts in the form of schemas and examples. At runtime, an agent composes a short program with a few tool calls to complete a task. The design goal is higher success with fewer steps and less reliance on free form reasoning.

Pipeline in two phases

The pipeline has discovery and construction with validation. In discovery, WALT explores a website and proposes tool candidates that map to common goals such as discovery, content management, and communication. In construction and validation, WALT converts traces to deterministic scripts, stabilizes selectors, attempts URL promotion when possible, induces an input schema, and registers a tool only after end to end checks pass. This shifts as much work as possible into stable URL and form operations and leaves agentic grounding for the cases that truly require it.

https://arxiv.org/pdf/2510.01524

Results on VisualWebArena and WebArena

On VisualWebArena, WALT reports an average success rate of 52.9 percent with per split results of 64.1 percent on Classifieds, 53.4 percent on Shopping, and 39.0 percent on Reddit. The table lists baselines such as SGV at 50.2 percent and ExaCT at 33.7 percent. Human performance is 88.7 percent on average.

On WebArena, WALT reaches 50.1 percent average across GitLab, Map, Shopping, CMS, Reddit, and Multi. The table shows WALT ahead of prior methods with a nine point margin over the best skill induction baseline. Human performance is 78.2 percent.

https://arxiv.org/pdf/2510.01524

Efficiency and ablations

Tools reduce action count by a factor near 1.4 on average relative to a matched agent without tools. On the Classifieds split, ablations show consistent gains when tools are used across different agent backbones. WALT with GPT 5 mini records 7 percent higher success and 27 percent fewer steps, while a human demonstration strategy yields 66.0 percent success. The fully autonomous WALT reaches 64.1 percent with 5 percent fewer steps than the human demonstration case. Multimodal DOM parsing adds 2.6 percent absolute improvement. External verification adds 3.3 percent while increasing checks. Across components, WALT records 21.3 percent fewer steps than baseline policies.

https://arxiv.org/pdf/2510.01524

Design choices that enforce determinism

WALT prefers URL level operations when the site exposes query parameters or routes for search and filtering. When pages require dynamic grounding, the tool script inserts bounded agentic steps such as content extraction or wait for page load. Selector stabilization and schema validation reduce drift when sites change. The method keeps the fraction of agentic operations low in discovered tool sets and biases toward deterministic actions like navigation, input, and click.

Key Takeaways

Approach: WALT discovers and validates website-native functions, then exposes them as callable tools with input schemas, selector stabilization, and URL promotion, reducing brittle step sequences to deterministic operations.

Results — VisualWebArena: Average success rate 52.9%, with 64.1% on Classifieds, 53.4% on Shopping, and 39.0% on Reddit, outperforming several baselines reported in the paper.

Results — WebArena: Average success rate 50.1% across GitLab, Map, Shopping, CMS, Reddit, and Multi, showing consistent gains over skill-induction and search-based baselines.

Efficiency and Ablations: Toolization cuts steps by about 1.4x, with 21.3% fewer actions on average. Multimodal DOM parsing adds +2.6% absolute success, and external verification adds +3.3%.

Editorial Comments

WALT is a useful pivot from step sequence agents to functionality grounded tools. The framework reverse engineers latent website functionality into reusable invocable tools across discovery, content management, and communication. By promoting UI traces to deterministic tools with schema validation and URL operations, WALT lifts web agent success to 52.9 percent on VisualWebArena and 50.1 percent on WebArena, while cutting actions by about 21.3 percent. The release ships a CLI, walt discover, walt agent, and MCP serving for integration.

Check out the Paper and GitHub Page. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
The post Salesforce AI Research Introduces WALT (Web Agents that Learn Tools): Enabling LLM agents to Automatically Discover Reusable Tools from Any Website appeared first on MarkTechPost.

Responsible AI design in healthcare and life sciences

Posted on October 25, 2025 by i-genie

Generative AI has emerged as a transformative technology in healthcare, driving digital transformation in essential areas such as patient engagement and care management. It has shown potential to revolutionize how clinicians provide improved care through automated systems with diagnostic support tools that provide timely, personalized suggestions, ultimately leading to better health outcomes. For example, a study reported in BMC Medical Education that medical students who received large language model (LLM)-generated feedback during simulated patient interactions significantly improved their clinical decision-making compared to those who did not.
At the center of most generative AI systems are LLMs capable of generating remarkably natural conversations, enabling healthcare customers to build products across billing, diagnosis, treatment, and research that can perform tasks and operate independently with human oversight. However, the utility of generative AI requires an understanding of the potential risks and impacts on healthcare service delivery, which necessitates the need for careful planning, definition, and execution of a system-level approach to building safe and responsible generative AI-infused applications.
In this post, we focus on the design phase of building healthcare generative AI applications, including defining system-level policies that determine the inputs and outputs. These policies can be thought of as guidelines that, when followed, help build a responsible AI system.
Designing responsibly
LLMs can transform healthcare by reducing the cost and time required for considerations such as quality and reliability. As shown in the following diagram, responsible AI considerations can be successfully integrated into an LLM-powered healthcare application by considering quality, reliability, trust, and fairness for everyone. The goal is to promote and encourage certain responsible AI functionalities of AI systems. Examples include the following:

Each component’s input and output is aligned with clinical priorities to maintain alignment and promote controllability
Safeguards, such as guardrails, are implemented to enhance the safety and reliability of your AI system
Comprehensive AI red-teaming and evaluations are applied to the entire end-to-end system to assess safety and privacy-impacting inputs and outputs

Conceptual architecture
The following diagram shows a conceptual architecture of a generative AI application with an LLM. The inputs (directly from an end-user) are mediated through input guardrails. After the input has been accepted, the LLM can process the user’s request using internal data sources. The output of the LLM is again mediated through guardrails and can be shared with end-users.

Establish governance mechanisms
When building generative AI applications in healthcare, it’s essential to consider the various risks at the individual model or system level, as well as at the application or implementation level. The risks associated with generative AI can differ from or even amplify existing AI risks. Two of the most important risks are confabulation and bias:

Confabulation — The model generates confident but erroneous outputs, sometimes referred to as hallucinations. This could mislead patients or clinicians.
Bias — This refers to the risk of exacerbating historical societal biases among different subgroups, which can result from non-representative training data.

To mitigate these risks, consider establishing content policies that clearly define the types of content your applications should avoid generating. These policies should also guide how to fine-tune models and which appropriate guardrails to implement. It is crucial that the policies and guidelines are tailored and specific to the intended use case. For instance, a generative AI application designed for clinical documentation should have a policy that prohibits it from diagnosing diseases or offering personalized treatment plans.
Additionally, defining clear and detailed policies that are specific to your use case is fundamental to building responsibly. This approach fosters trust and helps developers and healthcare organizations carefully consider the risks, benefits, limitations, and societal implications associated with each LLM in a particular application.
The following are some example policies you might consider using for your healthcare-specific applications. The first table summarizes the roles and responsibilities for human-AI configurations.

Action ID
Suggested Action
Generative AI Risks

GV-3.2-001
Policies are in place to bolster oversight of generative AI systems with independent evaluations or assessments of generative AI models or systems where the type and robustness of evaluations are proportional to the identified risks.
CBRN Information or Capabilities; Harmful Bias and Homogenization

GV-3.2-002
Consider adjustment of organizational roles and components across lifecycle stages of large or complex generative AI systems, including: test and evaluation, validation, and red-teaming of generative AI systems; generative AI content moderation; generative AI system development and engineering; increased accessibility of generative AI tools, interfaces, and systems; and incident response and containment.
Human-AI Configuration; Information Security; Harmful Bias and Homogenization

GV-3.2-003
Define acceptable use policies for generative AI interfaces, modalities, and human-AI configurations (for example, for AI assistants and decision-making tasks), including criteria for the kinds of queries generative AI applications should refuse to respond to.
Human-AI Configuration

GV-3.2-004
Establish policies for user feedback mechanisms for generative AI systems that include thorough instructions and any mechanisms for recourse.
Human-AI Configuration

GV-3.2-005
Engage in threat modeling to anticipate potential risks from generative AI systems.
CBRN Information or Capabilities; Information Security

The following table summarizes policies for risk management in AI system design.

Action ID
Suggested Action
Generative AI Risks

GV-4.1-001
Establish policies and procedures that address continual improvement processes for generative AI risk measurement. Address general risks associated with a lack of explainability and transparency in generative AI systems by using ample documentation and techniques such as application of gradient-based attributions, occlusion or term reduction, counterfactual prompts and prompt engineering, and analysis of embeddings. Assess and update risk measurement approaches at regular cadences.
Confabulation

GV-4.1-002
Establish policies, procedures, and processes detailing risk measurement in context of use with standardized measurement protocols and structured public feedback exercises such as AI red-teaming or independent external evaluations.
CBRN Information and Capability; Value Chain and Component Integration

Transparency artifacts
Promoting transparency and accountability throughout the AI lifecycle can foster trust, facilitate debugging and monitoring, and enable audits. This involves documenting data sources, design decisions, and limitations through tools like model cards and offering clear communication about experimental features. Incorporating user feedback mechanisms further supports continuous improvement and fosters greater confidence in AI-driven healthcare solutions.
AI developers and DevOps engineers should be transparent about the evidence and reasons behind all outputs by providing clear documentation of the underlying data sources and design decisions so that end-users can make informed decisions about the use of the system. Transparency enables the tracking of potential problems and facilitates the evaluation of AI systems by both internal and external teams. Transparency artifacts guide AI researchers and developers on the responsible use of the model, promote trust, and help end-users make informed decisions about the use of the system.
The following are some implementation suggestions:

When building AI features with experimental models or services, it’s essential to highlight the possibility of unexpected model behavior so healthcare professionals can accurately assess whether to use the AI system.
Consider publishing artifacts such as Amazon SageMaker model cards or AWS system cards. Also, at AWS we provide detailed information about our AI systems through AWS AI Service Cards, which list intended use cases and limitations, responsible AI design choices, and deployment and performance optimization best practices for some of our AI services. AWS also recommends establishing transparency policies and processes for documenting the origin and history of training data while balancing the proprietary nature of training approaches. Consider creating a hybrid document that combines elements of both model cards and service cards, because your application likely uses foundation models (FMs) but provides a specific service.
Offer a feedback user mechanism. Gathering regular and scheduled feedback from healthcare professionals can help developers make necessary refinements to improve system performance. Also consider establishing policies to help developers allow for user feedback mechanisms for AI systems. These should include thorough instructions and consider establishing policies for any mechanisms for recourse.

Security by design
When developing AI systems, consider security best practices at each layer of the application. Generative AI systems might be vulnerable to adversarial attacks suck as prompt injection, which exploits the vulnerability of LLMs by manipulating their inputs or prompt. These types of attacks can result in data leakage, unauthorized access, or other security breaches. To address these concerns, it can be helpful to perform a risk assessment and implement guardrails for both the input and output layers of the application. As a general rule, your operating model should be designed to perform the following actions:

Safeguard patient privacy and data security by implementing personally identifiable information (PII) detection, configuring guardrails that check for prompt attacks
Continually assess the benefits and risks of all generative AI features and tools and regularly monitor their performance through Amazon CloudWatch or other alerts
Thoroughly evaluate all AI-based tools for quality, safety, and equity before deploying

Developer resources
The following resources are useful when architecting and building generative AI applications:

Amazon Bedrock Guardrails helps you implement safeguards for your generative AI applications based on your use cases and responsible AI policies. You can create multiple guardrails tailored to different use cases and apply them across multiple FMs, providing a consistent user experience and standardizing safety and privacy controls across your generative AI applications.
The AWS responsible AI whitepaper serves as an invaluable resource for healthcare professionals and other developers that are developing AI applications in critical care environments where errors could have life-threatening consequences.
AWS AI Service Cards explains the use cases for which the service is intended, how machine learning (ML) is used by the service, and key considerations in the responsible design and use of the service.

Conclusion
Generative AI has the potential to improve nearly every aspect of healthcare by enhancing care quality, patient experience, clinical safety, and administrative safety through responsible implementation. When designing, developing, or operating an AI application, try to systematically consider potential limitations by establishing a governance and evaluation framework grounded by the need to maintain the safety, privacy, and trust that your users expect.
For more information about responsible AI, refer to the following resources:

NIST Trustworthy and Responsible AI
OWASP Top 10 for Large Language Model applications

About the authors
Tonny Ouma is an Applied AI Specialist at AWS, specializing in generative AI and machine learning. As part of the Applied AI team, Tonny helps internal teams and AWS customers incorporate leading-edge AI systems into their products. In his spare time, Tonny enjoys riding sports bikes, golfing, and entertaining family and friends with his mixology skills.
Simon Handley, PhD, is a Senior AI/ML Solutions Architect in the Global Healthcare and Life Sciences team at Amazon Web Services. He has more than 25 years’ experience in biotechnology and machine learning and is passionate about helping customers solve their machine learning and life sciences challenges. In his spare time, he enjoys horseback riding and playing ice hockey.

Beyond pilots: A proven framework for scaling AI to production

Posted on October 25, 2025 by i-genie

The era of perpetual AI pilots is over. This year, 65% of AWS Generative AI Innovation Center customer projects moved from concept to production—some launching in just 45 days, as AWS VP Swami Sivasubramanian shared on LinkedIn. These results come from insights gained across more than one thousand customer implementations.
The Generative AI Innovation Center pairs organizations across industries with AWS scientists, strategists, and engineers to implement practical AI solutions that drive measurable outcomes. These initiatives transform diverse sectors worldwide. For example, through a cross-functional AWS collaboration, we supported the National Football League (NFL) to create a generative AI-powered solution that obtains statistical game insights within 30 seconds. This helps their media and production teams locate video content six times faster. Similarly, we helped Druva’s DruAI system streamline customer support and data protection through natural language processing, reducing investigation time from hours to minutes.
These achievements reflect a broader pattern of success, driven by a powerful methodology: The Five V’s Framework for AI Implementation.

This framework takes projects from initial testing to full deployment by focusing on concrete business outcomes and operational excellence. It’s grounded in two of Amazon’s Leadership Principles, Customer Obsession and Deliver Results. By starting with what customers actually need and working backwards, we’ve helped companies across industries modernize their operations and better serve their customers.
The Five V’s Framework: A foundation for success
Every successful AI deployment begins with groundwork. In our experience, projects thrive when organizations first identify specific challenges they need to solve, align key stakeholders around these goals, and establish clear accountability for results. The Five V’s Framework helps guide organizations through a structured process:

Value: Target high-impact opportunities aligned with your strategic priorities
Visualize: Define clear success metrics that link directly to business outcomes
Validate: Test solutions against real-world requirements and constraints
Verify: Create a scalable path to production that delivers sustainable results
Venture: Secure the resources and support needed for long-term success

Value: The critical first step
The Value phase emphasizes working backwards from your most pressing business challenges. By starting with existing pain points and collaborating across technical and business teams, organizations can develop solutions that deliver meaningful return on investment (ROI). This focused approach helps direct resources where they’ll have the greatest impact.
Visualize: Defining success through measurement
The next step requires translating the potential benefits—cost reduction, revenue growth, risk mitigation, improved customer experience, and competitive advantage—into clear, measurable performance indicators. A comprehensive measurement framework starts with baseline metrics using historical data where available. These metrics should address both technical aspects like accuracy and response time, as well as business outcomes such as productivity gains and customer satisfaction.
The Visualize phase examines data availability and quality to support proper measurement while working with stakeholders to define success criteria that align with strategic objectives. This dual focus helps organizations track not just the performance of the AI solution, but its actual impact on business goals.
Validate: Where ambition meets reality
The Validate phase focuses on testing solutions against real-world conditions and constraints. Our approach integrates strategic vision with implementation expertise from day one. As Sri Elaprolu, Director of the Generative AI Innovation Center, explains: “Effective validation creates alignment between vision and execution. We unite diverse perspectives—from scientists to business leaders—so that solutions deliver both technical excellence and measurable business impact.”
This process involves systematic integration testing, stress testing for expected loads, verifying compliance requirements, and gathering end-user feedback. Security specialists shape the core architecture. Industry subject matter experts define the operational processes and decision logic that guide prompt design and model refinement. Change management strategies are integrated early to ensure alignment and adoption.
The Generative AI Innovation Center partnered with SparkXGlobal, an AI-driven marketing-technology company, to validate their new solution through comprehensive testing. Their platform, Xnurta, provides business analytics and reporting for Amazon merchants, demonstrating impressive results: report processing time dropped from 6-8 hours to just 8 minutes while maintaining 95% accuracy. This successful validation established a foundation for SparkXGlobal’s continued innovation and enhanced AI capabilities.
Working with the Generative AI Innovation Center, the U.S. Environmental Protection Agency (EPA) created an intelligent document processing solution powered by Anthropic models on Amazon Bedrock. This solution helped EPA scientists accelerate chemical risk assessments and pesticide reviews through transparent, verifiable, and human-controlled AI practices. The impact has been substantial: document processing time decreased by 85%, evaluation costs dropped by 99%, and more than 10,000 regulatory applications have advanced faster to protect public health.
Verify: The path to production
Moving from pilot to production requires more than proof of concept—it demands scalable solutions that integrate with existing systems and deliver consistent value. While demos can seem compelling, verification reveals the true complexity of enterprise-wide deployment. This critical stage maps the journey from prototype to production, establishing a foundation for sustainable success.
Building production-ready AI solutions brings together several key elements. Robust governance structures must facilitate responsible AI deployment and oversight, managing risk and compliance in an evolving regulatory landscape. Change management prepares teams and processes for new ways of working, driving organization-wide adoption. Operational readiness assessments evaluate existing workflows, integration points, and team capabilities to facilitate smooth implementation.
Architectural decisions in the verification phase balance scale, reliability, and operability, with security and compliance woven into the solution’s fabric. This often involves practical trade-offs based on real-world constraints. A simpler solution aligned to existing team capabilities may prove more valuable than a complex one requiring specialized expertise. Similarly, meeting strict latency requirements might necessitate choosing a streamlined model over a more sophisticated one, as model selection requires a balance of performance, accuracy, and computational costs based on the use case.
Generative AI Innovation Center Principal Data Scientist, Isaac Privitera, captures this philosophy: “When building a generative AI solution, we focus primarily on three things: measurable business impact, production readiness from day one, and sustained operational excellence. This trinity drives solutions that thrive in real-world conditions.”
Effective verification demands both technical expertise and practical wisdom from real-world deployments. It requires proving not just that a solution works in principle, but that it can operate at scale within existing systems and team capabilities. By systematically addressing these factors, we help make sure deployments deliver sustainable, long-term value.
Venture: Securing long-term success
Long-term success in AI also requires mindful resource planning across people, processes, and funding. The Venture phase maps the full journey from implementation through sustained organizational adoption.
Financial viability starts with understanding the total cost of ownership, from initial development through deployment, integration, training, and ongoing operations. Promising projects can stall mid-implementation due to insufficient resource planning. Success requires strategic budget allocation across all phases, with clear ROI milestones and the flexibility to scale.
Successful ventures demand organizational commitment through executive sponsorship, stakeholder alignment, and dedicated teams for ongoing optimization and maintenance. Organizations must also account for both direct and indirect costs—from infrastructure and development, to team training, process adaptation, and change management. A blend of sound financial planning and flexible resource strategies allows teams to accelerate and adjust as opportunities and challenges arise.
From there, the solution must integrate seamlessly into daily operations with clear ownership and widespread adoption. This transforms AI from a project into a core organizational capability.
Adopting the Five V’s Framework in your enterprise
The Five V’s Framework shifts AI focus from technical capabilities to business results, replacing ‘What can AI do?’ with ‘What do we need AI to do?’. Successful implementation requires both an innovative culture and access to specialized expertise.

AWS resources to support your journey
AWS offers a variety of resources to help you scale your AI to production.
Expert guidance
The AWS Partnership Network (APN) offers multiple pathways to access specialized expertise, while AWS Professional Services brings proven methodologies from its own successful AI implementations. Certified partners, including Generative AI Partner Innovation Alliance members who receive direct enablement training from the Generative AI Innovation Center team, extend this expertise across industries. AWS Generative AI Competency Partners bring use case-specific success, while specialized partners focus on model customization and evaluation.
Self-service learning
For teams building internal capabilities, AWS provides technical blogs with implementation guides based on real-world experience, GitHub repositories with production-ready code, and AWS Workshop Studio for hands-on learning that bridges theory and practice.
Balancing learning and innovation
Even with the right framework and resources, not every AI project will reach production. These initiatives still provide valuable lessons that strengthen your overall program. Organizations can build lasting AI capabilities through three key principles:

Embracing a portfolio approach: Treat AI initiatives as an investment portfolio where diversification drives risk management and value creation. Balance quick wins (delivering value within months), strategic initiatives (driving longer-term transformation), and moonshot projects (potentially revolutionizing your business).
Creating a culture of safe experimentation: Organizations thrive with AI when teams can innovate boldly. In rapidly evolving fields, the cost of inaction often exceeds the risk of calculated experiments.
Learning from “productive failures”: Capture insights systematically across projects. Technical challenges reveal capability gaps, data issues expose information needs, and organizational readiness concerns illuminate broader transformation requirements – all shaping future initiatives.

The path forward
The next 12-18 months present a pivotal opportunity for organizations to harness generative AI and agentic AI to solve previously intractable problems, establish competitive advantages, and explore entirely new frontiers of business possibility. Those who successfully move from pilot to production will help define what’s possible within their industries and beyond.
Are you ready to move your AI initiatives into production?

Learn more about the AWS Generative AI Innovation Center and contact your AWS Account Manager to be connected to our expert guidance and support.
Join our AWS Builder community to connect with others on a similar AI journey.

About the authors
Sri Elaprolu serves as Director of the AWS Generative AI Innovation Center, where he leverages nearly three decades of technology leadership experience to drive artificial intelligence and machine learning innovation. In this role, he leads a global team of machine learning scientists and engineers who develop and deploy advanced generative and agentic AI solutions for enterprise and government organizations facing complex business challenges. Throughout his nearly 13-year tenure at AWS, Sri has held progressively senior positions, including leadership of ML science teams that partnered with high-profile organizations such as the NFL, Cerner, and NASA. These collaborations enabled AWS customers to harness AI and ML technologies for transformative business and operational outcomes. Prior to joining AWS, he spent 14 years at Northrop Grumman, where he successfully managed product development and software engineering teams. Sri holds a Master’s degree in Engineering Science and an MBA with a concentration in general management, providing him with both the technical depth and business acumen essential for his current leadership role.
Dr. Diego Socolinsky is currently the North America Head of the Generative AI Innovation Center at Amazon Web Services (AWS). With over 25 years of experience at the intersection of technology, machine learning, and computer vision, he has built a career driving innovation from cutting-edge research to production-ready solutions. Dr. Socolinsky holds a Ph.D. in Mathematics from The Johns Hopkins University and has been a pioneer in various fields including thermal imaging biometrics, augmented/mixed reality, and generative AI initiatives. His technical expertise spans from optimizing low-level embedded systems to architecting complex real-time deep learning solutions, with particular focus on generative AI platforms, large-scale unstructured data classification, and advanced computer vision applications. He is known for his ability to bridge the gap between technical innovation and strategic business objectives, consistently delivering transformative technology that solves complex real-world problems.
Sabine Khan is a Strategic Initiatives Leader with the AWS Generative AI Innovation Center, where she implements delivery and strategy initiatives focused on scaling enterprise-grade Generative AI solutions. She specializes in production-ready AI systems and drives agentic AI projects from concept to deployment. With over twenty years of experience in software delivery and a strong focus on AI/ML during her tenure at AWS, she has established a track record of successful enterprise implementations. Prior to AWS, she led digital transformation initiatives and held product development and software engineering leadership roles in Houston’s energy sector. Sabine holds a Master’s degree in GeoScience and an MBA.
Andrea Jimenez is a dual master’s candidate at the Massachusetts Institute of Technology, pursuing an M.S. in Computer Science from the School of Engineering and an MBA from the Sloan School of Management. As a GenAI Lead Graduate Fellow at the MIT GenAI Innovation Center, she researches agentic AI systems and the economic implications of generative AI technologies, while leveraging her background in artificial intelligence, product development, and startup innovation to lead teams at the intersection of technology and business strategy. Her work focuses on advancing human-AI collaboration and translating cutting-edge research into scalable, high-impact solutions. Prior to AWS and MIT, she led product and engineering teams in the tech industry and founded and sold a startup that helped early-stage companies build and launch SaaS products.
Randi Larson connects AI innovation with executive strategy for the AWS Generative AI Innovation Center, shaping how organizations understand and translate technical breakthroughs into business value. She combines strategic storytelling with data-driven insight through global keynotes, Amazon’s first tech-for-good podcast, and conversations with industry and Amazon leaders on AI transformation. Before Amazon, Randi refined her analytical precision as a Bloomberg journalist and advisor to economic institutions, think tanks, and family offices on technology initiatives. Randi holds an MBA from Duke University’s Fuqua School of Business and a B.S. in Journalism and Spanish from Boston University.

Google AI Introduces FLAME Approach: A One-Step Active Learning that S …

Posted on October 24, 2025 by i-genie

Open vocabulary object detectors answer text queries with boxes. In remote sensing, zero shot performance drops because classes are fine grained and visual context is unusual. Google Research team proposess FLAME, a one step active learning strategy that rides on a strong open vocabulary detector and adds a tiny refiner that you can train in near real time on a CPU. The base model generates high recall proposals, the refiner filters false positives with a few targeted labels, and you avoid full model fine tuning. It reports state of the art accuracy on DOTA and DIOR with 30 shots, and minute scale adaptation per label on a CPU.

https://arxiv.org/pdf/2510.17670v1

Problem framing

Open vocabulary detectors such as OWL ViT v2 are trained on web scale image text pairs. They generalize well on natural images, yet they struggle when categories are subtle, for example chimney versus storage tank, or when the imaging geometry is different, for example nadir aerial tiles with rotated objects and small scales. Precision falls because the text embedding and the visual embedding overlap for look alike categories. A practical system needs the breadth of open vocabulary models, and the precision of a local specialist, without hours of GPU fine tuning or thousands of new labels.

Method and design in concise

FLAME is a cascaded pipeline. Step one, run a zero shot open vocabulary detector to produce many candidate boxes for a text query, for example “chimney.” Step two, represent each candidate with visual features and its similarity to the text. Step three, retrieve marginal samples that sit near the decision boundary by doing a low dimensional projection with PCA, then a density estimate, then select the uncertain band. Step four, cluster this band and pick one item per cluster for diversity. Step five, have a user label about 30 crops as positive or negative. Step six, optionally rebalance with SMOTE or SVM SMOTE if the labels are skewed. Step seven, train a small classifier, for example an RBF SVM or a two layer MLP, to accept or reject the original proposals. The base detector stays frozen, so you keep recall and generalization, and the refiner learns the exact semantics the user meant.

https://arxiv.org/pdf/2510.17670v1

Datasets, base models, and setup

Evaluation uses two standard remote sensing detection benchmarks. DOTA has oriented boxes over 15 categories in high resolution aerial images. DIOR has 23,463 images and 192,472 instances over 20 categories. The comparison includes a zero shot OWL ViT v2 baseline, a zero shot RS OWL ViT v2 that is fine tuned on RS WebLI, and several few shot baselines. RS OWL ViT v2 improves zero shot mean AP to 31.827 percent on DOTA and 29.387 percent on DIOR, which becomes the starting point for FLAME.

https://arxiv.org/pdf/2510.17670v1

Understanding the Results

On 30 shot adaptation, FLAME cascaded on RS OWL ViT v2 reaches 53.96 percent AP on DOTA and 53.21 percent AP on DIOR, which is the top accuracy among the listed methods. The comparison includes SIoU, a prototype based method with DINOv2, and a few shot method proposed by the research team. These numbers appear in Table 1. The research team also reports the per class breakdown in Table 2. On DIOR, the chimney class improves from 0.11 in zero shot to 0.94 after FLAME, which illustrates how the refiner removes look alike false positives from the open vocabulary proposals.

https://arxiv.org/pdf/2510.17670v1

Key Takeaways

FLAME is a one step active learning cascade over OWL ViT v2, it retrieves marginal samples using density estimation, enforces diversity with clustering, collects about 30 labels, and trains a lightweight refiner such as an RBF SVM or a small MLP, with no base model fine tuning.

With 30 shots, FLAME on RS OWL ViT v2 reaches 53.96% AP on DOTA and 53.21% AP on DIOR, exceeding prior few shot baselines including SIoU and a prototype method with DINOv2.

On DIOR, the chimney class improves from 0.11 in zero shot to 0.94 after FLAME, which shows strong filtering of look alike false positives.

Adaptation runs in about 1 minute for each label on a standard CPU, which supports near real time, user in the loop specialization.

Zero shot OWL ViT v2 starts at 13.774% AP on DOTA and 14.982% on DIOR, RS OWL ViT v2 raises zero shot AP to 31.827% and 29.387% respectively, and FLAME then delivers the large precision gains on top.

Editorial Comments

FLAME is a one step active learning cascade that layers a tiny refiner on top of OWL ViT v2, selecting marginal detections, collecting about 30 labels, and training a small classifier without touching the base model. On DOTA and DIOR, FLAME with RS OWL ViT v2 reports 53.96 percent AP and 53.21 percent AP, establishing a strong few shot baseline. On DIOR chimney, average precision rises from 0.11 to 0.94 after refinement, illustrating false positive suppression. Adaptation runs in about 1 minute per label on a CPU, enabling interactive specialization. OWLv2 and RS WebLI provide the foundation for zero shot proposals. Overall, FLAME demonstrates a practical path to open vocabulary detection specialization in remote sensing by pairing RS OWL ViT v2 proposals with a minute scale CPU refiner that lifts DOTA to 53.96 percent AP and DIOR to 53.21 percent AP.

Check out the Paper here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
The post Google AI Introduces FLAME Approach: A One-Step Active Learning that Selects the Most Informative Samples for Training and Makes a Model Specialization Super Fast appeared first on MarkTechPost.

UltraCUA: A Foundation Computer-Use Agents Model that Bridges the Gap …

Posted on October 24, 2025 by i-genie

Computer-use agents have been limited to primitives. They click, they type, they scroll. Long action chains amplify grounding errors and waste steps. Apple Researchers introduce UltraCUA, a foundation model that builds an hybrid action space that lets an agent interleave low level GUI actions with high level programmatic tool calls. The model chooses the cheaper and more reliable move at each step. The approach improves success and reduces steps on OSWorld, and transfers to WindowsAgentArena without Windows specific training.