HtFLlib: A Unified Benchmarking Library for Evaluating Heterogeneous F …

AI institutions develop heterogeneous models for specific tasks but face data scarcity challenges during training. Traditional Federated Learning (FL) supports only homogeneous model collaboration, which needs identical architectures across all clients. However, clients develop model architectures for their unique requirements. Moreover, sharing effort-intensive locally trained models contains intellectual property and reduces participants’ interest in engaging in collaborations. Heterogeneous Federated Learning (HtFL) addresses these limitations, but the literature lacks a unified benchmark for evaluating HtFL across various domains and aspects.

Background and Categories of HtFL Methods

Existing FL benchmarks focus on data heterogeneity using homogeneous client models but neglect real scenarios that involve model heterogeneity. Representative HtFL methods fall into three main categories addressing these limitations. Partial parameter sharing methods such as LG-FedAvg, FedGen, and FedGH maintain heterogeneous feature extractors while assuming homogeneous classifier heads for knowledge transfer. Mutual distillation, such as FML, FedKD, and FedMRL, trains and shares small auxiliary models through distillation techniques. Prototype sharing methods transfer lightweight class-wise prototypes as global knowledge, collecting local prototypes from clients, and collecting them on servers to guide local training. However, it remains unclear whether existing HtFL methods perform consistently across diverse scenarios.

Introducing HtFLlib: A Unified Benchmark

Researchers from Shanghai Jiao Tong University, Beihang University, Chongqing University, Tongji University, Hong Kong Polytechnic University, and The Queen’s University of Belfast have proposed the first Heterogeneous Federated Learning Library (HtFLlib), an easy and extensible method for integrating multiple datasets and model heterogeneity scenarios. This method integrates:

12 datasets across various domains, modalities, and data heterogeneity scenarios

40 model architectures ranging from small to large, across three modalities. 

A modularized and easy-to-extend HtFL codebase with implementations of 10 representative HtFL methods.

Systematic evaluations covering accuracy, convergence, computation costs, and communication costs. 

Datasets and Modalities in HtFLlib

HtFLlib contains detailed data heterogeneity scenarios divided into three settings: Label Skew with Pathological and Dirichlet as subsettings, Feature Shift, and Real-World. It integrates 12 datasets, including Cifar10, Cifar100, Flowers102, Tiny-ImageNet, KVASIR, COVIDx, DomainNet, Camelyon17, AG News, Shakespeare, HAR, and PAMAP2. These datasets vary significantly in domain, data volume, and class numbers, demonstrating HtFLlib’s comprehensive and versatile nature. Moreover, researchers’ main focus is on image data, especially the label skew setting, as image tasks are the most commonly used tasks across various fields. The HtFL methods are evaluated across image, text, and sensor signal tasks to evaluate their respective strengths and weaknesses.

Performance Analysis: Image Modality

For image data, most HtFL methods show decreased accuracy as model heterogeneity increases. The FedMRL shows superior strength through its combination of auxiliary global and local models. When introducing heterogeneous classifiers that make partial parameter sharing methods inapplicable, FedTGP maintains superiority across diverse settings due to its adaptive prototype refinement ability. Medical dataset experiments with black-boxed pre-trained heterogeneous models demonstrate that HtFL enhances model quality compared to pre-trained models and achieves greater improvements than auxiliary models, such as FML. For text data, FedMRL’s advantages in label skew settings diminish in real-world settings, while FedProto and FedTGP perform relatively poorly compared to image tasks.

Conclusion

In conclusion, researchers introduced HtFLlib, a framework that addresses the critical gap in HtFL benchmarking by providing unified evaluation standards across diverse domains and scenarios. HtFLlib’s modular design and extensible architecture provide a detailed benchmark for both research and practical applications in HtFL. Moreover, its ability to support heterogeneous models in collaborative learning opens the way for future research into utilizing complex pre-trained large models, black-box systems, and varied architectures across different tasks and modalities.

Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post HtFLlib: A Unified Benchmarking Library for Evaluating Heterogeneous Federated Learning Methods Across Modalities appeared first on MarkTechPost.

Why Small Language Models (SLMs) Are Poised to Redefine Agentic AI: Ef …

The Shift in Agentic AI System Needs

LLMs are widely admired for their human-like capabilities and conversational skills. However, with the rapid growth of agentic AI systems, LLMs are increasingly being utilized for repetitive, specialized tasks. This shift is gaining momentum—over half of major IT companies now use AI agents, with significant funding and projected market growth. These agents rely on LLMs for decision-making, planning, and task execution, typically through centralized cloud APIs. Massive investments in LLM infrastructure reflect confidence that this model will remain foundational to AI’s future. 

SLMs: Efficiency, Suitability, and the Case Against Over-Reliance on LLMs

Researchers from NVIDIA and Georgia Tech argue that small language models (SLMs) are not only powerful enough for many agent tasks but also more efficient and cost-effective than large models. They believe SLMs are better suited for the repetitive and simple nature of most agentic operations. While large models remain essential for more general, conversational needs, they propose using a mix of models depending on task complexity. They challenge the current reliance on LLMs in agentic systems and offer a framework for transitioning from LLMs to SLMs. They invite open discussion to encourage more resource-conscious AI deployment. 

Why SLMs are Sufficient for Agentic Operations

The researchers argue that SLMs are not only capable of handling most tasks within AI agents but are also more practical and cost-effective than LLMs. They define SLMs as models that can run efficiently on consumer devices, highlighting their strengths—lower latency, reduced energy consumption, and easier customization. Since many agent tasks are repetitive and focused, SLMs are often sufficient and even preferable. The paper suggests a shift toward modular agentic systems using SLMs by default and LLMs only when necessary, promoting a more sustainable, flexible, and inclusive approach to building intelligent systems. 

Arguments for LLM Dominance

Some argue that LLMs will always outperform small models (SLMs) in general language tasks due to superior scaling and semantic abilities. Others claim centralized LLM inference is more cost-efficient due to economies of scale. There is also a belief that LLMs dominate simply because they had an early start, drawing the majority of the industry’s attention. However, the study counters that SLMs are highly adaptable, cheaper to run, and can handle well-defined subtasks in agent systems effectively. Still, the broader adoption of SLMs faces hurdles, including existing infrastructure investments, evaluation bias toward LLM benchmarks, and lower public awareness. 

Framework for Transitioning from LLMs to SLMs

To smoothly shift from LLMs to smaller, specialized ones (SLMs) in agent-based systems, the process starts by securely collecting usage data while ensuring privacy. Next, the data is cleaned and filtered to remove sensitive details. Using clustering, common tasks are grouped to identify where SLMs can take over. Based on task needs, suitable SLMs are chosen and fine-tuned with tailored datasets, often utilizing efficient techniques such as LoRA. In some cases, LLM outputs guide SLM training. This isn’t a one-time process—models should be regularly updated and refined to stay aligned with evolving user interactions and tasks. 

Conclusion: Toward Sustainable and Resource-Efficient Agentic AI

In conclusion, the researchers believe that shifting from large to SLMs could significantly improve the efficiency and sustainability of agentic AI systems, especially for tasks that are repetitive and narrowly focused. They argue that SLMs are often powerful enough, more cost-effective, and better suited for such roles compared to general-purpose LLMs. In cases requiring broader conversational abilities, using a mix of models is recommended. To encourage progress and open dialogue, they invite feedback and contributions to their stance, committing to share responses publicly. The goal is to inspire more thoughtful and resource-efficient use of AI technologies in the future. 

Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post Why Small Language Models (SLMs) Are Poised to Redefine Agentic AI: Efficiency, Cost, and Practical Deployment appeared first on MarkTechPost.

How to Build an Advanced BrightData Web Scraper with Google Gemini for …

In this tutorial, we walk you through building an enhanced web scraping tool that leverages BrightData’s powerful proxy network alongside Google’s Gemini API for intelligent data extraction. You’ll see how to structure your Python project, install and import the necessary libraries, and encapsulate scraping logic within a clean, reusable BrightDataScraper class. Whether you’re targeting Amazon product pages, bestseller listings, or LinkedIn profiles, the scraper’s modular methods demonstrate how to configure scraping parameters, handle errors gracefully, and return structured JSON results. An optional React-style AI agent integration also shows you how to combine LLM-driven reasoning with real-time scraping, empowering you to pose natural language queries for on-the-fly data analysis.

Copy CodeCopiedUse a different Browser!pip install langchain-brightdata langchain-google-genai langgraph langchain-core google-generativeai

We install all of the key libraries needed for the tutorial in one step: langchain-brightdata for BrightData web scraping, langchain-google-genai and google-generativeai for Google Gemini integration, langgraph for agent orchestration, and langchain-core for the core LangChain framework.

Copy CodeCopiedUse a different Browserimport os
import json
from typing import Dict, Any, Optional
from langchain_brightdata import BrightDataWebScraperAPI
from langchain_google_genai import ChatGoogleGenerativeAI
from langgraph.prebuilt import create_react_agent

These imports prepare your environment and core functionality: os and json handle system operations and data serialization, while typing provides structured type hints. You then bring in BrightDataWebScraperAPI for BrightData scraping, ChatGoogleGenerativeAI to interface with Google’s Gemini LLM, and create_react_agent to orchestrate these components in a React-style agent.

Copy CodeCopiedUse a different Browserclass BrightDataScraper:
“””Enhanced web scraper using BrightData API”””

def __init__(self, api_key: str, google_api_key: Optional[str] = None):
“””Initialize scraper with API keys”””
self.api_key = api_key
self.scraper = BrightDataWebScraperAPI(bright_data_api_key=api_key)

if google_api_key:
self.llm = ChatGoogleGenerativeAI(
model=”gemini-2.0-flash”,
google_api_key=google_api_key
)
self.agent = create_react_agent(self.llm, [self.scraper])

def scrape_amazon_product(self, url: str, zipcode: str = “10001”) -> Dict[str, Any]:
“””Scrape Amazon product data”””
try:
results = self.scraper.invoke({
“url”: url,
“dataset_type”: “amazon_product”,
“zipcode”: zipcode
})
return {“success”: True, “data”: results}
except Exception as e:
return {“success”: False, “error”: str(e)}

def scrape_amazon_bestsellers(self, region: str = “in”) -> Dict[str, Any]:
“””Scrape Amazon bestsellers”””
try:
url = f”https://www.amazon.{region}/gp/bestsellers/”
results = self.scraper.invoke({
“url”: url,
“dataset_type”: “amazon_product”
})
return {“success”: True, “data”: results}
except Exception as e:
return {“success”: False, “error”: str(e)}

def scrape_linkedin_profile(self, url: str) -> Dict[str, Any]:
“””Scrape LinkedIn profile data”””
try:
results = self.scraper.invoke({
“url”: url,
“dataset_type”: “linkedin_person_profile”
})
return {“success”: True, “data”: results}
except Exception as e:
return {“success”: False, “error”: str(e)}

def run_agent_query(self, query: str) -> None:
“””Run AI agent with natural language query”””
if not hasattr(self, ‘agent’):
print(“Error: Google API key required for agent functionality”)
return

try:
for step in self.agent.stream(
{“messages”: query},
stream_mode=”values”
):
step[“messages”][-1].pretty_print()
except Exception as e:
print(f”Agent error: {e}”)

def print_results(self, results: Dict[str, Any], title: str = “Results”) -> None:
“””Pretty print results”””
print(f”n{‘=’*50}”)
print(f”{title}”)
print(f”{‘=’*50}”)

if results[“success”]:
print(json.dumps(results[“data”], indent=2, ensure_ascii=False))
else:
print(f”Error: {results[‘error’]}”)
print()

The BrightDataScraper class encapsulates all BrightData web-scraping logic and optional Gemini-powered intelligence under a single, reusable interface. Its methods enable you to easily fetch Amazon product details, bestseller lists, and LinkedIn profiles, handling API calls, error handling, and JSON formatting, and even stream natural-language “agent” queries when a Google API key is provided. A convenient print_results helper ensures your output is always cleanly formatted for inspection.

Copy CodeCopiedUse a different Browserdef main():
“””Main execution function”””
BRIGHT_DATA_API_KEY = “Use Your Own API Key”
GOOGLE_API_KEY = “Use Your Own API Key”

scraper = BrightDataScraper(BRIGHT_DATA_API_KEY, GOOGLE_API_KEY)

print(” Scraping Amazon India Bestsellers…”)
bestsellers = scraper.scrape_amazon_bestsellers(“in”)
scraper.print_results(bestsellers, “Amazon India Bestsellers”)

print(” Scraping Amazon Product…”)
product_url = “https://www.amazon.com/dp/B08L5TNJHG”
product_data = scraper.scrape_amazon_product(product_url, “10001”)
scraper.print_results(product_data, “Amazon Product Data”)

print(” Scraping LinkedIn Profile…”)
linkedin_url = “https://www.linkedin.com/in/satyanadella/”
linkedin_data = scraper.scrape_linkedin_profile(linkedin_url)
scraper.print_results(linkedin_data, “LinkedIn Profile Data”)

print(” Running AI Agent Query…”)
agent_query = “””
Scrape Amazon product data for https://www.amazon.com/dp/B0D2Q9397Y?th=1
in New York (zipcode 10001) and summarize the key product details.
“””
scraper.run_agent_query(agent_query)

The main() function ties everything together by setting your BrightData and Google API keys, instantiating the BrightDataScraper, and then demonstrating each feature: it scrapes Amazon India’s bestsellers, fetches details for a specific product, retrieves a LinkedIn profile, and finally runs a natural-language agent query, printing neatly formatted results after each step.

Copy CodeCopiedUse a different Browserif __name__ == “__main__”:
print(“Installing required packages…”)
os.system(“pip install -q langchain-brightdata langchain-google-genai langgraph”)

os.environ[“BRIGHT_DATA_API_KEY”] = “Use Your Own API Key”

main()

Finally, this entry-point block ensures that, when run as a standalone script, the required scraping libraries are quietly installed, and the BrightData API key is set in the environment. Then the main function is executed to initiate all scraping and agent workflows.

In conclusion, by the end of this tutorial, you’ll have a ready-to-use Python script that automates tedious data collection tasks, abstracts away low-level API details, and optionally taps into generative AI for advanced query handling. You can extend this foundation by adding support for other dataset types, integrating additional LLMs, or deploying the scraper as part of a larger data pipeline or web service. With these building blocks in place, you’re now equipped to gather, analyze, and present web data more efficiently, whether for market research, competitive intelligence, or custom AI-driven applications.

Check out the Notebook. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post How to Build an Advanced BrightData Web Scraper with Google Gemini for AI-Powered Data Extraction appeared first on MarkTechPost.

Meeting summarization and action item extraction with Amazon Nova

Meetings play a crucial role in decision-making, project coordination, and collaboration, and remote meetings are common across many organizations. However, capturing and structuring key takeaways from these conversations is often inefficient and inconsistent. Manually summarizing meetings or extracting action items requires significant effort and is prone to omissions or misinterpretations.
Large language models (LLMs) offer a more robust solution by transforming unstructured meeting transcripts into structured summaries and action items. This capability is especially useful for project management, customer support and sales calls, legal and compliance, and enterprise knowledge management.
In this post, we present a benchmark of different understanding models from the Amazon Nova family available on Amazon Bedrock, to provide insights on how you can choose the best model for a meeting summarization task.
LLMs to generate meeting insights
Modern LLMs are highly effective for summarization and action item extraction due to their ability to understand context, infer topic relationships, and generate structured outputs. In these use cases, prompt engineering provides a more efficient and scalable approach compared to traditional model fine-tuning or customization. Rather than modifying the underlying model architecture or training on large labeled datasets, prompt engineering uses carefully crafted input queries to guide the model’s behavior, directly influencing the output format and content. This method allows for rapid, domain-specific customization without the need for resource-intensive retraining processes. For tasks such as meeting summarization and action item extraction, prompt engineering enables precise control over the generated outputs, making sure they meet specific business requirements. It allows for the flexible adjustment of prompts to suit evolving use cases, making it an ideal solution for dynamic environments where model behaviors need to be quickly reoriented without the overhead of model fine-tuning.
Amazon Nova models and Amazon Bedrock
Amazon Nova models, unveiled at AWS re:Invent in December 2024, are built to deliver frontier intelligence at industry-leading price performance. They’re among the fastest and most cost-effective models in their respective intelligence tiers, and are optimized to power enterprise generative AI applications in a reliable, secure, and cost-effective manner.
The understanding model family has four tiers of models: Nova Micro (text-only, ultra-efficient for edge use), Nova Lite (multimodal, balanced for versatility), Nova Pro (multimodal, balance of speed and intelligence, ideal for most enterprise needs) and Nova Premier (multimodal, the most capable Nova model for complex tasks and teacher for model distillation). Amazon Nova models can be used for a variety of tasks, from summarization to structured text generation. With Amazon Bedrock Model Distillation, customers can also bring the intelligence of Nova Premier to a faster and more cost-effective model such as Nova Pro or Nova Lite for their use case or domain. This can be achieved through the Amazon Bedrock console and APIs such as the Converse API and Invoke API.
Solution overview
This post demonstrates how to use Amazon Nova understanding models, available through Amazon Bedrock, for automated insight extraction using prompt engineering. We focus on two key outputs:

Meeting summarization – A high-level abstractive summary that distills key discussion points, decisions made, and critical updates from the meeting transcript
Action items – A structured list of actionable tasks derived from the meeting conversation that apply to the entire team or project

The following diagram illustrates the solution workflow.

Prerequisites
To follow along with this post, familiarity with calling LLMs using Amazon Bedrock is expected. For detailed steps on using Amazon Bedrock for text summarization tasks, refer to Build an AI text summarizer app with Amazon Bedrock. For additional information about calling LLMs, refer to the Invoke API and Using the Converse API reference documentation.
Solution components
We developed the two core features of the solution—meeting summarization and action item extraction—by using popular models available through Amazon Bedrock. In the following sections, we look at the prompts that were used for these key tasks.
For the meeting summarization task, we used a persona assignment, prompting the LLM to generate a summary in <summary> tags to reduce redundant opening and closing sentences, and a one-shot approach by giving the LLM one example to make sure the LLM consistently follows the right format for summary generation. As part of the system prompt, we give clear and concise rules emphasizing the correct tone, style, length, and faithfulness towards the provided transcript.
For the action item extraction task, we gave specific instructions on generating action items in the prompts and used chain-of-thought to improve the quality of the generated action items. In the assistant message, the prefix <action_items> tag is provided as a prefilling to nudge the model generation in the right direction and to avoid redundant opening and closing sentences.
Different model families respond to the same prompts differently, and it’s important to follow the prompting guide defined for the particular model. For more information on best practices for Amazon Nova prompting, refer to Prompting best practices for Amazon Nova understanding models.
Dataset
To evaluate the solution, we used the samples for the public QMSum dataset. The QMSum dataset is a benchmark for meeting summarization, featuring English language transcripts from academic, business, and governance discussions with manually annotated summaries. It evaluates LLMs on generating structured, coherent summaries from complex and multi-speaker conversations, making it a valuable resource for abstractive summarization and discourse understanding. For testing, we used 30 randomly sampled meetings from the QMSum dataset. Each meeting contained 2–5 topic-wise transcripts and contained approximately 8,600 tokens for each transcript in average.
Evaluation framework
Achieving high-quality outputs from LLMs in meeting summarization and action item extraction can be a challenging task. Traditional evaluation metrics such as ROUGE, BLEU, and METEOR focus on surface-level similarity between generated text and reference summaries, but they often fail to capture nuances such as factual correctness, coherence, and actionability. Human evaluation is the gold standard but is expensive, time-consuming, and not scalable. To address these challenges, you can use LLM-as-a-judge, where another LLM is used to systematically assess the quality of generated outputs based on well-defined criteria. This approach offers a scalable and cost-effective way to automate evaluation while maintaining high accuracy. In this example, we used Anthropic’s Claude 3.5 Sonnet v1 as the judge model because we found it to be most aligned with human judgment. We used the LLM judge to score the generated responses on three main metrics: faithfulness, summarization, and question answering (QA).
The faithfulness score measures the faithfulness of a generated summary by measuring the portion of the parsed statements in a summary that are supported by given context (for example, a meeting transcript) with respect to the total number of statements.
The summarization score is the combination of the QA score and the conciseness score with the same weight (0.5). The QA score measures the coverage of a generated summary from a meeting transcript. It first generates a list of question and answer pairs from a meeting transcript and measures the portion of the questions that are asked correctly when the summary is used as a context instead of a meeting transcript. The QA score is complimentary to the faithfulness score because the faithfulness score doesn’t measure the coverage of a generated summary. We only used the QA score to measure the quality of a generated summary because the action items aren’t supposed to cover all aspects of a meeting transcript. The conciseness score measures the ratio of the length of a generated summary divided by the length of the total meeting transcript.
We used a modified version of the faithfulness score and the summarization score that had much lower latency than the original implementation.
Results
Our evaluation of Amazon Nova models across meeting summarization and action item extraction tasks revealed clear performance-latency patterns. For summarization, Nova Premier achieved the highest faithfulness score (1.0) with a processing time of 5.34s, while Nova Pro delivered 0.94 faithfulness in 2.9s. The smaller Nova Lite and Nova Micro models provided faithfulness scores of 0.86 and 0.83 respectively, with faster processing times of 2.13s and 1.52s. In action item extraction, Nova Premier again led in faithfulness (0.83) with 4.94s processing time, followed by Nova Pro (0.8 faithfulness, 2.03s). Interestingly, Nova Micro (0.7 faithfulness, 1.43s) outperformed Nova Lite (0.63 faithfulness, 1.53s) in this particular task despite its smaller size. These measurements provide valuable insights into the performance-speed characteristics across the Amazon Nova model family for text-processing applications. The following graphs show these results. The following screenshot shows a sample output for our summarization task, including the LLM-generated meeting summary and a list of action items.

Conclusion
In this post, we showed how you can use prompting to generate meeting insights such as meeting summaries and action items using Amazon Nova models available through Amazon Bedrock. For large-scale AI-driven meeting summarization, optimizing latency, cost, and accuracy is essential. The Amazon Nova family of understanding models (Nova Micro, Nova Lite, Nova Pro, and Nova Premier) offers a practical alternative to high-end models, significantly improving inference speed while reducing operational costs. These factors make Amazon Nova an attractive choice for enterprises handling large volumes of meeting data at scale.
For more information on Amazon Bedrock and the latest Amazon Nova models, refer to the Amazon Bedrock User Guide and Amazon Nova User Guide, respectively. The AWS Generative AI Innovation Center has a group of AWS science and strategy experts with comprehensive expertise spanning the generative AI journey, helping customers prioritize use cases, build a roadmap, and move solutions into production. Check out the Generative AI Innovation Center for our latest work and customer success stories.

About the Authors
Baishali Chaudhury is an Applied Scientist at the Generative AI Innovation Center at AWS, where she focuses on advancing Generative AI solutions for real-world applications. She has a strong background in computer vision, machine learning, and AI for healthcare. Baishali holds a PhD in Computer Science from University of South Florida and PostDoc from Moffitt Cancer Centre.
Sungmin Hong is a Senior Applied Scientist at Amazon Generative AI Innovation Center where he helps expedite the variety of use cases of AWS customers. Before joining Amazon, Sungmin was a postdoctoral research fellow at Harvard Medical School. He holds Ph.D. in Computer Science from New York University. Outside of work, he prides himself on keeping his indoor plants alive for 3+ years.
Mengdie (Flora) Wang is a Data Scientist at AWS Generative AI Innovation Center, where she works with customers to architect and implement scalable Generative AI solutions that address their unique business challenges. She specializes in model customization techniques and agent-based AI systems, helping organizations harness the full potential of generative AI technology. Prior to AWS, Flora earned her Master’s degree in Computer Science from the University of Minnesota, where she developed her expertise in machine learning and artificial intelligence.
Anila Joshi has more than a decade of experience building AI solutions. As a AWSI Geo Leader at AWS Generative AI Innovation Center, Anila pioneers innovative applications of AI that push the boundaries of possibility and accelerate the adoption of AWS services with customers by helping customers ideate, identify, and implement secure generative AI solutions.

Building a custom text-to-SQL agent using Amazon Bedrock and Converse …

Developing robust text-to-SQL capabilities is a critical challenge in the field of natural language processing (NLP) and database management. The complexity of NLP and database management increases in this field, particularly while dealing with complex queries and database structures. In this post, we introduce a straightforward but powerful solution with accompanying code to text-to-SQL using a custom agent implementation along with Amazon Bedrock and Converse API.
The ability to translate natural language queries into SQL statements is a game-changer for businesses and organizations because users can now interact with databases in a more intuitive and accessible manner. However, the complexity of database schemas, relationships between tables, and the nuances of natural language can often lead to inaccurate or incomplete SQL queries. This not only compromises the integrity of the data but also hinders the overall user experience. Through a straightforward yet powerful architecture, the agent can understand your query, develop a plan of execution, create SQL statements, self-correct if there is a SQL error, and learn from its execution to improve in the future. Overtime, the agent can develop a cohesive understanding of what to do and what not to do to efficiently answer queries from users.
Solution overview
The solution is composed of an AWS Lambda function that contains the logic of the agent that communicates with Amazon DynamoDB for long-term memory retention, calls Anthropic’s Claude Sonnet in Amazon Bedrock through Converse API, uses AWS Secrets Manager to retrieve database connection details and credentials, and Amazon Relational Database Service (Amazon RDS) that contains an example Postgres database called HR Database. The Lambda function is connected to a virtual private cloud (VPC) and communicates with DynamoDB, Amazon Bedrock, and Secrets Manager through AWS PrivateLink VPC endpoints so that the Lambda can communicate with the RDS database while keeping traffic private through AWS networking.
In the demo, you can interact with the agent through the Lambda function. You can provide it a natural language query, such as “How many employees are there in each department in each region?” or “What is the employee mix by gender in each region”. The following is the solution architecture.

A custom agent build using Converse API
Converse API is provided by Amazon Bedrock for you to be able to create conversational applications. It enables powerful features such as tool use. Tool use is the ability for a large language model (LLM) to choose from a list of tools, such as running SQL queries against a database, and decide which tool to use depending on the context of the conversation. Using Converse API also means you can maintain a series of messages between User and Assistant roles to carry out a chat with an LLM such as Anthropic’s Claude 3.5 Sonnet. In this post, a custom agent called ConverseSQLAgent was created specifically for long-running agent executions and to follow a plan of execution.
The Agent loop: Agent planning, self-correction, and long-term learning
The agent contains several key features: planning and carry-over, execution and tool use, SQLAlchemy and self-correction, reflection and long-term learning using memory.
Planning and carry-over
The first step that the agent takes is to create a plan of execution to perform the text-to-SQL task. It first thinks through what the user is asking and develops a plan on how it will fulfill the request of the user. This behavior is controlled using a system prompt, which defines how the agent should behave. After the agent thinks through what it should do, it outputs the plan.
One of the challenges with long-running agent execution is that sometimes the agent will forget the plan that it was supposed to execute as the context becomes longer and longer as it conducts its steps. One of the primary ways to deal with this is by “carrying over” the initial plan by injecting it back into a section in the system prompt. The system prompt is part of every converse API call, and it improves the ability of the agent to follow its plan. Because the agent may revise its plan as it progresses through the execution, the plan in the system prompt is updated as new plans emerge. Refer to the following figure on how the carry over works.

Execution and tool use
After the plan has been created, the agent will execute its plan one step at a time. It might decide to call on one or more tools it has access to. With Converse API, you can pass in a toolConfig that contains the toolSpec for each tool it has access to. The toolSpec defines what the tool is, a description of the tool, and the parameters that the tool requires. When the LLM decides to use a tool, it outputs a tool use block as part of its response. The application, in this case the Lambda code, needs to identify that tool use block, execute the corresponding tool, append the tool result response to the message list, and call the Converse API again. As shown at (a) in the following figure, you can add tools for the LLM to choose from by adding in a toolConfig along with toolSpecs. Part (b) shows that in the implementation of ConverseSQLAgent, tool groups contain a collection of tools, and each tool contains the toolSpec and the callable function. The tool groups are added to the agent, which in turn adds it to the Converse API call. Tool group instructions are additional instructions on how to use the tool group that get injected into the system prompt. Although you can add descriptions to each individual tool, having tool group–wide instructions enable more effective usage of the group.

SQLAlchemy and self-correction
The SQL tool group (these tools are part of the demo code provided), as shown in the preceding figure, is implemented using SQLAlchemy, which is a Python SQL toolkit you can use to interface with different databases without having to worry about database-specific SQL syntax. You can connect to Postgres, MySQL, and more without having to change your code every time.
In this post, there is an InvokeSQLQuery tool that allows the agent to execute arbitrary SQL statements. Although almost all database specific tasks, such as looking up schemas and tables, can be accomplished through InvokeSQLQuery, it’s better to provide SQLAlchemy implementations for specific tasks, such as GetDatabaseSchemas, which gets every schema in the database, greatly reducing the time it takes for the agent to generate the correct query. Think of it as giving the agent a shortcut to getting the information it needs. The agents can make errors in querying the database through the InvokeSQLQuery tool. The InvokeSQLQuery tool will respond with the error that it encountered back to the agent, and the agent can perform self-correction to correct the query. This flow is shown in the following diagram.

Reflection and long-term learning using memory
Although self-correction is an important feature of the agent, the agent must be able to learn through its mistakes to avoid the same mistake in the future. Otherwise, the agent will continue to make the mistake, greatly reducing effectiveness and efficiency. The agent maintains a hierarchical memory structure, as shown in the following figure. The agent decides how to structure its memory. Here is an example on how it may structure it.

The agent can reflect on its execution, learn best practices and error avoidance, and save it into long-term memory. Long-term memory is implemented through a hierarchical memory structure with Amazon DynamoDB. The agent maintains a main memory that has pointers to other memories it has. Each memory is represented as a record in a DynamoDB table. As the agent learns through its execution and encounters errors, it can update its main memory and create new memories by maintaining an index of memories in the main memory. It can then tap onto this memory in the future to avoid errors and even improve the efficiency of queries by caching facts.
Prerequisites
Before you get started, make sure you have the following prerequisites:

An AWS account with an AWS Identity and Access Management (IAM) user with permissions to deploy the CloudFormation template
The AWS Command Line Interface (AWS CLI) installed and configured for use
Python 3.11 or later
Amazon Bedrock model access to Anthropic’s Claude 3.5 Sonnet

Deploy the solution
The full code and instructions are available in GitHub in the Readme file.

Clone the code to your working environment:

git clone https://github.com/aws-samples/aws-field-samples.git

Move to ConverseSqlAgent folder
Follow the steps in the Readme file in the GitHub repo

Cleanup
To dispose of the stack afterwards, invoke the following command:
cdk destroy
Conclusion
The development of robust text-to-SQL capabilities is a critical challenge in natural language processing and database management. Although current approaches have made progress, there remains room for improvement, particularly with complex queries and database structures. The introduction of the ConverseSQLAgent, a custom agent implementation using Amazon Bedrock and Converse API, presents a promising solution to this problem. The agent’s architecture, featuring planning and carry-over, execution and tool use, self-correction through SQLAlchemy, and reflection-based long-term learning, demonstrates its ability to understand natural language queries, develop and execute SQL plans, and continually improve its capabilities. As businesses seek more intuitive ways to access and manage data, solutions such as the ConverseSQLAgent hold the potential to bridge the gap between natural language and structured database queries, unlocking new levels of productivity and data-driven decision-making. To dive deeper and learn more about generative AI, check out these additional resources:

Amazon Bedrock
Amazon Bedrock Knowledge Bases
Generative AI use cases
Amazon Bedrock Agents
Carry out a conversation with the Converse API operations

About the authors
Pavan Kumar is a Solutions Architect at Amazon Web Services (AWS), helping customers design robust, scalable solutions on the cloud across multiple industries. With a background in enterprise architecture and software development, Pavan has contributed to creating solutions to handle API security, API management, microservices, and geospatial information system use cases for his customers. He is passionate about learning new technologies and solving, automating, and simplifying customer problems using these solutions.
Abdullah Siddiqui is a Partner Sales Solutions Architect at Amazon Web Services (AWS) based out of Toronto. He helps AWS Partners and customers build solutions using AWS services and specializes in resilience and migrations. In his spare time, he enjoys spending time with his family and traveling.
Parag Srivastava is a Solutions Architect at Amazon Web Services (AWS), helping enterprise customers with successful cloud adoption and migration. During his professional career, he has been extensively involved in complex digital transformation projects. He is also passionate about building innovative solutions around geospatial aspects of addresses.

Accelerate threat modeling with generative AI

In this post, we explore how generative AI can revolutionize threat modeling practices by automating vulnerability identification, generating comprehensive attack scenarios, and providing contextual mitigation strategies. Unlike previous automation attempts that struggled with the creative and contextual aspects of threat analysis, generative AI overcomes these limitations through its ability to understand complex system relationships, reason about novel attack vectors, and adapt to unique architectural patterns. Where traditional automation tools relied on rigid rule sets and predefined templates, AI models can now interpret nuanced system designs, infer security implications across components, and generate threat scenarios that human analysts might overlook, making effective automated threat modeling a practical reality.
Threat modeling and why it matters
Threat modeling is a structured approach to identifying, quantifying, and addressing security risks associated with an application or system. It involves analyzing the architecture from an attacker’s perspective to discover potential vulnerabilities, determine their impact, and implement appropriate mitigations. Effective threat modeling examines data flows, trust boundaries, and potential attack vectors to create a comprehensive security strategy tailored to the specific system.
In a shift-left approach to security, threat modeling serves as a critical early intervention. By implementing threat modeling during the design phase—before a single line of code is written—organizations can identify and address potential vulnerabilities at their inception point. The following diagram illustrates this workflow.

This proactive strategy significantly reduces the accumulation of security debt and transforms security from a bottleneck into an enabler of innovation. When security considerations are integrated from the beginning, teams can implement appropriate controls throughout the development lifecycle, resulting in more resilient systems built from the ground up.
Despite these clear benefits, threat modeling remains underutilized in the software development industry. This limited adoption stems from several significant challenges inherent to traditional threat modeling approaches:

Time requirements – The process takes 1–8 days to complete, with multiple iterations needed for full coverage. This conflicts with tight development timelines in modern software environments.
Inconsistent assessment – Threat modeling suffers from subjectivity. Security experts often vary in their threat identification and risk level assignments, creating inconsistencies across projects and teams.
Scaling limitations – Manual threat modeling can’t effectively address modern system complexity. The growth of microservices, cloud deployments, and system dependencies outpaces security teams’ capacity to identify vulnerabilities.

How generative AI can help
Generative AI has revolutionized threat modeling by automating traditionally complex analytical tasks that required human judgment, reasoning, and expertise. Generative AI brings powerful capabilities to threat modeling, combining natural language processing with visual analysis to simultaneously evaluate system architectures, diagrams, and documentation. Drawing from extensive security databases like MITRE ATT&CK and OWASP, these models can quickly identify potential vulnerabilities across complex systems. This dual capability of processing both text and visuals while referencing comprehensive security frameworks enables faster, more thorough threat assessments than traditional manual methods.
Our solution, Threat Designer, uses enterprise-grade foundation models (FMs) available in Amazon Bedrock to transform threat modeling. Using Anthropic’s Claude Sonnet 3.7 advanced multimodal capabilities, we create comprehensive threat assessments at scale. You can also use other available models from the model catalog or use your own fine-tuned model, giving you maximum flexibility to use pre-trained expertise or custom-tailored capabilities specific to your security domain and organizational requirements. This adaptability makes sure your threat modeling solution delivers precise insights aligned with your unique security posture.
Solution overview
Threat Designer is a user-friendly web application that makes advanced threat modeling accessible to development and security teams. Threat Designer uses large language models (LLMs) to streamline the threat modeling process and identify vulnerabilities with minimal human effort.
Key features include:

Architecture diagram analysis – Users can submit system architecture diagrams, which the application processes using multimodal AI capabilities to understand system components and relationships
Interactive threat catalog – The system generates a comprehensive catalog of potential threats that users can explore, filter, and refine through an intuitive interface
Iterative refinement – With the replay functionality, teams can rerun the threat modeling process with design improvements or modifications, and see how changes impact the system’s security posture
Standardized exports – Results can be exported in PDF or DOCX formats, facilitating integration with existing security documentation and compliance processes
Serverless architecture – The solution runs on a cloud-based serverless infrastructure, alleviating the need for dedicated servers and providing automatic scaling based on demand

The following diagram illustrates the Threat Designer architecture.

The solution is built on a serverless stack, using AWS managed services for automatic scaling, high availability, and cost-efficiency. The solution is composed of the following core components:

Frontend – AWS Amplify hosts a ReactJS application built with the Cloudscape design system, providing the UI
Authentication – Amazon Cognito manages the user pool, handling authentication flows and securing access to application resources
API layer – Amazon API Gateway serves as the communication hub, providing proxy integration between frontend and backend services with request routing and authorization
Data storage – We use the following services for storage:

Two Amazon DynamoDB tables:

The agent execution state table maintains processing state
The threat catalog table stores identified threats and vulnerabilities

An Amazon Simple Storage Service (Amazon S3) architecture bucket stores system diagrams and artifacts

Generative AI – Amazon Bedrock provides the FM for threat modeling, analyzing architecture diagrams and identifying potential vulnerabilities
Backend service – An AWS Lambda function contains the REST interface business logic, built using Powertools for AWS Lambda (Python)
Agent service – Hosted on a Lambda function, the agent service works asynchronously to manage threat analysis workflows, processing diagrams and maintaining execution state in DynamoDB

Agent service workflow
The agent service is built on LangGraph by LangChain, with which we can orchestrate complex workflows through a graph-based structure. This approach incorporates two key design patterns:

Separation of concerns – The threat modeling process is decomposed into discrete, specialized steps that can be executed independently and iteratively. Each node in the graph represents a specific function, such as image processing, asset identification, data flow analysis, or threat enumeration.
Structured output – Each component in the workflow produces standardized, well-defined outputs that serve as inputs to subsequent steps, providing consistency and facilitating downstream integrations for consistent representation.

The agent workflow follows a directed graph where processing begins at the Start node and proceeds through several specialized stages, as illustrated in the following diagram.

The workflow includes the following nodes:

Image processing – The Image processing node processes the architecture diagram image and converts it in the appropriate format for the LLM to consume
Assets – This information, along with textual descriptions, feeds into the Assets node, which identifies and catalogs system components
Flows – The workflow then progresses to the Flows node, mapping data movements and trust boundaries between components
Threats – Lastly, the Threats node uses this information to identify potential vulnerabilities and attack vectors

A critical innovation in our agent architecture is the adaptive iteration mechanism implemented through conditional edges in the graph. This feature addresses one of the fundamental challenges in LLM-based threat modeling: controlling the comprehensiveness and depth of the analysis.
The conditional edge after the Threats node enables two powerful operational modes:

User-controlled iteration – In this mode, the user specifies the number of iterations the agent should perform. With each pass through the loop, the agent enriches the threat catalog by analyzing edge cases that might have been overlooked in previous iterations. This approach gives security professionals direct control over the thoroughness of the analysis.
Autonomous gap analysis – In fully agentic mode, a specialized gap analysis component evaluates the current threat catalog. This component identifies potential blind spots or underdeveloped areas in the threat model and triggers additional iterations until it determines the threat catalog is sufficiently comprehensive. The agent essentially performs its own quality assurance, continuously refining its output until it meets predefined completeness criteria.

Prerequisites
Before you deploy Threat Designer, make sure you have the required prerequisites in place. For more information, refer to the GitHub repo.
Get started with Threat Designer
To start using Threat Designer, follow the step-by-step deployment instructions from the project’s README available in GitHub. After you deploy the solution, you’re ready to create your first threat model. Log in and complete the following steps:

Choose Submit threat model to initiate a new threat model.
Complete the submission form with your system details:

Required fields: Provide a title and architecture diagram image.
Recommended fields: Provide a solution description and assumptions (these significantly improve the quality of the threat model).

Configure analysis parameters:

Choose your iteration mode:

Auto (default): The agent intelligently determines when the threat catalog is comprehensive.
Manual: Specify up to 15 iterations for more control.

Configure your reasoning boost to specify how much time the model spends on analysis (available when using Anthropic’s Claude Sonnet 3.7).

Choose Start threat modeling to launch the analysis.

You can monitor progress through the intuitive interface, which displays each execution step in real time. The complete analysis typically takes between 5–15 minutes, depending on system complexity and selected parameters.

When the analysis is complete, you will have access to a comprehensive threat model that you can explore, refine, and export.

Clean up
To avoid incurring future charges, delete the solution by running the ./destroy.sh script. Refer to the README for more details.
Conclusion
In this post, we demonstrated how generative AI transforms threat modeling from an exclusive, expert-driven process into an accessible security practice for all development teams. By using FMs through our Threat Designer solution, we’ve democratized sophisticated security analysis, enabling organizations to identify vulnerabilities earlier and more consistently. This AI-powered approach removes the traditional barriers of time, expertise, and scalability, making shift-left security a practical reality rather than just an aspiration—ultimately building more resilient systems without sacrificing development velocity.
Deploy Threat Designer following the README instructions, upload your architecture diagram, and quickly receive AI-generated security insights. This streamlined approach helps you integrate proactive security measures into your development process without compromising speed or innovation—making comprehensive threat modeling accessible to teams of different sizes.

About the Authors
Edvin Hallvaxhiu is a senior security architect at Amazon Web Services, specialized in cybersecurity and automation. He helps customers design secure, compliant cloud solutions.
Sindi Cali is a consultant with AWS Professional Services. She supports customers in building data-driven applications in AWS.
Aditi Gupta is a Senior Global Engagement Manager at AWS ProServe. She specializes in delivering impactful Big Data and AI/ML solutions that enable AWS customers to maximize their business value through data utilization.
Rahul Shaurya is a Principal Data Architect at Amazon Web Services. He helps and works closely with customers building data platforms and analytical applications on AWS.

From Fine-Tuning to Prompt Engineering: Theory and Practice for Effici …

The Challenge of Fine-Tuning Large Transformer Models

Self-attention enables transformer models to capture long-range dependencies in text, which is crucial for comprehending complex language patterns. These models work efficiently with massive datasets and achieve remarkable performance without needing task-specific structures. As a result, they are widely applied across industries, including software development, education, and content generation.

A key limitation in applying these powerful models is the reliance on supervised fine-tuning. Adapting a base transformer to a specific task typically involves retraining the model with labeled data, which demands significant computational resources, sometimes amounting to thousands of GPU hours. This presents a major barrier for organizations that lack access to such hardware or seek quicker adaptation times. Consequently, there is a pressing need for methods that can elicit task-specific capabilities from pre-trained transformers without modifying their parameters.

Inference-Time Prompting as an Alternative to Fine-Tuning

To address this issue, researchers have explored inference-time techniques that guide the model’s behavior using example-based inputs, bypassing the need for parameter updates. Among these methods, in-context learning has emerged as a practical approach where a model receives a sequence of input-output pairs to generate predictions for new inputs. Unlike traditional training, these techniques operate during inference, enabling the base model to exhibit desired behaviors solely based on context. Despite their promise, there has been limited formal proof to confirm that such techniques can consistently match fine-tuned performance.

Theoretical Framework: Approximating Fine-Tuned Models via In-Context Learning

Researchers from Patched Codes, Inc. introduced a method grounded in the Turing completeness of transformers, demonstrating that a base model can approximate the behavior of a fine-tuned model using in-context learning, provided sufficient computational resources and access to the original training dataset. Their theoretical framework offers a quantifiable approach to understanding how dataset size, context length, and task complexity influence the quality of the approximation. The analysis specifically examines two task types—text generation and linear classification—and establishes bounds on dataset requirements to achieve fine-tuned-like outputs with a defined error margin.

Prompt Design and Theoretical Guarantees

The method involves designing a prompt structure that concatenates a dataset of labeled examples with a target query. The model processes this sequence, drawing patterns from the examples to generate a response. For instance, a prompt could include input-output pairs like sentiment-labeled reviews, followed by a new review whose sentiment must be predicted. The researchers constructed this process as a simulation of a Turing machine, where self-attention mimics the tape state and feed-forward layers act as transition rules. They also formalized conditions under which the total variation distance between the base and fine-tuned output distributions remains within an acceptable error ε. The paper provides a construction for this inference technique and quantifies its theoretical performance.

Quantitative Results: Dataset Size and Task Complexity

The researchers provided performance guarantees based on dataset size and task type. For text generation tasks involving a vocabulary size V, the dataset must be of sizeOmVϵ2log1δ to ensure the base model approximates the fine-tuned model within an error ε across mmm contexts. When the output length is fixed at l, a smaller dataset of size Ol logVϵ2log1δ suffices. For linear classification tasks where the input has dimension d, the required dataset size becomes Odϵ, or with context constraints, O1ϵ2log1δ. These results are robust under idealized assumptions but also adapted to practical constraints like finite context length and partial dataset availability using techniques such as retrieval-augmented generation.

Implications: Towards Efficient and Scalable NLP Models

This research presents a detailed and well-structured argument demonstrating that inference-time prompting can closely match the capabilities of supervised fine-tuning, provided sufficient contextual data is supplied. It successfully identifies a path toward more resource-efficient deployment of large language models, presenting both a theoretical justification and practical techniques. The study demonstrates that leveraging a model’s latent capabilities through structured prompts is not just viable but scalable and highly effective for specific NLP tasks.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post From Fine-Tuning to Prompt Engineering: Theory and Practice for Efficient Transformer Adaptation appeared first on MarkTechPost.

Building High-Performance Financial Analytics Pipelines with Polars: L …

In this tutorial, we delve into building an advanced data analytics pipeline using Polars, a lightning-fast DataFrame library designed for optimal performance and scalability. Our goal is to demonstrate how we can utilize Polars’ lazy evaluation, complex expressions, window functions, and SQL interface to process large-scale financial datasets efficiently. We begin by generating a synthetic financial time series dataset and move step-by-step through an end-to-end pipeline, from feature engineering and rolling statistics to multi-dimensional analysis and ranking. Throughout, we demonstrate how Polars empowers us to write expressive and performant data transformations, all while maintaining low memory usage and ensuring fast execution.

Copy CodeCopiedUse a different Browserimport polars as pl
import numpy as np
from datetime import datetime, timedelta
import io

try:
import polars as pl
except ImportError:
import subprocess
subprocess.run([“pip”, “install”, “polars”], check=True)
import polars as pl

print(” Advanced Polars Analytics Pipeline”)
print(“=” * 50)

We begin by importing the essential libraries, including Polars for high-performance DataFrame operations and NumPy for generating synthetic data. To ensure compatibility, we add a fallback installation step for Polars in case it isn’t already installed. With the setup ready, we signal the start of our advanced analytics pipeline.

Copy CodeCopiedUse a different Browsernp.random.seed(42)
n_records = 100000
dates = [datetime(2020, 1, 1) + timedelta(days=i//100) for i in range(n_records)]
tickers = np.random.choice([‘AAPL’, ‘GOOGL’, ‘MSFT’, ‘TSLA’, ‘AMZN’], n_records)

# Create complex synthetic dataset
data = {
‘timestamp’: dates,
‘ticker’: tickers,
‘price’: np.random.lognormal(4, 0.3, n_records),
‘volume’: np.random.exponential(1000000, n_records).astype(int),
‘bid_ask_spread’: np.random.exponential(0.01, n_records),
‘market_cap’: np.random.lognormal(25, 1, n_records),
‘sector’: np.random.choice([‘Tech’, ‘Finance’, ‘Healthcare’, ‘Energy’], n_records)
}

print(f” Generated {n_records:,} synthetic financial records”)

We generate a rich, synthetic financial dataset with 100,000 records using NumPy, simulating daily stock data for major tickers such as AAPL and TSLA. Each entry includes key market features such as price, volume, bid-ask spread, market cap, and sector. This provides a realistic foundation for demonstrating advanced Polars analytics on a time-series dataset.

Copy CodeCopiedUse a different Browserlf = pl.LazyFrame(data)

result = (
lf
.with_columns([
pl.col(‘timestamp’).dt.year().alias(‘year’),
pl.col(‘timestamp’).dt.month().alias(‘month’),
pl.col(‘timestamp’).dt.weekday().alias(‘weekday’),
pl.col(‘timestamp’).dt.quarter().alias(‘quarter’)
])

.with_columns([
pl.col(‘price’).rolling_mean(20).over(‘ticker’).alias(‘sma_20’),
pl.col(‘price’).rolling_std(20).over(‘ticker’).alias(‘volatility_20’),

pl.col(‘price’).ewm_mean(span=12).over(‘ticker’).alias(’ema_12′),

pl.col(‘price’).diff().alias(‘price_diff’),

(pl.col(‘volume’) * pl.col(‘price’)).alias(‘dollar_volume’)
])

.with_columns([
pl.col(‘price_diff’).clip(0, None).rolling_mean(14).over(‘ticker’).alias(‘rsi_up’),
pl.col(‘price_diff’).abs().rolling_mean(14).over(‘ticker’).alias(‘rsi_down’),

(pl.col(‘price’) – pl.col(‘sma_20’)).alias(‘bb_position’)
])

.with_columns([
(100 – (100 / (1 + pl.col(‘rsi_up’) / pl.col(‘rsi_down’)))).alias(‘rsi’)
])

.filter(
(pl.col(‘price’) > 10) &
(pl.col(‘volume’) > 100000) &
(pl.col(‘sma_20’).is_not_null())
)

.group_by([‘ticker’, ‘year’, ‘quarter’])
.agg([
pl.col(‘price’).mean().alias(‘avg_price’),
pl.col(‘price’).std().alias(‘price_volatility’),
pl.col(‘price’).min().alias(‘min_price’),
pl.col(‘price’).max().alias(‘max_price’),
pl.col(‘price’).quantile(0.5).alias(‘median_price’),

pl.col(‘volume’).sum().alias(‘total_volume’),
pl.col(‘dollar_volume’).sum().alias(‘total_dollar_volume’),

pl.col(‘rsi’).filter(pl.col(‘rsi’).is_not_null()).mean().alias(‘avg_rsi’),
pl.col(‘volatility_20’).mean().alias(‘avg_volatility’),
pl.col(‘bb_position’).std().alias(‘bollinger_deviation’),

pl.len().alias(‘trading_days’),
pl.col(‘sector’).n_unique().alias(‘sectors_count’),

(pl.col(‘price’) > pl.col(‘sma_20’)).mean().alias(‘above_sma_ratio’),

((pl.col(‘price’).max() – pl.col(‘price’).min()) / pl.col(‘price’).min())
.alias(‘price_range_pct’)
])

.with_columns([
pl.col(‘total_dollar_volume’).rank(method=’ordinal’, descending=True).alias(‘volume_rank’),
pl.col(‘price_volatility’).rank(method=’ordinal’, descending=True).alias(‘volatility_rank’)
])

.filter(pl.col(‘trading_days’) >= 10)
.sort([‘ticker’, ‘year’, ‘quarter’])
)

We load our synthetic dataset into a Polars LazyFrame to enable deferred execution, allowing us to chain complex transformations efficiently. From there, we enrich the data with time-based features and apply advanced technical indicators, such as moving averages, RSI, and Bollinger bands, using window and rolling functions. We then perform grouped aggregations by ticker, year, and quarter to extract key financial statistics and indicators. Finally, we rank the results based on volume and volatility, filter out under-traded segments, and sort the data for intuitive exploration, all while leveraging Polars’ powerful lazy evaluation engine to its full advantage.

Copy CodeCopiedUse a different Browserdf = result.collect()
print(f”n Analysis Results: {df.height:,} aggregated records”)
print(“nTop 10 High-Volume Quarters:”)
print(df.sort(‘total_dollar_volume’, descending=True).head(10).to_pandas())

print(“n Advanced Analytics:”)

pivot_analysis = (
df.group_by(‘ticker’)
.agg([
pl.col(‘avg_price’).mean().alias(‘overall_avg_price’),
pl.col(‘price_volatility’).mean().alias(‘overall_volatility’),
pl.col(‘total_dollar_volume’).sum().alias(‘lifetime_volume’),
pl.col(‘above_sma_ratio’).mean().alias(‘momentum_score’),
pl.col(‘price_range_pct’).mean().alias(‘avg_range_pct’)
])
.with_columns([
(pl.col(‘overall_avg_price’) / pl.col(‘overall_volatility’)).alias(‘risk_adj_score’),

(pl.col(‘momentum_score’) * 0.4 +
pl.col(‘avg_range_pct’) * 0.3 +
(pl.col(‘lifetime_volume’) / pl.col(‘lifetime_volume’).max()) * 0.3)
.alias(‘composite_score’)
])
.sort(‘composite_score’, descending=True)
)

print(“n Ticker Performance Ranking:”)
print(pivot_analysis.to_pandas())

Once our lazy pipeline is complete, we collect the results into a DataFrame and immediately review the top 10 quarters based on total dollar volume. This helps us identify periods of intense trading activity. We then take our analysis a step further by grouping the data by ticker to compute higher-level insights, such as lifetime trading volume, average price volatility, and a custom composite score. This multi-dimensional summary helps us compare stocks not just by raw volume, but also by momentum and risk-adjusted performance, unlocking deeper insights into overall ticker behavior.

Copy CodeCopiedUse a different Browserprint(“n SQL Interface Demo:”)
pl.Config.set_tbl_rows(5)

sql_result = pl.sql(“””
SELECT
ticker,
AVG(avg_price) as mean_price,
STDDEV(price_volatility) as volatility_consistency,
SUM(total_dollar_volume) as total_volume,
COUNT(*) as quarters_tracked
FROM df
WHERE year >= 2021
GROUP BY ticker
ORDER BY total_volume DESC
“””, eager=True)

print(sql_result)

print(f”n Performance Metrics:”)
print(f” • Lazy evaluation optimizations applied”)
print(f” • {n_records:,} records processed efficiently”)
print(f” • Memory-efficient columnar operations”)
print(f” • Zero-copy operations where possible”)

print(f”n Export Options:”)
print(” • Parquet (high compression): df.write_parquet(‘data.parquet’)”)
print(” • Delta Lake: df.write_delta(‘delta_table’)”)
print(” • JSON streaming: df.write_ndjson(‘data.jsonl’)”)
print(” • Apache Arrow: df.to_arrow()”)

print(“n Advanced Polars pipeline completed successfully!”)
print(” Demonstrated: Lazy evaluation, complex expressions, window functions,”)
print(” SQL interface, advanced aggregations, and high-performance analytics”)

We wrap up the pipeline by showcasing Polars’ elegant SQL interface, running an aggregate query to analyze post-2021 ticker performance with familiar SQL syntax. This hybrid capability enables us to blend expressive Polars transformations with declarative SQL queries seamlessly. To highlight its efficiency, we print key performance metrics, emphasizing lazy evaluation, memory efficiency, and zero-copy execution. Finally, we demonstrate how easily we can export results in various formats, such as Parquet, Arrow, and JSONL, making this pipeline both powerful and production-ready. With that, we complete a full-circle, high-performance analytics workflow using Polars.

In conclusion, we’ve seen firsthand how Polars’ lazy API can optimize complex analytics workflows that would otherwise be sluggish in traditional tools. We’ve developed a comprehensive financial analysis pipeline, spanning from raw data ingestion to rolling indicators, grouped aggregations, and advanced scoring, all executed with blazing speed. Not only that, but we also tapped into Polars’ powerful SQL interface to run familiar queries seamlessly over our DataFrames. This dual ability to write both functional-style expressions and SQL makes Polars an incredibly flexible tool for any data scientist.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post Building High-Performance Financial Analytics Pipelines with Polars: Lazy Evaluation, Advanced Expressions, and SQL Integration appeared first on MarkTechPost.

How to Use python-A2A to Create and Connect Financial Agents with Goog …

Python A2A is an implementation of Google’s Agent-to-Agent (A2A) protocol, which enables AI agents to communicate with each other using a shared, standardized format—eliminating the need for custom integration between services.

In this tutorial, we’ll use the decorator-based approach provided by the python-a2a library. With simple @agent and @skill decorators, you can define your agent’s identity and behavior, while the library takes care of protocol handling and message flow.

This method is perfect for quickly building useful, task-focused agents without worrying about low-level communication logic.

Installing the dependencies

To get started, you’ll need to install the python-a2a library, which provides a clean abstraction to build and run agents that follow the A2A protocol.

Open your terminal and run:

Copy CodeCopiedUse a different Browserpip install python-a2a

Creating the Agents

For this tutorial, we will be creating two agents – one for calculating stock returns based on investment, rate, and time, and another for adjusting an amount based on inflation over a period of years.

EMI Agent (emi_agent.py)

Copy CodeCopiedUse a different Browserfrom python_a2a import A2AServer, skill, agent, run_server, TaskStatus, TaskState
import re

@agent(
name=”EMI Calculator Agent”,
description=”Calculates EMI for a given principal, interest rate, and loan duration”,
version=”1.0.0″
)
class EMIAgent(A2AServer):

@skill(
name=”Calculate EMI”,
description=”Calculates EMI given principal, annual interest rate, and duration in months”,
tags=[“emi”, “loan”, “interest”]
)
def calculate_emi(self, principal: float, annual_rate: float, months: int) -> str:
monthly_rate = annual_rate / (12 * 100)
emi = (principal * monthly_rate * ((1 + monthly_rate) ** months)) / (((1 + monthly_rate) ** months) – 1)
return f”The EMI for a loan of ₹{principal:.0f} at {annual_rate:.2f}% interest for {months} months is ₹{emi:.2f}”

def handle_task(self, task):
input_text = task.message[“content”][“text”]

# Extract values from natural language
principal_match = re.search(r”₹?(d{4,10})”, input_text)
rate_match = re.search(r”(d+(.d+)?)s*%”, input_text)
months_match = re.search(r”(d+)s*(months|month)”, input_text, re.IGNORECASE)

try:
principal = float(principal_match.group(1)) if principal_match else 100000
rate = float(rate_match.group(1)) if rate_match else 10.0
months = int(months_match.group(1)) if months_match else 12

print(f”Inputs → Principal: {principal}, Rate: {rate}, Months: {months}”)
emi_text = self.calculate_emi(principal, rate, months)

except Exception as e:
emi_text = f”Sorry, I couldn’t parse your input. Error: {e}”

task.artifacts = [{
“parts”: [{“type”: “text”, “text”: emi_text}]
}]
task.status = TaskStatus(state=TaskState.COMPLETED)

return task

# Run the server
if __name__ == “__main__”:
agent = EMIAgent()
run_server(agent, port=4737)

This EMI Calculator Agent is built using the python-a2a library and follows the decorator-based approach. At the top, we use the @agent decorator to define the agent’s name, description, and version. This registers the agent so that it can communicate using the A2A protocol.

Inside the class, we define a single skill using the @skill decorator. This skill, called calculate_emi, performs the actual EMI calculation using the standard formula. The formula takes in three parameters: the loan principal, the annual interest rate, and the loan duration in months. We convert the annual rate into a monthly rate and use it to compute the monthly EMI.

The handle_task method is the core of the agent. It receives the user’s input message, extracts relevant numbers using simple regular expressions, and passes them to the calculate_emi method. 

Finally, at the bottom of the file, we launch the agent using the run_server() function on port 4737, making it ready to receive A2A protocol messages. This design keeps the agent simple, modular, and easy to extend with more skills in the future.

Inflation Agent (inflation_agent.py)

Copy CodeCopiedUse a different Browserfrom python_a2a import A2AServer, skill, agent, run_server, TaskStatus, TaskState
import re

@agent(
name=”Inflation Adjusted Amount Agent”,
description=”Calculates the future value adjusted for inflation”,
version=”1.0.0″
)
class InflationAgent(A2AServer):

@skill(
name=”Inflation Adjustment”,
description=”Adjusts an amount for inflation over time”,
tags=[“inflation”, “adjustment”, “future value”]
)
def handle_input(self, text: str) -> str:
try:
# Extract amount
amount_match = re.search(r”₹?(d{3,10})”, text)
amount = float(amount_match.group(1)) if amount_match else None

# Extract rate (e.g. 6%, 7.5 percent)
rate_match = re.search(r”(d+(.d+)?)s*(%|percent)”, text, re.IGNORECASE)
rate = float(rate_match.group(1)) if rate_match else None

# Extract years (e.g. 5 years)
years_match = re.search(r”(d+)s*(years|year)”, text, re.IGNORECASE)
years = int(years_match.group(1)) if years_match else None

if amount is not None and rate is not None and years is not None:
adjusted = amount * ((1 + rate / 100) ** years)
return f”₹{amount:.2f} adjusted for {rate:.2f}% inflation over {years} years is ₹{adjusted:.2f}”

return (
“Please provide amount, inflation rate (e.g. 6%) and duration (e.g. 5 years).n”
“Example: ‘What is ₹10000 worth after 5 years at 6% inflation?'”
)
except Exception as e:
return f”Sorry, I couldn’t compute that. Error: {e}”

def handle_task(self, task):
text = task.message[“content”][“text”]
result = self.handle_input(text)

task.artifacts = [{
“parts”: [{“type”: “text”, “text”: result}]
}]
task.status = TaskStatus(state=TaskState.COMPLETED)
return task

if __name__ == “__main__”:
agent = InflationAgent()
run_server(agent, port=4747)

This agent helps calculate how much a given amount would be worth in the future after adjusting for inflation. It uses the same decorator-based structure provided by the python-a2a library. The @agent decorator defines the metadata for this agent, and the @skill decorator registers the main logic under the name “Inflation Adjustment.”

The handle_input method is where the main processing happens. It extracts the amount, inflation rate, and number of years from the user’s input using simple regular expressions. If all three values are present, it uses the standard future value formula to calculate the inflation-adjusted amount:

Adjusted Value = amount × (1 + rate/100) ^ years.

If any value is missing, the agent returns a helpful prompt telling the user what to provide, including an example. The handle_task function connects everything by taking the user’s message, passing it to the skill function, and returning the formatted result back to the user.

Finally, the agent is launched using run_server() on port 4747, making it ready to handle A2A queries.

Creating the Agent Network

Firstly run both the agents in two separate terminals

Copy CodeCopiedUse a different Browserpython emi_agent.py

Copy CodeCopiedUse a different Browserpython inflation_agent.py

Each of these agents exposes a REST API endpoint (e.g. http://localhost:4737 for EMI, http://localhost:4747 for Inflation) using the A2A protocol. They listen for incoming tasks (like “calculate EMI for ₹2,00,000…”) and respond with text answers.

Now, we will add these two agents to our network

Copy CodeCopiedUse a different Browserfrom python_a2a import AgentNetwork, A2AClient, AIAgentRouter

# Create an agent network
network = AgentNetwork(name=”Economics Calculator”)

# Add agents to the network
network.add(“EMI”, “http://localhost:4737”)
network.add(“Inflation”, “http://localhost:4747”)

Next we will create a router to intelligently direct queries to the best agent. This is a core utility of the A2A protocol—it defines a standard task format so agents can be queried uniformly, and routers can make intelligent routing decisions using LLMs.

Copy CodeCopiedUse a different Browserrouter = AIAgentRouter(
llm_client=A2AClient(“http://localhost:5000/openai”), # LLM for making routing decisions
agent_network=network
)

Lastly, we will query the agents

Copy CodeCopiedUse a different Browserquery = “Calculate EMI for ₹200000 at 5% interest over 18 months.”
agent_name, confidence = router.route_query(query)
print(f”Routing to {agent_name} with {confidence:.2f} confidence”)

# Get the selected agent and ask the question
agent = network.get_agent(agent_name)
response = agent.ask(query)
print(f”Response: {response}”)

Copy CodeCopiedUse a different Browserquery = “What is ₹1500000 worth if inflation is 9% for 10 years?”
agent_name, confidence = router.route_query(query)
print(f”Routing to {agent_name} with {confidence:.2f} confidence”)

# Get the selected agent and ask the question
agent = network.get_agent(agent_name)
response = agent.ask(query)
print(f”Response: {response}”)

Check out the Notebooks- inflation_agent.py, network.ipynb and emi_agent.py. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post How to Use python-A2A to Create and Connect Financial Agents with Google’s Agent-to-Agent (A2A) Protocol appeared first on MarkTechPost.

How Anomalo solves unstructured data quality issues to deliver trusted …

This post is co-written with Vicky Andonova and Jonathan Karon from Anomalo.
Generative AI has rapidly evolved from a novelty to a powerful driver of innovation. From summarizing complex legal documents to powering advanced chat-based assistants, AI capabilities are expanding at an increasing pace. While large language models (LLMs) continue to push new boundaries, quality data remains the deciding factor in achieving real-world impact.
A year ago, it seemed that the primary differentiator in generative AI applications would be who could afford to build or use the biggest model. But with recent breakthroughs in base model training costs (such as DeepSeek-R1) and continual price-performance improvements, powerful models are becoming a commodity. Success in generative AI is becoming less about building the right model and more about finding the right use case. As a result, the competitive edge is shifting toward data access and data quality.
In this environment, enterprises are poised to excel. They have a hidden goldmine of decades of unstructured text—everything from call transcripts and scanned reports to support tickets and social media logs. The challenge is how to use that data. Transforming unstructured files, maintaining compliance, and mitigating data quality issues all become critical hurdles when an organization moves from AI pilots to production deployments.
In this post, we explore how you can use Anomalo with Amazon Web Services (AWS) AI and machine learning (AI/ML) to profile, validate, and cleanse unstructured data collections to transform your data lake into a trusted source for production ready AI initiatives, as shown in the following figure.

The challenge: Analyzing unstructured enterprise documents at scale
Despite the widespread adoption of AI, many enterprise AI projects fail due to poor data quality and inadequate controls. Gartner predicts that 30% of generative AI projects will be abandoned in 2025. Even the most data-driven organizations have focused primarily on using structured data, leaving unstructured content underutilized and unmonitored in data lakes or file systems. Yet, over 80% of enterprise data is unstructured (according to MIT Sloan School research), spanning everything from legal contracts and financial filings to social media posts.
For chief information officers (CIOs), chief technical officers (CTOs), and chief information security officers (CISOs), unstructured data represents both risk and opportunity. Before you can use unstructured content in generative AI applications, you must address the following critical hurdles:

Extraction – Optical character recognition (OCR), parsing, and metadata generation can be unreliable if not automated and validated. In addition, if extraction is inconsistent or incomplete, it can result in malformed data.
Compliance and security – Handling personally identifiable information (PII) or proprietary intellectual property (IP) demands rigorous governance, especially with the EU AI Act, Colorado AI Act, General Data Protection Regulation (GDPR), California Consumer Privacy Act (CCPA), and similar regulations. Sensitive information can be difficult to identify in unstructured text, leading to inadvertent mishandling of that information.
Data quality – Incomplete, deprecated, duplicative, off-topic, or poorly written data can pollute your generative AI models and Retrieval Augmented Generation (RAG) context, yielding hallucinated, out-of-date, inappropriate, or misleading outputs. Making sure that your data is high-quality helps mitigate these risks.
Scalability and cost – Training or fine-tuning models on noisy data increases compute costs by unnecessarily growing the training dataset (training compute costs tend to grow linearly with dataset size), and processing and storing low-quality data in a vector database for RAG wastes processing and storage capacity.

In short, generative AI initiatives often falter—not because the underlying model is insufficient, but because the existing data pipeline isn’t designed to process unstructured data and still meet high-volume, high-quality ingestion and compliance requirements. Many companies are in the early stages of addressing these hurdles and are facing these problems in their existing processes:

Manual and time-consuming – The analysis of vast collections of unstructured documents relies on manual review by employees, creating time-consuming processes that delay projects.
Error-prone – Human review is susceptible to mistakes and inconsistencies, leading to inadvertent exclusion of critical data and inclusion of incorrect data.
Resource-intensive – The manual document review process requires significant staff time that could be better spent on higher-value business activities. Budgets can’t support the level of staffing needed to vet enterprise document collections.

Although existing document analysis processes provide valuable insights, they aren’t efficient or accurate enough to meet modern business needs for timely decision-making. Organizations need a solution that can process large volumes of unstructured data and help maintain compliance with regulations while protecting sensitive information.
The solution: An enterprise-grade approach to unstructured data quality
Anomalo uses a highly secure, scalable stack provided by AWS that you can use to detect, isolate, and address data quality problems in unstructured data–in minutes instead of weeks. This helps your data teams deliver high-value AI applications faster and with less risk. The architecture of Anomalo’s solution is shown in the following figure.

Automated ingestion and metadata extraction – Anomalo automates OCR and text parsing for PDF files, PowerPoint presentations, and Word documents stored in Amazon Simple Storage Service (Amazon S3) using auto scaling Amazon Elastic Cloud Compute (Amazon EC2) instances, Amazon Elastic Kubernetes Service (Amazon EKS), and Amazon Elastic Container Registry (Amazon ECR).
Continuous data observability – Anomalo inspects each batch of extracted data, detecting anomalies such as truncated text, empty fields, and duplicates before the data reaches your models. In the process, it monitors the health of your unstructured pipeline, flagging surges in faulty documents or unusual data drift (for example, new file formats, an unexpected number of additions or deletions, or changes in document size). With this information reviewed and reported by Anomalo, your engineers can spend less time manually combing through logs and more time optimizing AI features, while CISOs gain visibility into data-related risks.
Governance and compliance – Built-in issue detection and policy enforcement help mask or remove PII and abusive language. If a batch of scanned documents includes personal addresses or proprietary designs, it can be flagged for legal or security review—minimizing regulatory and reputational risk. You can use Anomalo to define custom issues and metadata to be extracted from documents to solve a broad range of governance and business needs.
Scalable AI on AWS – Anomalo uses Amazon Bedrock to give enterprises a choice of flexible, scalable LLMs for analyzing document quality. Anomalo’s modern architecture can be deployed as software as a service (SaaS) or through an Amazon Virtual Private Cloud (Amazon VPC) connection to meet your security and operational needs.
Trustworthy data for AI business applications – The validated data layer provided by Anomalo and AWS Glue helps make sure that only clean, approved content flows into your application.
Supports your generative AI architecture – Whether you use fine-tuning or continued pre-training on an LLM to create a subject matter expert, store content in a vector database for RAG, or experiment with other generative AI architectures, by making sure that your data is clean and validated, you improve application output, preserve brand trust, and mitigate business risks.

Impact
Using Anomalo and AWS AI/ML services for unstructured data provides these benefits:

Reduced operational burden – Anomalo’s off-the-shelf rules and evaluation engine save months of development time and ongoing maintenance, freeing time for designing new features instead of developing data quality rules.
Optimized costs – Training LLMs and ML models on low-quality data wastes precious GPU capacity, while vectorizing and storing that data for RAG increases overall operational costs, and both degrade application performance. Early data filtering cuts these hidden expenses.
Faster time to insights – Anomalo automatically classifies and labels unstructured text, giving data scientists rich data to spin up new generative prototypes or dashboards without time-consuming labeling prework.
Strengthened compliance and security – Identifying PII and adhering to data retention rules is built into the pipeline, supporting security policies and reducing the preparation needed for external audits.
Create durable value – The generative AI landscape continues to rapidly evolve. Although LLM and application architecture investments may depreciate quickly, trustworthy and curated data is a sure bet that won’t be wasted.

Conclusion
Generative AI has the potential to deliver massive value–Gartner estimates 15–20% revenue increase, 15% cost savings, and 22% productivity improvement. To achieve these results, your applications must be built on a foundation of trusted, complete, and timely data. By delivering a user-friendly, enterprise-scale solution for structured and unstructured data quality monitoring, Anomalo helps you deliver more AI projects to production faster while meeting both your user and governance requirements.
Interested in learning more? Check out Anomalo’s unstructured data quality solution and request a demo or contact us for an in-depth discussion on how to begin or scale your generative AI journey.

About the authors
Vicky Andonova is the GM of Generative AI at Anomalo, the company reinventing enterprise data quality. As a founding team member, Vicky has spent the past six years pioneering Anomalo’s machine learning initiatives, transforming advanced AI models into actionable insights that empower enterprises to trust their data. Currently, she leads a team that not only brings innovative generative AI products to market but is also building a first-in-class data quality monitoring solution specifically designed for unstructured data. Previously, at Instacart, Vicky built the company’s experimentation platform and led company-wide initiatives to grocery delivery quality. She holds a BE from Columbia University.
Jonathan Karon leads Partner Innovation at Anomalo. He works closely with companies across the data ecosystem to integrate data quality monitoring in key tools and workflows, helping enterprises achieve high-functioning data practices and leverage novel technologies faster. Prior to Anomalo, Jonathan created Mobile App Observability, Data Intelligence, and DevSecOps products at New Relic, and was Head of Product at a generative AI sales and customer success startup. He holds a BA in Cognitive Science from Hampshire College and has worked with AI and data exploration technology throughout his career.
Mahesh Biradar is a Senior Solutions Architect at AWS with a history in the IT and services industry. He helps SMBs in the US meet their business goals with cloud technology. He holds a Bachelor of Engineering from VJTI and is based in New York City (US)
Emad Tawfik is a seasoned Senior Solutions Architect at Amazon Web Services, boasting more than a decade of experience. His specialization lies in the realm of Storage and Cloud solutions, where he excels in crafting cost-effective and scalable architectures for customers.

An innovative financial services leader finds the right AI solution: R …

This post is cowritten with Renyu Chen and Dev Tagare from Robinhood.
Robinhood has been a pioneer and disruptor in the once staid world of online brokerages. Founded in 2013, the company transformed an industry better known for gatekeeping into an open platform accessible to all. Robinhood pioneered commission-free trading, and harnessed the power of technology and intuitive design to create a seamless and engaging experience for modern investors. To this day, the company continues to disrupt the financial services industry by launching groundbreaking product innovations on AWS.
Such innovations have made Robinhood one of the fastest growing brokerages in history, with more than 25 million customers worldwide and a global reputation as an innovator and technology leader. Fueled by its mission of “democratizing finance for all,” the company’s focus on accessibility, particularly for first-time investors, has kept Robinhood as one of the top finance apps on the Apple App Store for more than a decade and earned Robinhood accolades such as an award from Fast Company magazine as one of World’s 50 Most Innovative Companies. This annual ranking highlights companies that are reshaping industries and culture through innovation.
Robinhood’s Chief Executive Officer, Vlad Tenev, explains why this focus is important to Robinhood:

“Our belief is, the more we lower the barriers to entry, the more we level the playing field and allow people to invest their money at a younger age, the better off our economy will be and the better off society will be.”

Built to operate in the cloud, Robinhood uses AWS to power its online business, deliver and update its mobile trading app, securely store information and data, and perform business analytics. Robinhood recently used AI to improve customer experience and expand accessibility. For example, in 2025, the company will launch Robinhood Cortex, an AI investment tool that is designed to provide real-time insights to help users better navigate markets, identify potential opportunities, and stay up to date on the latest market moving news. Cortex is an exciting step forward, providing a level premium investment and market digests that have historically been reserved for institutional investors and wealthy individuals.
As Robinhood customers are able to do more on the platform, the company is working with AWS to explore new generative AI solutions such as Amazon Nova, a family of foundation models (FMs) that make generative AI development faster and more efficient, with exceptional price performance. These new solutions will help the company accommodate rapid expansion of customer requirements.
In this post, we share how Robinhood delivers democratized finance and real-time market insights using generative AI and Amazon Nova.
An AI/ML journey built on customer obsession
Robinhood, like all financial services firms, operates in a highly regulated environment. Historically, the industry was seen as slow-moving and wary of new technologies. Robinhood’s founders put technology at the forefront by initially building a no-frills, no-fee app that, by design, would make investing accessible to everyone, not just the very wealthy. As Robinhood grew, it attracted a wider variety of customers who need the speed, reliability, security, and low cost the platform offers, but who also want a richer set of services for different and novel use cases.
Robinhood listens closely to these active traders. As Renyu Chen, staff machine learning (ML) engineer at Robinhood, explains,

“We wanted to create a seamless journey for AI/ML applications to go from experimentation to Robinhood scale. We looked to the AWS team to help meet the AI/ML needs of our developers while providing advanced ML tooling to serve our most sophisticated ‘active trader’ customers. This would also require a plug-and-play approach that could adopt the latest generative AI technologies from open source, model providers, and home-grown platform tooling.”

Robinhood explored various generative AI solutions during 2023, concluding that the best way to get to Robinhood scale was with Amazon Bedrock, a fully managed service that helps users build generative AI models. Amazon Bedrock offers an extensive selection of FMs from various providers, and allows a high level of customization and security through a single API.
According to Robinhood’s Renyu Chen,

“For us, the security of our customers’ data comes first. Nothing is more important. With Amazon Bedrock, data stays under our control. When we query a model, the input and output never leave our virtual private cloud. When we fine-tune a foundation model, it is based on a private copy of that model. This means our customers’ data is not shared with model providers, and is not used to improve the base models.”

To meet the needs of Robinhood’s ever-growing base of power users, Robinhood is exploring Amazon Nova, estimating that the price per token using Amazon Nova can be up to 80% lower than other models they have tested, which would make it cost-effective to power new high-demand use cases such as a fraud investigation assistant, enhanced document processing, and AI-created content generation.
In addition, AWS generative AI solutions working through Amazon Nova can power new agentic workflows for Robinhood, in which autonomous AI agents can independently make decisions, adapt to changing situations, and execute actions.

“Robinhood offers its customers simplicity, speed, security, and cost savings. Working developer-to-developer with the Robinhood team and building together, we can design generative AI solutions that meet Robinhood’s priorities and customer-focused goals. For example, Amazon Nova models can be easily customized with Amazon Bedrock Model Distillation, which ‘distills’ knowledge from a larger, more capable ‘teacher’ model to a smaller, faster, and cost-efficient ‘student’ model. This solution can help Robinhood use models such as DeepSeek to explore exciting new use cases quickly, securely, and at a 75% lower cost than equivalent offerings from competitors.”
– Dushan Tharmal, Principal Product Manager, Amazon Artificial General Intelligence (AGI).

Amazon Nova: More services, greater value for Robinhood and its customers
Working with AWS on its ambitious AI journey, Robinhood is able to rapidly scale new services for customers without needing the costly structures, staff, and infrastructure found at traditional brokerages. With support from AWS, Robinhood is able to offer a richer customer experience while remaining true to its mission of simplicity, clarity, low cost, speed, security, and reliability.

“We see that Amazon Nova can be a great match for our mission. Amazon Nova offers the lowest latency responses at very low cost, and is accurate and lightning-fast across a wide range of interactive and high-volume Robinhood applications. And, consistent with Robinhood’s commitment to simplicity and low cost for its customers, using Amazon Nova models through Amazon Bedrock makes these large-scale tasks significantly easier, cheaper, and more cost-effective.”
– Dev Tagare, Robinhood’s head of AI.

Learn more about Amazon Nova and how it can deliver frontier intelligence and industry leading price-performance for your organization.

About the authors
Renyu Chen is a Staff AI Engineer at Robinhood Markets
Dev Tagare is the Head of AI at Robinhood Markets
Uchenna Egbe is a GenAI Solutions Architect at AWS FSI,
Trevor Spires is a GenAI Solutions Architect at AWS FinTech.

Build conversational interfaces for structured data using Amazon Bedro …

Organizations manage extensive structured data in databases and data warehouses. Large language models (LLMs) have transformed natural language processing (NLP), yet converting conversational queries into structured data analysis remains complex. Data analysts must translate business questions into SQL queries, creating workflow bottlenecks.
Amazon Bedrock Knowledge Bases enables direct natural language interactions with structured data sources. The system interprets database schemas and context, converting natural language questions into accurate queries while maintaining data reliability standards. You can chat with your structured data by setting up structured data ingestion from AWS Glue Data Catalog tables and Amazon Redshift clusters in a few steps, using the power of Amazon Bedrock Knowledge Bases structured data retrieval.
This post provides instructions to configure a structured data retrieval solution, with practical code examples and templates. It covers implementation samples and additional considerations, empowering you to quickly build and scale your conversational data interfaces. Through clear examples and proven methodologies, organizations can transform their data access capabilities and accelerate decision-making processes.
Solution overview
The solution demonstrates how to build a conversational application using Amazon Bedrock Knowledge Bases structured data retrieval. Developers often face challenges integrating structured data into generative AI applications. This includes difficulties training LLMs to convert natural language queries to SQL queries based on complex database schemas, as well as making sure appropriate data governance and security controls are in place. Amazon Bedrock Knowledge Bases alleviates these complexities by providing a managed natural language to SQL (NL2SQL) module. Amazon Bedrock Knowledge Bases offers an end-to-end managed workflow for you to build custom generative AI applications that can access and incorporate contextual information from a variety of structured and unstructured data sources. Using advanced NLP, Amazon Bedrock Knowledge Bases can transform natural language queries into SQL queries, so you can retrieve data directly from the source without the need to move or preprocess the data.
This solution includes Amazon Bedrock Knowledge Bases, Amazon Redshift, AWS Glue, and Amazon Simple Storage Service (Amazon S3). The solution architecture consists of two parts: a data ingestion pipeline, and a structured data retrieval application using Amazon Bedrock Knowledge Bases.
Amazon Bedrock Knowledge Bases structured data retrieval supports Amazon Redshift as the query engine and multiple data ingestion options. The data ingestion pipeline is a one-time setup, and supports multiple ingestion options. In this post, we discuss a common data ingestion use case using Amazon S3, AWS Glue, and Amazon Redshift.
You can configure Amazon Bedrock Knowledge Bases structured data retrieval to retrieve data from AWS Glue databases and S3 datasets. This setup uses automatic mounting of the Data Catalog in Amazon Redshift. With this ingestion option, you can seamlessly integrate existing S3 datasets and Data Catalog tables into your Retrieval Augmented Generation (RAG) applications with the access permissions configured through Lake Formation. The following diagram illustrates this pipeline.

The following screenshot shows the configuration options on the Amazon Bedrock console.

After the data ingestion is configured and the knowledge bases data source sync job is complete, users can ask natural language questions, and Amazon Bedrock Knowledge Bases will generate the SQL, execute the SQL against the query engine, and process it through the LLM to provide a user-friendly response. The following diagram illustrates a sample architecture of the structured data retrieval workflow.

The data retrieval workflow consists of the following steps:

In a RAG application, the user can ask a natural language data analytics question through the chat interface, such as “What is the sales revenue for the Month of February 2025?”
The natural language query is sent to Amazon Bedrock Knowledge Bases for data retrieval and processing.
Amazon Bedrock Knowledge Bases generates a SQL query based on the underlying data schema configured during the knowledge base creation.
The SQL query is executed against the query engine (Amazon Redshift) to retrieve data from a structured data store (AWS Glue tables). The query can include multiple joins and aggregation.
The generated SQL response is sent to an LLM along with additional context to generate a response in natural language.
The response is sent back to the user. The user can ask follow-up questions based on the retrieved response, such as “What is the product that generated highest revenue in this period?”

Amazon Bedrock Knowledge Bases structured data retrieval supports three different APIs to meet your data retrieval requirements:

Retrieval and response generation – The retrieval and response generation API, similar to the solution workflow we’ve discussed, generates a SQL query, retrieves data through the query engine, and processes it through the LLM to generate a natural language response
Retrieval only – The retrieval only API generates a SQL query, retrieves data through the query engine, and returns the data without processing it through an LLM
Generate SQL queries – The generate SQL query API returns the raw SQL query that was generated by Amazon Bedrock Knowledge Bases, which can be used for review and further processing by applications

The following screenshot shows the configuration options on the Amazon Bedrock console.

Code resources and templates
The solution uses the following notebooks:

Data ingestion notebook – Structured-rag-s3-glue-ingestion includes the step-by-step guide to ingest an open dataset to Amazon S3, configure AWS Glue tables using crawlers, and set up the Amazon Redshift Serverless query engine.
Structured data retrieval notebook – Structured-rag-s3-glue-retrieval walks through the implementation steps and provides sample code for configuring Amazon Bedrock Knowledge Bases structured data retrieval using Amazon S3, AWS Glue, and the Amazon Redshift query engine.

For more details, refer to the GitHub repo.
Prerequisites
To implement the solution provided in this post, you must have an AWS account. Additionally, access to the required foundation models must be enabled in Amazon Bedrock.
Set up the data ingestion pipeline
To set up the data ingestion pipeline, we load the sample dataset in an S3 bucket and configure AWS Glue as data storage and a Redshift Serverless workgroup as the query engine. Complete the following steps in data ingestion notebook:

For data ingestion, download the following sample ecommerce dataset, convert it to a pandas data frame, and upload it to an S3 bucket using Amazon SageMaker Data Wrangler.
Create an AWS Glue database and table using an AWS Glue crawler by crawling the source S3 bucket with the dataset. You can update this step to crawl your own S3 bucket or use your existing Data Catalog tables as storage metadata.
Use the data ingestion notebook to create a Redshift Serverless namespace and workgroup in the default VPC. If you plan to use your own Redshift Serverless workgroup or Amazon Redshift provisioned cluster, you can skip this step.

Set up the structured data retrieval solution
In this section, we detail the steps to set up the structured data retrieval component of the solution.
Amazon Bedrock Knowledge Bases supports multiple data access patterns, including AWS Identity and Access Management (IAM), AWS Secrets Manager, and database users. For this post, we demonstrate the setup option with IAM access. You can use IAM access with the Redshift Serverless workgroup configured as part of the ingestion workflow or an existing Redshift Serverless or provisioned cluster to compete these steps.
Complete the following steps in structured data retrieval notebook:

Create an execution role with the necessary policies for accessing data from Amazon Redshift, AWS Glue, and the S3 bucket.
Invoke the CreateKnowledgeBase API to create the knowledge base with the execution role and knowledge base configurations. In the knowledge base configuration, the AWS Glue database and tables are used as storage metadata with Amazon Redshift as the query engine.
After you create the knowledge base, you must complete additional steps to make sure the IAM execution role has the necessary permissions to execute the query in Amazon Redshift and retrieve data from AWS Glue. The notebook includes the necessary instructions to create and grant database access to the execution role, and grant AWS Lake Formation permissions.
The ingestion job will sync the data store schema metadata about AWS Glue database and tables with the NL2SQL module. This schema metadata will be used while generating the SQL query during structured data retrieval.
After the knowledge base sync job is complete, you can use the three data retrieval APIs – retrieve and generate response, retrieval only, and generate SQL query – to query and validate the structured data retrieval solution.

For more details, refer to Create a knowledge base by connecting to a structured data store.
Clean up
We have included cleanup instructions in both the data ingestion and structured data retrieval notebooks to clean up resources after the end-to-end solution is implemented and validated.
Conclusion
Amazon Bedrock Knowledge Bases simplifies data analysis by converting natural language questions into SQL queries, eliminating the need for specialized database expertise. The service integrates with Amazon Redshift, AWS Glue, and Amazon S3, allowing business analysts, data scientists, and operations teams to query data directly using conversation-like questions. It maintains data security through built-in governance controls and access permissions. Customers can deploy this managed service to enable users to analyze data using natural language questions, while maintaining data integrity and security standards.
To learn more, refer to Build a knowledge base by connecting to a structured data store and Amazon Bedrock Knowledge Bases now supports structured data retrieval.

About the authors
George Belsian is a Senior Cloud Application Architect at Amazon Web Services, helping organizations navigate the complexities of cloud adoption, AI integration, and data-driven innovation. By transforming legacy systems into cloud-based platforms and incorporating AI/ML capabilities, he helps businesses create new opportunities for growth, optimize their processes, and deliver scalable solutions.
Sandeep Singh is a Senior Generative AI Data Scientist at Amazon Web Services, helping businesses innovate with generative AI. He specializes in generative AI, machine learning, and system design. He has successfully delivered state-of-the-art AI/ML-powered solutions to solve complex business problems for diverse industries, optimizing efficiency and scalability.
Mani Khanuja is a Principal Generative AI Specialist SA and author of the book Applied Machine Learning and High-Performance Computing on AWS. She leads machine learning projects in various domains such as computer vision, natural language processing, and generative AI. She speaks at internal and external conferences such AWS re:Invent, Women in Manufacturing West, YouTube webinars, and GHC 23. In her free time, she likes to go for long runs along the beach.
Gopikrishnan Anilkumar is a Principal Technical Product Manager in AWS Agentic AI organization. He has over 10 years of product management experience across a variety of domains and is passionate about AI/ML.

Innovate business logic by implementing return of control in Amazon Be …

In the context of distributed systems and microservices architecture, orchestrating communication between diverse components presents significant challenges. However, with the launch of Amazon Bedrock Agents, the landscape is evolving, offering a simplified approach to agent creation and seamless integration of the return of control capability. In this post, we explore how Amazon Bedrock Agents revolutionizes agent creation and demonstrates the efficacy of the return of control capability in orchestrating complex interactions between multiple systems.
Amazon Bedrock Agents simplifies the creation, deployment, and management of agents in distributed systems. By using the power of AWS Lambda and AWS Step Functions, Amazon Bedrock Agents abstracts away the complexities of agent implementation, which means developers can focus on building robust and scalable applications without worrying about infrastructure management.
You can use agents in Amazon Bedrock in various scenarios where you need to handle the return of control to the user or the system. Use cases include conversational assistants, task automation, decision support systems, interactive tutorials and walkthroughs, and virtual assistants. In these use cases, the key aspect of the agents is their ability to handle the return of control to the user or the system. This allows for a more natural and responsive interaction, where the user feels in control of the process while still benefiting from the agent’s guidance and automation capabilities.
Solution overview
In this post, we demonstrate an automated personalized investment portfolio solution using Amazon Bedrock Agents. The solution calls a third-party API to fetch a user’s current investment portfolio. These are then analyzed using foundation models (FMs) available on Amazon Bedrock to produce recommendations inline to the inputs provided by the end user, showcasing a return of control capability integrated with Amazon Bedrock Agents.
This solution uses a combination of synchronous data retrieval and generative AI to provide tailored investment recommendations that align with users’ specific financial goals and risk tolerance. By incorporating machine learning (ML) and simulation techniques, the system can generate personalized portfolios and assess their potential performance, making sure the recommended solutions are optimized for individual needs.
With Amazon Bedrock Agents, the capability to return control to the application invoking the agent can handle external functions and business logic at the application level instead of using a Lambda function. This way, an application can manage external interactions and return the response while the agent continues its orchestration. This is illustrated in the following diagram.

The option to return control is particularly useful in two main scenarios:

Calling an API from an existing application rather than building a new Lambda function with the required authentication and networking configurations
Handling tasks that might run longer than 15 minutes and can’t be accommodated through a Lambda function, instead requiring containers, virtual servers, or workflow orchestration tools such as AWS Step Functions

The following sample code uses Amazon Bedrock Agents with handling return of control in the code. With the Amazon Bedrock Agents feature, you can manage Amazon Bedrock Agents return of control in your backend services and simplify application integrations. To demonstrate this, we have the following four code snippets: external-bedrock-agent-api.py, streamlit-app-portfolio-recommender.py, Portfolio-Recommender-CFN-Template.yaml, and requirements.txt, along with detailed steps to replicate the scenario.
The external-bedrock-agent-api code implements a portfolio recommendation system using Amazon Bedrock Agents and Flask. Here’s a high-level overview of the functions used:

fetch_user_data: Processes user profile information such as risk tolerance or investment goals
generate_portfolios: Creates sample investment portfolios with different risk levels
fetch_custom_portfolio: Combines user data and portfolio generation
send_custom_portfolio_as_email: Sends portfolio recommendations by email using an Amazon Simple Email Service (Amazon SES) verified email identity
/sns-handler endpoint: This API endpoint receives POST requests with user investment preferences, processes the message containing user preference details, invokes the Amazon Bedrock agent to generate recommendations, and handles email communication of the recommendations

The streamlit-app-portfolio-recommender code is a Streamlit web application for investment portfolio recommendations. The code sets up the webpage with a title and configuration. The app collects several pieces of information through form elements:

Email address – Text input
Financial goal – Dropdown with options for retirement, wealth accumulation, and passive income
Risk tolerance – Dropdown with options for low, medium, and high
Investment horizon – Dropdown with options for short-term and long-term
Environmental, social, and governance (ESG) preference – Checkbox for environmental, social, and governance preferences
Email preference – Checkbox for receiving recommendations by email

The system operates through a Portfolio Generation Function that actively sending POST requests to a local API endpoint. This function transforms user preferences into JSON data and delivers either an API response or error message back to the user.
The process to display results begins when user click the Submit button, which triggers the custom_portfolio function with their specific inputs. The system then displays the portfolio recommendation in a text area for successful executions, while immediately alerting users with an error message if any issues occur during the process.
Solution walkthrough
Follow the steps to set up the environment and test the application in the US East (N. Virginia) us-east-1 Region.
To enable Anthropic’s Claude model on Amazon Bedrock in your AWS account:

On the Amazon Bedrock console, in the left navigation pane under Amazon Bedrock configurations, select Model access
Select Claude 3 Sonnet, as shown in the following screenshot

To create the Amazon Bedrock agents, related action groups, Amazon SageMaker AI domain, sample user profile, and JupyterLab space, follow these steps:

Invoke the AWS CloudFormation template at Portfolio-Recommender-CloudFormation-Template.yml
Give a name to the stack
Provide an email address for the EmailIdentityParameter

Select the checkbox to acknowledge that the template contains AWS Identity and Access Management (IAM) resources, as shown in the following screenshot

Monitor AWS CloudFormation until it completes the resource creation process. You can verify the successful deployment by checking the Stack details output tab, which will display the AgentId and AgentAliasId values, as shown in the screenshot below.

You will receive an email address verification request email from AWS for in the US East (N. Virginia) Region. Select the link in the email to verify.
After creating your CloudFormation resources, follow these steps to access Amazon SageMaker Studio:

On the Amazon SageMaker AI console, under Admin configurations in the left navigation pane, select Domains
Select the bedrock-return-of-control-demo domain created by the CloudFormation template, as shown in the following screenshot

Select the User profiles tab
To open the SageMaker Studio environment, under User profiles, next to the sagemakeruser profile on the right, select Launch. From the dropdown menu, choose Studio, as shown in the following screenshot

You should now observe the SageMaker Studio home page. This environment is where you will execute Python scripts to set up your application.
To access the JupyterLab environment for this lab, follow these steps:

On the SageMaker Studio console, in the left navigation pane under Applications, select JupyterLab
You’ll find bedrock-agent-space that has been preprovisioned for this lab. Its Status should be Stopped. On the right side under Action, choose Run
Within 30–40 seconds, the JupyterLab application status will change from Starting to Running

When it’s running, under Action, choose Open, as shown in the following screenshot

Three required files are copied under the /home/sagemaker-user/scripts directory: two Python files (external-bedrock-agent-api and streamlit-app-portfolio-recommender) and one requirements.txt file, as shown in the following screenshot. The JupyterLab application environment is under the default directory.

In the File menu, select New. In the dropdown menu, select Terminal to open a new terminal window, as shown in the following screenshot.
Go to the scripts directory where you have the required files in the terminal and enter:

pip install -r requirements.txt

Enter the following command on the terminal:

python3 external-bedrock-agent-api.py

Open a new terminal and go to the /home/sagemaker-user/scripts directory and enter:

streamlit run streamlit-app-portfolio-recommender.py

From the command execution in the terminal, note the port number (8501) and studio URL from the browser. The URL will be in the format of: https://{domainid}.studio.{region}-1.sagemaker.aws/jupyterlab/default/lab/tree/scripts
To access the Streamlit app, modify the Studio URL, replacing everything after the default/ lab/tree/scripts with proxy/[PORT NUMBER]/. The modified Streamlit UI URL will look like this: https://{domainid}.studio.{region}.sagemaker.aws/jupyterlab/default/proxy/8501/
Select all appropriate inputs for generating your custom portfolio recommendation. Choose whether you prefer to receive email notifications or inline recommendations through the application interface by checking the corresponding box. Then choose Submit. Provide the same email address that was verified earlier in this walkthrough.

The sample output and email response are shown in the following demo screenshot.

Cleanup
When you’re done, delete resources you no longer need to avoid ongoing costs. Follow these steps:

Go to the SageMaker AI JupyterLab environment and stop the Amazon SageMaker Studio application or running instance
Delete the resources created by deleting the CloudFormation stack.

The following screenshot demonstrates how to view and stop running instances in the SageMaker AI JupyterLab environment. For more information, refer to Delete a stack from the CloudFormation console.

Amazon Bedrock Agents return of control considerations
When implementing return of control, consider the following:

Return of control performance considerations – When implementing return of control, developers should focus on optimizing action execution times and response handling. Each action should be designed to complete within reasonable timeframes to maintain conversation flow. Consider implementing caching mechanisms for frequently accessed data and facilitate efficient state management between return of control cycles. The application should be designed to handle concurrent user sessions effectively while maintaining responsiveness.
Return of control limitations – Actions must be defined with clear input and output schemas. Each action should be atomic and focused on a specific task to maintain simplicity and reliability. Consider payload sizes for requests and responses because there might be size limitations. Actions execute sequentially, and the system needs to maintain conversation context throughout the interaction cycle.
Security recommendations – Security implementation requires proper authentication and authorization mechanisms for all actions, following the principle of least privilege when defining permissions. Input parameters must be validated before processing, with comprehensive error handling in place. Rate limiting and request validation should be implemented to prevent abuse, and sensitive data handling must comply with security requirements and include proper logging mechanisms for audit trails. Additionally, implement input filtering to prevent prompt injection attacks, configure response filters to protect sensitive information, and set up content scanning for both input and output. Deploy regex-based response filtering to help prevent personally identifiable information (PII) exposure and establish content moderation filters to block inappropriate content.
Monitoring and observability – Implement comprehensive logging for all action executions and responses. Monitor key metrics such as action execution times, success rates, and error rates. Set up alerts for abnormal patterns or failures. Use Amazon CloudWatch for monitoring system health and performance. Consider implementing tracing to track request flow through different components of your system. Regular review of metrics and logs helps identify potential issues and optimization opportunities.

Conclusion
In this post, we’ve demonstrated how Amazon Bedrock Agents simplifies agent creation and streamlines the orchestration of complex interactions between microservices using the return of control capability. By abstracting away infrastructure management and providing seamless integration with your application, Amazon Bedrock Agents empowers developers to build resilient and scalable applications with ease. As organizations embrace microservices architecture and distributed systems, tools such as Amazon Bedrock Agents play a pivotal role in accelerating innovation and driving digital transformation.
Resources
For the most current and specific information, refer to:

Amazon Bedrock documentation
AWS Well-Architected Framework best practices
AWS Security best practices
AWS observability best practices

About the Authors
Vishwanatha Handadi is a Sr. Solutions Architect within the Global Financial Services vertical, working with Amazon Web Services (AWS) for over 2 years and has over 22 years of experience in the IT industry primarily in data and analytics. At AWS, he drives customers through their cloud transformation journeys by converting complex challenges into actionable roadmaps for both technical and business audiences. He is based out of Bangalore, India.
Mohammed Asadulla Baig is a Sr. Technical Account Manager with Amazon Web Services (AWS) Enterprise Support. Asad helps customers architect scalable, resilient, and secure solutions. With a keen eye for innovation and a passion for delivering customer success, Asad has established himself as a thought leader in the industry, helping enterprises navigate their cloud transformation journeys with confidence and ease.

OThink-R1: A Dual-Mode Reasoning Framework to Cut Redundant Computatio …

The Inefficiency of Static Chain-of-Thought Reasoning in LRMs

Recent LRMs achieve top performance by using detailed CoT reasoning to solve complex tasks. However, many simple tasks they handle could be solved by smaller models with fewer tokens, making such elaborate reasoning unnecessary. This echoes human thinking, where we use fast, intuitive responses for easy problems and slower, analytical thinking for complex ones. While LRMs mimic slow, logical reasoning, they generate significantly longer outputs, thereby increasing computational cost. Current methods for reducing reasoning steps lack flexibility, limiting models to a single fixed reasoning style. There is a growing need for adaptive reasoning that adjusts effort according to task difficulty. 

Limitations of Existing Training-Based and Training-Free Approaches

Recent research on improving reasoning efficiency in LRMs can be categorized into two main areas: training-based and training-free methods. Training strategies often use reinforcement learning or fine-tuning to limit token usage or adjust reasoning depth, but they tend to follow fixed patterns without flexibility. Training-free approaches utilize prompt engineering or pattern detection to shorten outputs during inference; however, they also lack adaptability. More recent work focuses on variable-length reasoning, where models adjust reasoning depth based on task complexity. Others study “overthinking,” where models over-reason unnecessarily. However, few methods enable dynamic switching between quick and thorough reasoning—something this paper addresses directly. 

Introducing OThink-R1: Dynamic Fast/Slow Reasoning Framework

Researchers from Zhejiang University and OPPO have developed OThink-R1, a new approach that enables LRMs to switch between fast and slow thinking smartly, much like humans do. By analyzing reasoning patterns, they identified which steps are essential and which are redundant. With help from another model acting as a judge, they trained LRMs to adapt their reasoning style based on task complexity. Their method reduces unnecessary reasoning by over 23% without losing accuracy. Using a loss function and fine-tuned datasets, OThink-R1 outperforms previous models in both efficiency and performance on various math and question-answering tasks. 

System Architecture: Reasoning Pruning and Dual-Reference Optimization

The OThink-R1 framework helps LRMs dynamically switch between fast and slow thinking. First, it identifies when LRMs include unnecessary reasoning, like overexplaining or double-checking, versus when detailed steps are truly essential. Using this, it builds a curated training dataset by pruning redundant reasoning and retaining valuable logic. Then, during fine-tuning, a special loss function balances both reasoning styles. This dual-reference loss compares the model’s outputs with both fast and slow thinking variants, encouraging flexibility. As a result, OThink-R1 can adaptively choose the most efficient reasoning path for each problem while preserving accuracy and logical depth. 

Empirical Evaluation and Comparative Performance

The OThink-R1 model was tested on simpler QA and math tasks to evaluate its ability to switch between fast and slow reasoning. Using datasets like OpenBookQA, CommonsenseQA, ASDIV, and GSM8K, the model demonstrated strong performance, generating fewer tokens while maintaining or improving accuracy. Compared to baselines such as NoThinking and DualFormer, OThink-R1 demonstrated a better balance between efficiency and effectiveness. Ablation studies confirmed the importance of pruning, KL constraints, and LLM-Judge in achieving optimal results. A case study illustrated that unnecessary reasoning can lead to overthinking and reduced accuracy, highlighting OThink-R1’s strength in adaptive reasoning. 

Conclusion: Towards Scalable and Efficient Hybrid Reasoning Systems

In conclusion, OThink-R1 is a large reasoning model that adaptively switches between fast and slow thinking modes to improve both efficiency and performance. It addresses the issue of unnecessarily complex reasoning in large models by analyzing and classifying reasoning steps as either essential or redundant. By pruning the redundant ones while maintaining logical accuracy, OThink-R1 reduces unnecessary computation. It also introduces a dual-reference KL-divergence loss to strengthen hybrid reasoning. Tested on math and QA tasks, it cuts down reasoning redundancy by 23% without sacrificing accuracy, showing promise for building more adaptive, scalable, and efficient AI reasoning systems in the future. 

Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post OThink-R1: A Dual-Mode Reasoning Framework to Cut Redundant Computation in LLMs appeared first on MarkTechPost.

Building AI-Powered Applications Using the Plan → Files → Code Wor …

In this tutorial, we introduce TinyDev class implementation, a minimal yet powerful AI code generation tool that utilizes the Gemini API to transform simple app ideas into comprehensive, structured applications. Designed to run effortlessly in Notebook, TinyDev follows a clean three-phase workflow—Plan → Files → Code—to ensure consistency, functionality, and modular design. Whether building a web interface, a Python backend, or a utility script, TinyDev allows users to describe their project in natural language & receive ready-to-run code files, automatically generated and saved in an organized directory. This makes it an ideal starting point for rapid prototyping or learning how AI can assist in development tasks.

Copy CodeCopiedUse a different Browserimport google.generativeai as genai
import os
import json
import re
from pathlib import Path
from typing import List, Dict

We begin by importing essential libraries required for the TinyDev code generator. google.generativeai is used to interact with the Gemini API, while standard libraries like os, json, and re support file handling and text processing. Path and type hints from typing ensure clean file operations and better code readability.

Copy CodeCopiedUse a different Browserclass TinyDev:
“””
TinyDev: A lightweight AI code generator inspired by smol-dev
Uses Gemini API to generate complete applications from simple prompts
Follows the proven three-phase workflow: Plan → Files → Code
“””

def __init__(self, api_key: str, model: str = “gemini-1.5-flash”):
genai.configure(api_key=api_key)
self.model = genai.GenerativeModel(model)
self.generation_config = {
‘temperature’: 0.1,
‘top_p’: 0.8,
‘max_output_tokens’: 8192,
}

def plan(self, prompt: str) -> str:
“””
Phase 1: Generate project plan and shared dependencies
Creates the foundation for consistent code generation
“””
planning_prompt = f”””As an AI developer, you’re building a tool that automatically generates code tailored to the user’s needs.

the program you are writing is based on the following description:
{prompt}

the files we write will be generated by a python script. the goal is for us to all work together to write a program that will write the code for the user.

since we are working together, we need to understand what our shared dependencies are. this includes:
– import statements we all need to use
– variable names that are shared between files
– functions that are called from one file to another
– any other shared state

this is the most critical part of the process, if we don’t get this right, the generated code will not work properly.

please output a markdown file called shared_dependencies.md that lists all of the shared dependencies.

the dependencies should be organized as:
1. shared variables (globals, constants)
2. shared functions (function signatures)
3. shared classes (class names and key methods)
4. shared imports (modules to import)
5. shared DOM element ids (if web project)
6. shared file paths/names

be EXHAUSTIVE in your analysis. every file must be able to import or reference these shared items.”””

response = self.model.generate_content(
planning_prompt,
generation_config=self.generation_config
)
return response.text

def specify_file_paths(self, prompt: str, shared_deps: str) -> List[str]:
“””
Phase 2: Determine what files need to be created
“””
files_prompt = f”””As an AI developer, you’re building a tool that automatically generates code tailored to the user’s needs.

the program:
{prompt}

the shared dependencies:
{shared_deps}

Based on the program description and shared dependencies, return a JSON array of the filenames that should be written.

Only return the JSON array, nothing else. The JSON should be an array of strings representing file paths.

For example, for a simple web app you might return:
[“index.html”, “styles.css”, “script.js”]

For a Python project you might return:
[“main.py”, “utils.py”, “config.py”, “requirements.txt”]

JSON array:”””

response = self.model.generate_content(
files_prompt,
generation_config=self.generation_config
)

try:
json_match = re.search(r'[.*?]’, response.text, re.DOTALL)
if json_match:
files = json.loads(json_match.group())
return [f for f in files if isinstance(f, str)]
else:
lines = [line.strip() for line in response.text.split(‘n’) if line.strip()]
files = []
for line in lines:
if ‘.’ in line and not line.startswith(‘#’):
file = re.sub(r'[^w-_./]’, ”, line)
if file:
files.append(file)
return files[:10]
except Exception as e:
print(f”Error parsing files: {e}”)
return [“main.py”, “README.md”]

def generate_code_sync(self, prompt: str, shared_deps: str, filename: str) -> str:
“””
Phase 3: Generate code for individual files
“””
code_prompt = f”””As an AI developer, you’re building a tool that automatically generates code tailored to the user’s needs..

the program:
{prompt}

the shared dependencies:
{shared_deps}

Please write the file {filename}.

Remember that your job is to write the code for {filename} ONLY. Do not write any other files.

the code should be fully functional. meaning:
– all imports should be correct
– all variable references should be correct
– all function calls should be correct
– the code should be syntactically correct
– the code should be logically correct

Make sure to implement every part of the functionality described in the program description.

DO NOT include “` code fences in your response. Return only the raw code.

Here is the code for {filename}:”””

response = self.model.generate_content(
code_prompt,
generation_config=self.generation_config
)

code = response.text
code = re.sub(r’^“`[w]*n’, ”, code, flags=re.MULTILINE)
code = re.sub(r’n“`$’, ”, code, flags=re.MULTILINE)

return code.strip()

def create_app(self, prompt: str, output_dir: str = “/content/generated_app”) -> Dict:
“””
Main workflow: Transform a simple prompt into a complete application
“””
print(f” TinyDev workflow starting…”)
print(f” Prompt: {prompt}”)

print(“n Step 1: Planning shared dependencies…”)
shared_deps = self.plan(prompt)
print(” Dependencies planned”)

print(“n Step 2: Determining file structure…”)
file_paths = self.specify_file_paths(prompt, shared_deps)
print(f” Files to generate: {file_paths}”)

Path(output_dir).mkdir(parents=True, exist_ok=True)

print(f”n Step 3: Generating {len(file_paths)} files…”)
results = {
‘prompt’: prompt,
‘shared_deps’: shared_deps,
‘files’: {},
‘output_dir’: output_dir
}

with open(Path(output_dir) / “shared_dependencies.md”, ‘w’) as f:
f.write(shared_deps)

for filename in file_paths:
print(f” Generating {filename}…”)
try:
code = self.generate_code_sync(prompt, shared_deps, filename)

file_path = Path(output_dir) / filename
file_path.parent.mkdir(parents=True, exist_ok=True)

with open(file_path, ‘w’, encoding=’utf-8′) as f:
f.write(code)

results[‘files’][filename] = code
print(f” {filename} created ({len(code)} chars)”)

except Exception as e:
print(f” Error generating {filename}: {e}”)
results[‘files’][filename] = f”# Error: {e}”

readme = f”””# Generated by TinyDev (Gemini-Powered)

## Original Prompt
{prompt}

## Generated Files
{chr(10).join(f’- {f}’ for f in file_paths)}

## About TinyDev
TinyDev is inspired by smol-ai/developer but uses free Gemini API.
It follows the proven three-phase workflow: Plan → Files → Code

## Usage
Check individual files for specific usage instructions.

Generated on: {os.popen(‘date’).read().strip()}
“””

with open(Path(output_dir) / “README.md”, ‘w’) as f:
f.write(readme)

print(f”n Complete! Generated {len(results[‘files’])} files in {output_dir}”)
return results

The TinyDev class encapsulates the full logic of an AI-powered code generator using the Gemini API. It implements a structured three-phase workflow: first, it analyzes the user prompt to generate shared dependencies (plan); next, it identifies which files are needed for the application (specify_file_paths); and finally, it generates functional code for each file individually (generate_code_sync). The create_app method brings everything together by orchestrating the full app generation pipeline and saving the results, including code files and a detailed README, into a specified output directory, offering a complete, ready-to-use application scaffold from a single prompt.

Copy CodeCopiedUse a different Browserdef demo_tinydev():
“””Demo the TinyDev code generator”””

api_key = “Use Your API Key here”

if api_key == “YOUR_GEMINI_API_KEY_HERE”:
print(” Please set your Gemini API key!”)
print(“Get one free at: https://makersuite.google.com/app/apikey”)
return None

tiny_dev = TinyDev(api_key)

demo_prompts = [
“a simple HTML/JS/CSS tic tac toe game”,
“a Python web scraper that gets the latest news from multiple sources”,
“a responsive landing page for a local coffee shop with contact form”,
“a Flask REST API for managing a todo list”,
“a JavaScript calculator with a modern UI”
]

print(” TinyDev – AI Code Generator”)
print(“=” * 50)
print(“Inspired by smol-ai/developer, powered by Gemini API”)
print(f”Available demo projects:”)

for i, prompt in enumerate(demo_prompts, 1):
print(f”{i}. {prompt}”)

demo_prompt = demo_prompts[0]
print(f”n Running demo: {demo_prompt}”)

try:
results = tiny_dev.create_app(demo_prompt)

print(f”n Results Summary:”)
print(f” Prompt: {results[‘prompt’]}”)
print(f” Output: {results[‘output_dir’]}”)
print(f” Files: {len(results[‘files’])}”)

print(f”n Generated Files:”)
for filename in results[‘files’].keys():
print(f” – {filename}”)

if results[‘files’]:
preview_file = list(results[‘files’].keys())[0]
preview_code = results[‘files’][preview_file]
print(f”n Preview of {preview_file}:”)
print(“-” * 40)
print(preview_code[:400] + “…” if len(preview_code) > 400 else preview_code)
print(“-” * 40)

print(f”n This uses the same proven workflow as smol-ai/developer!”)
print(f” Check {results[‘output_dir’]} for all generated files”)

return results

except Exception as e:
print(f” Demo failed: {e}”)
return None

The demo_tinydev() function showcases TinyDev’s capabilities by running a predefined demo using one of several sample prompts, such as generating a Tic Tac Toe game or a Python news scraper. It initializes the TinyDev class with a Gemini API key, selects the first prompt from a list of project ideas, and guides the user through the full code generation pipeline, including planning shared dependencies, defining file structure, and generating code. After execution, it summarizes the output, previews a sample file, and points to the directory where the complete app has been saved.

Copy CodeCopiedUse a different Browserdef interactive_tinydev():
“””Interactive version where you can try your own prompts”””
api_key = input(” Enter your Gemini API key: “).strip()

if not api_key:
print(” API key required!”)
return

tiny_dev = TinyDev(api_key)

print(“n Interactive TinyDev Mode”)
print(“Type your app ideas and watch them come to life!”)

while True:
prompt = input(“n Describe your app (or ‘quit’): “).strip()

if prompt.lower() in [‘quit’, ‘exit’, ‘q’]:
print(” Goodbye!”)
break

if prompt:
try:
results = tiny_dev.create_app(prompt, f”/content/app_{hash(prompt) % 10000}”)
print(f” Success! Check {results[‘output_dir’]}”)
except Exception as e:
print(f” Error: {e}”)

print(” TinyDev – AI Code Generator Ready!”)
print(“Inspired by smol-ai/developer, powered by free Gemini API”)
print(“nTo run demo: demo_tinydev()”)
print(“To try interactive mode: interactive_tinydev()”)

The interactive_tinydev() function allows users to generate applications from their custom prompts in real time. After entering a valid Gemini API key, users can describe any app idea, and TinyDev will develop the complete project, code, structure, and supporting files automatically. The process continues in a loop until the user types ‘quit’. This interactive mode enables hands-on experimentation and rapid prototyping from natural language descriptions.

Copy CodeCopiedUse a different Browserdemo_tinydev()

Finally, calling demo_tinydev() runs a predefined demonstration of TinyDev using a sample app prompt. It walks through the full workflow, planning, file structure creation, and code generation, to showcase how the tool automatically builds a complete application from a simple idea.

In conclusion, TinyDev class demonstrates the potential of using AI to automate application scaffolding with remarkable accuracy and efficiency. By breaking down the code generation process into intuitive phases, it ensures that outputs are logically sound, well-structured, and aligned with the user’s intent. Whether you’re exploring new app ideas or seeking to accelerate development, TinyDev provides a lightweight and user-friendly solution powered by the Gemini models. It’s a practical tool for developers looking to integrate AI into their workflow without unnecessary complexity or overhead.

Check out the Notebook here. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post Building AI-Powered Applications Using the Plan → Files → Code Workflow in TinyDev appeared first on MarkTechPost.