Meet ReSearch: A Novel AI Framework that Trains LLMs to Reason with Se …

Large language models (LLMs) have demonstrated significant progress across various tasks, particularly in reasoning capabilities. However, effectively integrating reasoning processes with external search operations remains challenging, especially for multi-hop questions requiring intricate reasoning chains and multiple retrieval steps. Current methods primarily depend on manually designed prompts or heuristics, posing limitations in scalability and flexibility. Additionally, generating supervised data for multi-step reasoning scenarios is often prohibitively expensive and practically infeasible.

Researchers from Baichuan Inc., Tongji University, The University of Edinburgh, and Zhejiang University introduce ReSearch, a novel AI framework designed to train LLMs to integrate reasoning with search via reinforcement learning, notably without relying on supervised reasoning steps. The core methodology of ReSearch incorporates search operations directly into the reasoning chain. Utilizing Group Relative Policy Optimization (GRPO), a reinforcement learning technique, ReSearch guides LLMs to autonomously identify optimal moments and strategies for performing search operations, which subsequently influence ongoing reasoning. This approach enables models to progressively refine their reasoning and naturally facilitates advanced capabilities such as reflection and self-correction.

From a technical perspective, ReSearch employs structured output formats by embedding specific tags—such as <think>, <search>, <result>, and <answer>—within the reasoning chain. These tags facilitate clear communication between the model and the external retrieval environment, systematically organizing generated outputs. During training, ReSearch intentionally excludes retrieval results from loss computations to prevent model bias. Reward signals guiding the reinforcement learning process are based on straightforward criteria: accuracy assessment through F1 scores and adherence to the predefined structured output format. This design encourages the autonomous development of sophisticated reasoning patterns, circumventing the need for manually annotated reasoning datasets.

Experimental evaluation confirms the robustness of ReSearch. When assessed on multi-hop question-answering benchmarks, including HotpotQA, 2WikiMultiHopQA, MuSiQue, and Bamboogle, ReSearch consistently outperformed baseline methods. Specifically, ReSearch-Qwen-32B-Instruct achieved improvements ranging between 8.9% and 22.4% in performance compared to established baselines. Notably, these advancements were achieved despite the model being trained exclusively on a single dataset, underscoring its strong generalization capabilities. Further analyses demonstrated that models gradually increased their reliance on iterative search operations throughout training, indicative of enhanced reasoning proficiency. A detailed case study illustrated the model’s capacity to identify suboptimal search queries, reflect on its reasoning steps, and implement corrective actions autonomously.

In summary, ReSearch presents a significant methodological advancement in training LLMs to seamlessly integrate reasoning with external search mechanisms via reinforcement learning. By eliminating dependency on supervised reasoning data, this framework effectively addresses critical scalability and adaptability issues inherent in multi-hop reasoning scenarios. Its capability for self-reflection and correction enhances its practical applicability in complex, realistic contexts. Future research directions may further extend this reinforcement learning-based framework to broader applications and incorporate additional external knowledge resources.

Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 85k+ ML SubReddit.

The post Meet ReSearch: A Novel AI Framework that Trains LLMs to Reason with Search via Reinforcement Learning without Using Any Supervised Data on Reasoning Steps appeared first on MarkTechPost.

Introducing AWS MCP Servers for code assistants (Part 1)

We’re excited to announce the open source release of AWS MCP Servers for code assistants — a suite of specialized Model Context Protocol (MCP) servers that bring Amazon Web Services (AWS) best practices directly to your development workflow. Our specialized AWS MCP servers combine deep AWS knowledge with agentic AI capabilities to accelerate development across key areas. Each AWS MCP Server focuses on a specific domain of AWS best practices, working together to provide comprehensive guidance throughout your development journey.
This post is the first in a series covering AWS MCP Servers. In this post, we walk through how these specialized MCP servers can dramatically reduce your development time while incorporating security controls, cost optimizations, and AWS Well-Architected best practices into your code. Whether you’re an experienced AWS developer or just getting started with cloud development, you’ll discover how to use AI-powered coding assistants to tackle common challenges such as complex service configurations, infrastructure as code (IaC) implementation, and knowledge base integration. By the end of this post, you’ll understand how to start using AWS MCP Servers to transform your development workflow and deliver better solutions, faster.
If you want to get started right away, skip ahead to the section “From Concept to working code in minutes.”
AI is transforming how we build software, creating opportunities to dramatically accelerate development while improving code quality and consistency. Today’s AI assistants can understand complex requirements, generate production-ready code, and help developers navigate technical challenges in real time. This AI-driven approach is particularly valuable in cloud development, where developers need to orchestrate multiple services while maintaining security, scalability, and cost-efficiency.
Developers need code assistants that understand the nuances of AWS services and best practices. Specialized AI agents can address these needs by:

Providing contextual guidance on AWS service selection and configuration
Optimizing compliance with security best practices and regulatory requirements
Promoting the most efficient utilization and cost-effective solutions
Automating repetitive implementation tasks with AWS specific patterns

This approach means developers can focus on innovation while AI assistants handle the undifferentiated heavy lifting of coding. Whether you’re using Amazon Q, Amazon Bedrock, or other AI tools in your workflow, AWS MCP Servers complement and enhance these capabilities with deep AWS specific knowledge to help you build better solutions faster.
Model Context Protocol (MCP) is a standardized open protocol that enables seamless interaction between large language models (LLMs), data sources, and tools. This protocol allows AI assistants to use specialized tooling and to access domain-specific knowledge by extending the model’s capabilities beyond its built-in knowledge—all while keeping sensitive data local. Through MCP, general-purpose LLMs can now seamlessly access relevant knowledge beyond initial training data and be effectively steered towards desired outputs by incorporating specific context and best practices.
Accelerate building on AWS
What if your AI assistant could instantly access deep AWS knowledge, understanding every AWS service, best practice, and architectural pattern? With MCP, we can transform general-purpose LLMs into AWS specialists by connecting them to specialized knowledge servers. This opens up exciting new possibilities for accelerating cloud development while maintaining security and following best practices.
Build on AWS in a fraction of the time, with best practices automatically applied from the first line of code. Skip hours of documentation research and immediately access ready-to-use patterns for complex services such as Amazon Bedrock Knowledge Bases. Our MCP Servers will help you write well-architected code from the start, implement AWS services correctly the first time, and deploy solutions that are secure, observable, and cost-optimized by design. Transform how you build on AWS today.

Enforce AWS best practices automatically – Write well-architected code from the start with built-in security controls, proper observability, and optimized resource configurations
Cut research time dramatically – Stop spending hours reading documentation. Our MCP Servers provide contextually relevant guidance for implementing AWS services correctly, addressing common pitfalls automatically
Access ready-to-use patterns instantly – Use pre-built AWS CDK constructs, Amazon Bedrock Agents schema generators, and Amazon Bedrock Knowledge Bases integration templates that follow AWS best practices from the start
Optimize cost proactively – Prevent over-provisioning as you design your solution by getting cost-optimization recommendations and generating a comprehensive cost report to analyze your AWS spending before deployment

To turn this vision into reality and make AWS development faster, more secure, and more efficient, we’ve created AWS MCP Servers—a suite of specialized AWS MCP Servers that bring AWS best practices directly to your development workflow. Our specialized AWS MCP Servers combine deep AWS knowledge with AI capabilities to accelerate development across key areas. Each AWS MCP Server focuses on a specific domain of AWS best practices, working together to provide comprehensive guidance throughout your development journey.
Overview of domain-specific MCP Servers for AWS development
Our specialized MCP Servers are designed to cover distinct aspects of AWS development, each bringing deep knowledge to specific domains while working in concert to deliver comprehensive solutions:

Core – The foundation server that provides AI processing pipeline capabilities and serves as a central coordinator. It helps provide clear plans for building AWS solutions and can federate to other MCP servers as needed.
AWS Cloud Development Kit (AWS CDK) – Delivers AWS CDK knowledge with tools for implementing best practices, security configurations with cdk-nag, Powertools for AWS Lambda integration, and specialized constructs for generative AI services. It makes sure infrastructure as code (IaC) follows AWS Well-Architected principles from the start.
Amazon Bedrock Knowledge Bases – Enables seamless access to Amazon Bedrock Knowledge Bases so developers can query enterprise knowledge with natural language, filter results by data source, and use reranking for improved relevance.
Amazon Nova Canvas – Provides image generation capabilities using Amazon Nova Canvas through Amazon Bedrock, enabling the creation of visuals from text prompts and color palettes—perfect for mockups, diagrams, and UI design concepts.
Cost – Analyzes AWS service costs and generates comprehensive cost reports, helping developers understand the financial implications of their architectural decisions and optimize for cost-efficiency.

Prerequisites
To complete the solution, you need to have the following prerequisites in place:

uv package manager
Install Python using uv python install 3.13
AWS credentials with appropriate permissions
An MCP-compatible LLM client (such as Anthropic’s Claude for Desktop, Cline, Amazon Q CLI, or Cursor)

From concept to working code in minutes
You can download the AWS MCP Servers on GitHub or through the PyPI package manager. Here’s how to get started using your favorite code assistant with MCP support.
To install MCP Servers, enter the following code:

# Install and setup the MCP servers
{
“mcpServers”: {
“awslabs.core-mcp-server”: {
“command”: “uvx”,
“args”: [
“awslabs.core-mcp-server@latest”
],
“env”: {
“FASTMCP_LOG_LEVEL”: “ERROR”,
“MCP_SETTINGS_PATH”: “path to your mcp server settings”
},
“autoApprove”: [],
“disabled”: false
},
“awslabs.bedrock-kb-retrieval-mcp-server”: {
“command”: “uvx”,
“args”: [“awslabs.bedrock-kb-retrieval-mcp-server@latest”],
“env”: {
“AWS_PROFILE”: “your-aws-profile”,
“AWS_REGION”: “us-east-1”
}
},
“awslabs.cdk-mcp-server”: {
“command”: “uvx”,
“args”: [“awslabs.cdk-mcp-server@latest”],
“env”: {
“FASTMCP_LOG_LEVEL”: “ERROR”
}
},
“awslabs.cost-analysis-mcp-server”: {
“command”: “uvx”,
“args”: [“awslabs.cost-analysis-mcp-server@latest”],
“env”: {
“AWS_PROFILE”: “your-aws-profile”,
“FASTMCP_LOG_LEVEL”: “ERROR”
}
},
“awslabs.nova-canvas-mcp-server”: {
“command”: “uvx”,
“args”: [“awslabs.nova-canvas-mcp-server@latest”],
“env”: {
“AWS_PROFILE”: “your-aws-profile”,
“AWS_REGION”: “us-east-1”
}
}
}
}

AWS MCP Servers in action
Here’s how AWS MCP servers transform the development experience:
Developer: “I need to build an AI-powered chatbot using Amazon Bedrock that can answer questions from our company’s knowledge base. I also want to add a tool for the chatbot to call our internal API.”
Core: “I’ll help you build an Amazon Bedrock Knowledge Bases chatbot with API integration. Let’s create an architecture that uses Amazon Bedrock Agents with a custom action group to call your internal API.”
Core generates a comprehensive architecture diagram showing the knowledge base integration, Amazon Bedrock Agents configuration with action groups, API connectivity, and data flow between components.
AWS CDK: “Here’s the infrastructure code for your chatbot with the Amazon Bedrock Agents action group. I’ve included proper IAM roles, security controls, and Lambda Powertools for observability.”

// CDK code with Well-Architected best practices
const knowledgeBase = new BedrockKnowledgeBase(this, ‘CompanyKB’, {
embeddingModel: BedrockFoundationModel.TITAN_EMBED_TEXT_V1,
vectorStore: new OpenSearchServerlessVectorStore(this, ‘VectorStore’, {
encryption: OpenSearchEncryption.KMS,
ebs: OpenSearchEbsOptions.provisioned(100, OpenSearchVolumeType.GP3)
})
});

// Action Group for API integration
const actionGroupSchema = new BedrockAgentSchema(this, ‘ApiActionSchema’, {
schemaPath: ‘schemas/api-action-schema.json’,
// Schema will be auto-generated from your Lambda code
generateFromLambda: {
lambdaPath: ‘lambda/api-actions.py’,
outputPath: ‘schemas/api-action-schema.json’
}
});

const apiActionGroup = new BedrockAgentActionGroup(this, ‘ApiActionGroup’, {
actionGroupName: ‘InternalApiActions’,
description: ‘Actions to interact with internal API’,
actionGroupExecutor: new BedrockAgentLambdaExecutor(apiActionsLambda),
actionGroupSchema: actionGroupSchema
});

The CDK MCP Server generates complete AWS CDK code to deploy the entire solution. It automatically runs cdk-nag to identify potential security issues and provides remediation steps for each finding, making sure that the infrastructure follows AWS Well-Architected best practices.
Amazon Bedrock Knowledge Bases retrieval: “I’ve configured the optimal settings for your knowledge base queries, including proper reranking for improved relevance.”
Amazon Bedrock Knowledge Bases MCP Server demonstrates how to structure queries to the knowledge base for maximum relevance, provides sample code for filtering by data source, and shows how to integrate the knowledge base responses with the chatbot interface.
Amazon Nova Canvas: “To enhance your chatbot’s capabilities, I’ve created visualizations that can be generated on demand when users request data explanations.”
Amazon Nova Canvas MCP server generates sample images showing how Amazon Nova Canvas can create charts, diagrams, and visual explanations based on knowledge base content, making complex information more accessible to users.
Cost Analysis: “Based on your expected usage patterns, here’s the estimated monthly cost breakdown and optimization recommendations.”
The Cost Analysis MCP Server generates a detailed cost analysis report showing projected expenses for each AWS service, identifies cost optimization opportunities such as reserved capacity for Amazon Bedrock, and provides specific recommendations to reduce costs without impacting performance.
With AWS MCP Servers, what would typically take days of research and implementation is completed in minutes, with better quality, security, and cost-efficiency than manual development in that same time.
Best practices for MCP-assisted development
To maximize the benefits of MCP assisted development while maintaining security and code quality, developers should follow these essential guidelines:

Always review generated code for security implications before deployment
Use MCP Servers as accelerators, not replacements for developer judgment and expertise
Keep MCP Servers updated with the latest AWS security best practices
Follow the principle of least privilege when configuring AWS credentials
Run security scanning tools on generated infrastructure code

Coming up in the series
This post introduced the foundations of AWS MCP Servers and how they accelerate AWS development through specialized, AWS specific MCP Servers. In upcoming posts, we’ll dive deeper into:

Detailed walkthroughs of each MCP server’s capabilities
Advanced patterns for integrating AWS MCP Servers into your development workflow
Real-world case studies showing AWS MCP Servers’ impact on development velocity
How to extend AWS MCP Servers with your own custom MCP servers

Stay tuned to learn how AWS MCP Servers can transform your specific AWS development scenarios and help you build better solutions faster. Visit our GitHub repository or Pypi package manager to explore example implementations and get started today.

About the Authors
Jimin Kim is a Prototyping Architect on the AWS Prototyping and Cloud Engineering (PACE) team, based in Los Angeles. With specialties in Generative AI and SaaS, she loves helping her customers succeed in their business. Outside of work, she cherishes moments with her wife and three adorable calico cats.
Pranjali Bhandari is part of the Prototyping and Cloud Engineering (PACE) team at AWS, based in the San Francisco Bay Area. She specializes in Generative AI, distributed systems, and cloud computing. Outside of work, she loves exploring diverse hiking trails, biking, and enjoying quality family time with her husband and son.
Laith Al-Saadoon is a Principal Prototyping Architect on the Prototyping and Cloud Engineering (PACE) team. He builds prototypes and solutions using generative AI, machine learning, data analytics, IoT & edge computing, and full-stack development to solve real-world customer challenges. In his personal time, Laith enjoys the outdoors–fishing, photography, drone flights, and hiking.
Paul Vincent is a Principal Prototyping Architect on the AWS Prototyping and Cloud Engineering (PACE) team. He works with AWS customers to bring their innovative ideas to life. Outside of work, he loves playing drums and piano, talking with others through Ham radio, all things home automation, and movie nights with the family.
Justin Lewis leads the Emerging Technology Accelerator at AWS. Justin and his team help customers build with emerging technologies like generative AI by providing open source software examples to inspire their own innovation. He lives in the San Francisco Bay Area with his wife and son.
Anita Lewis is a Technical Program Manager on the AWS Emerging Technology Accelerator team, based in Denver, CO. She specializes in helping customers accelerate their innovation journey with generative AI and emerging technologies. Outside of work, she enjoys competitive pickleball matches, perfecting her golf game, and discovering new travel destinations.

Harness the power of MCP servers with Amazon Bedrock Agents

AI agents extend large language models (LLMs) by interacting with external systems, executing complex workflows, and maintaining contextual awareness across operations. Amazon Bedrock Agents enables this functionality by orchestrating foundation models (FMs) with data sources, applications, and user inputs to complete goal-oriented tasks through API integration and knowledge base augmentation. However, in the past, connecting these agents to diverse enterprise systems has created development bottlenecks, with each integration requiring custom code and ongoing maintenance—a standardization challenge that slows the delivery of contextual AI assistance across an organization’s digital ecosystem. This is a problem that you can solve by using Model Context Protocol (MCP), which provides a standardized way for LLMs to connect to data sources and tools.
Today, MCP is providing agents standard access to an expanding list of accessible tools that you can use to accomplish a variety of tasks. In time, MCP can promote better discoverability of agents and tools through marketplaces, enabling agents to share context and have common workspaces for better interaction, and scale agent interoperability across the industry.
In this post, we show you how to build an Amazon Bedrock agent that uses MCP to access data sources to quickly build generative AI applications. Using Amazon Bedrock Agents, your agent can be assembled on the fly with MCP-based tools as in this example:

InlineAgent(
foundation_model=”us.anthropic.claude-3-5-sonnet-20241022-v2:0″,
instruction=”You are a friendly assistant for resolving user queries”,
agent_name=”SampleAgent”,
action_groups=[
ActionGroup(
name=”SampleActionGroup”,
mcp_clients=[mcp_client_1, mcp_client_2],
)
],
).invoke(input_text=”Convert 11am from NYC time to London time”)

We showcase an example of building an agent to understand your Amazon Web Service (AWS) spend by connecting to AWS Cost Explorer, Amazon CloudWatch, and Perplexity AI through MCP. You can use the code referenced in this post to connect your agents to other MCP servers to address challenges for your business. We envision a world where agents have access to an ever-growing list of MCP servers that they can use for accomplishing a wide variety of tasks.
Model Context Protocol
Developed by Anthropic as an open protocol, MCP provides a standardized way to connect AI models to virtually any data source or tool. Using a client-server architecture, MCP enables developers to expose their data through lightweight MCP servers while building AI applications as MCP clients that connect to these servers. Through this architecture, MCP enables users to build more powerful, context-aware AI agents that can seamlessly access the information and tools they need. Whether you’re connecting to external systems or internal data stores or tools, you can now use MCP to interface with all of them in the same way. The client-server architecture of MCP enables your agent to access new capabilities as the MCP server updates without requiring any changes to the application code.
MCP architecture
MCP uses a client-server architecture that contains the following components and is shown in the following figure:

Host: An MCP host is a program or AI tool that requires access to data through the MCP protocol, such as Claude Desktop, an integrated development environment (IDE), or any other AI application.
Client: Protocol clients that maintain one-to-one connections with servers.
Server: Lightweight programs that expose capabilities through standardized MCP.
Local data sources: Your databases, local data sources, and services that MCP servers can securely access.
Remote services: External systems available over the internet through APIs that MCP servers can connect to.

Let’s walk through how to set up Amazon Bedrock agents that take advantage of MCP servers.
Using MCP with Amazon Bedrock agents
In this post, we provide a step-by-step guide for how to connect your favorite MCP servers with Amazon Bedrock agents as Action Groups that an agent can use to accomplish tasks provided by the user. The AgentInlineSDK provides a straightforward way to create inline agents, containing a built-in MCP client implementation that provides you with direct access to tools delivered by an MCP server.
As part of creating an agent, the developer creates an MCP client specific to each MCP server that requires agent communication. When invoked, the agent determines which tools are needed for the user’s task; if MCP server tools are required, it uses the corresponding MCP client to request tool execution from that server. The user code doesn’t need to be aware of the MCP protocol because that’s handled by the MCP client provided the InlineAgent code repository.
To orchestrate this workflow, you take advantage of the return control capability of Amazon Bedrock Agents. The following diagram illustrates the end-to-end flow of an agent handling a request that uses two tools. In the first flow, a Lambda-based action is taken, and in the second, the agent uses an MCP server.

Use case: transform how you manage your AWS spend across different AWS services including Amazon Bedrock
To show how an Amazon Bedrock agent can use MCP servers, let’s walk through a sample use case. Imagine asking questions like “Help me understand my Bedrock spend over the last few weeks” or “What were my EC2 costs last month across regions and instance types?” and getting a human-readable analysis of the data instead of raw numbers on a dashboard. The system interprets your intent and delivers precisely what you need—whether that’s detailed breakdowns, trend analyses, visualizations, or cost-saving recommendations. This is useful because what you’re interested in is insights rather than data. You can accomplish this using two MCP servers: a custom-built MCP server for retrieving the AWS spend data and an open source MCP server from Perplexity AI to interpret the data. You add these two MCP servers as action groups in an inline Amazon Bedrock agent. This gives you an AI agent that can transform the way you manage your AWS spend. All the code for this post is available in the GitHub repository.
Let’s walk through how this agent is created using inline agents. You can use inline agents to define and configure Amazon Bedrock agents dynamically at runtime. They provide greater flexibility and control over agent capabilities, enabling users to specify FMs, instructions, action groups, guardrails, and knowledge bases as needed without relying on pre-configured control plane settings. It’s worth noting that you can also orchestrate this behavior without inline agents by using RETURN_CONTROL with the InvokeAgent API.
MCP components in Amazon Bedrock Agents

Host: This is the Amazon Bedrock inline agent. This agent adds MCP clients as action groups that can be invoked through RETURN_CONTROL when the user asks an AWS spend-related question.
Client: You create two clients that establish one-to-one connections with their respective servers: a cost explorer client with specific cost server parameters and a Perplexity AI client with Perplexity server parameters.
Servers: You create two MCP servers that each run locally on your machine and communicate to your application over standard input/output (alternatively, you could also configure the client to talk to remote MCP servers).

Cost Explorer and Amazon CloudWatch Logs (for Amazon Bedrock model invocation log data) and an MCP server to retrieve the AWS spend data.
Perplexity AI MCP server to interpret the AWS spend data.

Data sources: The MCP servers talk to remote data sources such as Cost Explorer API, CloudWatch Logs and the Perplexity AI search API.

Prerequisites
You need the following prerequisites to get started implementing the solution in this post:

An AWS account
Familiarity with FMs and Amazon Bedrock
Install AWS Command Line Interface (AWS CLI) and set up credentials
Python 3.11 or later
AWS Cloud Development Kit (AWS CDK) CLI
Enable model access for Anthropic’s Claude 3.5 Sonnet v2
You need to have your AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY so that you can set them using environment variables for the server
The two MCP servers are run as Docker daemons, so you need to have Docker installed and running on your computer

The MCP servers run locally on your computer and need to access AWS services and the Perplexity API. You can read more about AWS credentials in Manage access keys for IAM users. Make sure that your credentials include AWS Identity and Access Manager (IAM) read access to Cost Explorer and CloudWatch. You can do this by using AWSBillingReadOnlyAccess and CloudWatchReadOnlyAccess managed IAM permissions. You can get the Perplexity API key from the Perplexity Sonar API page.
Steps to run
With the prerequisites in place, you’re ready to implement the solution.

Navigate to the InlineAgent GitHub repository.
Follow the setup steps.
Navigate to the cost_explorer_agent This folder contains the code for this post.

cd examples/mcp/cost_explorer_agent

Create a .env file in cost_explorer_agent directory using example.

AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_REGION=
BEDROCK_LOG_GROUP_NAME=
PERPLEXITY_API_KEY=

Build aws-cost-explorer-mcp server

git clone https://github.com/aarora79/aws-cost-explorer-mcp-server.git
cd aws-cost-explorer-mcp-server/
docker build -t aws-cost-explorer-mcp

You’re now ready to create an agent that can invoke these MCP servers to provide insights into your AWS spend. You can do this by running the python main.py command. The output will look like the following example. The agent connects with the two MCP servers and accesses their respective tools. Next, the agent lays out a plan to use specific tools sequentially, uses code interpreter to generate a chart showing cost distribution, and uses Cost Explorer and Perplexity API to provide information about Amazon Bedrock and the AWS account’s cost expenditure.

Connected to server with tools: [‘get_bedrock_daily_usage_stats’, ‘get_bedrock_hourly_usage_stats’, ‘get_ec2_spend_last_day’, ‘get_detailed_breakdown_by_day’]
Connected to server with tools: [‘perplexity_ask’]

Thought: Let me break this down into parts:
1. First, I’ll ask about Amazon Bedrock using Perplexity
2. Then I’ll get the cost breakdown for the last 7 days
3. Finally, I’ll create a visualization of the spending data

Tool use: perplexity_ask with these inputs: messages[[{role=user, content=What is Amazon Bedrock? Provide a concise explanation.}] (array)]

Tool use: get_detailed_breakdown_by_day with these inputs: days[7 (integer)]

Thought: Let me analyze the data and create a visualization of the top services by cost. I’ll use Python to process the data and create a bar chart.

Code interpreter:
Generated code
“`
import pandas as pd
import matplotlib.pyplot as plt

“`

Code interpreter output:
aws_costs.png (image/png)
Directory output created successfully

Thought: Now I can provide a comprehensive answer about Amazon Bedrock and the cost analysis. Amazon Bedrock is a fully managed service that provides access to foundation models (FMs) from leading AI companies through a single API. Based on the cost analysis of the last 7 days, your top spending services were:

1. Amazon Neptune: $24
2. Amazon SageMaker: $567
3. EC2 (Compute + Other): $454
4. Amazon OpenSearch Service: $34
5. Amazon Bedrock: $1235
6. Amazon Q: $178

Implementation details
Now that you understand the output produced by an agent, let’s lift the curtain and review some of the important pieces of code that produce the output.

Creating MCP clients: config.py, defines the two MCP clients that talk to your two MCP servers.

Server parameters are defined for the cost explorer and Perplexity clients. The solution uses StdioServerParameters, which configures how the client should communicate over standard input/output (stdio) streams. This contains the parameters required by the server to access the required data through APIs.

# Cost server parameters
cost_server_params = StdioServerParameters(
command=”/usr/local/bin/docker”,
args=[
“run”,
“-i”,
“–rm”,
“-e”,
“AWS_ACCESS_KEY_ID”,
“-e”,
“AWS_SECRET_ACCESS_KEY”,
“-e”,
“AWS_REGION”,
“-e”,
“BEDROCK_LOG_GROUP_NAME”,
“-e”,
“stdio”,
“aws-cost-explorer-mcp:latest”,
],
env={
“AWS_ACCESS_KEY_ID”: AWS_ACCESS_KEY_ID,
“AWS_SECRET_ACCESS_KEY”: AWS_SECRET_ACCESS_KEY,
“AWS_REGION”: AWS_REGION,
“BEDROCK_LOG_GROUP_NAME”: BEDROCK_LOG_GROUP_NAME,
},
)

# Perplexity server parameters
perplexity_server_params = StdioServerParameters(
command=”/usr/local/bin/docker”,
args=[“run”, “-i”, “–rm”, “-e”, “PERPLEXITY_API_KEY”, “mcp/perplexity-ask”],
env={“PERPLEXITY_API_KEY”: PERPLEXITY_API_KEY},
)

In main.py, the MCP server parameters are imported and used to create your two MCP clients.

cost_explorer_mcp_client = await MCPClient.create(server_params=cost_server_params)
perplexity_mcp_client = await MCPClient.create(server_params=perplexity_server_params)

Configure agent action group: main.py creates the action group that combines the MCP clients into a single interface that the agent can access. This enables the agent to ask your application to invoke either of these MCP servers as needed through return of control.

# Create action group with both MCP clients
cost_action_group = ActionGroup(
name=”CostActionGroup”,
mcp_clients=[cost_explorer_mcp_client, perplexity_mcp_client]
)

Inline agent creation: The inline agent can be created with the following specifications:

Foundation model: Configure your choice of FM to power your agent. This can be any model provided on Amazon Bedrock. This example uses Anthropic’s Claude 3.5 Sonnet model.
Agent instruction: Provide instructions to your agent that contain the guidance and steps for orchestrating responses to user queries. These instructions anchor the agent’s approach to handling various types of queries
Agent name: Name of your agent.
Action groups: Define the action groups that your agent can access. These can include single or multiple action groups, with each group having access to multiple MCP clients or AWS Lambda As an option, you can configure your agent to use Code Interpreter to generate, run, and test code for your application.

# Create and invoke the inline agent
await InlineAgent(
foundation_model=”us.anthropic.claude-3-5-sonnet-20241022-v2:0″,
instruction=”””You are a friendly assistant that is responsible for resolving user queries.

You have access to search, cost tool and code interpreter.

“””,
agent_name=”cost_agent”,
action_groups=[
cost_action_group,
{
“name”: “CodeInterpreter”,
“builtin_tools”: {
“parentActionGroupSignature”: “AMAZON.CodeInterpreter”
},
},
],
).invoke(
input_text=”<user-query-here>”
)

You can use this example to build an inline agent on Amazon Bedrock that establishes connections with different MCP servers and groups their clients into a single action group for the agent to access.
Conclusion
The Anthropic MCP protocol offers a standardized way of connecting FMs to data sources, and now you can use this capability with Amazon Bedrock Agents. In this post, you saw an example of combining the power of Amazon Bedrock and MCP to build an application that offers a new perspective on understanding and managing your AWS spend.
Organizations can now offer their teams natural, conversational access to complex financial data while enhancing responses with contextual intelligence from sources like Perplexity. As AI continues to evolve, the ability to securely connect models to your organization’s critical systems will become increasingly valuable. Whether you’re looking to transform customer service, streamline operations, or gain deeper business insights, the Amazon Bedrock and MCP integration provides a flexible foundation for your next AI innovation. You can dive deeper on this MCP integration by exploring our code samples.
Here are some examples of what you can build by connecting your Amazon Bedrock Agents to MCP servers:

A multi-data source agent that retrieves data from different data sources such as Amazon Bedrock Knowledge Bases, Sqlite, or even your local filesystem.
A developer productivity assistant agent that integrates with Slack and GitHub MCP servers.
A machine learning experiment tracking agent that integrates with the Opik MCP server from Comet ML for managing, visualizing, and tracking machine learning experiments directly within development environments.

What business challenges will you tackle with these powerful new capabilities?

About the authors
Mark Roy is a Principal Machine Learning Architect for AWS, helping customers design and build generative AI solutions. His focus since early 2023 has been leading solution architecture efforts for the launch of Amazon Bedrock, the flagship generative AI offering from AWS for builders. Mark’s work covers a wide range of use cases, with a primary interest in generative AI, agents, and scaling ML across the enterprise. He has helped companies in insurance, financial services, media and entertainment, healthcare, utilities, and manufacturing. Prior to joining AWS, Mark was an architect, developer, and technology leader for over 25 years, including 19 years in financial services. Mark holds six AWS certifications, including the ML Specialty Certification.
Eashan Kaushik is a Specialist Solutions Architect AI/ML at Amazon Web Services. He is driven by creating cutting-edge generative AI solutions while prioritizing a customer-centric approach to his work. Before this role, he obtained an MS in Computer Science from NYU Tandon School of Engineering. Outside of work, he enjoys sports, lifting, and running marathons.
Madhur Prashant is an AI and ML Solutions Architect at Amazon Web Services. He is passionate about the intersection of human thinking and generative AI. His interests lie in generative AI, specifically building solutions that are helpful and harmless, and most of all optimal for customers. Outside of work, he loves doing yoga, hiking, spending time with his twin, and playing the guitar.
Amit Arora is an AI and ML Specialist Architect at Amazon Web Services, helping enterprise customers use cloud-based machine learning services to rapidly scale their innovations. He is also an adjunct lecturer in the MS data science and analytics program at Georgetown University in Washington, D.C.
Andy Palmer is a Director of Technology for AWS Strategic Accounts. His teams provide Specialist Solutions Architecture skills across a number of speciality domain areas, including AIML, generative AI, data and analytics, security, network, and open source software. Andy and his team have been at the forefront of guiding our most advanced customers through their generative AI journeys and helping to find ways to apply these new tools to both existing problem spaces and net new innovations and product experiences.

Generate compliant content with Amazon Bedrock and ConstitutionalChain

Generative AI has emerged as a powerful tool for content creation, offering several key benefits that can significantly enhance the efficiency and effectiveness of content production processes such as creating marketing materials, image generation, content moderation etc. Constitutional AI and LangGraph‘s reflection mechanisms represent two complementary approaches to ensuring AI systems behave ethically – with Anthropic embedding principles during training while LangGraph implements them during inference/runtime through reflection and self-correction mechanisms. By using LanGraph’s Constitutional AI, content creators can streamline their workflow while maintaining high standards of user-defined compliance and ethical integrity. This method not only reduces the need for extensive human oversight but also enhances the transparency and accountability of content generation process by AI.
In this post, we explore practical strategies for using Constitutional AI to produce compliant content efficiently and effectively using Amazon Bedrock and LangGraph to build ConstitutionalChain for rapid content creation in highly regulated industries like finance and healthcare. Although AI offers significant productivity benefits, maintaining compliance with strict regulations are crucial. Manual validation of AI-generated content for regulatory adherence can be time-consuming and challenging. We also provide an overview of how Insagic, a Publicis Groupe company, integrated this concept into their existing healthcare marketing workflow using Amazon Bedrock. Insagic is a next-generation insights and advisory business that combines data, design, and dialogues to deliver actionable insights and transformational intelligence for healthcare marketers. It uses expertise from data scientists, behavior scientists, and strategists to drive better outcomes in the healthcare industry.
Understanding Constitutional AI
Constitutional AI is designed to align large language models (LLMs) with human values and ethical considerations. It works by integrating a set of predefined rules, principles, and constraints into the LLM’s core architecture and training process. This approach makes sure that the LLM operates within specified ethical and legal parameters, much like how a constitution governs a nation’s laws and actions.
The key benefits of Constitutional AI for content creation include:

Ethical alignment – Content generated using Constitutional AI is inherently aligned with predefined ethical standards
Legal compliance – The LLM is designed to operate within legal frameworks, reducing the risk of producing non-compliant content
Transparency – The principles guiding the LLM’s decision-making process are clearly defined and can be inspected
Reduced human oversight – By embedding ethical guidelines into the LLM, the need for extensive human review is significantly reduced

Let’s explore how you can harness the power of Constitutional AI to generate compliant content for your organization.
Solution overview
For this solution, we use Amazon Bedrock Knowledge Bases to store a repository of healthcare documents. We employ a Retrieval Augmented Generation (RAG) approach, first retrieving relevant context and synthesizing an answer based on the retrieved context, to generate articles based on the repository. We then use the open source orchestration framework LangGraph and ConstitutionalChain to generate, critique, and review prompts in an Amazon SageMaker notebook and develop an agentic workflow to generate compliance content. The following diagram illustrates this architecture.

This implementation demonstrates a sophisticated agentic workflow that not only generates responses based on a knowledge base but also employs a reflection technique to examine its outputs through ethical principles, allowing it to refine and improve its outputs. We upload a sample set of mental health documents to Amazon Bedrock Knowledge Bases and use those documents to write an article on mental health using a RAG-based approach. Later, we define a constitutional principle with a custom Diversity, Equity, and Inclusion (DEI) principle, specifying how to critique and revise responses for inclusivity.
Prerequisites
To deploy the solution, you need the following prerequisites:

An AWS account
Appropriate AWS Identity and Access Management (IAM) permissions to access an Amazon Simple Storage Service (Amazon S3) bucket, create Amazon Bedrock knowledge bases, and create a SageMaker notebook instance

Create an Amazon Bedrock knowledge base
To demonstrate this capability, we download a mental health article from the following GitHub repo and store it in Amazon S3. We then use Amazon Bedrock Knowledge Bases to index the articles. By default, Amazon Bedrock uses Amazon OpenSearch Serverless as a vector database. For full instructions to create an Amazon Bedrock knowledge base with Amazon S3 as the data source, see Create a knowledge base in Amazon Bedrock Knowledge Bases.

On the Amazon Bedrock console, create a new knowledge base.
Provide a name for your knowledge base and create a new IAM service role.
Choose Amazon S3 as the data source and provide the S3 bucket storing the mental health article.
Choose Amazon Titan Text Embeddings v2 as the embeddings model and OpenSearch Serverless as the vector store.
Choose Create Knowledge Base.

Import statements and set up an Amazon Bedrock client
Follow the instructions provided in the README file in the GitHub repo. Clone the GitHub repo to make a local copy. We recommend running this code in a SageMaker JupyterLab environment. The following code imports the necessary libraries, including Boto3 for AWS services, LangChain components, and Streamlit. It sets up an Amazon Bedrock client and configures Anthropic’s Claude 3 Haiku model with specific parameters.

import boto3
from langchain_aws import ChatBedrock

bedrock_runtime = boto3.client(service_name=”bedrock-runtime”, region_name=”us-east-1″)
llm = ChatBedrock(client=bedrock_runtime, model_id=”anthropic.claude-3-haiku-20240307-v1:0″)
…..

Define Constitutional AI components
Next, we define a Critique class to structure the output of the critique process. Then we create prompt templates for critique and revision. Lastly, we set up chains using LangChain for generating responses, critiques, and revisions.

# LangChain Constitutional chain migration to LangGraph

class Critique(TypedDict):
“””Generate a critique, if needed.”””

critique_needed: Annotated[bool, …, “Whether or not a critique is needed.”]
critique: Annotated[str, …, “If needed, the critique.”]

critique_prompt = ChatPromptTemplate.from_template(
“Critique this response according to the critique request. ”

)

revision_prompt = ChatPromptTemplate.from_template(
“Revise this response according to the critique and reivsion request.nn”
….
)
chain = llm | StrOutputParser()
critique_chain = critique_prompt | llm.with_structured_output(Critique)
revision_chain = revision_prompt | llm | StrOutputParser()

Define a State class and refer to the Amazon Bedrock Knowledge Bases retriever
We define a LangGraph State class to manage the conversation state, including the query, principles, responses, and critiques:

# LangGraph State

class State(TypedDict):
query: str
constitutional_principles: List[ConstitutionalPrinciple]

Next, we set up an Amazon Bedrock Knowledge Bases retriever to extract the relevant information. We refer to the Amazon Bedrock knowledge base we created earlier to create an article based on mental health documents. Make sure to update the knowledge base ID in the following code with the knowledge base you created in previous steps:

#—————————————————————–
# Amazon Bedrock KnowledgeBase

from langchain_aws.retrievers import AmazonKnowledgeBasesRetriever

retriever = AmazonKnowledgeBasesRetriever(
knowledge_base_id=”W3NMIJXLUE”, # Change it to your Knowledge base ID

)

Create LangGraph nodes and a LangGraph graph along with constitutional principles
The next section of code integrates graph-based workflow orchestration, ethical principles, and a user-friendly interface to create a sophisticated Constitutional AI model. The following diagram illustrates the workflow.

It uses a StateGraph to manage the flow between RAG and critique/revision nodes, incorporating a custom DEI principle to guide the LLM’s responses. The system is presented through a Streamlit application, which provides an interactive chat interface where users can input queries and view the LLM’s initial responses, critiques, and revised answers. The application also features a sidebar displaying a graph visualization of the workflow and a description of the applied ethical principle. This comprehensive approach makes sure that the LLM’s outputs are not only knowledge-based but also ethically aligned by using customizable constitutional principles that guide a reflection flow (critique and revise), all while maintaining a user-friendly experience with features like chat history management and a clear chat option.
Streamlit application
The Streamlit application component of this code creates an interactive and user-friendly interface for the Constitutional AI model. It sets up a side pane that displays a visualization of the LLM’s workflow graph and provides a description of the DEI principle being applied. The main interface features a chat section where users can input their queries and view the LLM’s responses.

# ————————————————————————
# Streamlit App

# Clear Chat History fuction
def clear_screen():
st.session_state.messages = [{“role”: “assistant”, “content”: “How may I assist you today?”}]

with st.sidebar:
st.subheader(‘Constitutional AI Demo’)
…..
ConstitutionalPrinciple(
name=”DEI Principle”,
critique_request=”Analyze the content for any lack of diversity, equity, or inclusion. Identify specific instances where the text could be more inclusive or representative of diverse perspectives.”,
revision_request=”Rewrite the content by incorporating critiques to be more diverse, equitable, and inclusive. Ensure representation of various perspectives and use inclusive language throughout.”
)
“””)
st.button(‘Clear Screen’, on_click=clear_screen)

# Store LLM generated responses
if “messages” not in st.session_state.keys():
st.session_state.messages = [{“role”: “assistant”, “content”: “How may I assist you today?”}]

# Chat Input – User Prompt
if prompt := st.chat_input():
….

with st.spinner(f”Generating…”):
….
with st.chat_message(“assistant”):
st.markdown(“**[initial response]**”)
….
st.session_state.messages.append({“role”: “assistant”, “content”: “[revised response] ” + generation[‘response’]})

The application maintains a chat history, displaying both user inputs and LLM responses, including the initial response, any critiques generated, and the final revised response. Each step of the LLM’s process is clearly labeled and presented to the user. The interface also includes a Clear Screen button to reset the chat history. When processing a query, the application shows a loading spinner and displays the runtime, providing transparency into the LLM’s operation. This comprehensive UI design allows users to interact with the LLM while observing how constitutional principles are applied to refine the LLM’s outputs.
Test the solution using the Streamlit UI
In the Streamlit application, when a user inputs a query, the application initiates the process by creating and compiling the graph defined earlier. It then streams the execution of this graph, which includes the RAG and critique/revise steps. During this process, the application displays real-time updates for each node’s execution, showing the user what’s happening behind the scenes. The system measures the total runtime, providing transparency about the processing duration. When it’s complete, the application presents the results in a structured manner within the chat interface. It displays the initial LLM-generated response, followed by any critiques made based on the constitutional principles, and finally shows the revised response that incorporates these ethical considerations. This step-by-step presentation allows users to see how the LLM’s response evolves through the constitutional AI process, from initial generation to ethical refinement. As mentioned, in the GitHub README file, in order to run the Streamlit application, use the following code:

pip install -r requirements.txt
streamlit run main.py

For details on using a Jupyter proxy to access the Streamlit application, refer to Build Streamlit apps in Amazon SageMaker Studio.
Modify the Studio URL, replacing lab? with proxy/8501/.

How Insagic uses Constitutional AI to generate compliant content
Insagic uses real-world medical data to help brands understand people as patients and patients as people, enabling them to deliver actionable insights in the healthcare marketing space. Although generating deep insights in the health space can yield profound dividends, it must be done with consideration for compliance and the personal nature of health data. By defining federal guidelines as constitutional principles, Insagic makes sure that the content delivered by generative AI complies with federal guidelines for healthcare marketing.
Clean up
When you have finished experimenting with this solution, clean up your resources to prevent AWS charges from being incurred:

Empty the S3 buckets.
Delete the SageMaker notebook instance.
Delete the Amazon Bedrock knowledge base.

Conclusion
This post demonstrated how to implement a sophisticated generative AI solution using Amazon Bedrock and LangGraph to generate compliant content. You can also integrate this workflow to generate responses based on a knowledge base and apply ethical principles to critique and revise its outputs, all within an interactive web interface. Insagic is looking at more ways to incorporate this into existing workflows by defining custom principles to achieve compliance goals.
You can expand this concept further by incorporating Amazon Bedrock Guardrails. Amazon Bedrock Guardrails and LangGraph Constitutional AI can create a comprehensive safety system by operating at different levels. Amazon Bedrock provides API-level content filtering and safety boundaries, and LangGraph implements constitutional principles in reasoning workflows. Together, they enable multi-layered protection through I/O filtering, topic restrictions, ethical constraints, and logical validation steps in AI applications.
Try out the solution for your own use case, and leave your feedback in the comments.

About the authors
Sriharsh Adari is a Senior Solutions Architect at Amazon Web Services (AWS), where he helps customers work backwards from business outcomes to develop innovative solutions on AWS. Over the years, he has helped multiple customers on data platform transformations across industry verticals. His core area of expertise include Technology Strategy, Data Analytics, and Data Science. In his spare time, he enjoys playing sports, binge-watching TV shows, and playing Tabla.
David Min is a Senior Partner Sales Solutions Architect at Amazon Web Services (AWS) specializing in Generative AI, where he helps customers transform their businesses through innovative AI solutions. Throughout his career, David has helped numerous organizations across industries bridge the gap between cutting-edge AI technology and practical business applications, focusing on executive engagement and successful solution adoption.
Stephen Garth is a Data Scientist at Insagic, where he develops advanced machine learning solutions, including LLM-powered automation tools and deep clustering models for actionable, consumer insights. With a strong background spanning software engineering, healthcare data science, and computational research, he is passionate to bring his expertise in AI-driven analytics and large-scale data processing to drive solutions.
Chris Cocking specializes in scalable enterprise application design using multiple programming languages. With a nearly 20 years of experience, he excels in LAMP and IIS environments, SEO strategies, and most recently designing agentic systems. Outside of work, Chris is an avid bassist and music lover, which helps fuel his creativity and problem-solving skills.

How to Use Git and Git Bash Locally: A Comprehensive Guide

Table of contentsIntroductionInstallationWindowsmacOSLinuxVerifying InstallationGit Bash BasicsNavigation CommandsFile OperationsKeyboard ShortcutsGit ConfigurationAdditional ConfigurationsBasic Git WorkflowInitializing a RepositoryChecking StatusStaging FilesCommitting ChangesBranching and MergingWorking with BranchesMerging BranchesHandling Merge ConflictsDeleting BranchesRemote RepositoriesAdding a Remote RepositoryAdvanced Git CommandsStashing ChangesReverting ChangesInteractive RebaseTroubleshootingCommon Issues and SolutionsGit Best Practices.gitignore ExampleConclusion

Introduction

Git is a distributed version control system that helps you track changes in your code, collaborate with others, and maintain a history of your project. Git Bash is a terminal application for Windows that provides a Unix-like command-line experience for using Git.

This guide will walk you through setting up Git, using Git Bash, and mastering essential Git commands for local development.

Installation

Windows

Download Git for Windows from git-scm.com

Run the installer with default options (or customize as needed)

Git Bash will be installed automatically as part of the package

macOS

Install Git using Homebrew: brew install git

Alternatively, download from git-scm.com

Linux

For Debian/Ubuntu: sudo apt-get install git

For Fedora: sudo dnf install git

For other distributions, use the appropriate package manager

Verifying Installation

Open Git Bash (Windows) or Terminal (macOS/Linux) and type:

This should display the installed Git version.

Git Bash Basics

Git Bash provides a Unix-like shell experience on Windows. Here are some essential commands:

Navigation Commands

pwd – Print working directory

ls – List files and directories

cd [directory] – Change directory

mkdir [directory] – Create a new directory

rm [file] – Remove a file

rm -r [directory] – Remove a directory and its contents

File Operations

touch [filename] – Create an empty file

cat [filename] – Display file contents

nano [filename] or vim [filename] – Edit files in the terminal

Keyboard Shortcuts

Ctrl + C – Terminate the current command

Ctrl + L – Clear the screen

Tab – Auto-complete commands or filenames

Up/Down arrows – Navigate through command history

Git Configuration

Before using Git, configure your identity:

Additional Configurations

Set your default editor:

Enable colorful output:

View all configurations:

Basic Git Workflow

Initializing a Repository

Navigate to your project folder and initialize a Git repository:

Checking Status

See which files are tracked, modified, or staged:

Staging Files

Add files to the staging area:

Committing Changes

Save staged changes to the repository:

Or open an editor to write a more detailed commit message:

Viewing Commit History

Branching and Merging

Working with Branches

Create a new branch:

Switch to a branch:

Create and switch to a new branch in one command:

List all branches:

Merging Branches

Merge changes from another branch into your current branch:

Handling Merge Conflicts

When Git can’t automatically merge changes, you’ll need to resolve conflicts:

Git will mark the conflicted files

Open the files and look for conflict markers (<<<<<<<, =======, >>>>>>>)

Edit the files to resolve conflicts

Add the resolved files: git add <filename>

Complete the merge: git commit

Deleting Branches

Delete a branch after merging:

Remote Repositories

Adding a Remote Repository

Viewing Remote Repositories

Pushing to a Remote Repository

Pulling from a Remote Repository

Cloning a Repository

Advanced Git Commands

Stashing Changes

Temporarily store modified files to work on something else:

Reverting Changes

Undo commits:

Reset to a previous state (use with caution):

Viewing and Comparing Changes

Interactive Rebase

Rewrite, squash, or reorder commits:

Troubleshooting

Common Issues and Solutions

Problem: “fatal: not a git repository”

Solution: Make sure you’re in the correct directory or initialize a repository with git init

Problem: Unable to push to remote repository

Solution:

Check if you have the correct permissions

Pull latest changes first: git pull origin main

Check if remote URL is correct: git remote -v

Problem: Merge conflicts

Solution: Resolve conflicts manually, then git add the resolved files and git commit

Problem: Accidental commit

Solution: Use git reset –soft HEAD~1 to undo the last commit while keeping changes

Git Best Practices

Commit frequently with clear, descriptive commit messages

Create branches for new features or bug fixes

Pull before pushing to minimize conflicts

Write meaningful commit messages that explain why changes were made

Use .gitignore to exclude unnecessary files (build artifacts, dependencies, etc.)

Review changes before committing with git diff and git status

Keep commits focused on a single logical change

Use tags for marking releases or important milestones

Back up your repositories regularly

Document your Git workflow for team collaboration

.gitignore Example

Create a .gitignore file in your repository root:

Customize this file according to your project’s specific needs.

Conclusion

Git and Git Bash provide powerful tools for version control and collaborative development. In this guide, we covered installation across platforms, essential Git Bash commands, repository initialization, the core add-commit workflow, branching strategies, remote repository management, and advanced operations like stashing and rebasing. We also addressed common troubleshooting scenarios and best practices to maintain a clean workflow. With these fundamentals, you’re now equipped to track changes, collaborate effectively, and maintain a structured history of your projects.
The post How to Use Git and Git Bash Locally: A Comprehensive Guide appeared first on MarkTechPost.

How to Build a Prototype X-ray Judgment Tool (Open Source Medical Infe …

In this tutorial, we demonstrate how to build a prototype X-ray judgment tool using open-source libraries in Google Colab. By leveraging the power of TorchXRayVision for loading pre-trained DenseNet models and Gradio for creating an interactive user interface, we show how to process and classify chest X-ray images with minimal setup. This notebook guides you through image preprocessing, model inference, and result interpretation, all designed to run seamlessly on Colab without requiring external API keys or logins. Please note that this demo is intended for educational purposes only and should not be used as a substitute for professional clinical diagnosis.

Copy CodeCopiedUse a different Browser!pip install torchxrayvision gradio

First, we install the torchxrayvision library for X-ray analysis and Gradio to create an interactive interface.

Copy CodeCopiedUse a different Browserimport torch
import torchxrayvision as xrv
import torchvision.transforms as transforms
import gradio as gr

We import PyTorch for deep learning operations, TorchXRayVision for X‑ray analysis, torchvision’s transforms for image preprocessing, and Gradio for building an interactive UI.

Copy CodeCopiedUse a different Browsermodel = xrv.models.DenseNet(weights=”densenet121-res224-all”)
model.eval()

Then, we load a pre-trained DenseNet model using the “densenet121-res224-all” weights and set it to evaluation mode for inference.

Copy CodeCopiedUse a different Browsertry:
pathology_labels = model.meta[“labels”]
print(“Retrieved pathology labels from model.meta.”)
except Exception as e:
print(“Could not retrieve labels from model.meta. Using fallback labels.”)
pathology_labels = [
“Atelectasis”, “Cardiomegaly”, “Consolidation”, “Edema”,
“Emphysema”, “Fibrosis”, “Hernia”, “Infiltration”, “Mass”,
“Nodule”, “Pleural Effusion”, “Pneumonia”, “Pneumothorax”, “No Finding”
]

Now, we attempt to retrieve pathology labels from the model’s metadata and fall back to a predefined list if the retrieval fails.

Copy CodeCopiedUse a different Browserdef classify_xray(image):
try:
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.Grayscale(num_output_channels=1),
transforms.ToTensor()
])
input_tensor = transform(image).unsqueeze(0) # add batch dimension

with torch.no_grad():
preds = model(input_tensor)

pathology_scores = preds[0].detach().numpy()
results = {}
for idx, label in enumerate(pathology_labels):
results[label] = float(pathology_scores[idx])

sorted_results = sorted(results.items(), key=lambda x: x[1], reverse=True)
top_label, top_score = sorted_results[0]

judgement = (
f”Prediction: {top_label} (score: {top_score:.2f})nn”
f”Full Scores:n{results}”
)
return judgement
except Exception as e:
return f”Error during inference: {str(e)}”

Here, with this function, we preprocess an input X-ray image, run inference using the pre-trained model, extract pathology scores, and return a formatted summary of the top prediction and all scores while handling errors gracefully.

Copy CodeCopiedUse a different Browseriface = gr.Interface(
fn=classify_xray,
inputs=gr.Image(type=”pil”),
outputs=”text”,
title=”X-ray Judgement Tool (Prototype)”,
description=(
“Upload a chest X-ray image to receive a classification judgement. ”
“This demo is for educational purposes only and is not intended for clinical use.”
)
)

iface.launch()

Finally, we build and launch a Gradio interface that lets users upload a chest X-ray image. The classify_xray function processes the image to output a diagnostic judgment.

Gradio Interface for the tool

Through this tutorial, we’ve explored the development of an interactive X-ray judgment tool that integrates advanced deep learning techniques with a user-friendly interface. Despite the inherent limitations, such as the model not being fine-tuned for clinical diagnostics, this prototype serves as a valuable starting point for experimenting with medical imaging applications. We encourage you to build upon this foundation, considering the importance of rigorous validation and adherence to medical standards for real-world use.

Here is the Colab Notebook. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 85k+ ML SubReddit.

The post How to Build a Prototype X-ray Judgment Tool (Open Source Medical Inference System) Using TorchXRayVision, Gradio, and PyTorch appeared first on MarkTechPost.

This AI Paper Introduces Diversified DPO and ORPO: Post-Training Metho …

Creative writing is a domain that thrives on diversity and imagination. Unlike fact-based or task-specific writing, where a single correct output may exist, creative writing involves numerous valid responses to a prompt. Stories, poems, and narratives can branch in countless directions, each with stylistic flavor and meaning. This inherent open-mindedness makes creative writing a prime challenge for AI systems, which need to maintain narrative coherence while producing novel and distinct outputs.

The core issue lies in how large language models are refined after their initial training. Post-training methods often emphasize quality improvements by aligning responses with user preferences or maximizing reward scores. However, these adjustments inadvertently cause the models to produce responses that are too similar across prompts. In creative settings, this leads to a noticeable drop in output diversity. A lack of variation limits the expressive power of the model, resulting in uniform storylines or similar sentence constructions even when prompts are vastly different.

Earlier solutions attempted to address this by tweaking decoding methods or prompt strategies. Researchers used sampling temperature adjustment, top-k or top-p filtering, or iterative prompting to introduce randomness. Some explored methods, such as beam search modifications or self-critiquing, to encourage alternative responses. While these helped diversify outputs, they often came with a cost—sacrificing overall response quality, increasing generation time, or introducing inconsistencies in tone and grammar. More crucially, they did not adopt the model’s core training process to learn from diverse samples.

Researchers from Midjourney and New York University proposed a novel adjustment during the post-training phase. They introduced “Diversified DPO” and “Diversified ORPO”—enhanced versions of two popular preference-based optimization techniques. Their innovation was incorporating a deviation score, quantifying how much a training example differs from others responding to the same prompt. Rare and diverse responses are given more importance during learning by using this score to weight training losses. The researchers specifically implemented these strategies on large models like Meta’s Llama-3.1-8B and Mistral-7B using parameter-efficient fine-tuning via LoRA.

In this approach, deviation acts as a learning signal. For every training pair of a better and worse response to a prompt, the deviation of the better response is computed using both semantic and stylistic embeddings. These embeddings measure not only content differences but also stylistic uniqueness between responses. The resulting score then influences how much that training pair contributes to the model’s weight updates. This method increases the likelihood that the model generates distinct yet high-quality outputs. The training used over 400,000 prompt-response pairs with Reddit upvotes as quality signals and introduced mixing methods to effectively balance semantic and style deviations.

Quantitative results demonstrated the success of the proposed method. The best-performing model, Llama-3.1-8B with Diversified DPO using semantic and style deviation (DDPO-both), achieved nearly the same reward score as GPT-4o while significantly outperforming it in diversity. Specifically, the model had semantic diversity approaching that of the human-crafted reference dataset and style diversity slightly below it. In head-to-head human evaluations, 68% of reviewers preferred DDPO-both’s outputs over GPT-4o’s for quality, and 100% chose them as more diverse. Compared to the baseline DPO, DDPO-both still came out ahead, selected 50% of the time for quality and 62% for diversity. When fewer responses per prompt were available during training, slight drops in reward scores were mitigated using a minimum deviation threshold or sampling higher-quality responses.

This research highlighted a compelling solution to the diversity-quality trade-off in AI-generated creative writing. By emphasizing deviation in training, the researchers enabled models to value uniqueness without compromising coherence. The outcome is a model that delivers richer and more varied storytelling, marking a meaningful step forward in creative AI development.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 85k+ ML SubReddit.
The post This AI Paper Introduces Diversified DPO and ORPO: Post-Training Methods to Boost Output Diversity in Creative Writing with LLMs appeared first on MarkTechPost.

Build agentic systems with CrewAI and Amazon Bedrock

This post is co-authored with Joao Moura and Tony Kipkemboi from CrewAI.
The enterprise AI landscape is undergoing a seismic shift as agentic systems transition from experimental tools to mission-critical business assets. In 2025, AI agents are expected to become integral to business operations, with Deloitte predicting that 25% of enterprises using generative AI will deploy AI agents, growing to 50% by 2027. The global AI agent space is projected to surge from $5.1 billion in 2024 to $47.1 billion by 2030, reflecting the transformative potential of these technologies.
In this post, we explore how CrewAI’s open source agentic framework, combined with Amazon Bedrock, enables the creation of sophisticated multi-agent systems that can transform how businesses operate. Through practical examples and implementation details, we demonstrate how to build, deploy, and orchestrate AI agents that can tackle complex tasks with minimal human oversight. Although “agents” is the buzzword of 2025, it’s important to understand what an AI agent is and where deploying an agentic system could yield benefits.
Agentic design
An AI agent is an autonomous, intelligent system that uses large language models (LLMs) and other AI capabilities to perform complex tasks with minimal human oversight. Unlike traditional software, which follows pre-defined rules, AI agents can operate independently, learn from their environment, adapt to changing conditions, and make contextual decisions. They are designed with modular components, such as reasoning engines, memory, cognitive skills, and tools, that enable them to execute sophisticated workflows. Traditional SaaS solutions are designed for horizontal scalability and general applicability, which makes them suitable for managing repetitive tasks across diverse sectors, but they often lack domain-specific intelligence and the flexibility to address unique challenges in dynamic environments. Agentic systems, on the other hand, are designed to bridge this gap by combining the flexibility of context-aware systems with domain knowledge. Consider a software development use case AI agents can generate, evaluate, and improve code, shifting software engineers’ focus from routine coding to more complex design challenges. For example, for the CrewAI git repository, pull requests are evaluated by a set of CrewAI agents who review code based on code documentation, consistency of implementation, and security considerations. Another use case can be seen in supply chain management, where traditional inventory systems might track stock levels, but lack the capability to anticipate supply chain disruptions or optimize procurement based on industry insights. In contrast, an agentic system can use real-time data (such as weather or geopolitical risks) to proactively reroute supply chains and reallocate resources. The following illustration describes the components of an agentic AI system:

Overview of CrewAI
CrewAI is an enterprise suite that includes a Python-based open source framework. It simplifies the creation and management of AI automations using either AI flows, multi-agent systems, or a combination of both, enabling agents to work together seamlessly, tackling complex tasks through collaborative intelligence. The following figure illustrates the capability of CrewAI’s enterprise offering:

CrewAI’s design centers around the ability to build AI automation through flows and crews of AI agents. It excels at the relationship between agents and tasks, where each agent has a defined role, goal, and backstory, and can access specific tools to accomplish their objectives. This framework allows for autonomous inter-agent delegation, where agents can delegate tasks and inquire among themselves, enhancing problem-solving efficiency. This growth is fueled by the increasing demand for intelligent automation and personalized customer experiences across sectors like healthcare, finance, and retail.
CrewAI’s agents are not only automating routine tasks, but also creating new roles that require advanced skills. CrewAI’s emphasis on team collaboration, through its modular design and simplicity principles, aims to transcend traditional automation, achieving a higher level of decision simplification, creativity enhancement, and addressing complex challenges.
CrewAI key concepts
CrewAI’s architecture is built on a modular framework comprising several key components that facilitate collaboration, delegation, and adaptive decision-making in multi-agent environments. Let’s explore each component in detail to understand how they enable multi-agent interactions.
At a high level, CrewAI creates two main ways to create agentic automations: flows and crews.
Flows
CrewAI Flows provide a structured, event-driven framework to orchestrate complex, multi-step AI automations seamlessly. Flows empower users to define sophisticated workflows that combine regular code, single LLM calls, and potentially multiple crews, through conditional logic, loops, and real-time state management. This flexibility allows businesses to build dynamic, intelligent automation pipelines that adapt to changing conditions and evolving business needs. The following figure illustrates the difference between Crews and Flows:

When integrated with Amazon Bedrock, CrewAI Flows unlock even greater potential. Amazon Bedrock provides a robust foundation by enabling access to powerful foundation models (FMs).
For example, in a customer support scenario, a CrewAI Flow orchestrated through Amazon Bedrock could automatically route customer queries to specialized AI agent crews. These crews collaboratively diagnose customer issues, interact with backend systems for data retrieval, generate personalized responses, and dynamically escalate complex problems to human agents only when necessary.
Similarly, in financial services, a CrewAI Flow could monitor industry conditions, triggering agent-based analysis to proactively manage investment portfolios based on industry volatility and investor preferences.
Together, CrewAI Flows and Amazon Bedrock create a powerful synergy, enabling enterprises to implement adaptive, intelligent automation that addresses real-world complexities efficiently and at scale.
Crews
Crews in CrewAI are composed of several key components, which we discuss in this section.
Agents
Agents in CrewAI serve as autonomous entities designed to perform specific roles within a multi-agent system. These agents are equipped with various capabilities, including reasoning, memory, and the ability to interact dynamically with their environment. Each agent is defined by four main elements:

Role – Determines the agent’s function and responsibilities within the system
Backstory – Provides contextual information that guides the agent’s decision-making processes
Goals – Specifies the objectives the agent aims to accomplish
Tools – Extends the capabilities of agents to access more information and take actions

Agents in CrewAI are designed to work collaboratively, making autonomous decisions, delegating tasks, and using tools to execute complex workflows efficiently. They can communicate with each other, use external resources, and refine their strategies based on observed outcomes.
Tasks
Tasks in CrewAI are the fundamental building blocks that define specific actions an agent needs to perform to achieve its objectives. Tasks can be structured as standalone assignments or interdependent workflows that require multiple agents to collaborate. Each task includes key parameters, such as:

Description – Clearly defines what the task entails
Agent assignment – Specifies which agent is responsible for executing the task

Tools
Tools in CrewAI provide agents with extended capabilities, enabling them to perform actions beyond their intrinsic reasoning abilities. These tools allow agents to interact with APIs, access databases, execute scripts, analyze data, and even communicate with other external systems. CrewAI supports a modular tool integration system where tools can be defined and assigned to specific agents, providing efficient and context-aware decision-making.
Process
The process layer in CrewAI governs how agents interact, coordinate, and delegate tasks. It makes sure that multi-agent workflows operate seamlessly by managing task execution, communication, and synchronization among agents.
More details on CrewAI concepts can be found in the CrewAI documentation.
CrewAI enterprise suite
For businesses looking for tailored AI agent solutions, CrewAI provides an enterprise offering that includes dedicated support, advanced customization, and integration with enterprise-grade systems like Amazon Bedrock. This enables organizations to deploy AI agents at scale while maintaining security and compliance requirements.
Enterprise customers get access to comprehensive monitoring tools that provide deep visibility into agent operations. This includes detailed logging of agent interactions, performance metrics, and system health indicators. The monitoring dashboard enables teams to track agent behavior, identify bottlenecks, and optimize multi-agent workflows in real time.
Real-world enterprise impact
CrewAI customers are already seeing significant returns by adopting agentic workflows in production. In this section, we provide a few real customer examples.
Legacy code modernization
A large enterprise customer needed to modernize their legacy ABAP and APEX code base, a typically time-consuming process requiring extensive manual effort for code updates and testing.
Multiple CrewAI agents work in parallel to:

Analyze existing code base components
Generate modernized code in real time
Execute tests in production environment
Provide immediate feedback for iterations

The customer achieved approximately 70% improvement in code generation speed while maintaining quality through automated testing and feedback loops. The solution was containerized using Docker for consistent deployment and scalability. The following diagram illustrates the solution architecture.

Back office automation at global CPG company
A leading CPG company automated their back-office operations by connecting their existing applications and data stores to CrewAI agents that:

Research industry conditions
Analyze pricing data
Summarize findings
Execute decisions

The implementation resulted in a 75% reduction in processing time by automating the entire workflow from data analysis to action execution. The following diagram illustrates the solution architecture.

Get started with CrewAI and Amazon Bedrock
Amazon Bedrock integration with CrewAI enables the creation of production-grade AI agents powered by state-of-the-art language models.
The following is a code snippet on how to set up this integration:

from crewai import Agent, Crew, Process, Task, LLM
from crewai_tools import SerperDevTool, ScrapeWebsiteTool
import os

# Configure Bedrock LLM
llm = LLM(
model=”bedrock/anthropic. anthropic.claude-3-5-sonnet-20241022-v2:0″,
aws_access_key_id=os.getenv(‘AWS_ACCESS_KEY_ID’),
aws_secret_access_key=os.getenv(‘AWS_SECRET_ACCESS_KEY’),
aws_region_name=os.getenv(‘AWS_REGION_NAME’)
)

# Create an agent with Bedrock as the LLM provider
security_analyst = Agent(
config=agents_config[‘security_analyst’],
tools=[SerperDevTool(), ScrapeWebsiteTool()],
llm=llm
)

Check out the CrewAI LLM documentation for detailed instructions on how to configure LLMs with your AI agents.
Amazon Bedrock provides several key advantages for CrewAI applications:

Access to state-of-the-art language models such as Anthropic’s Claude and Amazon Nova – These models provide the cognitive capabilities that power agent decision-making. The models enable agents to understand complex instructions, generate human-like responses, and make nuanced decisions based on context.
Enterprise-grade security and compliance features – This is crucial for organizations that need to maintain strict control over their data and enforce compliance with various regulations.
Scalability and reliability backed by AWS infrastructure – This means your agent systems can handle increasing workloads while maintaining consistent performance.

Amazon Bedrock Agents and Amazon Bedrock Knowledge Bases as native CrewAI Tools
Amazon Bedrock Agents offers you the ability to build and configure autonomous agents in a fully managed and serverless manner on Amazon Bedrock. You don’t have to provision capacity, manage infrastructure, or write custom code. Amazon Bedrock manages prompt engineering, memory, monitoring, encryption, user permissions, and API invocation. BedrockInvokeAgentTool enables CrewAI agents to invoke Amazon Bedrock agents and use their capabilities within your workflows.
With Amazon Bedrock Knowledge Bases, you can securely connect FMs and agents to your company data to deliver more relevant, accurate, and customized responses. BedrockKBRetrieverTool enables CrewAI agents to retrieve information from Amazon Bedrock Knowledge Bases using natural language queries.
The following code shows an example for Amazon Bedrock Agents integration:

from crewai import Agent, Task, Crew

from crewai_tools.aws.bedrock.agents.invoke_agent_tool import BedrockInvokeAgentTool

# Initialize the Bedrock Agents Tool

agent_tool = BedrockInvokeAgentTool(
agent_id=”your-agent-id”,
agent_alias_id=”your-agent-alias-id”
)

# Create an CrewAI agent that uses the Bedrock Agents Tool

aws_expert = Agent(
role=’AWS Service Expert’,
goal=’Help users understand AWS services and quotas’,
backstory=’I am an expert in AWS services and can provide detailed information about them.’,
tools=[agent_tool],
verbose=True
)

The following code shows an example for Amazon Bedrock Knowledge Bases integration:

# Create and configure the BedrockKB tool
kb_tool = BedrockKBRetrieverTool(
knowledge_base_id=”your-kb-id”,
number_of_results=5
)

# Create an CrewAI agent that uses the Bedrock Agents Tool
researcher = Agent(
role=’Knowledge Base Researcher’,
goal=’Find information about company policies’,
backstory=’I am a researcher specialized in retrieving and analyzing company documentation.’,
tools=[kb_tool],
verbose=True
)

Operational excellence through monitoring, tracing, and observability with CrewAI on AWS
As with any software application, achieving operational excellence is crucial when deploying agentic applications in production environments. These applications are complex systems comprising both deterministic and probabilistic components that interact either sequentially or in parallel. Therefore, comprehensive monitoring, traceability, and observability are essential factors for achieving operational excellence. This includes three key dimensions:

Application-level observability – Provides smooth operation of the entire system, including the agent orchestration framework CrewAI and potentially additional application components (such as a frontend)
Model-level observability – Provides reliable model performance (including metrics like accuracy, latency, throughput, and more)
Agent-level observability – Maintains efficient operations within single-agent or multi-agent systems

When running agent-based applications with CrewAI and Amazon Bedrock on AWS, you gain access to a comprehensive set of built-in capabilities across these dimensions:

Application-level logs – Amazon CloudWatch automatically collects application-level logs and metrics from your application code running on your chosen AWS compute platform, such as AWS Lambda, Amazon Elastic Container Service (Amazon ECS), or Amazon Elastic Compute Cloud (Amazon EC2). The CrewAI framework provides application-level logging, configured at a minimal level by default. For more detailed insights, verbose logging can be enabled at the agent or crew level by setting verbose=True during initialization.
Model-level invocation logs – Furthermore, CloudWatch automatically collects model-level invocation logs and metrics from Amazon Bedrock. This includes essential performance metrics.
Agent-level observability – CrewAI seamlessly integrates with popular third-party monitoring and observability frameworks such as AgentOps, Arize, MLFlow, LangFuse, and others. These frameworks enable comprehensive tracing, debugging, monitoring, and optimization of the agent system’s performance.

Solution overview
Each AWS service has its own configuration nuances, and missing just one detail can lead to serious vulnerabilities. Traditional security assessments often demand multiple experts, coordinated schedules, and countless manual checks. With CrewAI Agents, you can streamline the entire process, automatically mapping your resources, analyzing configurations, and generating clear, prioritized remediation steps.
The following diagram illustrates the solution architecture.

Our use case demo implements a specialized team of three agents, each with distinct responsibilities that mirror roles you might find in a professional security consulting firm:

Infrastructure mapper – Acts as our system architect, methodically documenting AWS resources and their configurations. Like an experienced cloud architect, it creates a detailed inventory that serves as the foundation for our security analysis.
Security analyst – Serves as our cybersecurity expert, examining the infrastructure map for potential vulnerabilities and researching current best practices. It brings deep knowledge of security threats and mitigation strategies.
Report writer – Functions as our technical documentation specialist, synthesizing complex findings into clear, actionable recommendations. It makes sure that technical insights are communicated effectively to both technical and non-technical stakeholders.

Implement the solution
In this section, we walk through the implementation of a security assessment multi-agent system. The code for this example is located on GitHub. Note that not all code artifacts of the solution are explicitly covered in this post.
Step 1: Configure the Amazon Bedrock LLM
We’ve saved our environment variables in an .env file in our root directory before we pass them to the LLM class:

from crewai import Agent, Crew, Process, Task, LLM
from crewai.project import CrewBase, agent, crew, task

from aws_infrastructure_security_audit_and_reporting.tools.aws_infrastructure_scanner_tool import AWSInfrastructureScannerTool
from crewai_tools import SerperDevTool, ScrapeWebsiteTool
import os

@CrewBase
class AwsInfrastructureSecurityAuditAndReportingCrew():
“””AwsInfrastructureSecurityAuditAndReporting crew”””
def __init__(self) -> None:
self.llm = LLM( model=os.getenv(‘MODEL’),
aws_access_key_id=os.getenv(‘AWS_ACCESS_KEY_ID’),
aws_secret_access_key=os.getenv(‘AWS_SECRET_ACCESS_KEY’),
aws_region_name=os.getenv(‘AWS_REGION_NAME’)
)

Step 2: Define agents
These agents are already defined in the agents.yaml file, and we’re importing them into each agent function in the crew.py file:


# Configure AI Agents

@agent
def infrastructure_mapper(self) -> Agent:
return Agent(
config=self.agents_config[‘infrastructure_mapper’],
tools=[AWSInfrastructureScannerTool()],
llm=self.llm
)

@agent
def security_analyst(self) -> Agent:
return Agent(
config=self.agents_config[‘security_analyst’],
tools=[SerperDevTool(), ScrapeWebsiteTool()],
llm=self.llm
)

@agent
def report_writer(self) -> Agent:
return Agent(
config=self.agents_config[‘report_writer’],
llm=self.llm
)

Step 3: Define tasks for the agents
Similar to our agents in the preceding code, we import tasks.yaml into our crew.py file:


# Configure Tasks for the agents

@task
def map_aws_infrastructure_task(self) -> Task:
return Task(
config=self.tasks_config[‘map_aws_infrastructure_task’]
)

@task
def exploratory_security_analysis_task(self) -> Task:
return Task(
config=self.tasks_config[‘exploratory_security_analysis_task’]
)

@task
def generate_report_task(self) -> Task:
return Task(
config=self.tasks_config[‘generate_report_task’]
)

Step 4: Create the AWS infrastructure scanner tool
This tool enables our agents to interact with AWS services and retrieve information they need to perform their analysis:

class AWSInfrastructureScannerTool(BaseTool):
name: str = “AWS Infrastructure Scanner”
description: str = (
“A tool for scanning and mapping AWS infrastructure components and their configurations. ”
“Can retrieve detailed information about EC2 instances, S3 buckets, IAM configurations, ”
“RDS instances, VPC settings, and security groups. Use this tool to gather information ”
“about specific AWS services or get a complete infrastructure overview.”
)
args_schema: Type[BaseModel] = AWSInfrastructureScannerInput

def _run(self, service: str, region: str) -> str:
try:
if service.lower() == ‘all’:
return json.dumps(self._scan_all_services(region), indent=2, cls=DateTimeEncoder)
return json.dumps(self._scan_service(service.lower(), region), indent=2, cls=DateTimeEncoder)
except Exception as e:
return f”Error scanning AWS infrastructure: {str(e)}”

def _scan_all_services(self, region: str) -> Dict:
return {
‘ec2’: self._scan_service(‘ec2’, region),
‘s3’: self._scan_service(‘s3’, region),
‘iam’: self._scan_service(‘iam’, region),
‘rds’: self._scan_service(‘rds’, region),
‘vpc’: self._scan_service(‘vpc’, region)
}

# More services can be added here

Step 5: Assemble the security audit crew
Bring the components together in a coordinated crew to execute on the tasks:

@crew
def crew(self) -> Crew:
“””Creates the AwsInfrastructureSecurityAuditAndReporting crew”””
return Crew(
agents=self.agents, # Automatically created by the @agent decorator
tasks=self.tasks, # Automatically created by the @task decorator
process=Process.sequential,
verbose=True,
)

Step 6: Run the crew
In our main.py file, we import our crew and pass in inputs to the crew to run:

def run():
“””
Run the crew.
“””
inputs = {}
AwsInfrastructureSecurityAuditAndReportingCrew().crew().kickoff(inputs=inputs)

The final report will look something like the following code:

“`markdown
### Executive Summary

In response to an urgent need for robust security within AWS infrastructure, this assessment identified several critical areas requiring immediate attention across EC2 Instances, S3 Buckets, and IAM Configurations. Our analysis revealed two high-priority issues that pose significant risks to the organization’s security posture.

### Risk Assessment Matrix

| Security Component | Risk Description | Impact | Likelihood | Priority |
|——————–|——————|———|————|———-|
| S3 Buckets | Unintended public access | High | High | Critical |
| EC2 Instances | SSRF through Metadata | High | Medium | High |
| IAM Configurations | Permission sprawl | Medium | High | Medium |

### Prioritized Remediation Roadmap

1. **Immediate (0-30 days):**
– Enforce IMDSv2 on all EC2 instances
– Conduct S3 bucket permission audit and rectify public access issues
– Adjust security group rules to eliminate broad access

2. **Short Term (30-60 days):**
– Conduct IAM policy audit to eliminate unused permissions
– Restrict RDS access to known IP ranges
“`

This implementation shows how CrewAI agents can work together to perform complex security assessments that would typically require multiple security professionals. The system is both scalable and customizable, allowing for adaptation to specific security requirements and compliance standards.
Conclusion
In this post, we demonstrated how to use CrewAI and Amazon Bedrock to build a sophisticated, automated security assessment system for AWS infrastructure. We explored how multiple AI agents can work together seamlessly to perform complex security audits, from infrastructure mapping to vulnerability analysis and report generation. Through our example implementation, we showcased how CrewAI’s framework enables the creation of specialized agents, each bringing unique capabilities to the security assessment process. By integrating with powerful language models using Amazon Bedrock, we created a system that can autonomously identify security risks, research solutions, and generate actionable recommendations.
The practical example we shared illustrates just one of many possible applications of CrewAI with Amazon Bedrock. The combination of CrewAI’s agent orchestration capabilities and advanced language models in Amazon Bedrock opens up numerous possibilities for building intelligent, autonomous systems that can tackle complex business challenges.
We encourage you to explore our code on GitHub and start building your own multi-agent systems using CrewAI and Amazon Bedrock. Whether you’re focused on security assessments, process automation, or other use cases, this powerful combination provides the tools you need to create sophisticated AI solutions that can scale with your needs.

About the Authors
Tony Kipkemboi is a Senior Developer Advocate and Partnerships Lead at CrewAI, where he empowers developers to build AI agents that drive business efficiency. A US Army veteran, Tony brings a diverse background in healthcare, data engineering, and AI. With a passion for innovation, he has spoken at events like PyCon US and contributes to the tech community through open source projects, tutorials, and thought leadership in AI agent development. Tony holds a Bachelor’s of Science in Health Sciences and is pursuing a Master’s in Computer Information Technology at the University of Pennsylvania.
João (Joe) Moura is the Founder and CEO of CrewAI, the leading agent orchestration platform powering multi-agent automations at scale. With deep expertise in generative AI and enterprise solutions, João partners with global leaders like AWS, NVIDIA, IBM, and Meta AI to drive innovative AI strategies. Under his leadership, CrewAI has rapidly become essential infrastructure for top-tier companies and developers worldwide and used by most of the F500 in the US.
Karan Singh is a Generative AI Specialist at AWS, where he works with top-tier third-party foundation model and agentic frameworks providers to develop and execute joint go-to-market strategies, enabling customers to effectively deploy and scale solutions to solve enterprise generative AI challenges. Karan holds a Bachelor’s of Science in Electrical Engineering from Manipal University, a Master’s in Science in Electrical Engineering from Northwestern University, and an MBA from the Haas School of Business at University of California, Berkeley.
Aris Tsakpinis is a Specialist Solutions Architect for Generative AI focusing on open source models on Amazon Bedrock and the broader generative AI open source ecosystem. Alongside his professional role, he is pursuing a PhD in Machine Learning Engineering at the University of Regensburg, where his research focuses on applied natural language processing in scientific domains.

Meet Hostinger Horizons: A No-Code AI Tool that Lets You Create, Edit, …

​In the evolving landscape of web development, the emergence of no-code platforms has significantly broadened access to application creation. Among these, Hostinger Horizons stands out as an AI-powered tool designed to facilitate the building, editing, and publishing of custom web applications without necessitating any coding expertise. By integrating essential services such as hosting, domain registration, and email functionalities, Hostinger Horizons offers a comprehensive solution for individuals and businesses seeking to establish a digital presence.​

Technical Overview

Hostinger Horizons utilizes advanced artificial intelligence and natural language processing to interpret user inputs and generate functional web applications. The platform features a user-friendly chat interface where users can describe their envisioned application in everyday language. For example, a prompt like “Create a personal finance tracker that allows users to log expenses and view spending reports” enables the AI to construct an application aligned with these specifications. ​

Notable Technical Features:

Real-Time Editing and Live Preview: Users can make modifications to their applications and observe changes instantaneously, promoting an iterative development process. ​

Multilingual Support: The platform accommodates over 80 languages, allowing users worldwide to develop applications in their native tongues. ​

Image and Voice Input: Beyond text prompts, users can upload images or utilize voice commands to guide the AI in building the application, enhancing accessibility and flexibility. ​

Sandbox Environment: Hostinger Horizons provides a sandbox environment where users can test their applications without affecting the live version, ensuring a smooth deployment process. ​

Integrated Deployment: Once the application meets the user’s satisfaction, it can be deployed directly through the platform. Hostinger Horizons manages all backend processes, including hosting and domain setup, streamlining the launch process. ​

Business Considerations

Hostinger Horizons is tailored to a diverse audience, encompassing entrepreneurs, small businesses, and individual creators. By removing the necessity for coding skills, the platform lowers the barrier to web application development, enabling rapid transformation of ideas into functional applications.​

Advantages for Businesses:

Cost-Effective Development: Traditional web application development often involves significant expenses related to hiring developers. Hostinger Horizons offers a more economical alternative, making it particularly advantageous for startups and small enterprises. ​

Rapid Prototyping: The platform facilitates swift development and deployment of applications, allowing businesses to test concepts and iterate based on user feedback without substantial time investments.​

Integrated Services: With built-in hosting, domain registration, and email services, businesses can manage all aspects of their web presence from a single platform, simplifying operations and reducing the need for multiple service providers. ​

Scalability: Hostinger Horizons’ cloud-based infrastructure ensures that applications can scale seamlessly as the business grows, accommodating increasing traffic and user engagement.​

Pricing Structure:

Hostinger Horizons offers several pricing plans to accommodate different needs:​

Starter Plan: Priced at $19.99 per month, it includes 100 messages, hosting (one month free), unlimited bandwidth, up to 50 web apps, and free email services. ​

Hobbyist Plan: At $49.99 per month, this plan offers 250 messages along with the features included in the Starter Plan.​

Hustler Plan: For $99.99 per month, users receive 500 messages and the standard features.​

Pro Plan: The most comprehensive plan at $199.99 per month provides 1,000 messages and all included features.

Hostinger also offer a free test with 5 messages when clicking on the “Start for free” button

Tutorial: Creating a Web Application with Hostinger Horizons

Developing a web application with Hostinger Horizons involves a straightforward process. Here’s a step-by-step guide:

Step 1: Sign Up and Access Hostinger Horizons

Visit the Hostinger Horizons page and select a plan that aligns with your requirements.​

After purchasing, log in to your Hostinger account and navigate to the hPanel dashboard.​

Go to “Websites” → “Website List” and click on “Add Website.” Choose “Hostinger Horizons” from the options to access the platform. ​

Step 2: Define Your Application Idea

In the chat interface, describe the application you wish to create. For example: “Create a web application for SUDUKO Game. The web application should be mobile friendly. There should be 3 levels of games. Level 1: Easy mode. Level 2: Medium difficulty. Level 3: Difficult Mode.”​

The AI will process your input and generate a basic version of the application based on your description.​

Step 3: Customize the Application

Layout and Design: Use the real-time editor to adjust the layout, color scheme, and overall design to match your preferences.​

Functionality: Add or modify features by providing additional prompts. For instance, you can request the inclusion of a budgeting feature or integration with external APIs for real-time data.​

Content: Upload images, input text content, and configure any necessary settings to personalize the application.​

Step 4: Test the Application

Utilize the sandbox environment to test the application’s functionality. Ensure all features operate as intended and make any necessary adjustments based on your testing.​

Step 5: Deploy the Application

Once satisfied, click the “Publish” button to deploy your application.​

Demo

Thanks to the Hostinger team for the thought leadership/ Resources for this article. Hostinger team has supported us in this content/article.
The post Meet Hostinger Horizons: A No-Code AI Tool that Lets You Create, Edit, and Publish Custom Web Apps Without Writing a Single Line of Code appeared first on MarkTechPost.

Understanding AI Agent Memory: Building Blocks for Intelligent Systems

AI agent memory comprises multiple layers, each serving a distinct role in shaping the agent’s behavior and decision-making. By dividing memory into different types, it is better to understand and design AI systems that are both contextually aware and responsive. Let’s explore the four key types of memory commonly used in AI agents: Episodic, Semantic, Procedural, and Short-Term (or Working) Memory, along with the interplay between long-term and short-term storage.

Image Source

1. Episodic Memory: Recalling Past Interactions

Episodic memory in AI refers to the storage of past interactions and the specific actions taken by the agent. Like human memory, episodic memory records the events or “episodes” an agent experiences during its operation. This type of memory is crucial because it enables the agent to reference previous conversations, decisions, and outcomes to inform future actions. For example, when a user interacts with a customer support bot, the bot might store the conversation history in an episodic memory log, allowing it to maintain context over multiple exchanges. This contextual awareness is especially important in multi-turn dialogues where understanding previous interactions can dramatically improve the quality of responses.

In practical applications, episodic memory is often implemented using persistent storage systems like vector databases. These systems can store semantic representations of interactions, enabling rapid retrieval based on similarity searches. This means that when an AI agent needs to refer back to an earlier conversation, it can quickly identify and pull relevant segments of past interactions, thereby enhancing the continuity and personalization of the experience.

2. Semantic Memory: External Knowledge and Self-awareness

Semantic memory in AI encompasses the agent’s repository of factual, external information and internal knowledge. Unlike episodic memory, which is tied to specific interactions, semantic memory holds generalized knowledge that the agent can use to understand and interpret the world. This may include language rules, domain-specific information, or self-awareness of the agent’s capabilities and limitations.

One common semantic memory use is in Retrieval-Augmented Generation (RAG) applications, where the agent leverages a vast data store to answer questions accurately. For instance, if an AI agent is tasked with providing technical support for a software product, its semantic memory might contain user manuals, troubleshooting guides, and FAQs. Semantic memory also includes grounding context that helps the agent filter and prioritize relevant data from a broader corpus of information available on the internet.

Integrating semantic memory ensures that an AI agent responds based on immediate context and draws on a broad spectrum of external knowledge. This creates a more robust, informed system that can handle diverse queries with accuracy and nuance.

3. Procedural Memory: The Blueprint of Operations

Procedural memory is the backbone of an AI system’s operational aspects. It includes systemic information such as the structure of the system prompt, the tools available to the agent, and the guardrails that ensure safe and appropriate interactions. In essence, procedural memory defines “how” the agent functions rather than “what” it knows.

This type of memory is typically managed through well-organized registries, such as Git repositories for code, prompt registries for conversational contexts, and tool registries that enumerate the available functions and APIs. An AI agent can execute tasks more reliably and predictably by having a clear blueprint of its operational procedures. The explicit definition of protocols and guidelines also ensures that the agent behaves in a controlled manner, thereby minimizing risks such as unintended outputs or safety violations.

Procedural memory supports consistency in performance and facilitates easier updates and maintenance. As new tools become available or system requirements evolve, the procedural memory can be updated in a centralized manner, ensuring that the agent adapts seamlessly to changes without compromising its core functionality.

4. Short-Term (Working) Memory: Integrating Information for Action

In many AI systems, the information drawn from long-term memory is consolidated into short-term or working memory. This is the temporary context that the agent actively uses to process current tasks. Short-term memory is a compilation of the episodic, semantic, and procedural memories that have been retrieved and localized for immediate use.

When an agent is presented with a new task or query, it assembles relevant information from its long-term stores. This might include a snippet of a previous conversation (episodic memory), pertinent factual data (semantic memory), and operational guidelines (procedural memory). The combined information forms the prompt fed into the underlying language model, allowing the AI to generate coherent, context-aware responses.

This process of compiling short-term memory is critical for tasks that require nuanced decision-making and planning. It allows the AI agent to “remember” the conversation history and tailor responses accordingly. The agility provided by short-term memory is a significant factor in creating interactions that feel natural and human-like. Also, the separation between long-term and short-term memory ensures that while the system has a vast knowledge repository, only the most pertinent information is actively engaged during interaction, optimizing performance and accuracy.

The Synergy of Long-Term and Short-Term Memory

To fully appreciate the architecture of AI agent memory, it is important to understand the dynamic interplay between long-term memory and short-term (working) memory. Long-term memory, consisting of episodic, semantic, and procedural types, is the deep storage that informs the AI about its history, external facts, and internal operational frameworks. On the other hand, short-term memory is a fluid, working subset that the agent uses to navigate current tasks. The agent can adapt to new contexts without losing the richness of stored experiences and knowledge by periodically retrieving and synthesizing data from long-term memory. This dynamic balance ensures that AI systems are well-informed, responsive, and contextually aware.

In conclusion, the multifaceted approach to memory in AI agents underscores the complexity and sophistication required to build systems that can interact intelligently with the world. Episodic memory allows for the personalization of interactions, semantic memory enriches responses with factual depth, and procedural memory guarantees operational reliability. Meanwhile, integrating these long-term memories into short-term working memory enables the AI to act swiftly and contextually in real-time scenarios. As AI advances, refining these memory systems will be pivotal in creating smart agents capable of nuanced, context-aware decision-making. The layered memory approach is a cornerstone of intelligent agent design, ensuring these systems remain robust, adaptive, and ready to tackle the challenges of an ever-evolving digital landscape.

Sources:

https://www.deeplearning.ai/short-courses/long-term-agentic-memory-with-langgraph/ 

https://arxiv.org/html/2502.12110v1 

https://arxiv.org/pdf/2309.02427

The post Understanding AI Agent Memory: Building Blocks for Intelligent Systems appeared first on MarkTechPost.

PilotANN: A Hybrid CPU-GPU System For Graph-based ANNS

Approximate Nearest Neighbor Search (ANNS) is a fundamental vector search technique that efficiently identifies similar items in high-dimensional vector spaces. Traditionally, ANNS has served as the backbone for retrieval engines and recommendation systems, however, it struggles to keep pace with modern Transformer architectures that employ higher-dimensional embeddings and larger datasets. Unlike deep learning systems that can be horizontally scaled due to their stateless nature, ANNS remains centralized, creating a severe single-machine throughput bottleneck. Empirical testing with 100-million scale datasets reveals that even state-of-the-art CPU implementations of the Hierarchical Navigable Small World (HNSW) algorithm can’t maintain adequate performance as vector dimensions increase.

Previous research on large-scale ANNS has explored two optimization paths: index structure improvements and hardware acceleration. The Inverted MultiIndex (IMI) enhanced space partitioning through multi-codebook quantization, while PQFastScan improved performance with SIMD and cache-aware optimizations. DiskANN and SPANN introduced disk-based indexing for billion-scale datasets, addressing memory hierarchy challenges through different approaches. SONG and CAGRA achieved impressive speedups through GPU parallelization but remain constrained by GPU memory capacity. BANG handled billion-scale datasets via hybrid CPU-GPU processing but lacked critical CPU baseline comparisons. These methods frequently sacrifice compatibility, accuracy or require specialized hardware.

Researchers from the Chinese University of Hong Kong, Centre for Perceptual and Interactive Intelligence, and Theory Lab of Huawei Technologies have proposed PilotANN, a hybrid CPU-GPU system designed to overcome the limitations of existing ANNS implementations. PilotANN addresses the challenge: CPU-only implementations struggle with computational demands, while GPU-only solutions are constrained by limited memory capacity. It solves this issue by utilizing both the abundant RAM of CPUs and the parallel processing capabilities of GPUs. Moreover, it employs a three-stage graph traversal process, GPU-accelerated subgraph traversal using dimensionally-reduced vectors, CPU refinement, and precise search with complete vectors.

PilotANN fundamentally reimagines the vector search process through a “staged data ready processing” paradigm. It minimizes data movement across processing stages rather than adhering to traditional “move data for computation” models. It also consists of three stages: GPU piloting with subgraph and dimensionally-reduced vectors, residual refinement using subgraph with full vectors, and final traversal employing full graph and complete vectors. The design shows cost-effectiveness with only a single commodity GPU while scaling effectively across vector dimensions and graph complexity. Data transfer overhead is minimized to just the initial query vector movement to GPU and a small candidate set returning to CPU after GPU piloting.

Experimental results show PilotANN’s performance advantages across diverse large-scale datasets. PilotANN achieves a 3.9 times throughput speedup on the 96-dimensional DEEP dataset compared to the HNSW-CPU baseline, with even more impressive gains of 5.1-5.4 times on higher-dimensional datasets. PilotANN delivers significant speedups even on the notoriously challenging T2I dataset despite no specific optimizations for this benchmark. Moreover, it shows remarkable cost-effectiveness despite utilizing more expensive hardware. While the GPU-based platform costs 2.81 USD/hour compared to the CPU-only solution at 1.69 USD/hour, PilotANN achieves 2.3 times cost-effectiveness for DEEP and 3.0-3.2 times for T2I, WIKI, and LAION datasets when measuring throughput per dollar.

In conclusion, researchers introduced PilotANN, an advancement in graph-based ANNS that effectively utilizes CPU and GPU resources for emerging workloads. It shows great performance over existing CPU-only approaches through the intelligent decomposition of top-k search into a multi-stage CPU-GPU pipeline and implementation of efficient entry selection. It democratizes high-performance nearest neighbor search by achieving competitive results with a single commodity GPU, making advanced search capabilities accessible to researchers and organizations with limited computing resources. Unlike alternative solutions requiring expensive high-end GPUs, PilotANN enables efficient ANNS deployment on common hardware configurations while maintaining search accuracy.

Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 85k+ ML SubReddit.
The post PilotANN: A Hybrid CPU-GPU System For Graph-based ANNS appeared first on MarkTechPost.

Tencent AI Researchers Introduce Hunyuan-T1: A Mamba-Powered Ultra-Lar …

Large language models struggle to process and reason over lengthy, complex texts without losing essential context. Traditional models often suffer from context loss, inefficient handling of long-range dependencies, and difficulties aligning with human preferences, affecting the accuracy and efficiency of their responses. Tencent’s Hunyuan-T1 directly tackles these challenges by integrating a novel Mamba-powered architecture with advanced reinforcement learning and curriculum strategies, ensuring robust context capture and enhanced reasoning capabilities.

Hunyuan-T1 is the first model powered by the innovative Mamba architecture, a design that fuses Hybrid Transformer and Mixture-of-Experts (MoE) technologies. Built on the TurboS fast-thinking base, Hunyuan-T1 is specifically engineered to optimize the processing of long textual sequences while minimizing computational overhead. This allows the model to effectively capture extended context and manage long-distance dependencies, crucial for tasks that demand deep, coherent reasoning.

A key highlight of Hunyuan-T1 is its heavy reliance on RL during the post-training phase. Tencent dedicated 96.7% of its computing power to this approach, enabling the model to refine its reasoning abilities iteratively. Techniques such as data replay, periodic policy resetting, and self-rewarding feedback loops help improve output quality, ensuring the model’s responses are detailed, efficient, and closely aligned with human expectations.

To further boost reasoning proficiency, Tencent employed a curriculum learning strategy. This approach gradually increases the difficulty of training data while simultaneously expanding the model’s context length. As a result, Hunyuan-T1 is trained to use tokens more efficiently, seamlessly adapting from solving basic mathematical problems to tackling complex scientific and logical challenges. Efficiency is another cornerstone of Hunyuan-T1’s design. The TurboS base’s ability to capture long-text information prevents context loss, a common issue in many language models, and doubles the decoding speed compared to similar systems. This breakthrough means that users benefit from faster, higher-quality responses without compromising performance.

The model has achieved impressive scores on multiple benchmarks: 87.2 on MMLU-PRO, which tests various subjects including humanities, social sciences, and STEM fields; 69.3 on GPQA-diamond, a challenging evaluation featuring doctoral-level scientific problems; 64.9 on LiveCodeBench for coding tasks; and a remarkable 96.2 on the MATH-500 benchmark for mathematical reasoning. These results underscore Hunyuan-T1’s versatility and ability to handle high-stakes, professional-grade tasks across various fields. Beyond quantitative metrics, Hunyuan-T1 is designed to deliver outputs with human-like understanding and creativity. During its RL phase, the model underwent a comprehensive alignment process that combined self-rewarding feedback with external reward models. This dual approach ensures its responses are accurate and exhibit rich details and natural flow.

In conclusion, Tencent’s Hunyuan-T1 combines an ultra-large-scale, Mamba-powered architecture with state-of-the-art reinforcement learning and curriculum strategies. Hunyuan-T1 delivers high performance, enhanced reasoning, and exceptional efficiency.

Check out the Details, Hugging Face and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 85k+ ML SubReddit.

The post Tencent AI Researchers Introduce Hunyuan-T1: A Mamba-Powered Ultra-Large Language Model Redefining Deep Reasoning, Contextual Efficiency, and Human-Centric Reinforcement Learning appeared first on MarkTechPost.

Advancing Medical Reasoning with Reinforcement Learning from Verifiabl …

Reinforcement Learning from Verifiable Rewards (RLVR) has recently emerged as a promising method for enhancing reasoning abilities in language models without direct supervision. This approach has shown notable success in mathematics and coding, where reasoning naturally aligns with structured problem-solving. While studies have demonstrated that RLVR alone can lead to self-evolved reasoning, research has largely been limited to these technical fields. Efforts to extend RLVR have explored synthetic datasets, such as those involving sequential tasks and object counting, indicating potential but also highlighting the challenges of adapting this method to different domains.

Expanding RLVR to broader areas remains an open challenge, particularly in tasks like multiple-choice question answering (MCQA), which provides structured, verifiable labels across diverse subjects, including medicine. However, unlike math and coding, which involve complex reasoning with an open-ended answer space, MCQA tasks typically have predefined answer choices, making it uncertain whether RLVR’s benefits translate effectively. This limitation is especially relevant in medical reasoning tasks, where models must navigate intricate clinical knowledge to produce accurate responses, an area that has proven difficult for existing AI systems.

Researchers from Microsoft Research investigate whether medical reasoning can emerge through RLVR. They introduce MED-RLVR, leveraging medical MCQA data to assess RLVR’s effectiveness in the medical domain. Their findings show that RLVR extends beyond math and coding, achieving performance comparable to supervised fine-tuning (SFT) in in-distribution tasks while significantly improving out-of-distribution generalization by eight percentage points. Analyzing training dynamics, they observe that reasoning capabilities emerge in a 3B-parameter base model without explicit supervision, highlighting RLVR’s potential for advancing reasoning in knowledge-intensive fields like medicine.

RL optimizes decision-making by training an agent to maximize rewards through interactions with an environment. It has been effectively applied to language models to align outputs with human preferences and, more recently, to elicit reasoning without explicit supervision. This study employs Proximal Policy Optimization (PPO) to train a policy model, incorporating a clipped objective function to stabilize training. Using a rule-based reward function, MED-RLVR assigns rewards based on output correctness and format validity. Without additional supervision, the model demonstrates emergent medical reasoning, similar to mathematical reasoning in prior RLVR studies, highlighting RLVR’s potential beyond structured domains.

The MedQA-USMLE dataset, which includes multi-choice medical exam questions, is used to train MED-RLVR. Unlike the standard four-option version, this dataset presents a greater challenge by offering more answer choices. Training is based on the Qwen2.5-3B model using OpenRLHF for reinforcement learning. Compared to SFT, MED-RLVR demonstrates superior generalization, particularly on the MMLU-Pro-Health dataset. Analysis reveals six stages of reasoning evolution: format failures, verbose outputs, reward hacking, and reintegrated reasoning. Unlike math or coding tasks, no self-validation behaviors (“aha-moments”) were observed, suggesting potential improvements through penalizing short reasoning chains or fine-tuning with longer CoTs.

In conclusion, the study focuses on MCQA in medicine, providing a controlled setting for evaluation. However, MCQA does not fully capture the complexity of real-world tasks like open-text answering, report generation, or medical dialogues. Additionally, the unimodal approach limits the model’s ability to integrate multimodal data, which is crucial for diagnostic applications. Future work should address these limitations. MED-RLVR, based on reinforcement learning with verifiable rewards, matches SFT on in-distribution tasks and improves out-of-distribution generalization. While medical reasoning emerges without explicit supervision, challenges like reward hacking persist, highlighting the need for further exploration of complex reasoning and multimodal integration.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 85k+ ML SubReddit.

The post Advancing Medical Reasoning with Reinforcement Learning from Verifiable Rewards (RLVR): Insights from MED-RLVR appeared first on MarkTechPost.

NVIDIA AI Researchers Introduce FFN Fusion: A Novel Optimization Techn …

Large language models (LLMs) have become vital across domains, enabling high-performance applications such as natural language generation, scientific research, and conversational agents. Underneath these advancements lies the transformer architecture, where alternating layers of attention mechanisms and feed-forward networks (FFNs) sequentially process tokenized input. However, with an increase in size and complexity, the computational burden required for inference grows substantially, creating an efficiency bottleneck. Efficient inference is now a critical concern, with many research groups focusing on strategies that can reduce latency, increase throughput, and cut computational costs while maintaining or improving model performance.

At the center of this efficiency problem lies the inherently sequential structure of transformers. Each layer’s output feeds into the next, demanding strict order and synchronization, which is especially problematic at scale. As model sizes expand, the cost of sequential computation and communication across GPUs grows, leading to reduced efficiency and increased deployment cost. This challenge is amplified in scenarios requiring fast, multi-token generation, such as real-time AI assistants. Reducing this sequential load while maintaining model capabilities presents a key technical hurdle. Unlocking new parallelization strategies that preserve accuracy yet significantly reduce computation depth is essential to broadening the accessibility and scalability of LLMs.

Several techniques have emerged to improve efficiency. Quantization reduces the precision of numerical representations to minimize memory and computation needs, though it often risks accuracy losses, especially at low bit-widths. Pruning eliminates redundant parameters and simplifies models but potentially harms accuracy without care. Mixture-of-Experts (MoE) models activate only a subset of parameters per input, making them highly efficient for specific workloads. Still, they can underperform at intermediate batch sizes due to low hardware utilization. While valuable, these strategies have trade-offs that limit their universal applicability. Consequently, the field seeks methods that offer broad efficiency improvements with fewer compromises, especially for dense architectures that are simpler to train, deploy, and maintain.

Researchers at NVIDIA introduced a new architectural optimization technique named FFN Fusion, which addresses the sequential bottleneck in transformers by identifying FFN sequences that can be executed in parallel. This approach emerged from the observation that when attention layers are removed using a Puzzle tool, models often retain long sequences of consecutive FFNs. These sequences show minimal interdependency and, therefore, can be processed simultaneously. By analyzing the structure of LLMs such as Llama-3.1-405B-Instruct, researchers created a new model called Ultra-253B-Base by pruning and restructuring the base model through FFN Fusion. This method results in a significantly more efficient model that maintains competitive performance.

FFN Fusion fuses multiple consecutive FFN layers into a single, wider FFN. This process is grounded in mathematical equivalence: by concatenating the weights of several FFNs, one can produce a single module that behaves like the sum of the original layers but can be computed in parallel. For instance, if three FFNs are stacked sequentially, each dependent on the output of the previous one, their fusion removes these dependencies by ensuring all three operate on the same input and their outputs are aggregated. The theoretical foundation for this method shows that the fused FFN maintains the same representational capacity. Researchers performed dependency analysis using cosine distance between FFN outputs to identify regions with low interdependence. These regions were deemed optimal for fusion, as minimal change in token direction between layers indicated the feasibility of parallel processing.

Applying FFN Fusion to the Llama-405B model resulted in Ultra-253B-Base, which delivered notable gains in speed and resource efficiency. Specifically, the new model achieved a 1.71x improvement in inference latency and reduced per-token computational cost by 35x at a batch size of 32. This efficiency did not come at the expense of capability. Ultra-253B-Base scored 85.17% on MMLU, 72.25% on MMLU-Pro, 84.92% on Arena Hard, 86.58% on HumanEval, and 9.19 on MT-Bench. These results often matched or exceeded the original 405B-parameter model, even though Ultra-253B-Base contained only 253 billion parameters. Memory usage also improved with a 2× reduction in kv-cache requirements. The training process involved distilling 54 billion tokens at an 8k context window, followed by staged fine-tuning at 16k, 32k, and 128k contexts. These steps ensured the fused model maintained high accuracy while benefiting from reduced size.

This research demonstrates how thoughtful architectural redesign can unlock significant efficiency gains. Researchers showed that FFN layers in transformer architectures are often more independent than previously assumed. Their method of quantifying inter-layer dependency and transforming model structures allowed for broader application across models of various sizes. The technique was also validated on a 70B-parameter model, proving generalizability. Further experiments indicated that while FFN layers can often be fused with minimal impact, full block parallelization, including attention, introduces more performance degradation due to stronger interdependencies.

Several Key Takeaways from the Research on FFN Fusion:

The FFN Fusion technique reduces sequential computation in transformers by parallelizing low-dependency FFN layers.  

Fusion is achieved by replacing sequences of FFNs with a single wider FFN using concatenated weights.  

Ultra-253B-Base, derived from Llama-3.1-405B, achieves 1.71x faster inference and 35x lower per-token cost.  

Benchmark results include: 85.17% (MMLU), 72.25% (MMLU-Pro), 86.58% (HumanEval), 84.92% (Arena Hard), and 9.19 (MT-Bench).  

Memory usage is cut by half due to kv-cache optimization.  

FFN Fusion is more effective at larger model scales and works well with techniques like pruning and quantization.  

Full transformer block parallelization shows potential but requires further research due to stronger interdependencies.  

A systematic method using cosine distance helps identify which FFN sequences are safe to fuse.  

The technique is validated across different model sizes, including 49B, 70B, and 253B.  

This approach lays the foundation for more parallel-friendly and hardware-efficient LLM designs.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 85k+ ML SubReddit.

The post NVIDIA AI Researchers Introduce FFN Fusion: A Novel Optimization Technique that Demonstrates How Sequential Computation in Large Language Models LLMs can be Effectively Parallelized appeared first on MarkTechPost.

Amazon Bedrock Guardrails image content filters provide industry-leadi …

Amazon Bedrock Guardrails announces the general availability of image content filters, enabling you to moderate both image and text content in your generative AI applications. Previously limited to text-only filtering, this enhancement now provides comprehensive content moderation across both modalities. This new capability removes the heavy lifting required to build your own image safeguards or spend cycles on manual content moderation that can be error-prone and tedious.
Tero Hottinen, VP, Head of Strategic Partnerships at KONE, envisions the following use case:

“In its ongoing evaluation, KONE recognizes the potential of Amazon Bedrock Guardrails as a key component in protecting generative AI applications, particularly for relevance and contextual grounding checks, as well as the multimodal safeguards. The company envisions integrating product design diagrams and manuals into its applications, with Amazon Bedrock Guardrails playing a crucial role in enabling more accurate diagnosis and analysis of multimodal content.”

Amazon Bedrock Guardrails provides configurable safeguards to help customers block harmful or unwanted inputs and outputs for their generative AI applications. Customers can create custom Guardrails tailored to their specific use cases by implementing different policies to detect and filter harmful or unwanted content from both input prompts and model responses. Furthermore, customers can use Guardrails to detect model hallucinations and help make responses grounded and accurate. Through its standalone ApplyGuardrail API, Guardrails enables customers to apply consistent policies across any foundation model, including those hosted on Amazon Bedrock, self-hosted models, and third-party models. Bedrock Guardrails supports seamless integration with Bedrock Agents and Bedrock Knowledge Bases, enabling developers to enforce safeguards across various workflows, such as Retrieval Augmented Generation (RAG) systems and agentic applications.
Amazon Bedrock Guardrails offers six distinct policies, including: content filters to detect and filter harmful material across several categories, including hate, insults, sexual content, violence, misconduct, and to prevent prompt attacks; topic filters to restrict specific subjects; sensitive information filters to block personally identifiable information (PII); word filters to block specific terms; contextual grounding checks to detect hallucinations and analyze response relevance; and Automated Reasoning checks (currently in gated preview) to identify, correct, and explain factual claims. With the new image content moderation capability, these safeguards now extend to both text and images, helping customer block up to 88% of harmful multimodal content. You can independently configure moderation for either image or text content (or both) with adjustable thresholds from low to high, helping you to build generative AI applications that align with your organization’s responsible AI policies.
This new capability is generally available in US East (N. Virginia), US West (Oregon), Europe (Frankfurt), and Asia Pacific (Tokyo) AWS Regions.
In this post, we discuss how to get started with image content filters in Amazon Bedrock Guardrails.
Solution overview
To get started, create a guardrail on the AWS Management Console and configure the content filters for either text or image data or both. You can also use AWS SDKs to integrate this capability into your applications.
Create a guardrail
To create a guardrail, complete the following steps:

On the Amazon Bedrock console, under Safeguards in the navigation pane, choose Guardrails.
Choose Create guardrail.
In the Configure content filters section, under Harmful categories and Prompt attacks, you can use the existing content filters to detect and block image data in addition to text data.
After you’ve selected and configured the content filters you want to use, you can save the guardrail and start using it to help you block harmful or unwanted inputs and outputs for your generative AI applications.

Test a guardrail with text generation
To test the new guardrail on the Amazon Bedrock console, select the guardrail and choose Test. You have two options: test the guardrail by choosing and invoking a model or test the guardrail without invoking a model by using the Amazon Bedrock Guardrails independent ApplyGuardail API.
With the ApplyGuardrail API, you can validate content at any point in your application flow before processing or serving results to the user. You can also use the API to evaluate inputs and outputs for self-managed (custom) or third-party FMs, regardless of the underlying infrastructure. For example, you could use the API to evaluate a Meta Llama 3.2 model hosted on Amazon SageMaker or a Mistral NeMo model running on your laptop.
Test a guardrail by choosing and invoking a model
Select a model that supports image inputs or outputs, for example, Anthropic’s Claude 3.5 Sonnet. Verify that the prompt and response filters are enabled for image content. Then, provide a prompt, upload an image file, and choose Run.

In this example, Amazon Bedrock Guardrails intervened. Choose View trace for more details.
The guardrail trace provides a record of how safety measures were applied during an interaction. It shows whether Amazon Bedrock Guardrails intervened or not and what assessments were made on both input (prompt) and output (model response). In this example, the content filters blocked the input prompt because they detected violence in the image with medium confidence.

Test a guardrail without invoking a model
On the Amazon Bedrock console, choose Use ApplyGuardail API, the independent API to test the guardrail without invoking a model. Choose whether you want to validate an input prompt or an example of a model generated output. Then, repeat the steps from the previous section. Verify that the prompt and response filters are enabled for image content, provide the content to validate, and choose Run.

For this example, we reused the same image and input prompt, and Amazon Bedrock Guardrails intervened again. Choose View trace again for more details.

Test a guardrail with image generation
Now, let’s test the Amazon Bedrock Guardrails multimodal toxicity detection with image generation use cases. The following is an example of using Amazon Bedrock Guardrails image content filters with an image generation use case. We generate an image using the Stability model on Amazon Bedrock using the InvokeModel API and the guardrail:

guardrailIdentifier = <<guardrail_id>>
guardrailVersion =”1″

model_id = ‘stability.sd3-5-large-v1:0’
output_images_folder = ‘images/output’

body = json.dumps(
{
“prompt”: “A Gun”, # for image generation (“A gun” should get blocked by violence)
“output_format”: “jpeg”
}
)

bedrock_runtime = boto3.client(“bedrock-runtime”, region_name=region)
try:
print(“Making a call to InvokeModel API for model: {}”.format(model_id))
response = bedrock_runtime.invoke_model(
body=body,
modelId=model_id,
trace=’ENABLED’,
guardrailIdentifier=guardrailIdentifier,
guardrailVersion=guardrailVersion
)
response_body = json.loads(response.get(‘body’).read())
print(“Received response from InvokeModel API (Request Id: {})”.format(response[‘ResponseMetadata’][‘RequestId’]))
if ‘images’ in response_body and len(response_body[‘images’]) > 0:
os.makedirs(output_images_folder, exist_ok=True)
images = response_body[“images”]
for image in images:
image_id = ”.join(random.choices(string.ascii_lowercase + string.digits, k=6))
image_file = os.path.join(output_images_folder, “generated-image-{}.jpg”.format(image_id))
print(“Saving generated image {} at {}”.format(image_id, image_file))
with open(image_file, ‘wb’) as image_file_descriptor:
image_file_descriptor.write(base64.b64decode(image.encode(‘utf-8’)))
else:
print(“No images generated from model”)
guardrail_trace = response_body[‘amazon-bedrock-trace’][‘guardrail’]
guardrail_trace[‘modelOutput’] = [‘<REDACTED>’]
print(guardrail_trace[‘outputs’])
print(“nGuardrail Trace: {}”.format(json.dumps(guardrail_trace, indent=2)))
except botocore.exceptions.ClientError as err:
print(“Failed while calling InvokeModel API with RequestId = {}”.format(err.response[‘ResponseMetadata’][‘RequestId’]))
raise err

You can access the complete example from the GitHub repo.
Conclusion
In this post, we explored how Amazon Bedrock Guardrails’ new image content filters provide comprehensive multimodal content moderation capabilities. By extending beyond text-only filtering, this solution now helps customers block up to 88% of harmful or unwanted multimodal content across configurable categories including hate, insults, sexual content, violence, misconduct, and prompt attack detection. Guardrails can help organizations across healthcare, manufacturing, financial services, media, and education enhance brand safety without the burden of building custom safeguards or conducting error-prone manual evaluations.
To learn more, see Stop harmful content in models using Amazon Bedrock Guardrails.

About the Authors
Satveer Khurpa is a Sr. WW Specialist Solutions Architect, Amazon Bedrock at Amazon Web Services, specializing in Amazon Bedrock security. In this role, he uses his expertise in cloud-based architectures to develop innovative generative AI solutions for clients across diverse industries. Satveer’s deep understanding of generative AI technologies and security principles allows him to design scalable, secure, and responsible applications that unlock new business opportunities and drive tangible value while maintaining robust security postures.
Shyam Srinivasan is on the Amazon Bedrock Guardrails product team. He cares about making the world a better place through technology and loves being part of this journey. In his spare time, Shyam likes to run long distances, travel around the world, and experience new cultures with family and friends.
Antonio Rodriguez is a Principal Generative AI Specialist Solutions Architect at AWS. He helps companies of all sizes solve their challenges, embrace innovation, and create new business opportunities with Amazon Bedrock. Apart from work, he loves to spend time with his family and play sports with his friends.
Dr. Andrew Kane is an AWS Principal WW Tech Lead (AI Language Services) based out of London. He focuses on the AWS Language and Vision AI services, helping our customers architect multiple AI services into a single use case-driven solution. Before joining AWS at the beginning of 2015, Andrew spent two decades working in the fields of signal processing, financial payments systems, weapons tracking, and editorial and publishing systems. He is a keen karate enthusiast (just one belt away from Black Belt) and is also an avid home-brewer, using automated brewing hardware and other IoT sensors.