Meet VideoRAG: A Retrieval-Augmented Generation (RAG) Framework Levera …

Video-based technologies have become essential tools for information retrieval and understanding complex concepts. Videos combine visual, temporal, and contextual data, providing a multimodal representation that surpasses static images and text. With the increasing popularity of video-sharing platforms and the vast repository of educational and informational videos available online, leveraging videos as knowledge sources offers unprecedented opportunities to answer queries that require detailed context, spatial understanding, and process demonstration.

Retrieval-augmented generation systems, which combine retrieval and response generation, often neglect the full potential of video data. These systems typically rely on textual information or occasionally include static images to support query responses. However, they fail to capture the richness of videos, which include visual dynamics and multimodal cues essential for complex tasks. Conventional methods either predefine query-relevant videos without retrieval or convert videos into textual formats, losing critical information like visual context and temporal dynamics. This inadequacy hinders providing precise and informative answers for real-world, multimodal queries.

Current methodologies have explored textual or image-based retrieval but have not fully utilized video data. In traditional RAG systems, video content is represented as subtitles or captions, focusing solely on textual aspects or reduced to preselected frames for targeted analysis. Both approaches limit the multimodal richness of videos. Moreover, the absence of techniques to dynamically retrieve and incorporate query-relevant videos further restricts the effectiveness of these systems. The lack of comprehensive video integration leaves an untapped opportunity to enhance the retrieval-augmented generation paradigm.

Research teams from KaiST and DeepAuto.ai proposed a novel framework called VideoRAG to address the challenges associated with using video data in retrieval-augmented generation systems. VideoRAG dynamically retrieves query-relevant videos from a large corpus and incorporates visual and textual information into the generation process. It leverages the capabilities of advanced Large Video Language Models (LVLMs) for seamless integration of multimodal data. The approach represents a significant improvement over previous methods by ensuring the retrieved videos are contextually aligned with user queries and maintaining the temporal richness of the video content.

The proposed methodology involves two main stages: retrieval and generation. It then identifies videos by their similar visual and textual aspects concerning the query during retrieval. VideoRAG applies automatic speech recognition to generate the auxiliary textual data for a video that is not available with subtitles. This stage ensures that the response generation from all videos gets meaningful contributions from each video. The relevant retrieved videos are further fed into the generation module of the framework, where multimodal data like frames, subtitles, and query text are integrated. These inputs are processed holistically in LVLMs, enabling them to produce long, rich, accurate, and contextually apt responses. The focus of VideoRAG on visual-textual element combinations makes it possible to represent intricacies in complex processes and interactions that cannot be described using static modalities.

VideoRAG was extensively experimented with on datasets like WikiHowQA and HowTo100M. These datasets encompass a broad spectrum of queries and video content. In particular, the approach revealed a better response quality, according to various metrics, like ROUGE-L, BLEU-4, and BERTScore. So, regarding the VideoRAG method, the score was 0.254 according to ROUGE-L, whereas for text-based methods, RAG reported 0.228 as the maximum score. It’s also demonstrated the same with the BLEU-4, the n-gram overlap: for VideoRAG; this is 0.054; for the text-based one, it was only 0.044. The framework variant, which used both video frames and transcripts, further improved performance, achieving a BERTScore of 0.881, compared to 0.870 for the baseline methods. These results highlight the importance of multimodal integration in improving response accuracy and underscore the transformative potential of VideoRAG.

The authors showed that VideoRAG’s ability to combine visual and textual elements dynamically leads to more contextually rich and precise responses. Compared to traditional RAG systems that rely solely on textual or static image data, VideoRAG excels in scenarios requiring detailed spatial and temporal understanding. Including auxiliary text generation for videos without subtitles further ensures consistent performance across diverse datasets. By enabling retrieval and generation based on a video corpus, the framework addresses the limitations of existing methods and sets a benchmark for future multimodal retrieval-augmented systems.

In a nutshell, VideoRAG represents a big step forward in retrieval-augmented generation systems because it leverages video content to enhance response quality. This model combines state-of-the-art retrieval techniques with the power of LVLMs to deliver context-rich, accurate answers. Methodologically, it successfully addresses the deficiencies of the current systems, thereby providing a solid framework for incorporating video data into knowledge generation pipelines. With its superior performance over various metrics and datasets, VideoRAG establishes itself as a novel approach for including videos in retrieval-augmented generation systems.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 65k+ ML SubReddit.

Recommended Open-Source AI Platform: ‘Parlant is a framework that transforms how AI agents make decisions in customer-facing scenarios.’ (Promoted)
The post Meet VideoRAG: A Retrieval-Augmented Generation (RAG) Framework Leveraging Video Content for Enhanced Query Responses appeared first on MarkTechPost.

Implement RAG while meeting data residency requirements using AWS hybr …

With the general availability of Amazon Bedrock Agents, you can rapidly develop generative AI applications to run multi-step tasks across a myriad of enterprise systems and data sources. However, some geographies and regulated industries bound by data protection and privacy regulations have sought to combine generative AI services in the cloud with regulated data on premises. In this post, we show how to extend Amazon Bedrock Agents to hybrid and edge services such as AWS Outposts and AWS Local Zones to build distributed Retrieval Augmented Generation (RAG) applications with on-premises data for improved model outcomes. With Outposts, we also cover a reference pattern for a fully local RAG application that requires both the foundation model (FM) and data sources to reside on premises.
Solution overview
For organizations processing or storing sensitive information such as personally identifiable information (PII), customers have asked for AWS Global Infrastructure to address these specific localities, including mechanisms to make sure that data is being stored and processed in compliance with local laws and regulations. Through AWS hybrid and edge services such as Local Zones and Outposts, you can benefit from the scalability and flexibility of the AWS Cloud with the low latency and local processing capabilities of an on-premises (or localized) infrastructure. This hybrid approach allows organizations to run applications and process data closer to the source, reducing latency, improving responsiveness for time-sensitive workloads, and adhering to data regulations.
Although architecting for data residency with an Outposts rack and Local Zone has been broadly discussed, generative AI and FMs introduce an additional set of architectural considerations. As generative AI models become increasingly powerful and ubiquitous, customers have asked us how they might consider deploying models closer to the devices, sensors, and end users generating and consuming data. Moreover, interest in small language models (SLMs) that enable resource-constrained devices to perform complex functions—such as natural language processing and predictive automation—is growing. To learn more about opportunities for customers to use SLMs, see Opportunities for telecoms with small language models: Insights from AWS and Meta on our AWS Industries blog.
Beyond SLMs, the interest in generative AI at the edge has been driven by two primary factors:

Latency – Running these computationally intensive models on an edge infrastructure can significantly reduce latency and improve real-time responsiveness, which is critical for many time-sensitive applications like virtual assistants, augmented reality, and autonomous systems.
Privacy and security – Processing sensitive data at the edge, rather than sending it to the cloud, can enhance privacy and security by minimizing data exposure. This is particularly useful in healthcare, financial services, and legal sectors.

In this post, we cover two primary architectural patterns: fully local RAG and hybrid RAG.
Fully local RAG
For the deployment of a large language model (LLM) in a RAG use case on an Outposts rack, the LLM will be self-hosted on a G4dn instance and knowledge bases will be created on the Outpost rack, using either Amazon Elastic Block Storage (Amazon EBS) or Amazon S3 on Outposts. The documents uploaded to the knowledge base on the rack might be private and sensitive documents, so they won’t be transferred to the AWS Region and will remain completely local on the Outpost rack. You can use a local vector database either hosted on Amazon Elastic Compute Cloud (Amazon EC2) or using Amazon Relational Database Service (Amazon RDS) for PostgreSQL on the Outpost rack with the pgvector extension to store embeddings. See the following figure for an example.

Hybrid RAG
Certain customers are required by data protection or privacy regulations to keep their data within specific state boundaries. To align with these requirements and still use such data for generative AI, customers with hybrid and edge environments need to host their FMs in both a Region and at the edge. This setup enables you to use data for generative purposes and remain compliant with security regulations. To orchestrate the behavior of such a distributed system, you need a system that can understand the nuances of your prompt and direct you to the right FM running in a compliant environment. Amazon Bedrock Agents makes this distributed system in hybrid systems possible.
Amazon Bedrock Agents enables you to build and configure autonomous agents in your application. Agents orchestrate interactions between FMs, data sources, software applications, and user conversations. The orchestration includes the ability to invoke AWS Lambda functions to invoke other FMs, opening the ability to run self-managed FMs at the edge. With this mechanism, you can build distributed RAG applications for highly regulated industries subject to data residency requirements. In the hybrid deployment scenario, in response to a customer prompt, Amazon Bedrock can perform some actions in a specified Region and defer other actions to a self-hosted FM in a Local Zone. The following example illustrates the hybrid RAG high-level architecture.

In the following sections, we dive deep into both solutions and their implementation.
Fully local RAG: Solution deep dive
To start, you need to configure your virtual private cloud (VPC) with an edge subnet on the Outpost rack. To create an edge subnet on the Outpost, you need to find the Outpost Amazon Resource Name (ARN) on which you want to create the subnet, as well as the Availability Zone of the Outpost. After you create the internet gateway, route tables, and subnet associations, launch a series of EC2 instances on the Outpost rack to run your RAG application, including the following components.

Vector store –  To support RAG (Retrieval-Augmented Generation), deploy an open-source vector database, such as ChromaDB or Faiss, on an EC2 instance (C5 family) on AWS Outposts. This vector database will store the vector representations of your documents, serving as a key component of your local Knowledge Base. Your selected embedding model will be used to convert text (both documents and queries) into these vector representations, enabling efficient storage and retrieval. The actual Knowledge Base consists of the original text documents and their corresponding vector representations stored in the vector database. To query this knowledge base and generate a response based on the retrieved results, you can use LangChain to chain the related documents retrieved by the vector search to the prompt fed to your Large Language Model (LLM). This approach allows for retrieval and integration of relevant information into the LLM’s generation process, enhancing its responses with local, domain-specific knowledge.
Chatbot application – On a second EC2 instance (C5 family), deploy the following two components: a backend service responsible for ingesting prompts and proxying the requests back to the LLM running on the Outpost, and a simple React application that allows users to prompt a local generative AI chatbot with questions.
LLM or SLM– On a third EC2 instance (G4 family), deploy an LLM or SLM to conduct edge inferencing via popular frameworks such as Ollama. Additionally, you can use ModelBuilder using the SageMaker SDK to deploy to a local endpoint, such as an EC2 instance running at the edge.

Optionally, your underlying proprietary data sources can be stored on Amazon Simple Storage Service (Amazon S3) on Outposts or using Amazon S3-compatible solutions running on Amazon EC2 instances with EBS volumes.
The components intercommunicate through the traffic flow illustrated in the following figure.

The workflow consists of the following steps:

Using the frontend application, the user uploads documents that will serve as the knowledge base and are stored in Amazon EBS on the Outpost rack. These documents are chunked by the application and are sent to the embedding model.
The embedding model, which is hosted on the same EC2 instance as the local LLM API inference server, converts the text chunks into vector representations.
The generated embeddings are sent to the vector database and stored, completing the knowledge base creation.
Through the frontend application, the user prompts the chatbot interface with a question.
The prompt is forwarded to the local LLM API inference server instance, where the prompt is tokenized and is converted into a vector representation using the local embedding model.
The question’s vector representation is sent to the vector database where a similarity search is performed to get matching data sources from the knowledge base.
After the local LLM has the query and the relevant context from the knowledge base, it processes the prompt, generates a response, and sends it back to the chatbot application.
The chatbot application presents the LLM response to the user through its interface.

To learn more about the fully local RAG application or get hands-on with the sample application, see Module 2 of our public AWS Workshop: Hands-on with Generative AI on AWS Hybrid & Edge Services.
Hybrid RAG: Solution deep dive
To start, you need to configure a VPC with an edge subnet, either corresponding to an Outpost rack or Local Zone depending on the use case. After you create the internet gateway, route tables, and subnet associations, launch an EC2 instance on the Outpost rack (or Local Zone) to run your hybrid RAG application. On the EC2 instance itself, you can reuse the same components as the fully local RAG: a vector store, backend API server, embedding model and a local LLM.
In this architecture, we rely heavily on managed services such as Lambda and Amazon Bedrock because only select FMs and knowledge bases corresponding to the heavily regulated data, rather than the orchestrator itself, are required to live at the edge. To do so, we will extend the existing Amazon Bedrock Agents workflows to the edge using a sample FM-powered customer service bot.
In this example customer service bot, we’re a shoe retailer bot that provides customer service support for purchasing shoes by providing options in a human-like conversation. We also assume that the knowledge base surrounding the practice of shoemaking is proprietary and, therefore, resides at the edge. As a result, questions surrounding shoemaking will be addressed by the knowledge base and local FM running at the edge.
To make sure that the user prompt is effectively proxied to the right FM, we rely on Amazon Bedrock Agents action groups. An action group defines actions that the agent can perform, such as place_order or check_inventory. In our example, we could define an additional action within an existing action group called hybrid_rag or learn_shoemaking that specifically addresses prompts that can only be addressed by the AWS hybrid and edge locations.
As part of the agent’s InvokeAgent API, an agent interprets the prompt (such as “How is leather used for shoemaking?”) with an FM and generates a logic for the next step it should take, including a prediction for the most prudent action in an action group. In this example, we want the prompt, “Hello, I would like recommendations to purchase some shoes.” to be directed to the /check_inventory action group, whereas the prompt, “How is leather used for shoemaking?” could be directed to the /hybrid_rag action group.
The following diagram illustrates this orchestration, which is implemented by the orchestration phase of the Amazon Bedrock agent.

To create the additional edge-specific action group, the new OpenAPI schema must reflect the new action, hybrid_rag with a detailed description, structure, and parameters that define the action in the action group as an API operation specifically focused on a data domain only available in a specific edge location.
After you define an action group using the OpenAPI specification, you can define a Lambda function to program the business logic for an action group. This Lambda handler (see the following code) might include supporting functions (such as queryEdgeModel) for the individual business logic corresponding to each action group.

def lambda_handler(event, context):
responses = []
global cursor
if cursor == None:
cursor = load_data()
id = ”
api_path = event[‘apiPath’]
logger.info(‘API Path’)
logger.info(api_path)

if api_path == ‘/customer/{CustomerName}’:
parameters = event[‘parameters’]
for parameter in parameters:
if parameter[“name”] == “CustomerName”:
cName = parameter[“value”]
body = return_customer_info(cName)
elif api_path == ‘/place_order’:
parameters = event[‘parameters’]
for parameter in parameters:
if parameter[“name”] == “ShoeID”:
id = parameter[“value”]
if parameter[“name”] == “CustomerID”:
cid = parameter[“value”]
body = place_shoe_order(id, cid)
elif api_path == ‘/check_inventory’:
body = return_shoe_inventory()
elif api_path == “/hybrid_rag”:
prompt = event[‘parameters’][0][“value”]
body = queryEdgeModel(prompt)
response_body = {“application/json”: {“body”: str(body)}}
response_code = 200
else:
body = {“{} is not a valid api, try another one.”.format(api_path)}

response_body = {
‘application/json’: {
‘body’: json.dumps(body)
}
}

However, in the action group corresponding to the edge LLM (as seen in the code below), the business logic won’t include Region-based FM invocations, such as using Amazon Bedrock APIs. Instead, the customer-managed endpoint will be invoked, for example using the private IP address of the EC2 instance hosting the edge FM in a Local Zone or Outpost. This way, AWS native services such as Lambda and Amazon Bedrock can orchestrate complicated hybrid and edge RAG workflows.

def queryEdgeModel(prompt):
import urllib.request, urllib.parse
# Composing a payload for API
payload = {‘text’: prompt}
data = json.dumps(payload).encode(‘utf-8’)
headers = {‘Content-type’: ‘application/json’}

# Sending a POST request to the edge server
req = urllib.request.Request(url=”http://<your-private-ip-address>:5000/”, data=data, headers=headers, method=’POST’)
with urllib.request.urlopen(req) as response:
response_text = response.read().decode(‘utf-8’)
return response_text

After the solution is fully deployed, you can visit the chat playground feature on the Amazon Bedrock Agents console and ask the question, “How are the rubber heels of shoes made?” Even though most of the prompts will be be exclusively focused on retail customer service operations for ordering shoes, the native orchestration support by Amazon Bedrock Agents seamlessly directs the prompt to your edge FM running the LLM for shoemaking.

To learn more about this hybrid RAG application or get hands-on with the cross-environment application, refer to Module 1 of our public AWS Workshop: Hands-on with Generative AI on AWS Hybrid & Edge Services.
Conclusion
In this post, we demonstrated how to extend Amazon Bedrock Agents to AWS hybrid and edge services, such as Local Zones or Outposts, to build distributed RAG applications in highly regulated industries subject to data residency requirements. Moreover, for 100% local deployments to align with the most stringent data residency requirements, we presented architectures converging the knowledge base, compute, and LLM within the Outposts hardware itself.
To get started with both architectures, visit AWS Workshops. To get started with our newly released workshop, see Hands-on with Generative AI on AWS Hybrid & Edge Services. Additionally, check out other AWS hybrid cloud solutions or reach out to your local AWS account team to learn how to get started with Local Zones or Outposts.

About the Authors
Robert Belson is a Developer Advocate in the AWS Worldwide Telecom Business Unit, specializing in AWS edge computing. He focuses on working with the developer community and large enterprise customers to solve their business challenges using automation, hybrid networking, and the edge cloud.
Aditya Lolla is a Sr. Hybrid Edge Specialist Solutions architect at Amazon Web Services. He assists customers across the world with their migration and modernization journey from on-premises environments to the cloud and also build hybrid architectures on AWS Edge infrastructure. Aditya’s areas of interest include private networks, public and private cloud platforms, multi-access edge computing, hybrid and multi cloud strategies and computer vision applications.

Unlocking complex problem-solving with multi-agent collaboration on Am …

Large language model (LLM) based AI agents that have been specialized for specific tasks have demonstrated great problem-solving capabilities. By combining the reasoning power of multiple intelligent specialized agents, multi-agent collaboration has emerged as a powerful approach to tackle more intricate, multistep workflows.
The concept of multi-agent systems isn’t entirely new—it has its roots in distributed artificial intelligence research dating back to the 1980s. However, with recent advancements in LLMs, the capabilities of specialized agents have significantly expanded in areas such as reasoning, decision-making, understanding, and generation through language and other modalities. For instance, a single attraction research agent can perform web searches and list potential destinations based on user preferences. By creating a network of specialized agents, we can combine the strengths of multiple specialist agents to solve increasingly complex problems, such as creating and optimizing an entire travel plan by considering weather forecasts in nearby cities, traffic conditions, flight and hotel availability, restaurant reviews, attraction ratings, and more.
The research team at AWS has worked extensively on building and evaluating the multi-agent collaboration (MAC) framework so customers can orchestrate multiple AI agents on Amazon Bedrock Agents. In this post, we explore the concept of multi-agent collaboration (MAC) and its benefits, as well as the key components of our MAC framework. We also go deeper into our evaluation methodology and present insights from our studies. More technical details can be found in our technical report.
Benefits of multi-agent systems
Multi-agent collaboration offers several key advantages over single-agent approaches, primarily stemming from distributed problem-solving and specialization.
Distributed problem-solving refers to the ability to break down complex tasks into smaller subtasks that can be handled by specialized agents. By breaking down tasks, each agent can focus on a specific aspect of the problem, leading to more efficient and effective problem-solving. For example, a travel planning problem can be decomposed into subtasks such as checking weather forecasts, finding available hotels, and selecting the best routes.
The distributed aspect also contributes to the extensibility and robustness of the system. As the scope of a problem increases, we can simply add more agents to extend the capability of the system rather than try to optimize a monolithic agent packed with instructions and tools. On robustness, the system can be more resilient to failures because multiple agents can compensate for and even potentially correct errors produced by a single agent.
Specialization allows each agent to focus on a specific area within the problem domain. For example, in a network of agents working on software development, a coordinator agent can manage overall planning, a programming agent can generate correct code and test cases, and a code review agent can provide constructive feedback on the generated code. Each agent can be designed and customized to excel at a specific task.
For developers building agents, this means the workload of designing and implementing an agentic system can be organically distributed, leading to faster development cycles and better quality. Within enterprises, often development teams have distributed expertise that is ideal for developing specialist agents. Such specialist agents can be further reused by other teams across the entire organization.
In contrast, developing a single agent to perform all subtasks would require the agent to plan the problem-solving strategy at a high level while also keeping track of low-level details. For example, in the case of travel planning, the agent would need to maintain a high-level plan for checking weather forecasts, searching for hotel rooms and attractions, while simultaneously reasoning about the correct usage of a set of hotel-searching APIs. This single-agent approach can easily lead to confusion for LLMs because long-context reasoning becomes challenging when different types of information are mixed. Later in this post, we provide evaluation data points to illustrate the benefits of multi-agent collaboration.
A hierarchical multi-agent collaboration framework

The MAC framework for Amazon Bedrock Agents starts from a hierarchical approach and expands to other mechanisms in the future. The framework consists of several key components designed to optimize performance and efficiency.

Here’s an explanation of each of the components of the multi-agent team:

Supervisor agent – This is an agent that coordinates a network of specialized agents. It’s responsible for organizing the overall workflow, breaking down tasks, and assigning subtasks to specialist agents. In our framework, a supervisor agent can assign and delegate tasks, however, the responsibility of solving the problem won’t be transferred.
Specialist agents – These are agents with specific expertise, designed to handle particular aspects of a given problem.
Inter-agent communication – Communication is the key component of multi-agent collaboration, allowing agents to exchange information and coordinate their actions. We use a standardized communication protocol that allows the supervisor agents to send and receive messages to and from the specialist agents.
Payload referencing – This mechanism enables efficient sharing of large content blocks (like code snippets or detailed travel itineraries) between agents, significantly reducing communication overhead. Instead of repeatedly transmitting large pieces of data, agents can reference previously shared payloads using unique identifiers. This feature is particularly valuable in domains such as software development.
Routing mode – For simpler tasks, this mode allows direct routing to specialist agents, bypassing the full orchestration process to improve efficiency for latency-sensitive applications.

The following figure shows inter-agent communication in an interactive application. The user first initiates a request to the supervisor agent. After coordinating with the subagents, the supervisor agent returns a response to the user.

Evaluation of multi-agent collaboration: A comprehensive approach
Evaluating the effectiveness and efficiency of multi-agent systems presents unique challenges due to several complexities:

Users can follow up and provide additional instructions to the supervisor agent.
For many problems, there are multiple ways to resolve them.
The success of a task often requires an agentic system to correctly perform multiple subtasks.

Conventional evaluation methods based on matching ground-truth actions or states often fall short in providing intuitive results and insights. To address this, we developed a comprehensive framework that calculates success rates based on automatic judgments of human-annotated assertions. We refer to this approach as “assertion-based benchmarking.” Here’s how it works:

Scenario creation – We create a diverse set of scenarios across different domains, each with specific goals that an agent must achieve to obtain success.
Assertions – For each scenario, we manually annotate a set of assertions that must be true for the task to be considered successful. These assertions cover both user-observable outcomes and system-level behaviors.
Agent and user simulation We simulate the behavior of the agent in a sandbox environment, where the agent is asked to solve the problems described in the scenarios. Whenever user interaction is required, we use an independent LLM-based user simulator to provide feedback.
Automated evaluation – We use an LLM to automatically judge whether each assertion is true based on the conversation transcript.
Human evaluation – Instead of using LLMs, we ask humans to directly judge the success based on simulated trajectories.

Here is an example of a scenario and corresponding assertions for assertion-based benchmarking:

Goals:

User needs the weather conditions expected in Las Vegas for tomorrow, January 5, 2025.
User needs to search for a direct flight from Denver International Airport to McCarran International Airport, Las Vegas, departing tomorrow morning, January 5, 2025.

Assertions:

User is informed about the weather forecast for Las Vegas tomorrow, January 5, 2025.
User is informed about the available direct flight options for a trip from Denver International Airport to McCarran International Airport in Las Vegas for tomorrow, January 5, 2025. get_tomorrow_weather_by_city is triggered to find information on the weather conditions expected in Las Vegas tomorrow, January 5, 2025.
search_flights is triggered to search for a direct flight from Denver International Airport to McCarran International Airport departing tomorrow, January 5, 2025.

For better user simulation, we also include additional contextual information as part of the scenario. A multi-agent collaboration trajectory is judged as successful only when all assertions are met.
Key metrics
Our evaluation framework focuses on evaluating a high-level success rate across multiple tasks to provide a holistic view of system performance:
Goal success rate (GSR) – This is our primary measure of success, indicating the percentage of scenarios where all assertions were evaluated as true. The overall GSR is aggregated into a single number for each problem domain.
Evaluation results
The following table shows the evaluation results of multi-agent collaboration on Amazon Bedrock Agents across three enterprise domains (travel planning, mortgage financing, and software development):

Dataset
Overall GSR

Automatic evaluation
 Travel planning
 87%

 Mortgage financing
 90%

 Software development
 77%

Human evaluation
 Travel planning
 93%

 Mortgage financing
 97%

 Software development
 73%

All experiments are conducted in a setting where the supervisor agents are driven by Anthropic’s Claude 3.5 Sonnet models.
Comparing to single-agent systems
We also conducted an apples-to-apples comparison with the single-agent approach under equivalent settings. The MAC approach achieved a 90% success rate across all three domains. In contrast, the single-agent approach scored 60%, 80%, and 53% in the travel planning, mortgage financing, and software development datasets, respectively, which are significantly lower than the multi-agent approach. Upon analysis, we found that when presented with many tools, a single agent tended to hallucinate tool calls and failed to reject some out-of-scope requests. These results highlight the effectiveness of our multi-agent system in handling complex, real-world tasks across diverse domains.
To understand the reliability of the automatic judgments, we conducted a human evaluation on the same scenarios to investigate the correlation between the model and human judgments and found high correlation on end-to-end GSR.
Comparison with other frameworks
To understand how our MAC framework stacks up against existing solutions, we conducted a comparative analysis with a widely adopted open source framework (OSF) under equivalent conditions, with Anthropic’s Claude 3.5 Sonnet driving the supervisor agent and Anthropic’s Claude 3.0 Sonnet driving the specialist agents. The results are summarized in the following figure:

These results demonstrate a significant performance advantage for our MAC framework across all the tested domains.
Best practices for building multi-agent systems
The design of multi-agent teams can significantly impact the quality and efficiency of problem-solving across tasks. Among the many lessons we learned, we found it crucial to carefully design team hierarchies and agent roles.
Design multi-agent hierarchies based on performance targets It’s important to design the hierarchy of a multi-agent team by considering the priorities of different targets in a use case, such as success rate, latency, and robustness. For example, if the use case involves building a latency-sensitive customer-facing application, it might not be ideal to include too many layers of agents in the hierarchy because routing requests through multiple tertiary agents can add unnecessary delays. Similarly, to optimize latency, it’s better to avoid agents with overlapping functionalities, which can introduce inefficiencies and slow down decision-making.
Define agent roles clearly Each agent must have a well-defined area of expertise. On Amazon Bedrock Agents, this can be achieved through collaborator instructions when configuring multi-agent collaboration. These instructions should be written in a clear and concise manner to minimize ambiguity. Moreover, there should be no confusion in the collaborator instructions across multiple agents because this can lead to inefficiencies and errors in communication.
The following is a clear, detailed instruction:

Trigger this agent for 1) searching for hotels in a given location, 2) checking availability of one or multiple hotels, 3) checking amenities of hotels, 4) asking for price quote of one or multiple hotels, and 5) answering questions of check-in/check-out time and cancellation policy of specific hotels.

The following instruction is too brief, making it unclear and ambiguous.

Trigger this agent for helping with accommodation.

The second, unclear, example can lead to confusion and lower collaboration efficiency when multiple specialist agents are involved. Because the instruction doesn’t explicitly define the capabilities of the hotel specialist agent, the supervisor agent may overcommunicate, even when the user query is out of scope.
Conclusion
Multi-agent systems represent a powerful paradigm for tackling complex real-world problems. By using the collective capabilities of multiple specialized agents, we demonstrate that these systems can achieve impressive results across a wide range of domains, outperforming single-agent approaches.
Multi-agent collaboration provides a framework for developers to combine the reasoning power of numerous AI agents powered by LLMs. As we continue to push the boundaries of what is possible, we can expect even more innovative and complex applications, such as networks of agents working together to create software or generate financial analysis reports. On the research front, it’s important to explore how different collaboration patterns, including cooperative and competitive interactions, will emerge and be applied to real-world scenarios.
Additional references

Amazon Bedrock Agents
Sample codes for using multi-agent collaboration on Amazon Bedrock Agents

About the author
Raphael Shu is a Senior Applied Scientist at Amazon Bedrock. He received his PhD from the University of Tokyo in 2020, earning a Dean’s Award. His research primarily focuses on Natural Language Generation, Conversational AI, and AI Agents, with publications in conferences such as ICLR, ACL, EMNLP, and AAAI. His work on the attention mechanism and latent variable models received an Outstanding Paper Award at ACL 2017 and the Best Paper Award for JNLP in 2018 and 2019. At AWS, he led the Dialog2API project, which enables large language models to interact with the external environment through dialogue. In 2023, he has led a team aiming to develop the Agentic capability for Amazon Titan. Since 2024, Raphael worked on multi-agent collaboration with LLM-based agents.
Nilaksh Das is an Applied Scientist at AWS, where he works with the Bedrock Agents team to develop scalable, interactive and modular AI systems. His contributions at AWS have spanned multiple initiatives, including the development of foundational models for semantic speech understanding, integration of function calling capabilities for conversational LLMs and the implementation of communication protocols for multi-agent collaboration. Nilaksh completed his PhD in AI Security at Georgia Tech in 2022, where he was also conferred the Outstanding Dissertation Award.
Michelle Yuan is an Applied Scientist on Amazon Bedrock Agents. Her work focuses on scaling customer needs through Generative and Agentic AI services. She has industry experience, multiple first-author publications in top ML/NLP conferences, and strong foundation in mathematics and algorithms. She obtained her Ph.D. in Computer Science at University of Maryland before joining Amazon in 2022.
Monica Sunkara is a Senior Applied Scientist at AWS, where she works on Amazon Bedrock Agents. With over 10 years of industry experience, including 6.5 years at AWS, Monica has contributed to various AI and ML initiatives such as Alexa Speech Recognition, Amazon Transcribe, and Amazon Lex ASR. Her work spans speech recognition, natural language processing, and large language models. Recently, she worked on adding function calling capabilities to Amazon Titan text models. Monica holds a degree from Cornell University, where she conducted research on object localization under the supervision of Prof. Andrew Gordon Wilson before joining Amazon in 2018.
Dr. Yi Zhang is a Principal Applied Scientist at AWS, Bedrock. With 25 years of combined industrial and academic research experience, Yi’s research focuses on syntactic and semantic understanding of natural language in dialogues, and their application in the development of conversational and interactive systems with speech and text/chat. He has been technically leading the development of modeling solutions behind AWS services such as Bedrock Agents, AWS Lex, HealthScribe, etc.

Apple Researchers Introduce Instruction-Following Pruning (IFPruning): …

Large language models (LLMs) have become crucial tools for applications in natural language processing, computational mathematics, and programming. Such models often require large-scale computational resources to execute inference and train the model efficiently. To reduce this, many researchers have devised ways to optimize the techniques used with these models.

A strong challenge in LLM optimization arises from the fact that traditional pruning methods are fixed. Static Pruning removes unnecessary parameters based on a prespecified mask. They cannot be applied if the required skill for an application is coding or solving mathematical problems. These methods lack flexibility, as the performance is usually not maintained for several tasks while optimizing the computational resources.

Historically, techniques such as static structured Pruning and mixture-of-experts (MoE) architectures have been used to counter the computational inefficiencies of LLMs. Structured Pruning removes components like channels or attention heads from specific layers. Although these methods are hardware-friendly, they require full retraining to avoid a loss of model accuracy. MoE models, in turn, activate parts of the model during inference but incur huge overheads from frequent parameter reloading.

Apple AI and UC Santa Barbara researchers have introduced a new technique called Instruction-Following Pruning (IFPruning), which dynamically adapts LLMs to the needs of a particular task. IFPruning uses a sparsity predictor that generates input-dependent pruning masks, selecting only the most relevant parameters for a given task. Unlike traditional methods, this dynamic approach focuses on feed-forward neural network (FFN) layers, allowing the model to adapt to diverse tasks while reducing computational demands efficiently.

The researchers propose a two-stage training process for IFPruning: First, continue pre-training dense models on large data, maximizing the sparsity predictor and the LLM. This produces a strong starting point for subsequent fine-tuning. In stage two, training is performed only on supervised fine-tuning datasets, using highly varied task prompts and multiple examples. Masking is still dynamic due to the online generation of sparsity predictors pruning out unnecessary weights without affecting model performance. This eliminates the need for parameter reloading, a limitation observed in prior dynamic methods.

The performance of IFPruning was rigorously evaluated across multiple benchmarks. For instance, pruning a 9B parameter model to 3B improved coding task accuracy by 8% compared to a dense 3B model, closely rivaling the unpruned 9B model. On mathematical datasets like GSM8K and MATH, the dynamic pruning approach yielded a 5% increase in accuracy. It exhibited consistent gains on instruction-following evaluation in both IFEval and AlpacaEval for around 4-6 percent points. Even with multi-task benchmarks like MMLU, it showed promising robust results of IFPruning, displaying versatility across other domains.

These results underpin the IFPruning approach’s scalability since models with varying sizes, namely 6B, 9B, and 12B parameters, have been tested; in all, important performance improvements post-pruning are achieved. Scaling from a 6B dense model to a 12B dense model showed that, under the same condition, efficiency was improved along with task-specific accuracy. It further outperformed traditional structured pruning methods like Pruning + Distill due to the use of a dynamic sparsity mechanism.

The introduction of IFPruning marks a significant advancement in optimizing LLMs, providing a method that dynamically balances efficiency and performance. The approach addresses the limitations of static pruning and MoE architectures, setting a new standard for resource-efficient language models. With its ability to adapt to varied inputs without sacrificing accuracy, IFPruning presents a promising solution for deploying LLMs on resource-constrained devices.

This research will point out further developments in model pruning, which include optimizing other components, such as attention heads and hidden layers. Even though the methodology presented today tackles many of the computational challenges, further research in server-side applications and multi-task Pruning can broaden its scope of applicability. As a dynamic and efficient framework, IFPruning opens up possibilities for more adaptive and accessible large-scale language models.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 65k+ ML SubReddit.

FREE UPCOMING AI WEBINAR (JAN 15, 2025): Boost LLM Accuracy with Synthetic Data and Evaluation Intelligence–Join this webinar to gain actionable insights into boosting LLM model performance and accuracy while safeguarding data privacy.
The post Apple Researchers Introduce Instruction-Following Pruning (IFPruning): A Dynamic AI Approach to Efficient and Scalable LLM Optimization appeared first on MarkTechPost.

What is Artificial Intelligence (AI)?

Artificial Intelligence (AI) has made significant strides in various fields, including healthcare, finance, and education. However, its adoption is not without challenges. Concerns about data privacy, biases in algorithms, and potential job displacement have raised valid questions about its societal impact. Additionally, the “black box” nature of many AI systems makes it difficult to understand their decision-making processes, leading to issues of trust and accountability. Addressing these concerns is essential to ensure AI is used responsibly and equitably.

Understanding Artificial Intelligence

AI refers to the simulation of human intelligence in machines that are designed to think, learn, and adapt. It enables systems to perform tasks such as reasoning, problem-solving, and understanding natural language—tasks traditionally requiring human intervention.

AI can be broadly classified into three types:

Artificial Narrow Intelligence (ANI): Focused on specific tasks, such as recommendation systems, virtual assistants, and facial recognition.

Artificial General Intelligence (AGI): A theoretical concept of AI that matches human intelligence and versatility.

Artificial Superintelligence (ASI): A speculative future stage where AI surpasses human intelligence, raising both potential benefits and risks.

Source: https://metaphorltd.com/types-of-artificial-intelligence-ani-vs-agi-vs-asi/

AI encompasses several subfields, including:

Machine Learning (ML): Algorithms that learn from data to improve their performance over time.

Natural Language Processing (NLP): Techniques for processing and understanding human language.

Computer Vision: Systems that analyze and interpret visual data.

Robotics: Machines capable of performing complex tasks autonomously.

Source: https://www.ibm.com/think/topics/artificial-intelligence

Technical Details and Benefits

AI systems rely on computational models inspired by neural networks in the human brain. Techniques like supervised, unsupervised, and reinforcement learning allow machines to analyze large datasets, recognize patterns, and make decisions.

Key Benefits of AI:

Increased Efficiency: Automation of repetitive tasks frees up time for more strategic work.

Better Decision-Making: Data-driven insights enhance planning and outcomes.

Improved Customer Experience: Personalized services and chatbots provide more engaging interactions.

Advancements in Healthcare: AI aids in early diagnosis and customized treatments.

Economic Opportunities: AI drives innovation and fosters new industries.

Insights

AI’s transformative potential is evident in numerous applications:

Healthcare: Tools like IBM Watson support physicians in diagnosing diseases, with studies projecting AI could save the healthcare industry billions annually by improving efficiency and outcomes.

Finance: AI systems detect fraudulent transactions in real-time, as seen in Mastercard’s fraud detection platform.

Retail: Amazon’s recommendation engine, powered by AI, contributes significantly to its revenue by enhancing the shopping experience.

Ethical considerations are central to AI’s continued growth. Efforts by organizations like Google and IBM focus on transparency, fairness, and accountability. For example, Google’s AI principles emphasize the importance of mitigating biases in AI systems. Learn more about Google’s AI practices.

Conclusion

Artificial Intelligence represents a significant technological shift, influencing how we live and work. Its potential is vast, but so are its challenges. Addressing ethical concerns, improving transparency, and fostering collaboration between technologists and policymakers will be crucial for harnessing AI’s benefits responsibly.

Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 65k+ ML SubReddit.

FREE UPCOMING AI WEBINAR (JAN 15, 2025): Boost LLM Accuracy with Synthetic Data and Evaluation Intelligence–Join this webinar to gain actionable insights into boosting LLM model performance and accuracy while safeguarding data privacy.
The post What is Artificial Intelligence (AI)? appeared first on MarkTechPost.

InfiGUIAgent: A Novel Multimodal Generalist GUI Agent with Native Reas …

Developing Graphical User Interface (GUI) Agents faces two key challenges that hinder their effectiveness. First, existing agents lack robust reasoning capabilities, relying primarily on single-step operations and failing to incorporate reflective learning mechanisms. This usually leads to errors being repeated in the execution of complex, multi-step tasks. Most current systems rely very much on textual annotations representing GUI data, such as accessibility trees. These lead to two types of consequences: information loss and computational inefficiency; but they also cause inconsistencies among platforms and reduce their flexibility in actual deployment scenarios.

The modern methods for GUI automation are multimodal large language models used together with vision encoders for understanding and interaction with GUI settings. Efforts such as ILuvUI, CogAgent, and Ferret-UI-anyres have advanced the field by enhancing GUI understanding, utilizing high-resolution vision encoders, and employing resolution-agnostic techniques. However, these methods exhibit notable drawbacks, including high computational costs, limited reliance on visual data over textual representations, and inadequate reasoning capabilities. The methodological constraints impose considerable constraints on their ability to perform real-time tasks and the complexity of executing complex sequences. This severely restricts their ability to dynamically adapt and correct errors during operational processes because of the lack of a robust mechanism for hierarchical and reflective reasoning.

Researchers from Zhejiang University, Dalian University of Technology, Reallm Labs, ByteDance Inc., and The Hong Kong Polytechnic University introduce InfiGUIAgent, a novel multimodal graphical user interface agent that addresses these limitations. The methodology is built upon the sophisticated inherent reasoning capabilities through a dual-phase supervised fine-tuning framework to be able to adapt and be effective. The training in the first phase focuses on developing the base capabilities by using diverse datasets that can improve understanding of graphical user interfaces, grounding, and task adaptability. The datasets used, such as Screen2Words, GUIEnv, and RICO SCA, cover tasks such as semantic interpretation, user interaction modeling, and question-answering-based learning, which makes the agent equipped with comprehensive functional knowledge.

In the next phase, advanced reasoning capabilities are incorporated through synthesized trajectory information, thus supporting hierarchical and expectation-reflection reasoning processes. The hierarchical reasoning framework contains a bifurcated architecture: a strategic component focused on task decomposition and a tactical component on accurate action selection. Expectation-reflection reasoning allows the agent to adjust and self-correct through the assessment of what was expected versus what happened, thus improving performance in different and dynamic contexts. This two-stage framework enables the system to natively handle multi-step tasks without textual augmentations, hence allowing for higher robustness and computational efficiency.

InfiGUIAgent was implemented by fine-tuning Qwen2-VL-2B using ZeRO0 technology for efficient resource management across GPUs. A reference-augmented annotation format was used to standardize and improve the quality of the dataset so that GUI elements could be precisely spatially referenced. Curating the datasets increases GUI comprehension, grounding, and QA capabilities to perform tasks such as semantic interpretation and modeling of interaction. The synthesized data was then used for reasoning to ensure that all task coverage was covered through trajectory-based annotations similar to real-world interactions with the GUI. Such modularity in action space design lets the agent respond dynamically to multiple platforms, which gives it greater flexibility and applicability.

InfiGUIAgent did exceptionally well in benchmark tests, far surpassing the state-of-the-art models both in accuracy and adaptability. It managed to achieve 76.3% accuracy on the ScreenSpot benchmark, showing a higher ability to ground GUI across mobile, desktop, and web platforms. For dynamic environments such as AndroidWorld, the agent was able to have a success rate of 0.09, which is greater than other similar models with even higher parameter counts. The results confirm that the system can proficiently carry out complex, multistep tasks with precision and adaptability while underlining the effectiveness of its hierarchical and reflective reasoning models.

InfiGUIAgent represents a breakthrough in the realm of GUI automation and solves key reasons why existing tools suffer from important limitations in reasoning and adaptability. Without requiring any textual augmentations, this state-of-the-art performance is derived by integrating mechanisms for hierarchical task decomposition and reflective learning into a multimodal framework. The new benchmarking provided here forms an opening for developing the next-generation GUI agents seamlessly embeddable in real applications for efficient and robust task execution.

Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 65k+ ML SubReddit.

FREE UPCOMING AI WEBINAR (JAN 15, 2025): Boost LLM Accuracy with Synthetic Data and Evaluation Intelligence–Join this webinar to gain actionable insights into boosting LLM model performance and accuracy while safeguarding data privacy.
The post InfiGUIAgent: A Novel Multimodal Generalist GUI Agent with Native Reasoning and Reflection appeared first on MarkTechPost.

How BQA streamlines education quality reporting using Amazon Bedrock

Given the value of data today, organizations across various industries are working with vast amounts of data across multiple formats. Manually reviewing and processing this information can be a challenging and time-consuming task, with a margin for potential errors. This is where intelligent document processing (IDP), coupled with the power of generative AI, emerges as a game-changing solution.
Enhancing the capabilities of IDP is the integration of generative AI, which harnesses large language models (LLMs) and generative techniques to understand and generate human-like text. This integration allows organizations to not only extract data from documents, but to also interpret, summarize, and generate insights from the extracted information, enabling more intelligent and automated document processing workflows.
The Education and Training Quality Authority (BQA) plays a critical role in improving the quality of education and training services in the Kingdom Bahrain. BQA reviews the performance of all education and training institutions, including schools, universities, and vocational institutes, thereby promoting the professional advancement of the nation’s human capital.
BQA oversees a comprehensive quality assurance process, which includes setting performance standards and conducting objective reviews of education and training institutions. The process involves the collection and analysis of extensive documentation, including self-evaluation reports (SERs), supporting evidence, and various media formats from the institutions being reviewed.
The collaboration between BQA and AWS was facilitated through the Cloud Innovation Center (CIC) program, a joint initiative by AWS, Tamkeen, and leading universities in Bahrain, including Bahrain Polytechnic and University of Bahrain. The CIC program aims to foster innovation within the public sector by providing a collaborative environment where government entities can work closely with AWS consultants and university students to develop cutting-edge solutions using the latest cloud technologies.
As part of the CIC program, BQA has built a proof of concept solution, harnessing the power of AWS services and generative AI capabilities. The primary purpose of this proof of concept was to test and validate the proposed technologies, demonstrating their viability and potential for streamlining BQA’s reporting and data management processes.
In this post, we explore how BQA used the power of Amazon Bedrock, Amazon SageMaker JumpStart, and other AWS services to streamline the overall reporting workflow.
The challenge: Streamlining self-assessment reporting
BQA has traditionally provided education and training institutions with a template for the SER as part of the review process. Institutions are required to submit a review portfolio containing the completed SER and supporting material as evidence, which sometimes did not adhere fully to the established reporting standards.
The existing process had some challenges:

Inaccurate or incomplete submissions – Institutions might provide incomplete or inaccurate information in the submitted reports and supporting evidence, leading to gaps in the data required for a comprehensive review.
Missing or insufficient supporting evidence – The supporting material provided as evidence by institutions frequently did not substantiate the claims made in their reports, which challenged the evaluation process.
Time-consuming and resource-intensive – The process required dedicating significant time and resources to review the submissions manually and follow up with institutions to request additional information if needed to rectify the submissions, resulting in slowing down the overall review process.

These challenges highlighted the need for a more streamlined and efficient approach to the submission and review process.
Solution overview
The proposed solution uses Amazon Bedrock and the Amazon Titan Express model to enable IDP functionalities. The architecture seamlessly integrates multiple AWS services with Amazon Bedrock, allowing for efficient data extraction and comparison.
Amazon Bedrock is a fully managed service that provides access to high-performing foundation models (FMs) from leading AI startups and Amazon through a unified API. It offers a wide range of FMs, allowing you to choose the model that best suits your specific use case.
The following diagram illustrates the solution architecture.

The solution consists of the following steps:

Relevant documents are uploaded and stored in an Amazon Simple Storage Service (Amazon S3) bucket.
An event notification is sent to an Amazon Simple Queue Service (Amazon SQS) queue to align each file for further processing. Amazon SQS serves as a buffer, enabling the different components to send and receive messages in a reliable manner without being directly coupled, enhancing scalability and fault tolerance of the system.
The text extraction AWS Lambda function is invoked by the SQS queue, processing each queued file and using Amazon Textract to extract text from the documents.
The extracted text data is placed into another SQS queue for the next processing step.
The text summarization Lambda function is invoked by this new queue containing the extracted text. This function sends a request to SageMaker JumpStart, where a Meta Llama text generation model is deployed to summarize the content based on the provided prompt.
In parallel, the InvokeSageMaker Lambda function is invoked to perform comparisons and assessments. It compares the extracted text against the BQA standards that the model was trained on, evaluating the text for compliance, quality, and other relevant metrics.
The summarized data and assessment results are stored in an Amazon DynamoDB table
Upon request, the InvokeBedrock Lambda function invokes Amazon Bedrock to generate generative AI summaries and comments. The function constructs a detailed prompt designed to guide the Amazon Titan Express model in evaluating the university’s submission.

Prompt engineering using Amazon Bedrock
To take advantage of the power of Amazon Bedrock and make sure the generated output adhered to the desired structure and formatting requirements, a carefully crafted prompt was developed according to the following guidelines:

Evidence submission – Present the evidence submitted by the institution under the relevant indicator, providing the model with the necessary context for evaluation
Evaluation criteria – Outline the specific criteria the evidence should be assessed against
Evaluation instructions – Instruct the model as follows:

Indicate N/A if the evidence is irrelevant to the indicator
Evaluate the university’s self-assessment based on the criteria
Assign a score from 1–5 for each comment, citing evidence directly from the content

Response format – Specify the response as bullet points, focusing on relevant analysis and evidence, with a word limit of 100 words

To use this prompt template, you can create a custom Lambda function with your project. The function should handle the retrieval of the required data, such as the indicator name, the university’s submitted evidence, and the rubric criteria. Within the function, include the prompt template and dynamically populate the placeholders (${indicatorName}, ${JSON.stringify(allContent)}, and ${JSON.stringify(c.comment)}) with the retrieved data.
The Amazon Titan Text Express model will then generate the evaluation response based on the provided prompt instructions, adhering to the specified format and guidelines. You can process and analyze the model’s response within your function, extracting the compliance score, relevant analysis, and evidence.
The following is an example prompt template:

for (const c of comments) {
const prompt = `
Below is the evidence submitted by the university under the indicator “${indicatorName}”:
${JSON.stringify(allContent)}

Analyze and Evaluate the university’s eviedence based on the provided rubric criteria:
${JSON.stringify(c.comment)}

– If the evidence does not relate to the indicator, indicate that it is not applicable (N/A) without any additional commentary.

Choose one from the below compliance score based on evidence submitted:
1. Non-compliant: The comment does not meet the criteria or standards.
2.Compliant with recommendation: The comment meets the criteria but includes a suggestion or recommendation for improvement.
3. Compliant: The comment meets the criteria or standards.

THE END OF THE RESPONSE THERE SHOULD BE EITHER SCORE: [SCORE: COMPLIANT OR NON-COMPLIANT OR COMPLIANT WITH RECOMMENDATION]
Write your response in concise bullet points, focusing strictly on relevant analysis and evidence.
**LIMIT YOUR RESPONSE TO 100 WORDS ONLY.**

`;

logger.info(`Prompt for comment ${c.commentId}: ${prompt}`);

const body = JSON.stringify({
inputText: prompt,
textGenerationConfig: {
maxTokenCount: 4096,
stopSequences: [],
temperature: 0,
topP: 0.1,
},
});

The following screenshot shows an example of the Amazon Bedrock generated response.

Results
The implementation of Amazon Bedrock enabled institutions with transformative benefits. By automating and streamlining the collection and analysis of extensive documentation, including SERs, supporting evidence, and various media formats, institutions can achieve greater accuracy and consistency in their reporting processes and readiness for the review process. This not only reduces the time and cost associated with manual data processing, but also improves compliance with the quality expectations, thereby enhancing the credibility and quality of their institutions.
For BQA the implementation helped in achieving one of its strategic objectives focused on streamlining their reporting processes and achieve significant improvements across a range of critical metrics, substantially enhancing the overall efficiency and effectiveness of their operations.
Key success metrics anticipated include:

Faster turnaround times for generating 70% accurate and standards-compliant self-evaluation reports, leading to improved overall efficiency.
Reduced risk of errors or non-compliance in the reporting process, enforcing adherence to established guidelines.
Ability to summarize lengthy submissions into concise bullet points, allowing BQA reviewers to quickly analyze and comprehend the most pertinent information, reducing evidence analysis time by 30%.
More accurate compliance feedback functionality, empowering reviewers to effectively evaluate submissions against established standards and guidelines, while achieving 30% reduced operational costs through process optimizations.
Enhanced transparency and communication through seamless interactions, enabling users to request additional documents or clarifications with ease.
Real-time feedback, allowing institutions to make necessary adjustments promptly. This is particularly useful to maintain submission accuracy and completeness.
Enhanced decision-making by providing insights on the data. This helps universities identify areas for improvement and make data-driven decisions to enhance their processes and operations.

The following screenshot shows an example generating new evaluations using Amazon Bedrock

Conclusion
This post outlined the implementation of Amazon Bedrock at the Education and Training Quality Authority (BQA), demonstrating the transformative potential of generative AI in revolutionizing the quality assurance processes in the education and training sectors. For those interested in exploring the technical details further, the full code for this implementation is available in the following GitHub repo. If you are interested in conducting a similar proof of concept with us, submit your challenge idea to the Bahrain Polytechnic or University of Bahrain CIC website.

About the Author
Maram AlSaegh is a Cloud Infrastructure Architect at Amazon Web Services (AWS), where she supports AWS customers in accelerating their journey to cloud. Currently, she is focused on developing innovative solutions that leverage generative AI and machine learning (ML) for public sector entities.

Boosting team innovation, productivity, and knowledge sharing with Ama …

Amazon Q Business can increase productivity across diverse teams, including developers, architects, site reliability engineers (SREs), and product managers. Amazon Q Business as a web experience makes AWS best practices readily accessible, providing cloud-centered recommendations quickly and making it straightforward to access AWS service functions, limits, and implementations. These elements are brought together in a web integration that serves various job roles and personas exactly when they need it.
As enterprises continue to grow their applications, environments, and infrastructure, it has become difficult to keep pace with technology trends, best practices, and programming standards. Enterprises provide their developers, engineers, and architects with a range of knowledge bases and documents, such as usage guides, wikis, and tools. But these resources tend to become siloed over time and inaccessible across teams, resulting in reduced knowledge, duplication of work, and reduced productivity.
MuleSoft from Salesforce provides the Anypoint platform that gives IT the tools to automate everything. This includes integrating data and systems and automating workflows and processes, and the creation of incredible digital experiences—all on a single, user-friendly platform.
This post shows how MuleSoft introduced a generative AI-powered assistant using Amazon Q Business to enhance their internal Cloud Central dashboard. This individualized portal shows assets owned, costs and usage, and well-architected recommendations to over 100 engineers. For more on MuleSoft’s journey to cloud computing, refer to Why a Cloud Operating Model?
Developers, engineers, FinOps, and architects can get the right answer at the right time when they’re ready to troubleshoot, address an issue, have an inquiry, or want to understand AWS best practices and cloud-centered deployments.
This post covers how to integrate Amazon Q Business into your enterprise setup.
Solution overview
The Amazon Q Business web experience provides seamless access to information, step-by-step instructions, troubleshooting, and prescriptive guidance so teams can deploy well-architected applications or cloud-centered infrastructure. Team members can chat directly or upload documents and receive summarization, analysis, or answers to a calculation. Amazon Q Business uses supported connectors such as Confluence, Amazon Relational Database Service (Amazon RDS), and web crawlers. The following diagram shows the reference architecture for various personas, including developers, support engineers, DevOps, and FinOps to connect with internal databases and the web using Amazon Q Business.

In this reference architecture, you can see how various user personas, spanning across teams and business units, use the Amazon Q Business web experience as an access point for information, step-by-step instructions, troubleshooting, or prescriptive guidance for deploying a well-architected application or cloud-centered infrastructure. The web experience allows team members to chat directly with an AI assistant or upload documents and receive summarization, analysis, or answers to a calculation.
Use cases for Amazon Q Business
Small, medium, and large enterprises, depending on their mode of operation, type of business, and level of investment in IT, will have varying approaches and policies on providing access to information. Amazon Q Business is one of the AWS suites of generative AI services that provides a web-based utility to set up, manage, and interact with Amazon Q. It can answer questions, provide summaries, generate content, and complete tasks using the data and expertise found in your enterprise systems. You can connect internal and external datasets without compromising security to seamlessly incorporate your specific standard operating procedures, guidelines, playbooks, and reference links. With Amazon Q, MuleSoft’s engineering teams were able to address their AWS specific inquiries (such as support ticket escalation, operational guidance, and AWS Well-Architected best practices) at scale.
The Amazon Q Business web experience allows business users across various job titles and functions to interact with Amazon Q through the web browser. With the web experience, teams can access the same information and receive similar recommendations based on their prompt or inquiry, level of experience, and knowledge, ranging from beginner to advanced.
The following demos are examples of what the Amazon Q Business web experience looks like. Amazon Q Business securely connects to over 40 commonly used business tools, such as wikis, intranets, Atlassian, Gmail, Microsoft Exchange, Salesforce, ServiceNow, Slack, and Amazon Simple Storage Service (Amazon S3). Point Amazon Q Business at your enterprise data, and it will search your data, summarize it logically, analyze trends, and engage in dialogue with end users about the data. This helps users access their data no matter where it resides in their organization.
Amazon Q Business underscores prompting and response for prescriptive guidance. Optimizing Amazon Elastic Block Store (Amazon EBS) volumes as an example, it provided detailed migration steps from gp2 to gp3. This is a well-known use case asked about by several MuleSoft teams.
Through the web experience, you can effortlessly perform document uploads and prompts for summary, calculation, or recommendations based on your document. You have the flexibility to upload .pdf, .xls, .xlsx, or .csv files directly into the chat interface. You can also assume a persona such as FinOps or DevOps and get personalized recommendations or responses.

MuleSoft engineers used the Amazon Q Business web summarization feature to better understand Split Cost Allocation Data (SCAD) for Amazon Elastic Kubernetes Service (Amazon EKS). They uploaded the SCAD PDF documents to Amazon Q and got straightforward summaries. This helped them understand their customer’s use of MuleSoft Anypoint platform running on Amazon EKS.

Amazon Q helped analyze IPv4 costs by processing an uploaded Excel file. As the video shows, it calculated expenses for elastic IPs and outbound data transfers, supporting a proposed network estimate.

Amazon Q Business demonstrating its ability to provide tailored advice by responding to a specific user scenario. As the video shows, a user took on the role of a FinOps professional and asked Amazon Q to recommend AWS tools for cost optimization. Amazon Q then offered personalized suggestions based on this FinOps persona perspective.

Prerequisites
To get started with your Amazon Q Business web experience, you need the following prerequisites:

An AWS account that will contain your AWS resources
AWS IAM Identity Center configured for an Amazon Q Business application
An Amazon Q Business subscription (Amazon Q Business Lite or Amazon Q Business Pro) and index (Starter or Enterprise) configured for an Amazon Q business application

Create an Amazon Q Business web experience
Complete the following steps to create your web experience:

Create a sample Amazon Q Business application.
Configure the Amazon Q Business application.
Configure the Amazon Q Business data source connectors.
Enhance the Amazon Q Business application.
Customize the Amazon Q web experience.

The web experience can be used by a variety of business users or personas to yield accurate and repeatable recommendations for level 100, 200, and 300 inquiries. Amazon Q supports a variety of data sources and data connectors to personalize your user experience. You can also further enrich your dataset with knowledge bases within Amazon Q. With Amazon Q Business set up with your own datasets and sources, teams and business units within your enterprise can index from the same information on common topics such as cost optimization, modernization, and operational excellence while maintaining their own unique area of expertise, responsibility, and job function.
Clean Up
After trying the Amazon Q Business web experience, remember to remove any resources you created to avoid unnecessary charges. Complete the following steps:

Delete the web experience:

On the Amazon Q Business console, navigate to the Web experiences section within your application.
Select the web experience you want to remove.
On the Actions menu, choose Delete.
Confirm the deletion by following the prompts.

If you granted specific users access to the web experience, revoke their permissions. This might involve updating AWS Identity and Access Management (IAM) policies or removing users from specific groups in IAM Identity Center.
If you set up any custom configurations for the web experience, such as specific data source filters or custom prompts, make sure to remove these.
If you integrated the web experience with other tools or services, remove those integrations.
Check for and delete any Amazon CloudWatch alarms or logs specifically set up for monitoring this web experience.

After deletion, review your AWS billing to make sure that charges related to the web experience have stopped.
Deleting a web experience is irreversible. Make sure you have any necessary backups or exports of important data before proceeding with the deletion. Also, keep in mind that deleting a web experience doesn’t automatically delete the entire Amazon Q Business application or its associated data sources. If you want to remove everything, follow the Amazon Q Business application clean-up procedure for the entire application.
Conclusion
Amazon Q Business web experience is your gateway to a powerful generative AI assistant. Want to take it further? Integrate Amazon Q with Slack for an even more interactive experience.
Every organization has unique needs when it comes to AI. That’s where Amazon Q shines. It adapts to your business needs, user applications, and end-user personas. The best part? You don’t need to do the heavy lifting. No complex infrastructure setup. No need for teams of data scientists. Amazon Q connects to your data and makes sense of it with just a click. It’s AI power made simple, giving you the intelligence you need without the hassle.
To learn more about the power of a generative AI assistant in your workplace, see Amazon Q Business.

About the Authors
Rueben Jimenez is an AWS Sr Solutions Architect who designs and implements complex data analytics, machine learning, generative AI, and cloud infrastructure solutions.
Sona Rajamani is a Sr. Manager Solutions Architect at AWS.  She lives in the San Francisco Bay Area and helps customers architect and optimize applications on AWS. In her spare time, she enjoys traveling and hiking.
Erick Joaquin is a Sr Customer Solutions Manager for Strategic Accounts at AWS. As a member of the account team, he is focused on evolving his customers’ maturity in the cloud to achieve operational efficiency at scale.

R3GAN: A Simplified and Stable Baseline for Generative Adversarial Net …

GANs are often criticized for being difficult to train, with their architectures relying heavily on empirical tricks. Despite their ability to generate high-quality images in a single forward pass, the original minimax objective is challenging to optimize, leading to instability and risks of mode collapse. While alternative objectives have been introduced, issues with fragile losses persist, hindering progress. Popular GAN models like StyleGAN incorporate tricks such as gradient-penalized losses and minibatch standard deviation to address instability and diversity but lack theoretical backing. Compared to diffusion models, GANs use outdated backbones, limiting their scalability and effectiveness.

Researchers from Brown University and Cornell University challenge that GANs require numerous tricks for effective training. They introduce a modern GAN baseline by proposing a regularized relativistic GAN loss, which addresses mode dropping and convergence issues without relying on ad-hoc solutions. This loss, augmented with zero-centered gradient penalties, ensures training stability and local convergence guarantees. By simplifying and modernizing StyleGAN2, incorporating advanced elements like ResNet design, grouped convolutions, and updated initialization, they develop a minimalist GAN, R3GAN, which surpasses StyleGAN2 and rivals state-of-the-art GANs and diffusion models across multiple datasets, achieving better performance with fewer architectural complexities.

In designing GAN objectives, balancing stability and diversity is critical. Traditional GANs often face challenges like mode collapse due to their reliance on a single decision boundary to separate real and fake data. Relativistic pairing GANs (RpGANs) address this by evaluating fake samples relative to real ones, promoting better mode coverage. However, RpGANs alone struggle with convergence, particularly with sharp data distributions. Adding zero-centered gradient penalties, R1 (on real data) and R2 (on fake data), ensures stable and convergent training. Experiments on StackedMNIST show that RpGAN with R1 and R2 achieves full mode coverage, outperforming conventional GANs and mitigating gradient explosions.

R3GAN builds a simplified yet advanced baseline for GANs by addressing optimization challenges using RpGAN with R1 and R2 losses. Starting with StyleGAN2, the model progressively strips non-essential components, such as style-based generation techniques and regularization tricks, to create a minimalist backbone. Modernization steps include adopting ResNet-inspired architectures, bilinear resampling, and leaky ReLU activations while avoiding normalization layers and momentum-based optimizers. Further enhancements involve grouped convolutions, inverted bottlenecks, and fix-up initialization to stabilize training without normalization. These updates result in a more efficient and powerful architecture, achieving competitive FID scores with roughly 25M trainable parameters for both the generator and discriminator.

The experiments showcase Config E’s advancements in GAN performance. On FFHQ-256, Config E achieves an FID of 7.05, outperforming StyleGAN2 and other configurations through architectural improvements like inverted bottlenecks and grouped convolutions. On StackedMNIST, Config E achieves perfect mode recovery with the lowest KL divergence (0.029). On CIFAR-10, FFHQ-64, and ImageNet datasets, Config E consistently surpasses prior GANs and rivals diffusion models, achieving lower FID with fewer parameters and faster inference (single evaluation). Despite slightly lower recall than some diffusion models, Config E demonstrates superior sample diversity compared to other GANs, highlighting its efficiency and effectiveness without relying on pre-trained features.

In conclusion, the study presents R3GAN, a simplified and stable GAN model for image generation that uses a regularized relativistic loss (RpGAN+R1+R2) with proven convergence properties. By focusing on essential components, R3GAN eliminates many ad-hoc techniques commonly used in GANs, enabling a streamlined architecture that achieves competitive FID scores on datasets like Stacked-MNIST, FFHQ, CIFAR-10, and ImageNet. While not optimized for downstream tasks like image editing or controllable synthesis, it provides a robust baseline for future research. Limitations include the lack of scalability evaluation on higher-resolution or text-to-image tasks and ethical concerns regarding the potential misuse of generative models.

Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 65k+ ML SubReddit.

FREE UPCOMING AI WEBINAR (JAN 15, 2025): Boost LLM Accuracy with Synthetic Data and Evaluation Intelligence–Join this webinar to gain actionable insights into boosting LLM model performance and accuracy while safeguarding data privacy.
The post R3GAN: A Simplified and Stable Baseline for Generative Adversarial Networks GANs appeared first on MarkTechPost.

This AI Paper Introduces Toto: Autoregressive Video Models for Unified …

Autoregressive pre-training has proved to be revolutionary in machine learning, especially concerning sequential data processing. Predictive modeling of the following sequence elements has been highly effective in natural language processing and, increasingly, has been explored within computer vision domains. Video modeling is one area that has hardly been explored, giving opportunities for extending into action recognition, object tracking, and robotics applications. These developments are due to growing datasets and innovation in transformer architectures that treat visual inputs as structured tokens suitable for autoregressive training.

Modeling videos has unique challenges due to their temporal dynamics and redundancy. Unlike text with a clear sequence, video frames usually contain redundant information, making it difficult to tokenize and learn proper representations. Proper video modeling should be able to overcome this redundancy while capturing spatiotemporal relationships in frames. Most frameworks have focused on image-based representations, leaving the optimization of video architectures open. The task requires new methods to balance efficiency and performance, particularly when video forecasting and robotic manipulation are at play.

Visual representation learning via convolutional networks and masked autoencoders has been effective for image tasks. Such approaches typically fail regarding video applications as they cannot entirely express temporal dependencies. Tokenization methods such as dVAE and VQGAN normally convert visual information into tokens. These have shown effectiveness, but scaling such an approach becomes challenging in scenarios with mixed datasets involving images and videos. Patch-based tokenization does not generalize to cater to various tasks efficiently in a video.

A research team from Meta FAIR and UC Berkeley has introduced the Toto family of autoregressive video models. Their novelty is to help address the limitations of traditional methods, treating videos as sequences of discrete visual tokens and applying causal transformer architectures to predict subsequent tokens. The researchers developed models that could easily combine image and video training by training on a unified dataset that includes more than one trillion tokens from images and videos. The unified approach enabled the team to take advantage of the strengths of autoregressive pretraining in both domains.

The Toto models use dVAE tokenization with an 8k-token vocabulary to process images and video frames. Each frame is resized and tokenized separately, resulting in sequences of 256 tokens. These tokens are then processed by a causal transformer that uses the features of RMSNorm and RoPE embeddings to establish improved model performance. The training was done on ImageNet and HowTo100M datasets, tokenizing at a resolution of 128×128 pixels. The researchers also optimized the models for downstream tasks by replacing average pooling with attention pooling to ensure a better quality of representation.

The models show good performance across the benchmarks. For ImageNet classification, the largest Toto model achieved a top-1 accuracy of 75.3%, outperforming other generative models like MAE and iGPT. In the Kinetics-400 action recognition task, the models achieve a top-1 accuracy of 74.4%, proving their capability to understand complex temporal dynamics. On the DAVIS dataset for semi-supervised video tracking, the models obtain J&F scores of up to 62.4, thus improving over previous state-of-the-art benchmarks established by DINO and MAE. Moreover, on robotics tasks like object manipulation, Toto models learn much faster and are more sample efficient. For example, the Toto-base model attains a cube-picking real-world task on the Franka robot with an accuracy of 63%. Overall, these are impressive results regarding the versatility and scalability of these proposed models with diverse applications.

The work provided significant development in video modeling by addressing redundancy and challenges in tokenization. The researchers successfully showed “through unified training on both images and videos, that this form of autoregressive pretraining is generally effective across a range of tasks.” Innovative architecture and tokenization strategies provide a baseline for further dense prediction and recognition research. This is one meaningful step toward unlocking the full potential of video modeling in real-world applications.

Check out the Paper and Project Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 65k+ ML SubReddit.

FREE UPCOMING AI WEBINAR (JAN 15, 2025): Boost LLM Accuracy with Synthetic Data and Evaluation Intelligence–Join this webinar to gain actionable insights into boosting LLM model performance and accuracy while safeguarding data privacy.
The post This AI Paper Introduces Toto: Autoregressive Video Models for Unified Image and Video Pre-Training Across Diverse Tasks appeared first on MarkTechPost.

What are Small Language Models (SLMs)?

Large language models (LLMs) like GPT-4, PaLM, Bard, and Copilot have made a huge impact in natural language processing (NLP). They can generate text, solve problems, and carry out conversations with remarkable accuracy. However, they also come with significant challenges. These models require vast computational resources, making them expensive to train and deploy. This excludes smaller businesses and individual developers from fully benefiting. Additionally, their energy consumption raises environmental concerns. The dependency on advanced infrastructure further limits their accessibility, creating a gap between well-funded organizations and others trying to innovate.

What are Small Language Models (SLMs)?

Small Language Models (SLMs) are a more practical and efficient alternative to LLMs. These models are smaller in size, with millions to a few billion parameters, compared to the hundreds of billions found in larger models. SLMs focus on specific tasks, providing a balance between performance and resource consumption. Their design makes them accessible and cost-effective, offering organizations an opportunity to harness NLP without the heavy demands of LLMs. You can explore more details in IBM’s analysis.

Technical Details and Benefits

SLMs use techniques like model compression, knowledge distillation, and transfer learning to achieve their efficiency. Model compression involves reducing the size of a model by removing less critical components, while knowledge distillation allows smaller models (students) to learn from larger ones (teachers), capturing essential knowledge in a compact form. Transfer learning further enables SLMs to fine-tune pre-trained models for specific tasks, cutting down on resource and data requirements.

Why Consider SLMs?

Cost Efficiency: Lower computational needs mean reduced operational costs, making SLMs ideal for smaller budgets.

Energy Savings: By consuming less energy, SLMs align with the push for environmentally friendly AI.

Accessibility: They make advanced NLP capabilities available to smaller organizations and individuals.

Focus: Tailored for specific tasks, SLMs often outperform larger models in specialized use cases.

Examples of SLM’s

Llama 3 8B (Meta)

Qwen2: 0.5B, 1B, and 7B (Alibaba)

Gemma 2 9B (Google)

Gemma 2B and 7B (Google)

Mistral 7B (Mistral AI)

Gemini Nano 1.8B and 3.25B (Google)

OpenELM 270M, 450M, 1B, and 3B (Apple)

Phi-4 (Microsoft)

and many more…..

Results, Data, and Insights

SLMs have demonstrated their value across a range of applications. In customer service, for instance, platforms powered by SLMs—like those from Aisera—are delivering faster, cost-effective responses. According to an DataCamp article, SLMs achieve up to 90% of the performance of LLMs in tasks such as text classification and sentiment analysis while using half the resources.

In healthcare, SLMs fine-tuned on medical datasets have been particularly effective in identifying conditions from patient records. A Medium article by Nagesh Mashette highlights their ability to streamline document summarization in industries like law and finance, cutting down processing times significantly.

SLMs also excel in cybersecurity. According to Splunk’s case studies, they’ve been used for log analysis, providing real-time insights with minimal latency.

Conclusion

Small Language Models are proving to be an efficient and accessible alternative to their larger counterparts. They address many challenges posed by LLMs by being resource-efficient, environmentally sustainable, and task-focused. Techniques like model compression and transfer learning ensure that these smaller models retain their effectiveness across a range of applications, from customer support to healthcare and cybersecurity. As Zapier’s blog suggests, the future of AI may well lie in optimizing smaller models rather than always aiming for bigger ones. SLMs show that innovation doesn’t have to come with massive infrastructure—it can come from doing more with less.

Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 65k+ ML SubReddit.

FREE UPCOMING AI WEBINAR (JAN 15, 2025): Boost LLM Accuracy with Synthetic Data and Evaluation Intelligence–Join this webinar to gain actionable insights into boosting LLM model performance and accuracy while safeguarding data privacy.
The post What are Small Language Models (SLMs)? appeared first on MarkTechPost.

What are Large Language Model (LLMs)?

Understanding and processing human language has always been a difficult challenge in artificial intelligence. Early AI systems often struggled to handle tasks like translating languages, generating meaningful text, or answering questions accurately. These systems relied on rigid rules or basic statistical methods that couldn’t capture the nuances of context, grammar, or cultural meaning. As a result, their outputs often missed the mark, either being irrelevant or outright wrong. Moreover, scaling these systems required considerable manual effort, making them inefficient as data volumes grew. The need for more adaptable and intelligent solutions eventually led to the development of Large Language Models (LLMs).

Understanding Large Language Models (LLMs)

Large Language Models are advanced AI systems designed to process, understand, and generate human language. Built on deep learning architectures—specifically Transformers—they are trained on enormous datasets to tackle a wide variety of language-related tasks. By pre-training on text from diverse sources like books, websites, and articles, LLMs gain a deep understanding of grammar, syntax, semantics, and even general world knowledge.

Some well-known examples include OpenAI’s GPT (Generative Pre-trained Transformer) and Google’s BERT (Bidirectional Encoder Representations from Transformers). These models excel at tasks such as language translation, content generation, sentiment analysis, and even programming assistance. They achieve this by leveraging self-supervised learning, which allows them to analyze context, infer meaning, and produce relevant and coherent outputs.

Image source: https://www.nvidia.com/en-us/glossary/large-language-models/

Technical Details and Benefits

The technical foundation of LLMs lies in the Transformer architecture, introduced in the influential paper “Attention Is All You Need.” This design uses self-attention mechanisms to allow the model to focus on different parts of an input sequence simultaneously. Unlike traditional recurrent neural networks (RNNs) that process sequences step-by-step, Transformers analyze entire sequences at once, making them faster and better at capturing complex relationships across long text.

Training LLMs is computationally intensive, often requiring thousands of GPUs or TPUs working over weeks or months. The datasets used can reach terabytes in size, encompassing a wide range of topics and languages. Some key advantages of LLMs include:

Scalability: They perform better as more data and computational power are applied.

Versatility: LLMs can handle many tasks without needing extensive customization.

Contextual Understanding: By considering the context of inputs, they provide relevant and coherent responses.

Transfer Learning: Once pre-trained, these models can be fine-tuned for specific tasks, saving time and resources.

Types of Large Language Models

Large Language Models can be categorized based on their architecture, training objectives, and use cases. Here are some common types:

Autoregressive Models: These models, such as GPT, predict the next word in a sequence based on the previous words. They are particularly effective for generating coherent and contextually relevant text.

Autoencoding Models: Models like BERT focus on understanding and encoding the input text by predicting masked words within a sentence. This bidirectional approach allows them to capture the context from both sides of a word.

Sequence-to-Sequence Models: These models are designed for tasks that require transforming one sequence into another, such as machine translation. T5 (Text-to-Text Transfer Transformer) is a prominent example.

Multimodal Models: Some LLMs, such as DALL-E and CLIP, extend beyond text and are trained to understand and generate multiple types of data, including images and text. These models enable tasks like generating images from text descriptions.

Domain-Specific Models: These are tailored to specific industries or tasks. For example, BioBERT is fine-tuned for biomedical text analysis, while FinBERT is optimized for financial data.

Each type of model is designed with a specific focus, enabling it to excel in particular applications. For example, autoregressive models are excellent for creative writing, while autoencoding models are better suited for comprehension tasks.

Results, Data Insights, and Additional Details

LLMs have shown remarkable capabilities across various domains. For example, OpenAI’s GPT-4 has performed well in standardized exams, demonstrated creativity in content generation, and even assisted with debugging code. According to IBM, LLM-powered chatbots are improving customer support by resolving queries with greater efficiency.

In healthcare, LLMs help analyze medical literature and support diagnostic decisions. A report by NVIDIA highlights how these models assist in drug discovery by analyzing vast datasets to identify promising compounds. Similarly, in e-commerce, LLMs enhance personalized recommendations and generate engaging product descriptions.

The rapid development of LLMs is evident in their scale. GPT-3, for instance, has 175 billion parameters, while Google’s PaLM boasts 540 billion. However, this rapid scaling also brings challenges, including high computational costs, concerns about bias in outputs, and the potential for misuse.

Conclusion

Large Language Models represent a significant step forward in artificial intelligence, addressing longstanding challenges in language understanding and generation. Their ability to learn from vast datasets and adapt to diverse tasks makes them an essential tool across industries. That said, as these models evolve, addressing their ethical, environmental, and societal implications will be crucial. By developing and using LLMs responsibly, we can unlock their full potential to create meaningful advancements in technology.

Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

FREE UPCOMING AI WEBINAR (JAN 15, 2025): Boost LLM Accuracy with Synthetic Data and Evaluation Intelligence–Join this webinar to gain actionable insights into boosting LLM model performance and accuracy while safeguarding data privacy.
The post What are Large Language Model (LLMs)? appeared first on MarkTechPost.

SepLLM: A Practical AI Approach to Efficient Sparse Attention in Large …

Large Language Models (LLMs) have shown remarkable capabilities across diverse natural language processing tasks, from generating text to contextual reasoning. However, their efficiency is often hampered by the quadratic complexity of the self-attention mechanism. This challenge becomes particularly pronounced with longer input sequences, where computational and memory demands grow significantly. Traditional methods that modify self-attention often render them incompatible with pre-trained models, while others focus on optimizing key-value (KV) caches, which can lead to inconsistencies between training and inference. These challenges have driven researchers to seek more efficient ways to enhance LLM performance while minimizing resource demands.

Researchers from Huawei Noah’s Ark Lab, The University of Hong Kong, KAUST, and Max Planck Institute for Intelligent Systems, Tübingen, have proposed SepLLM, a sparse attention mechanism that simplifies attention computation. SepLLM focuses on three token types: Initial Tokens, Neighboring Tokens, and Separator Tokens. Notably, separator tokens, such as commas and periods, often receive disproportionately high attention weights in LLMs. SepLLM leverages these tokens to condense segment information, reducing computational overhead while retaining essential context.

Designed to integrate seamlessly with existing models, SepLLM supports training from scratch, fine-tuning, and streaming applications. Its sparse attention mechanism prioritizes essential tokens, paving the way for efficient long-context processing.

Technical Overview and Advantages of SepLLM

1. Sparse Attention Mechanism SepLLM retains only three types of tokens:

Initial Tokens: The first tokens in a sequence, often key to understanding context.

Neighboring Tokens: Tokens near the current token, ensuring local coherence.

Separator Tokens: High-frequency tokens like commas and periods that encapsulate segment-level information.

By focusing on these tokens, SepLLM reduces the number of computations required, enhancing efficiency without compromising model performance.

2. Enhanced Long-Text Processing SepLLM processes sequences exceeding four million tokens, surpassing traditional length limitations. This capability is particularly valuable for tasks like document summarization and long conversations, where maintaining context is crucial.

3. Improved Inference and Memory Efficiency SepLLM’s separator-based compression mechanism accelerates inference and reduces memory usage. For instance, on the GSM8K-CoT benchmark, SepLLM reduced KV cache usage by 50%. It also demonstrated a 28% reduction in computational costs and a 26% decrease in training time compared to standard models using the Llama-3-8B architecture.

4. Versatile Deployment SepLLM is adaptable to various deployment scenarios, offering support for:

Integration with pre-trained models.

Training from scratch for specialized applications.

Fine-tuning and streaming for dynamic real-time use cases.

Experimental Results and Insights

The effectiveness of SepLLM has been validated through rigorous testing:

Training-Free Setting: Using the Llama-3-8B-Instruct model, SepLLM was tested on GSM8K-CoT and MMLU benchmarks. It matched the performance of full-attention models while reducing KV cache usage to 47%, demonstrating its ability to retain crucial context and reasoning with fewer resources.

Training from Scratch: When applied to the Pythia-160M-deduped model, SepLLM achieved faster convergence and improved task accuracy. Increasing neighboring tokens (n=128) further enhanced perplexity and downstream performance.

Post-Training: SepLLM adapted efficiently to pre-trained Pythia-1.4B-deduped models through fine-tuning, aligning with its sparse attention design. A tailored cosine learning rate scheduler ensured consistent loss reduction.

Streaming Applications: SepLLM excelled in streaming scenarios involving infinite-length inputs, such as multi-turn dialogues. On the PG19 dataset, it achieved lower perplexity and faster inference times compared to StreamingLLM, with reduced memory usage.

Conclusion

SepLLM addresses critical challenges in LLM scalability and efficiency by focusing on Initial Tokens, Neighboring Tokens, and Separator Tokens. Its sparse attention mechanism strikes a balance between computational demands and performance, making it an attractive solution for modern NLP tasks. With its ability to handle long contexts, reduce overhead, and integrate seamlessly with existing models, SepLLM provides a practical approach for advancing LLM technology.

As the need for processing extensive contexts grows, solutions like SepLLM will be pivotal in shaping the future of NLP. By optimizing computational resources while maintaining strong performance, SepLLM exemplifies a thoughtful and efficient design for next-generation language models.

Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

FREE UPCOMING AI WEBINAR (JAN 15, 2025): Boost LLM Accuracy with Synthetic Data and Evaluation Intelligence–Join this webinar to gain actionable insights into boosting LLM model performance and accuracy while safeguarding data privacy.
The post SepLLM: A Practical AI Approach to Efficient Sparse Attention in Large Language Models appeared first on MarkTechPost.

ToolHop: A Novel Dataset Designed to Evaluate LLMs in Multi-Hop Tool U …

Multi-hop queries have always given LLM agents a hard time with their solutions, necessitating multiple reasoning steps and information from different sources. They are crucial for analyzing a model’s comprehension, reasoning, and function-calling capabilities. At this time when new large models are booming every other day with claims of unparalleled capabilities, multi-hop tools realistically assess them by bestowing with a complex query, which the model needs to decompose into atomic parts and iteratively solve by invoking and utilizing appropriate tools. Furthermore, multi-hop tool evaluation has emerged as pivotal for advancing models toward generalized intelligence.

Existing works in this field fall short of offering a reliable evaluation method. Methods proposed until now have relied on tool-driven data construction methods where queries are simulated for a given collection of tools. This shortfall points out the loophole in ensuring the interdependence of collected tools and assessing the multi-hop reasoning. Additionally, the absence of verifiable answers introduces model bias and evaluation errors. This article discusses the latest research that presents a reliable method to honestly assess the multi-hop capabilities of a large language model.

Fudan University and ByteDance researchers presented ToolHop, a dataset designed explicitly for multi-hop tool evaluation with 995 rigorously designed user queries and 3,912 associated tools. Toolhop claims to solve all the aforementioned problems through diverse queries, locally executable tools, meaningful interdependencies, detailed feedback, and verifiable answers. The authors propose a novel query-driven data construction approach that could expand a single multi-hop query into a comprehensive multi-hop tool use test case.

The proposed novel scheme comprises three key stages: tool creation, document refinement, and code generation.

Tool Creation:    A preliminary set of tool documents is created per the user-provided multi-hop query. The document is designed to keep it interdependent and relevant by resolving queries into atomic parts and individually handling each. This way, the document captures the essence of the query and structures itself to generate similar queries, ensuring modularity and cohesion.

Document Refinement: The prepared tool document undergoes comprehensive filtering to support the evaluation of models in complex multi-hop scenarios. Here, new features like result filtering and customizable formats are introduced to expand functionality while maintaining originality. Parallelly, the number of parameters is increased, and their types are optimized.

Code Generation: At this stage, locally executable functions are generated by the prepared tool. Through these functions, tools are externally invoked, enabling seamless multi-turn interactions between the model and tools.

The research team implemented the approach with the queries drawn from the MoreHopQA dataset. Further, to ensure the evaluation with ToolHop, a rigorous five-dimensional analysis was done. ToolHop was then evaluated on fourteen LLMs from five families, including open and closed-sourced models. The evaluation method was so designed that answer correctness and minimized invocation errors were ensured. The authors observed that using tools increased the models’ performance by up to 12 % on average and by up to 23 % for GPT models. The best-performing model could achieve 49.04% answer correctness even after the increase. Also, despite using tools in response to multi-hop queries, models hallucinated around 10% of the time.

Conclusion: 

This paper presents a comprehensive dataset for solving multi-hop queries using specially designed queries and tools. The main finding from the experiments was that while LLMs have significantly enhanced their ability to solve complex multi-shop queries with the use of tools, their multi-shop tool use capabilities still leave considerable room for improvement.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

FREE UPCOMING AI WEBINAR (JAN 15, 2025): Boost LLM Accuracy with Synthetic Data and Evaluation Intelligence–Join this webinar to gain actionable insights into boosting LLM model performance and accuracy while safeguarding data privacy.
The post ToolHop: A Novel Dataset Designed to Evaluate LLMs in Multi-Hop Tool Use Scenarios appeared first on MarkTechPost.

Microsoft AI Introduces rStar-Math: A Self-Evolved System 2 Deep Think …

Mathematical problem-solving has long been a benchmark for artificial intelligence (AI). Solving math problems accurately requires not only computational precision but also deep reasoning—an area where even advanced language models (LLMs) have traditionally faced challenges. Many existing models rely on what psychologists term “System 1 thinking,” which is fast but often prone to errors. This approach generates solutions in a single inference, bypassing the iterative reasoning process essential for tackling complex problems. Furthermore, training high-quality models relies on curated datasets, which are particularly scarce for competition-level math problems. Open-source methods frequently fail to exceed the capabilities of their “teacher” models, leading to limited progress. Consequently, the development of efficient AI systems capable of addressing these challenges has remained elusive.

Microsoft introduces rStar-Math, a self-evolvable System 2-style reasoning framework designed to enhance mathematical problem-solving in small language models (SLMs). With a compact model size of just 7 billion parameters, rStar-Math demonstrates performance that rivals and occasionally surpasses OpenAI’s o1 model on challenging math competition benchmarks. This system leverages Monte Carlo Tree Search (MCTS) and self-evolution strategies to strengthen the reasoning capabilities of SLMs.

Unlike traditional methods that depend on distillation from larger models, rStar-Math enables small models to independently generate high-quality training data through a step-by-step reasoning process. The framework employs a code-augmented chain-of-thought (CoT) data synthesis, a process preference model (PPM), and iterative self-evolution techniques. These advancements allow rStar-Math to achieve notable accuracy across benchmarks, including the MATH dataset and the USA Math Olympiad (AIME), where it ranks among the top 20% of high school students.

Technical Innovations and Benefits

rStar-Math’s success is underpinned by three core innovations:

Code-Augmented CoT Data Synthesis:

The system uses MCTS rollouts to generate step-by-step verified reasoning trajectories. This method ensures that intermediate steps are validated through Python code execution, filtering out errors and improving overall data quality.

Process Preference Model (PPM):

Unlike conventional reward models, PPM employs pairwise ranking to optimize reasoning steps. This approach avoids noisy annotations and offers fine-grained feedback for step-level optimization, resulting in more reliable intermediate evaluations.

Self-Evolution Recipe:

Through four iterative rounds of self-evolution, rStar-Math progressively refines its policy model and PPM. Starting with a dataset of 747,000 math problems, the system generates millions of high-quality solutions, tackling increasingly challenging problems and enhancing reasoning capabilities with each iteration.

These innovations make rStar-Math a robust tool for both academic and competition-level math challenges. Additionally, by enabling smaller models to self-generate data, it reduces reliance on large, resource-intensive models, broadening access to advanced AI capabilities.

Results and Insights

rStar-Math has redefined benchmarks for small models in math reasoning. On the MATH dataset, it achieves 90.0% accuracy, a significant improvement over the previous 58.8% accuracy of Qwen2.5-Math-7B. Similarly, its performance on Phi3-mini-3.8B improves from 41.4% to 86.4%, representing a notable advancement over OpenAI’s o1-preview model.

In the AIME competition, rStar-Math solves 53.3% of problems, placing it among the top 20% of high school participants. Beyond competitions, the system excels across benchmarks such as Olympiad-level math, college-level problems, and the Gaokao exam, outperforming even larger open-source models. These results highlight its ability to generalize across diverse mathematical challenges.

Key findings from the study include:

Step-by-Step Reasoning Improves Reliability: Verified reasoning trajectories reduce errors in intermediate steps, enhancing overall model performance.

Emergence of Self-Reflection: rStar-Math exhibits the ability to self-correct flawed reasoning paths during problem-solving.

Importance of Reward Models: The PPM’s step-level evaluations play a critical role in achieving high accuracy, emphasizing the value of dense feedback signals in System 2 reasoning.

Conclusion

Microsoft’s rStar-Math highlights the potential of small language models in addressing complex mathematical reasoning tasks. By combining code-augmented synthesis, innovative reward modeling, and iterative self-evolution, the framework achieves remarkable accuracy and reliability. With 90.0% accuracy on the MATH dataset and strong performance in AIME competitions, rStar-Math demonstrates that smaller, efficient models can achieve competitive results.

This advancement not only pushes the boundaries of AI capabilities but also makes sophisticated reasoning models more accessible. As rStar-Math evolves, its potential applications could expand beyond mathematics into areas like scientific research and software development, paving the way for versatile, efficient AI systems to address real-world challenges.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

FREE UPCOMING AI WEBINAR (JAN 15, 2025): Boost LLM Accuracy with Synthetic Data and Evaluation Intelligence–Join this webinar to gain actionable insights into boosting LLM model performance and accuracy while safeguarding data privacy.
The post Microsoft AI Introduces rStar-Math: A Self-Evolved System 2 Deep Thinking Approach that Significantly Boosts the Math Reasoning Capabilities of Small LLMs appeared first on MarkTechPost.