How Should We Maximize the Planning Ability of LLMs While Reducing the …

Artificial Intelligence is rapidly popularizing and for all good reasons. With the introduction of Large Language Models like GPT, BERT, and LLaMA, almost every industry, including healthcare, finance, E-commerce, and media, is making use of these models for tasks like Natural Language Understanding (NLU), Natural Language Generation (NLG), question answering, programming, information retrieval and so on. The very famous ChatGPT, which has been in the headlines ever since its release, has been built with the GPT 3.5 and GPT 4’s transformer technology.

These AI systems imitating humans are heavily dependent on the development of agents that are capable of exhibiting problem-solving abilities similar to humans. The three primary approaches for developing agents that can address complex interactive reasoning tasks are – Deep Reinforcement Learning (RL), which involves training agents through a process of trial and error, Behavior Cloning (BC) through Sequence-to-Sequence (seq2seq) Learning which involves training agents by imitating the behavior of expert agents and Prompting LLMs in which generative agents based on prompting LLMs produce reasonable plans and actions for complex tasks. 

RL-based and seq2seq-based BC approaches have some limitations, such as task decomposition, inability to maintain long-term memory, generalization to unknown tasks, and exception handling. Due to repeated LLM inference at each time step, the prior approaches are also computationally expensive.

Recently, a framework called SWIFTSAGE has been proposed to address these challenges and enable agents to imitate how humans solve complex, open-world tasks. SWIFTSAGE aims to integrate the strengths of behavior cloning and prompt LLMs to enhance task completion performance in complex interactive tasks. The framework draws inspiration from the dual process theory, which suggests that human cognition involves two distinct systems: System 1 and System 2. System 1 involves rapid, intuitive, and automatic thinking, while System 2 entails methodical, analytical, and deliberate thought processes.

The SWIFTSAGE framework consists of two modules – the SWIFT module and the SAGE module. Similar to System 1, the SWIFT module represents quick and intuitive thinking. It is implemented as a compact encoder-decoder language model that has been fine-tuned on the action trajectories of an oracle agent. The SWIFT module encodes short-term memory components like previous actions, observations, visited locations, and the current environment state, followed by decoding the next individual action, thus aiming to simulate the rapid and instinctive decision-making process shown by humans.

The SAGE module, on the other hand, imitates thought processes similar to System 2 and utilizes LLMs such as GPT-4 for subgoal planning and grounding. In the planning stage, LLMs are prompted to locate necessary items, plan, track subgoals, and detect and rectify potential mistakes, while in the grounding stage, LLMs are employed to transform the output subgoals derived from the planning stage into a sequence of executable actions.

The SWIFT and SAGE modules have been integrated through a heuristic algorithm that determines when to activate or deactivate the SAGE module and how to combine the outputs of both modules using an action buffer mechanism. Unlike previous methods that generate only the immediate next action, SWIFTSAGE engages in longer-term action planning. 

For evaluating the performance of SWIFTSAGE, experiments have been conducted on 30 tasks from the ScienceWorld benchmark. The results have shown that SWIFTSAGE significantly outperforms other existing methods, such as SayCan, ReAct, and Reflexion. It achieves higher scores and demonstrates superior effectiveness in solving complex real-world tasks.

In conclusion, SWIFTSAGE is a promising framework that combines the strengths of behavior cloning and prompting LLMs. It thus can be really beneficial in enhancing action planning and improving performance in complex reasoning tasks.

Check Out The Paper, Github link, and Project Page. Don’t forget to join our 22k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at

Check Out 100’s AI Tools in AI Tools Club
The post How Should We Maximize the Planning Ability of LLMs While Reducing the Computation Cost? Meet SwiftSage: A Novel Generative Agent for Complex Interactive Reasoning Tasks, Inspired by the Dual-Process Theory of Human Cognition appeared first on MarkTechPost.