Transformers 4.42 by Hugging Face: Unleashing Gemma 2, RT-DETR, Instru …

Hugging Face has announced the release of Transformers version 4.42, which brings many new features and enhancements to the popular machine-learning library. This release introduces several advanced models, supports new tools and retrieval-augmented generation (RAG), offers GGUF fine-tuning, and incorporates a quantized KV cache, among other improvements.

With Transformers version 4.42, this release of new models, including Gemma 2, RT-DETR, InstructBlip, and LLaVa-NeXT-Video, also makes it more noteworthy. The Gemma 2 model family, developed by the Gemma2 Team at Google, comprises two versions: 2 billion and 7 billion parameters. These models are trained on 6 trillion tokens and have shown remarkable performance across various academic benchmarks in language understanding, reasoning, and safety. They outperformed similarly sized open models in 11 of 18 text-based tasks, showcasing their robust capabilities and responsible development practices.

RT-DETR, or Real-Time DEtection Transformer, is another significant addition. This model, designed for real-time object detection, leverages the transformer architecture to identify and locate multiple objects within images swiftly and accurately. Its development positions it as a formidable competitor in object detection models.

InstructBlip enhances visual instruction tuning using the BLIP-2 architecture. It feeds text prompts to the Q-Former, allowing for more effective visual-language model interactions. This model promises improved performance in tasks that require visual and textual understanding.

LLaVa-NeXT-Video builds upon the LLaVa-NeXT model by incorporating both video and image datasets. This enhancement enables the model to perform state-of-the-art video understanding tasks, making it a valuable tool for zero-shot video content analysis. The AnyRes technique, which represents high-resolution images as multiple smaller images, is crucial in this model’s ability to generalize from images to video frames effectively.

Image Source

Tool usage and RAG support have also significantly improved. Hugging Face automatically generates JSON schema descriptions for Python functions, facilitating seamless integration with tool models. A standardized API for tool models ensures compatibility across various implementations, targeting the Nous-Hermes, Command-R, and Mistral/Mixtral model families for imminent support.

Another noteworthy enhancement is GGUF fine-tuning support. This feature allows users to fine-tune models within the Python/Hugging Face ecosystem and then convert them back to GGUF/GGML/llama.cpp libraries. This flexibility ensures that models can be optimized and deployed in diverse environments.

Quantization improvements, including adding a quantized KV cache, further reduce memory requirements for generative models. This update, coupled with a comprehensive overhaul of the quantization documentation, provides users with clearer guidance on selecting the most suitable quantization methods for their needs.

In addition to these major updates, Transformers 4.42 includes several other enhancements. New instance segmentation examples have been added, enabling users to leverage Hugging Face pretrained model weights as backbones for vision models. The release also features bug fixes and optimizations, as well as the removal of deprecated components like the ConversationalPipeline and Conversation object.

In conclusion, Transformers 4.42 represents a significant development for Hugging Face’s machine-learning library. With its new models, enhanced tool support, and numerous optimizations, this release solidifies Hugging Face’s position as a leader in NLP and machine learning.


The post Transformers 4.42 by Hugging Face: Unleashing Gemma 2, RT-DETR, InstructBlip, LLaVa-NeXT-Video, Enhanced Tool Usage, RAG Support, GGUF Fine-Tuning, and Quantized KV Cache appeared first on MarkTechPost.

This AI Paper from UC Berkeley Research Highlights How Task Decomposit …

Artificial Intelligence (AI) systems are rigorously tested before they are released to determine whether they can be used for dangerous activities like bioterrorism, manipulation, or automated cybercrimes. This is especially crucial for powerful AI systems, as they are programmed to reject commands that can negatively affect them. Conversely, less powerful open-source models frequently have weaker rejection mechanisms that are easily overcome with more training.

In recent research, a team of researchers from UC Berkeley has shown that even with these safety measures, guaranteeing the security of individual AI models is insufficient. Even while each model seems safe on its own, adversaries can abuse combinations of models. They accomplish this by using a tactic known as task decomposition, which divides a difficult malicious activity into smaller tasks. Then, distinct models are given subtasks, in which competent frontier models handle the benign but difficult subtasks, whereas weaker models with laxer safety precautions handle the malicious but easy subtasks.

To demonstrate this, the team has formalized a threat model in which an adversary uses a set of AI models to attempt to produce a detrimental output, an example of which is a malicious Python script. The adversary chooses models and prompts iteratively to get the intended harmful result. In this instance, success indicates that the adversary has used the joint efforts of several models to produce a detrimental output.

The team has studied both automated and manual task decomposition techniques. In manual task decomposition, a human determines how to divide a task into manageable portions. For tasks that are too complicated for manual decomposition, the team has used automatic decomposition. This method involves the following steps: a strong model solves related benign tasks, a weak model suggests them and the weak model uses the solutions to carry out the initial malicious task.

The results have shown that combining models can greatly boost the success rate of producing damaging effects compared to employing individual models alone. For example, while developing susceptible code, the success rate of merging Llama 2 70B and Claude 3 Opus models was 43%, but neither model worked better than 3% by itself.

The team has also found that the quality of both the weaker and stronger models correlates with the likelihood of misuse. This implies that the likelihood of multi-model misuse will rise as AI models get better. This misuse potential could be further increased by employing other decomposition techniques, such as training the weak model to exploit the strong model through reinforcement learning or using the weak model as a general agent that continually calls the strong model.

In conclusion, this study has highlighted the necessity of ongoing red-teaming, which includes experimenting with different AI model configurations to find potential misuse hazards. This is a procedure that should be followed by developers for the duration of an AI model’s deployment lifecycle because updates can create new vulnerabilities. 

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. 

Join our Telegram Channel and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 45k+ ML SubReddit

Create, edit, and augment tabular data with the first compound AI system, Gretel Navigator, now generally available! [Advertisement]
The post This AI Paper from UC Berkeley Research Highlights How Task Decomposition Breaks the Safety of Artificial Intelligence (AI) Systems, Leading to Misuse appeared first on MarkTechPost.

Role of LLMs like ChatGPT in Scientific Research: The Integration of S …

In the contemporary landscape of scientific research, the transformative potential of AI has become increasingly evident. This is particularly true when applying scalable AI systems to high-performance computing (HPC) platforms. This exploration of scalable AI for science underscores the necessity of integrating large-scale computational resources with vast datasets to address complex scientific challenges.

The success of AI models like ChatGPT highlights two primary advancements crucial for their effectiveness: 

The development of the transformer architecture 

The ability to train on extensive amounts of internet-scale data

These elements have set the foundation for significant scientific breakthroughs, as seen in efforts such as black hole modeling, fluid dynamics, and protein structure prediction. For instance, one study utilized AI and large-scale computing to advance models of black hole mergers, leveraging a dataset of 14 million waveforms on the Summit supercomputer.

Image Source

A prime example of scalable AI’s impact is drug discovery, where transformer-based language models (LLMs) have revolutionized the exploration of chemical space. These models use extensive datasets and fine-tuning on specific tasks to autonomously learn and predict molecular structures, thereby accelerating the discovery process. LLMs can efficiently explore the chemical space by employing tokenization and mask prediction techniques, integrating pre-trained models for molecules and protein sequences with fine-tuning on small labeled datasets to enhance performance.

High-performance computing is indispensable for achieving such scientific advancements. Different scientific problems necessitate varying levels of computational scale, and HPC provides the infrastructure to handle these diverse requirements. This distinction sets AI for Science (AI4S) apart from consumer-centric AI, often dealing with sparse, high-precision data from costly experiments or simulations. Scientific AI requires handling specific scientific data characteristics, including incorporating known domain knowledge such as partial differential equations (PDEs). Physics-informed neural networks (PINNs), neural ordinary differential equations (NODEs), and universal differential equations (UDEs) are methodologies developed to meet these unique requirements.

Scaling AI systems involves both model-based and data-based parallelism. For example, training a large model like GPT-3 on a single NVIDIA V100 GPU would take centuries, but using parallel scaling techniques can reduce this time to just over a month on thousands of GPUs. These scaling methods are essential not only for faster training but also for enhancing model performance. Parallel scaling has two main approaches: model-based parallelism, needed when models exceed GPU memory capacity, and data-based parallelism, arising from the large data required for training.

Scientific AI differs from consumer AI in its data handling and precision requirements. While consumer applications might rely on 8-bit integer inferences, scientific models often need high-precision floating-point numbers and strict adherence to physical laws. This is particularly true for simulation surrogate models, where integrating machine learning with traditional physics-based approaches can yield more accurate and cost-effective results. Neural networks in physics-based applications might need to impose boundary conditions or conservation laws, especially in surrogate models that replace parts of larger simulations.

One critical aspect of AI4S is accommodating the specific characteristics of scientific data. This includes handling physical constraints and incorporating known domain knowledge, such as PDEs. Soft penalty constraints, neural operators, and symbolic regression are methods used in scientific machine learning. For instance, PINNs incorporate the PDE residual norm in the loss function, ensuring that the model optimizer minimizes both data loss and the PDE residual, leading to a satisfying physics approximation.

Parallel scaling techniques are diverse, including data-parallel and model-parallel approaches. Data-parallel training involves dividing a large batch of data across multiple GPUs, each processing a portion of the data simultaneously. On the other hand, model-parallel training distributes different parts of the model across various devices, which is particularly useful when the model size exceeds the memory capacity of a single GPU. Spatial decomposition can be applied in many scientific contexts where data samples are too large to fit on a single device.

The evolution of AI for science includes the development of hybrid AI-simulation workflows, such as cognitive simulations (CogSim) and digital twins. These workflows blend traditional simulations with AI models to enhance prediction accuracy and decision-making processes. For instance, in neutron scattering experiments, AI-driven methods can reduce the time required for experimental decision-making by providing real-time analysis and steering capabilities.

Image Source

Several trends are shaping the landscape of scalable AI for science. The shift towards mixture-of-experts (MoE) models, which are sparsely connected and thus more cost-effective than monolithic models, is gaining traction. These models can handle many parameters efficiently, making them suitable for complex scientific tasks. The concept of an autonomous laboratory driven by AI is another exciting development. With integrated research infrastructures (IRIs) and foundation models, these labs can conduct real-time experiments and analyses, expediting scientific discovery.

The limitations of transformer-based models, such as context length and computational expense, have renewed interest in linear recurrent neural networks (RNNs), which offer greater efficiency for long token lengths. Additionally, operator-based models for solving PDEs are becoming more prominent, allowing AI to simulate entire classes of problems rather than individual instances.

Finally, interpretability and explainability in AI models must be considered. As scientists remain cautious of AI/ML methods, developing tools to elucidate the rationale behind AI predictions is crucial. Techniques like Class Activation Mapping (CAM) and attention map visualization help provide insights into how AI models make decisions, fostering trust and broader adoption in the scientific community.


The post Role of LLMs like ChatGPT in Scientific Research: The Integration of Scalable AI and High-Performance Computing to Address Complex Challenges and Accelerate Discovery Across Diverse Fields appeared first on MarkTechPost.

Dolphin{anty} Antidetect Browser: The Ultimate Antidetect Browser for …

Dolphin{anty} antidetect browser stands out as a robust solution designed to help users maintain anonymity, manage multiple accounts efficiently, and navigate the web securely. 

This article delves into the features, applications, setup processes, and advantages of using the Dolphin{anty} antidetect browser.

Comprehensive Browser Fingerprint Management

One of the core functionalities of Dolphin{anty} is its ability to create unique browser fingerprints for each profile. This involves altering HTTP headers, spoofing device information, and isolating cookies and local storage for each browser instance. Dolphin{anty} ensures that each browser profile appears as a distinct entity to tracking systems. This is crucial for users who need to operate multiple accounts without them being linked by websites or online services.

Multi-Account Management

Dolphin{anty} excels in managing multiple online accounts simultaneously. 

Users can create numerous browser profiles, each with its unique settings and fingerprint, making it ideal for tasks that require multiple identities, such as social media management, affiliate marketing, and e-commerce. This eliminates the risk of account bans or cross-account tracking, ensuring smooth and secure operations across various platforms.

Advanced Automation with Scenario Builder

The Scenario Builder feature in Dolphin{anty} allows users to automate repetitive tasks through scripting. 

By creating scripts that perform actions such as form filling, logging into accounts, and navigating web pages, users can save significant time and effort. This is beneficial for activities that involve managing large volumes of accounts or performing routine online tasks.

Seamless Proxy and VPN Integration

To further enhance user anonymity, Dolphin{anty} supports integration with proxies and VPNs. 

Users can configure different proxies for each browser profile, ensuring that their IP addresses remain untraceable and tailored to the specific needs of each account. This is essential for users who need to mask their real IP addresses and operate accounts from various geographic locations.

Team Collaboration and Profile Sharing

Dolphin{anty} offers robust features for team collaboration. Users can easily share browser profiles with team members, facilitating efficient management and operation of multiple accounts. This helps businesses and teams working on large-scale projects, as it streamlines workflows and enhances productivity.

Where to use Dolphin{anty}

1. Affiliate Marketing

For affiliate marketers, managing multiple accounts without detection is crucial for maximizing reach and managing campaigns effectively. Dolphin{anty} enables marketers to create and operate numerous accounts, track performance, and optimize strategies without the risk of being flagged by affiliate networks.

2. Social Media Management

Social media managers can benefit significantly from Dolphin{anty} by handling multiple client accounts, automating posts, and interacting with followers without risking account bans. The ability to manage various social media profiles from a single interface simplifies the process and enhances efficiency.

3. Trading

Cryptocurrency and stock traders can use Dolphin{anty} to manage multiple exchange accounts, distribute risks, and implement different trading strategies. The browser’s ability to create distinct profiles for each account ensures that traders can operate securely and anonymously across various exchanges

4. E-commerce and Online Stores

E-commerce professionals can manage multiple online stores and customer accounts without the risk of being blocked. Dolphin{anty} allows users to handle different sales strategies and promotions across various platforms, ensuring smooth operations and increased sales.

Setting Up Dolphin{anty}

1. Installation and Account Setup

Installing Dolphin{anty} is straightforward. Users need to download the browser from the official website and install it on their PC. Once installed, users must register an account to use the browser. It is crucial to download the software only from the official website to avoid malware and ensure security

2. Creating Browser Profiles 

Creating a browser profile in Dolphin{anty} involves entering basic information such as profile name, operating system, proxies, and cookies. Users can also configure advanced settings like User-Agent, WebRTC, WebGL, Canvas, time zone, memory, and screen settings. This flexibility allows users to tailor each profile to their specific needs and requirements.

3. Using the Scenario Builder

The Scenario Builder feature allows users to automate tasks by creating scripts. Users can visualize and plan every action, such as clicks, button presses, and login/password entries. This automation capability is particularly useful for managing large volumes of accounts and performing repetitive online tasks.

In conclusion, Dolphin{anty} antidetect browser is a comprehensive solution for maintaining online anonymity and managing multiple accounts efficiently. 

Its robust features, including browser fingerprint management, multi-account management, advanced automation, proxy and VPN integration, and team collaboration, make it an invaluable tool for affiliate marketers, social media managers, cryptocurrency traders, and e-commerce professionals. 

With flexible pricing plans and a strong emphasis on security, Dolphin{anty} stands out as the best antidetect browser on the market. Get started today. 

Thanks to Dolphin{anty} for the thought leadership/ Educational article. Dolphin{anty} has supported and sponsored us in this content/article.
The post Dolphin{anty} Antidetect Browser: The Ultimate Antidetect Browser for Online Anonymity and Multi-Account Management appeared first on MarkTechPost.

Top Artificial Intelligence AI Books to Read in 2024

Artificial Intelligence (AI) has been making significant strides over the past few years, with the emergence of Large Language Models (LLMs) marking a major milestone in its growth. With such widespread adoption, feeling left out of this revolution is not uncommon. One way an individual can stay updated with the latest trends is by reading books on various facets of AI. Following are the top AI books one should read in 2024.

Deep Learning (Adaptive Computation and Machine Learning series)

This book covers a wide range of deep learning topics along with their mathematical and conceptual background. It also provides information on the different deep learning techniques used in various industrial applications.

Python: Advanced Guide to Artificial Intelligence

This book helps individuals familiarize themselves with the most popular machine learning (ML) algorithms and delves into the details of deep learning, covering topics like CNN, RNN, etc. It provides a comprehensive understanding of advanced AI concepts while focusing on their practical implementation using Python.

Machine Learning (in Python and R) for Dummies

This book explains the fundamentals of machine learning by providing practical examples using Python and R. It is a beginner-friendly guide and a good starting point for people new to this field.

Machine Learning for Beginners

Given the pace with which machine learning systems are growing, this book provides a good base for anyone shifting to this field. The author talks about machine intelligence’s historical background and provides beginners with information on how advanced algorithms work.

Artificial Intelligence: A Modern Approach

This is a well-acclaimed book that covers the breadth of AI topics, including problem-solving, knowledge representation, machine learning, and natural language processing. It provides theoretical explanations along with practical examples, making it an excellent starting point for anyone looking to dive into the world of AI.

Human Compatible: Artificial Intelligence and the Problem of Control

The book discusses the inevitable conflict between humans and machines, providing important context before we advocate for AI. The author also talks about the possibility of superhuman AI and questions the concepts of human comprehension and machine learning.

The Alignment Problem: Machine Learning and Human Values

This book talks about a concept called “The Alignment Problem,” where the systems we aim to teach, don’t perform as expected, and various ethical and existential risks emerge.

Life 3.0: Being Human in the Age of Artificial Intelligence

The author of this book talks about questions like what the future of AI will look like and the possibility of superhuman intelligence becoming our master. He also talks about how we can ensure these systems perform without malfunctioning.

The Coming Wave: Technology, Power, and the Twenty-First Century’s Greatest Dilemma

This book warns about the risks that emerging technologies pose to global order. It covers topics like robotics and large language models and examines the forces that fuel these innovations.

Artificial Intelligence Engines: A Tutorial Introduction to the Mathematics of Deep Learning

“Artificial Intelligence Engines” dives into the mathematical foundations of deep learning. It provides a holistic understanding of deep learning, covering both the historical development of neural networks as well as modern techniques and architecture while focusing on the underlying mathematical concepts.

Neural Networks and Deep Learning

This book covers the fundamental concepts of neural networks and deep learning. It also covers the mathematical aspects of the same, covering topics like linear algebra, probability theory, and numerical computation.

Artificial Intelligence for Humans

This book explains how AI algorithms are used using actual numeric calculations. The book aims to target those without an extensive mathematical background and each unit is followed by examples in different programming languages.

AI Superpowers: China, Silicon Valley, and the New World Order

The author of this book explains the unexpected consequences of AI development. The book sheds light on the competition between the USA and China over AI innovations through actual events.

Hello World: Being Human in the Age of Algorithms

The author talks about the powers and limitations of the algorithms that are widely used today. The book prepares its readers for the moral uncertainties of a world run by code.

The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World

This book talks about the concept of the “Master algorithm,” which is a single, overarching learning algorithm capable of incorporating different approaches.

Applied Artificial Intelligence: A Handbook for Business Leaders

“Applied Artificial Intelligence” provides a guide for businesses on how to leverage AI to drive innovation and growth. It covers various applications of AI and also explores its ethical considerations. Additionally, it sheds light on building AI teams and talent acquisition. 

Superintelligence: Paths, Dangers, Strategies

This book asks questions like whether AI agents will save or destroy us and what happens when machines surpass humans in general intelligence. The author talks about the importance of global collaboration in developing safe AI.

We make a small profit from purchases made via referral/affiliate links attached to each book mentioned in the above list.

If you want to suggest any book that we missed from this list, then please email us at
The post Top Artificial Intelligence AI Books to Read in 2024 appeared first on MarkTechPost.

Meta AI Introduces Meta LLM Compiler: A State-of-the-Art LLM that Buil …

Software engineering has witnessed remarkable advancements with the development of Large Language Models (LLMs). These models, trained on extensive datasets, have demonstrated proficiency in various tasks, including code generation, translation, and optimization. LLMs are increasingly utilized for compiler optimization, a critical process that transforms source code to enhance performance and efficiency while maintaining functionality. However, traditional code optimization methods are often labor-intensive and require specialized knowledge of the target programming language and the underlying hardware architecture, posing significant challenges as software grows in complexity and scale.

The main issue in software development is achieving efficient code optimization across diverse hardware architectures. This complexity is compounded by the time-consuming nature of traditional optimization methods, which demand deep expertise. As software systems expand, achieving optimal performance becomes increasingly challenging, necessitating advanced tools and methodologies that can effectively handle the intricacies of modern codebases.

Approaches to code optimization have employed machine learning algorithms to guide the process. These methods involve representing code in various forms, such as graphs or numeric features, to facilitate understanding and optimization by the algorithms. However, these representations often need more critical details, leading to suboptimal performance. While LLMs like Code Llama and GPT-4 have been used for minor optimization tasks, they need specialized training for comprehensive compiler optimization, limiting their effectiveness in this domain.

Researchers at Meta AI have introduced the Meta Large Language Model Compiler (LLM Compiler), specifically designed for code optimization tasks. This innovative tool is built on Code Llama’s foundation and fine-tuned on an extensive dataset of 546 billion tokens of LLVM intermediate representations (IRs) and assembly code. The Meta AI team has aimed to address the specific needs of compiler optimization by leveraging this extensive training, making the model available under a bespoke commercial license to facilitate broad use by academic researchers and industry practitioners.

The LLM Compiler undergoes a robust pre-training process involving 546 billion tokens of compiler-centric data, followed by instruction fine-tuning 164 billion tokens for downstream tasks such as flag tuning and disassembly. The model is available in 7 billion and 13 billion parameters. This detailed training process enables the model to perform sophisticated code size optimization and accurately convert assembly code back into LLVM-IR. The training stages include understanding the input code, applying various optimization passes, and predicting the resulting optimized code and size. This multi-stage training pipeline ensures that the LLM Compiler is adept at handling complex optimization tasks efficiently.

Image Source

The performance of the LLM Compiler achieves 77% of the optimizing potential of traditional autotuning methods without extensive compilations. The model attains a 45% round-trip disassembly rate in the disassembly task, with a 14% exact match accuracy. These results highlight the model’s effectiveness in producing optimized code and accurately reversing assembly to its intermediate representation. Compared to other models like Code Llama and GPT-4 Turbo, the LLM Compiler significantly outperforms them in specific tasks, demonstrating its advanced capabilities in compiler optimization.

Leveraging extensive training on compiler-specific data provides a scalable and cost-effective solution for academic researchers and industry practitioners. This innovation addresses the challenges of code optimization, offering an effective tool for enhancing software performance across various hardware platforms. The model’s availability in two sizes, coupled with its robust performance metrics, underscores its potential to revolutionize the approach to compiler optimization tasks. 

Image Source

In conclusion, the Meta LLM Compiler is a groundbreaking tool in code and compiler optimization. By building on the foundational capabilities of Code Llama and enhancing them with specialized training, the LLM Compiler addresses critical challenges in software development. Its ability to efficiently optimize code and impressive performance metrics make it a valuable asset for researchers and practitioners. This model simplifies the optimization process and sets a new benchmark for future advancements in the field.

Check out the Paper and HF Repo. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. 

Join our Telegram Channel and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 45k+ ML SubReddit

Create, edit, and augment tabular data with the first compound AI system, Gretel Navigator, now generally available! [Advertisement]
The post Meta AI Introduces Meta LLM Compiler: A State-of-the-Art LLM that Builds upon Code Llama with Improved Performance for Code Optimization and Compiler Reasoning appeared first on MarkTechPost.

The future of productivity agents with NinjaTech AI and AWS Trainium

This is a guest post by Arash Sadrieh, Tahir Azim, and Tengfui Xue from NinjaTech AI.
NinjaTech AI’s mission is to make everyone more productive by taking care of time-consuming complex tasks with fast and affordable artificial intelligence (AI) agents. We recently launched, one of the world’s first multi-agent personal AI assistants, to drive towards our mission. is built from the ground up using specialized agents that are capable of completing tasks on your behalf, including scheduling meetings, conducting deep research from the web, generating code, and helping with writing. These agents can break down complicated, multi-step tasks into branched solutions, and are capable of evaluating the generated solutions dynamically while continually learning from past experiences. All of these tasks are accomplished in a fully autonomous and asynchronous manner, freeing you up to continue your day while Ninja works on these tasks in the background, and engaging when your input is required.

Because no single large language model (LLM) is perfect for every task, we knew that building a personal AI assistant would require multiple LLMs optimized specifically for a variety of tasks. In order to deliver the accuracy and capabilities to delight our users, we also knew that we would require these multiple models to work together in tandem. Finally, we needed scalable and cost-effective methods for training these various models—an undertaking that has historically been costly to pursue for most startups. In this post, we describe how we built our cutting-edge productivity agent NinjaLLM, the backbone of, using AWS Trainium chips.
Building a dataset
We recognized early that to deliver on the mission of tackling tasks on a user’s behalf, we needed multiple models that were optimized for specific tasks. Examples include our Deep Researcher, Deep Coder, and Advisor models. After testing available open source models, we felt that the out-of-the-box capabilities and responses were insufficient with prompt engineering alone to meet our needs. Specifically, in our testing with open source models, we wanted to make sure each model was optimized for a ReAct/chain-of-thought style of prompting. Additionally, we wanted to make sure the model would, when deployed as part of a Retrieval Augmented Generation (RAG) system, accurately cite each source, as well as any bias towards saying “I don’t know” as opposed to generating false answers. For that purpose, we chose to fine-tune the models for the various downstream tasks.
In constructing our training dataset, our goal was twofold: adapt each model for its suited downstream task and persona (Researcher, Advisor, Coder, and so on), and adapt the models to follow a specific output structure. To that end, we followed the Lima approach for fine-tuning. We used a training sample size of roughly 20 million tokens, focusing on the format and tone of the output while using a diverse but relatively small sample size. To construct our supervised fine-tuning dataset, we began by creating initial seed tasks for each model. With these seed tasks, we generated an initial synthetic dataset using Meta’s Llama 2 model. We were able to use the synthetic dataset to perform an initial round of fine-tuning. To initially evaluate the performance of this fine-tuned model, we crowd-sourced user feedback to iteratively create more samples. We also used a series of benchmarks—internal and public—to assess model performance and continued to iterate.
Fine-tuning on Trainium
We elected to start with the Llama models for a pre-trained base model for several reasons: most notably the great out-of-the-box performance, strong ecosystem support from various libraries, and the truly open source and permissive license. At the time, we began with Llama 2, testing across the various sizes (7B, 13B, and 70B). For training, we chose to use a cluster of trn1.32xlarge instances to take advantage of Trainium chips. We used a cluster of 32 instances in order to efficiently parallelize the training. We also used AWS ParallelCluster to manage cluster orchestration. By using a cluster of Trainium instances, each fine-tuning iteration took less than 3 hours, at a cost of less than $1,000. This quick iteration time and low cost, allowed us to quickly tune and test our models and improve our model accuracy. To achieve the accuracies discussed in the following sections, we only had to spend around $30k, savings hundreds of thousands, if not millions of dollars if we had to train on traditional training accelerators.
The following diagram illustrates our training architecture.

After we had established our fine-tuning pipelines built on top of Trainium, we were able to fine-tune and refine our models thanks to the Neuron Distributed training libraries. This was exceptionally useful and timely, because leading up to the launch of, Meta’s Llama 3 models were released. Llama 3 and Llama 2 share similar architecture, so we were able to rapidly upgrade to the newer model. This velocity in switching allowed us to take advantage of the inherent gains in model accuracy, and very quickly run through another round of fine-tuning with the Llama 3 weights and prepare for launch.
Model evaluation
For evaluating the model, there were two objectives: evaluate the model’s ability to answer user questions, and evaluate the system’s ability to answer questions with provided sources, because this is our personal AI assistant’s primary interface. We selected the HotPotQA and Natural Questions (NQ) Open datasets, both of which are a good fit because of their open benchmarking datasets with public leaderboards.
We calculated accuracy by matching the model’s answer to the expected answer, using the top 10 passages retrieved from a Wikipedia corpus. We performed content filtering and ranking using ColBERTv2, a BERT-based retrieval model. We achieved accuracies of 62.22% on the NQ Open dataset and 58.84% on HotPotQA by using our enhanced Llama 3 RAG model, demonstrating notable improvements over other baseline models. The following figure summarizes our results.

Future work
Looking ahead, we’re working on several developments to continue improving our model’s performance and user experience. First, we intend to use ORPO to fine-tune our models. ORPO combines traditional fine-tuning with preference alignment, while using a single preference alignment dataset for both. We believe this will allow us to better align models to achieve better results for users.
Additionally, we intend to build a custom ensemble model from the various models we have fine-tuned thus far. Inspired by Mixture of Expert (MoE) model architectures, we intend to introduce a routing layer to our various models. We believe this will radically simplify our model serving and scaling architecture, while maintaining the quality in various tasks that our users have come to expect from our personal AI assistant.
Building next-gen AI agents to make everyone more productive is NinjaTech AI’s pathway to achieving its mission. To democratize access to this transformative technology, it is critical to have access to high-powered compute, open source models, and an ecosystem of tools that make training each new agent affordable and fast. AWS’s purpose-built AI chips, access to the top open source models, and its training architecture make this possible.
To learn more about how we built NinjaTech AI’s multi-agent personal AI, you can read our whitepaper. You can also try these AI agents for free at

About the authors
 Arash Sadrieh is the Co-Founder and Chief Science Officer at Arash co-founded with a vision to make everyone more productive by taking care of time-consuming tasks with AI agents. This vision was shaped during his tenure as a Senior Applied Scientist at AWS, where he drove key research initiatives that significantly improved infrastructure efficiency over six years, earning him multiple patents for optimizing core infrastructure. His academic background includes a PhD in computer modeling and simulation, with collaborations with esteemed institutions such as Oxford University, Sydney University, and CSIRO. Prior to his industry tenure, Arash had a postdoctoral research tenure marked by publications in high-impact journals, including Nature Communications.
Tahir Azim is a Staff Software Engineer at NinjaTech. Tahir focuses on NinjaTech’s Inf2 and Trn1 based training and inference platforms, its unified gateway for accessing these platforms, and its RAG-based research skill. He previously worked at Amazon as a senior software engineer, building data-driven systems for optimal utilization of Amazon’s global Internet edge infrastructure, driving down cost, congestion and latency. Before moving to industry, Tahir earned an M.S. and Ph.D. in Computer Science from Stanford University, taught for three years as an assistant professor at NUST(Pakistan), and did a post-doc in fast data analytics systems at EPFL. Tahir has authored several publications presented at top-tier conferences such as VLDB, USENIX ATC, MobiCom and MobiHoc.
Tengfei Xue is an Applied Scientist at NinjaTech AI. His current research interests include natural language processing and multimodal learning, particularly using large language models and large multimodal models. Tengfei completed his PhD studies at the School of Computer Science, University of Sydney, where he focused on deep learning for healthcare using various modalities. He was also a visiting PhD candidate at the Laboratory of Mathematics in Imaging (LMI) at Harvard University, where he worked on 3D computer vision for complex geometric data.

Build generative AI applications on Amazon Bedrock — the secure, com …

Generative AI has revolutionized industries by creating content, from text and images to audio and code. Although it can unlock numerous possibilities, integrating generative AI into applications demands meticulous planning. Amazon Bedrock is a fully managed service that provides access to large language models (LLMs) and other foundation models (FMs) from leading AI companies through a single API. It provides a broad set of tools and capabilities to help build generative AI applications.
Starting today, I’ll be writing a blog series to highlight some of the key factors driving customers to choose Amazon Bedrock. One of the most important reason is that Bedrock enables customers to build a secure, compliant, and responsible foundation for generative AI applications. In this post, I explore how Amazon Bedrock helps address security and privacy concerns, enables secure model customization, accelerates auditability and incident response, and fosters trust through transparency and responsible AI. Plus, I’ll showcase real-world examples of companies building secure generative AI applications on Amazon Bedrock—demonstrating its practical applications across different industries.
Listening to what our customers are saying
During the past year, my colleague Jeff Barr, VP & Chief Evangelist at AWS, and I have had the opportunity to speak with numerous customers about generative AI. They mention compelling reasons for choosing Amazon Bedrock to build and scale their transformative generative AI applications. Jeff’s video highlights some of the key factors driving customers to choose Amazon Bedrock today.

As you build and operationalize generative AI, it’s important not to lose sight of critically important elements—security, compliance, and responsible AI—particularly for use cases involving sensitive data. The OWASP Top 10 For LLMs outlines the most common vulnerabilities, but addressing these may require additional efforts including stringent access controls, data encryption, preventing prompt injection attacks, and compliance with policies. You want to make sure your AI applications work reliably, as well as securely.
Making data security and privacy a priority
Like many organizations starting their generative AI journey, the first concern is to make sure the organization’s data remains secure and private when used for model tuning or Retrieval Augmented Generation (RAG). Amazon Bedrock provides a multi-layered approach to address this issue, helping you ensure that your data remains secure and private throughout the entire lifecycle of building generative AI applications:

Data isolation and encryption. Any customer content processed by Amazon Bedrock, such as customer inputs and model outputs, is not shared with any third-party model providers, and will not be used to train the underlying FMs. Furthermore, data is encrypted in-transit using TLS 1.2+ and at-rest through AWS Key Management Service (AWS KMS).
Secure connectivity options. Customers have flexibility with how they connect to Amazon Bedrock’s API endpoints. You can use public internet gateways, AWS PrivateLink (VPC endpoint) for private connectivity, and even backhaul traffic over AWS Direct Connect from your on-premises networks.
Model access controls. Amazon Bedrock provides robust access controls at multiple levels. Model access policies allow you to explicitly allow or deny enabling specific FMs for your account. AWS Identity and Access Management (IAM) policies let you further restrict which provisioned models your applications and roles can invoke, and which APIs on those models can be called.

Druva provides a data security software-as-a-service (SaaS) solution to enable cyber, data, and operational resilience for all businesses. They used Amazon Bedrock to rapidly experiment, evaluate, and implement different LLM components tailored to solve specific customer needs around data protection without worrying about the underlying infrastructure management.

“We built our new service Dru — an AI co-pilot that both IT and business teams can use to access critical information about their protection environments and perform actions in natural language — in Amazon Bedrock because it provides fully managed and secure access to an array of foundation models,”
– David Gildea, Vice President of Product, Generative AI at Druva.

Ensuring secure customization
A critical aspect of generative AI adoption for many organizations is the ability to securely customize the application to align with your specific use cases and requirements, including RAG or fine-tuning FMs. Amazon Bedrock offers a secure approach to model customization, so sensitive data remains protected throughout the entire process:

Model customization data security. When fine-tuning a model, Amazon Bedrock uses the encrypted training data from an Amazon Simple Storage Service (Amazon S3) bucket through a private VPC connection. Amazon Bedrock doesn’t use model customization data for any other purpose. Your training data isn’t used to train the base Amazon Titan models or distributed to third parties. Nor is other usage data, such as usage timestamps, logged account IDs, and other information logged by the service, used to train the models. In fact, none of the training or validation data you provide for fine tuning or continued pre-training is stored by Amazon Bedrock. When the model customization work is complete—it remains isolated and encrypted with your KMS keys.
Secure deployment of fine-tuned models. The pre-trained or fine-tuned models are deployed in isolated environments specifically for your account. You can further encrypt these models with your own KMS keys, preventing access without appropriate IAM permissions.
Centralized multi-account model access.  AWS Organizations provides you with the ability to centrally manage your environment across multiple accounts. You can create and organize accounts in an organization, consolidate costs, and apply policies for custom environments. For organizations with multiple AWS accounts or a distributed application architecture, Amazon Bedrock supports centralized governance and access to FMs – you can secure your environment, create and share resources, and centrally manage permissions. Using standard AWS cross-account IAM roles, administrators can grant secure access to models across different accounts, enabling controlled and auditable usage while maintaining a centralized point of control.

With seamless access to LLMs in Amazon Bedrock—and with data encrypted in-transit and at-rest—BMW Group securely delivers high-quality connected mobility solutions to motorists around the world.

“Using Amazon Bedrock, we’ve been able to scale our cloud governance, reduce costs and time to market, and provide a better service for our customers. All of this is helping us deliver the secure, first-class digital experiences that people across the world expect from BMW.”
– Dr. Jens Kohl, Head of Offboard Architecture, BMW Group.

Enabling auditability and visibility
In addition to the security controls around data isolation, encryption, and access, Amazon Bedrock provides capabilities to enable auditability and accelerate incident response when needed:

Compliance certifications. For customers with stringent regulatory requirements, you can use Amazon Bedrock in compliance with the General Data Protection Regulation (GDPR), Health Insurance Portability and Accountability Act (HIPAA), and more. In addition, AWS has successfully extended the registration status of Amazon Bedrock in Cloud Infrastructure Service Providers in Europe Data Protection Code of Conduct (CISPE CODE) Public Register. This declaration provides independent verification and an added level of assurance that Amazon Bedrock can be used in compliance with the GDPR. For Federal agencies and public sector organizations, Amazon Bedrock recently announced FedRAMP Moderate, approved for use in our US East and West AWS Regions. Amazon Bedrock is also under JAB review for FedRAMP High authorization in AWS GovCloud (US).
Monitoring and logging. Native integrations with Amazon CloudWatch and AWS CloudTrail provide comprehensive monitoring, logging, and visibility into API activity, model usage metrics, token consumption, and other performance data. These capabilities enable continuous monitoring for improvement, optimization, and auditing as needed – something we know is critical from working with customers in the cloud for the last 18 years. Amazon Bedrock allows you to enable detailed logging of all model inputs and outputs, including IAM invocation role, and metadata associated with all calls that are performed in your account. These logs facilitate monitoring model responses to adhere to your organization’s AI policies and reputation guidelines. When you enable log model invocation logging, you can use AWS KMS to encrypt your log data, and use IAM policies to protect who can access your log data. None of this data is stored within Amazon Bedrock, and is only available within a customer’s account.

Implementing responsible AI practices
AWS is committed to developing generative AI responsibly, taking a people-centric approach that prioritizes education, science, and our customers, to integrate responsible AI across the full AI lifecycle. With AWS’s comprehensive approach to responsible AI development and governance, Amazon Bedrock empowers you to build trustworthy generative AI systems in line with your responsible AI principles.
We give our customers the tools, guidance, and resources they need to get started with purpose-built services and features, including several in Amazon Bedrock:

Safeguard generative AI applications– Guardrails for Amazon Bedrock is the only responsible AI capability offered by a major cloud provider that enables customers to customize and apply safety, privacy, and truthfulness checks for your generative AI applications. Guardrails helps customers block as much as 85% more harmful content than protection natively provided by some FMs on Amazon Bedrock today. It works with all LLMs in Amazon Bedrock, fine-tuned models, and also integrates with Agents and Knowledge Bases for Amazon Bedrock. Customers can define content filters with configurable thresholds to help filter harmful content across hate speech, insults, sexual language, violence, misconduct (including criminal activity), and prompt attacks (prompt injection and jailbreak). Using a short natural language description, Guardrails for Amazon Bedrock allows you to detect and block user inputs and FM responses that fall under restricted topics or sensitive content such as personally identifiable information (PII). You can combine multiple policy types to configure these safeguards for different scenarios and apply them across FMs on Amazon Bedrock. This ensures that your generative AI applications adhere to your organization’s responsible AI policies as well as provide a consistent and safe user experience.
Provenance tracking. Now available in preview, Model Evaluation on Amazon Bedrock helps customers evaluate, compare, and select the best FMs for their specific use case based on custom metrics, such as accuracy and safety, using either automatic or human evaluations. Customers can evaluate AI models in two ways—automatic or with human input. For automatic evaluations, they pick criteria such as accuracy or toxicity, and use their own data or public datasets. For evaluations needing human judgment, customers can easily set up workflows for human review with a few clicks. After setting up, Amazon Bedrock runs the evaluations and provides a report showing how well the model performed on important safety and accuracy measures. This report helps customers choose the best model for their needs, even more important when helping customers are evaluating migrating to a new model in Amazon Bedrock against an existing model for an application.
Watermark detection. All Amazon Titan FMs are built with responsible AI in mind. Amazon Titan Image Generator creates images embedded with imperceptible digital watermarks. The watermark detection for Amazon Titan Image Generator allows you to identify images generated by Amazon Titan Image Generator, a foundation model that allows users to create realistic, studio-quality images in large volumes and at low cost, using natural language prompts. With this feature, you can increase transparency around AI-generated content by mitigating harmful content generation and reducing the spread of misinformation. It also provides a confidence score, allowing you to assess the reliability of the detection, even if the original image has been modified. Simply upload an image in the Amazon Bedrock console, and the API will detect watermarks embedded in images created by Titan Image Generator, including those generated by the base model and any customized versions.
AI Service Cards provide transparency and document the intended use cases and fairness considerations for our AWS AI services. Our latest services cards include Amazon Titan Text Premier and Amazon Titan Text Lite and Titan Text Express with more coming soon.

Aha! is a software company that helps more than 1 million people bring their product strategy to life.

“Our customers depend on us every day to set goals, collect customer feedback, and create visual roadmaps. That is why we use Amazon Bedrock to power many of our generative AI capabilities. Amazon Bedrock provides responsible AI features, which enable us to have full control over our information through its data protection and privacy policies, and block harmful content through Guardrails for Bedrock.”
– Dr. Chris Waters, co-founder and Chief Technology Officer at Aha!

Building trust through transparency
By addressing security, compliance, and responsible AI holistically, Amazon Bedrock helps customers to unlock generative AI’s transformative potential. As generative AI capabilities continue to evolve so rapidly, building trust through transparency is crucial. Amazon Bedrock works continuously to help develop safe and secure applications and practices, helping build generative AI applications responsibly.
The bottom line? Amazon Bedrock makes it effortless for you to unlock sustained growth with generative AI and experience the power of LLMs. Get started today – Build AI applications or customize models securely using your data to start your generative AI journey with confidence.
For more information about generative AI and Amazon Bedrock, explore the following resources:

Announcing new tools and capabilities to enable responsible AI innovation
A secure approach to generative AI with AWS
Securing Generative AI: An Introduction to The Generative AI Scoping Matrix
Architect defense-in-depth security for generative AI applications using the OWASP Top 10 for LLMs
Implementing Knowledge Bases for Amazon Bedrock in support of GDPR (right to be forgotten) requests
CISPE Data Protection Code of Conduct Public Register now has 113 compliant AWS services
Large language model inference over confidential data using AWS Nitro Enclaves
Use AWS PrivateLink to set up private access to Amazon Bedrock
Improve visibility into Amazon Bedrock usage and performance with Amazon CloudWatch
Build safe and responsible generative AI applications with guardrails
Safeguard a generative AI travel agent with prompt engineering and Guardrails for Amazon Bedrock

About the author
Vasi Philomin is VP of Generative AI at AWS. He leads generative AI efforts, including Amazon Bedrock and Amazon Titan.

Build a conversational chatbot using different LLMs within single inte …

With the advent of generative artificial intelligence (AI), foundation models (FMs) can generate content such as answering questions, summarizing text, and providing highlights from the sourced document. However, for model selection, there is a wide choice from model providers, like Amazon, Anthropic, AI21 Labs, Cohere, and Meta, coupled with discrete real-world data formats in PDF, Word, text, CSV, image, audio, or video.
Amazon Bedrock is a fully managed service that makes it straightforward to build and scale generative AI applications. Amazon Bedrock offers a choice of high-performing FMs from leading AI companies, including AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon, through a single API. It enables you to privately customize FMs with your data using techniques such as fine-tuning, prompt engineering, and Retrieval Augmented Generation (RAG), and build agents that run tasks using your enterprise systems and data sources while complying with security and privacy requirements.
In this post, we show you a solution for building a single interface conversational chatbot that allows end-users to choose between different large language models (LLMs) and inference parameters for varied input data formats. The solution uses Amazon Bedrock to create choice and flexibility to improve the user experience and compare the model outputs from different options.
The entire code base is available in GitHub, along with an AWS CloudFormation template.
What is RAG
Retrieval Augmented Generation (RAG) can enhance the generation process by using the benefits of retrieval, enabling a natural language generation model to produce more informed and contextually appropriate responses. By incorporating relevant information from retrieval into the generation process, RAG aims to improve the accuracy, coherence, and informativeness of the generated content.
Implementing an effective RAG system requires several key components working in harmony:

Foundation models – The foundation of a RAG architecture is a pre-trained language model that handles text generation. Amazon Bedrock encompasses models from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, and Amazon that possess strong language comprehension and synthesis abilities to engage in conversational dialogue.
Vector store – At the heart of the retrieval functionality is a vector store database persisting document embeddings for similarity search. This allows rapid identification of relevant contextual information. AWS offers many services for your vector database requirements:

Amazon OpenSearch Service
Amazon Aurora PostgreSQL-Compatible Edition and Amazon Relational Database Service (Amazon RDS) for PostgreSQL
Amazon DocumentDB (with MongoDB compatibility)
Amazon Neptune ML
Vector search for Amazon MemoryDB for Redis

Retriever – The retriever module uses the vector store to efficiently find pertinent documents and passages to augment prompts.
Embedder – To populate the vector store, an embedding model encodes source documents into vector representations consumable by the retriever. Models like Amazon Titan Embeddings G1 – Text v1.2 are ideal for this text-to-vector abstraction.
Document ingestion – Robust pipelines ingest, preprocess, and tokenize source documents, chunking them into manageable passages for embedding and efficient lookup. For this solution, we use the LangChain framework for document preprocessing. By orchestrating these core components using LangChain, RAG systems empower language models to access vast knowledge for grounded generation.

We have fully managed support for our end-to-end RAG workflow using Knowledge Bases for Amazon Bedrock. With Knowledge Bases for Amazon Bedrock, you can give FMs and agents contextual information from your company’s private data sources for RAG to deliver more relevant, accurate, and customized responses.
To equip FMs with up-to-date and proprietary information, organizations use RAG to fetch data from company data sources and enrich the prompt to provide more relevant and accurate responses. Knowledge Bases for Amazon Bedrock is a fully managed capability that helps you implement the entire RAG workflow, from ingestion to retrieval and prompt augmentation, without having to build custom integrations to data sources and manage data flows. Session context management is built in, so your app can readily support multi-turn conversations.
Solution overview
This chatbot is built using RAG, enabling it to provide versatile conversational abilities. The following figure illustrates a sample UI of the Q&A interface using Streamlit and the workflow.

This post provides a single UI with multiple choices for the following capabilities:

Leading FMs available through Amazon Bedrock
Inference parameters for each of these models
Source data input formats for RAG:

Text (PDF, CSV, Word)
Website link
YouTube video
Scanned image

RAG operation using the LLM, inference parameter, and sources:

Summary: summarize, get highlights, extract text

We have used one of LangChain’s many document loaders, YouTubeLoader. The from_you_tube_url function helps extract transcripts and metadata from the YouTube video.
The documents contain two attributes:

page_content with the transcripts
metadata with basic information about the video

Text is extracted from the transcript and using Langchain TextLoader, the document is split and chunked, and embeddings are created, which are then stored in the vector store.
The following diagram illustrates the solution architecture.

To implement this solution, you should have the following prerequisites:

An AWS account with the required permissions to launch the stack using AWS CloudFormation.
Amazon Elastic Compute Cloud (Amazon EC2) hosting the application should have internet access so as to download all the necessary OS patches and application related (python) libraries
A basic understanding of Amazon Bedrock and FMs.
This solution uses the Amazon Titan Text Embedding model. Make sure this model is enabled for use in Amazon Bedrock. On the Amazon Bedrock console, choose Model access in the navigation pane.

If Amazon Titan Text Embeddings is enabled, the access status will state Access granted.
If the model is not available, enable access to the model by choosing Manage model access, selecting Titan Multimodal Embeddings G1, and choosing Request model access. The model is enabled for use immediately.

Deploy the solution
The CloudFormation template deploys an Amazon Elastic Compute Cloud (Amazon EC2) instance to host the Streamlit application, along with other associated resources like an AWS Identity and Access Management (IAM) role and Amazon Simple Storage Service (Amazon S3) bucket. For more information about Amazon Bedrock and IAM, refer to How Amazon Bedrock Works with IAM.
In this post, we deploy the Streamlit application over an EC2 instance inside a VPC, but you can deploy it as a containerized application using a serverless solution with AWS Fargate. We discuss this in more detail in Part 2.
Complete the following steps to deploy the solution resources using AWS CloudFormation:

Download the CloudFormation template StreamlitAppServer_Cfn.yml from the GitHub repo.
On the AWS CloudFormation, create a new stack.
For Prepare template, select Template is ready.
In the Specify template section, provide the following information:

For Template source, select Upload a template file.
Choose file and upload the template you downloaded.

Choose Next.

For Stack name, enter a name (for this post, StreamlitAppServer).
In the Parameters section, provide the following information:

For Specify the VPC ID where you want your app server deployed, enter the VPC ID where you want to deploy this application server.
For VPCCidr, enter the CIDR of the VPC you’re using.
For SubnetID, enter the subnet ID from the same VPC.
For MYIPCidr, enter the IP address of your computer or workstation so you can open the Streamlit application in your local browser.

You can run the command curl on your local terminal to get your IP address.

Leave the rest of the parameters as defaulted.
Choose Next.
In the Capabilities section, select the acknowledgement check box.
Choose Submit.

Wait until you see the stack status show as CREATE_COMPLETE.

Choose the stack’s Resources tab to see the resources you launched as part of the stack deployment.

Choose the link for S3Bucket to be redirected to the Amazon S3 console.

Note the S3 bucket name to update the deployment script later.
Choose Create folder to create a new folder.
For Folder name, enter a name (for this post, gen-ai-qa).

Make sure to follow AWS security best practices for securing data in Amazon S3. For more details, see Top 10 security best practices for securing data in Amazon S3.

Return to the stack Resources tab and choose the link to StreamlitAppServer to be redirected to the Amazon EC2 console.

Select StreamlitApp_Sever and choose Connect.

This will open a new page with various ways to connect to the EC2 instance launched.

For this solution, select Connect using EC2 Instance Connect, then choose Connect.

This will open an Amazon EC2 session in your browser.

Run the following command to monitor the progress of all the Python-related libraries being installed as part of the user data:

tail -f /tmp/userData.log

When you see the message Finished running user data…, you can exit the session by pressing Ctrl + C.

This takes about 15 minutes to complete.

Run the following commands to start the application:

cd $HOME/bedrock-qnachatbot
bucket_name=$(aws cloudformation describe-stacks –stack-name StreamlitAppServer –query “Stacks[0].Outputs[?starts_with(OutputKey, ‘BucketName’)].OutputValue” –output text)
TOKEN=$(curl -s -X PUT “” -H “X-aws-ec2-metadata-token-ttl-seconds: 21600”)
aws_region_name=$(curl -s -H “X-aws-ec2-metadata-token: $TOKEN”)
sed -i “s/<S3_Bucket_Name>/${bucket_name}/g” $HOME/bedrock-qnachatbot/src/
sed -i “s/<AWS_Region>/${aws_region_name}/g” $HOME/bedrock-qnachatbot/src/
export AWS_DEFAULT_REGION=${aws_region_name}
streamlit run src/1_🏠

Make a note of the External URL value.
If by any chance you exit of the session (or application is stopped), you can restart the application by running the same command as highlighted in Step # 18

Use the chatbot
Use the external URL you copied in the previous step to access the application.

You can upload your file to start using the chatbot for Q&A.

Clean up
To avoid incurring future charges, delete the resources that you created:

Empty the contents of the S3 bucket you created as a part of this post.
Delete the CloudFormation stack you created as part of this post.

In this post, we showed you how to create a Q&A chatbot that can answer questions across an enterprise’s corpus of documents with choices of FM available within Amazon Bedrock—within a single interface.
In Part 2, we show you how to use Knowledge Bases for Amazon Bedrock with enterprise-grade vector databases like OpenSearch Service, Amazon Aurora PostgreSQL, MongoDB Atlas, Weaviate, and Pinecone with your Q&A chatbot.

About the Authors
Anand Mandilwar is an Enterprise Solutions Architect at AWS. He works with enterprise customers helping customers innovate and transform their business in AWS. He is passionate about automation around Cloud operation , Infrastructure provisioning and Cloud Optimization. He also likes python programming. In his spare time, he enjoys honing his photography skill especially in Portrait and landscape area.
NagaBharathi Challa is a solutions architect in the US federal civilian team at Amazon Web Services (AWS). She works closely with customers to effectively use AWS services for their mission use cases, providing architectural best practices and guidance on a wide range of services. Outside of work, she enjoys spending time with family & spreading the power of meditation.

EAGLE-2: An Efficient and Lossless Speculative Sampling Method Achievi …

Large language models (LLMs) have significantly advanced the field of natural language processing (NLP). These models, renowned for their ability to generate and understand human language, are applied in various domains such as chatbots, translation services, and content creation. Continuous development in this field aims to enhance the efficiency and effectiveness of these models, making them more responsive and accurate for real-time applications.

A major challenge LLMs face is the substantial computational cost and time required for inference. As these models increase, generating each token during autoregressive tasks becomes slower, impeding real-time applications. Addressing this issue is crucial to improving applications’ performance and user experience relying on LLMs, particularly when quick responses are essential.

Current methods to alleviate this issue include speculative sampling techniques, which generate and verify tokens in parallel to reduce latency. Traditional speculative sampling methods often rely on static draft trees that do not account for context, leading to inefficiencies and suboptimal acceptance rates of draft tokens. These methods aim to reduce inference time but still face limitations in performance.

Researchers from Peking University, Microsoft Research, the University of Waterloo and Vector Institute introduced EAGLE-2, a method leveraging a context-aware dynamic draft tree to enhance speculative sampling. EAGLE-2 builds upon the previous EAGLE method, offering significant improvements in speed while maintaining the quality of generated text. This method dynamically adjusts the draft tree based on context, using confidence scores from the draft model to approximate acceptance rates.

EAGLE-2 dynamically adjusts the draft tree based on context, enhancing speculative sampling. Its methodology includes two main phases: expansion and reranking. The process begins with the expansion phase, where the draft model inputs the most promising nodes from the latest layer of the draft tree to form the next layer. Confidence scores from the draft model approximate acceptance rates, allowing efficient prediction and verification of tokens. During the reranking phase, tokens with higher acceptance probabilities are selected for the original LLM’s input during verification. This two-phase approach ensures the draft tree adapts to the context, significantly improving token acceptance rates and overall efficiency. This method eliminates the need for multiple forward passes, thus accelerating the inference process without compromising the quality of the generated text.

The proposed method showed remarkable results. For instance, in multi-turn conversations, EAGLE-2 achieved a speedup of approximately 4.26x, while in code generation tasks, it reached up to 5x. The average number of tokens generated per drafting-verification cycle was significantly higher than other methods, roughly twice that of standard speculative sampling. This performance boost makes EAGLE-2 a valuable tool for real-time NLP applications.

Performance evaluations also show that EAGLE-2 achieves speedup ratios between 3.05x and 4.26x across various tasks and LLMs, outperforming the previous EAGLE method by 20%-40%.  It maintains the distribution of the generated text, ensuring no loss in the output quality despite the increased speed. EAGLE-2 demonstrated the best performance in extensive tests across six tasks and three series of LLMs, confirming its robustness and efficiency.

In conclusion, EAGLE-2 effectively addresses computational inefficiencies in LLM inference by introducing a context-aware dynamic draft tree. This method offers a substantial performance boost without compromising the quality of the generated text, making it a significant advancement in NLP. Future research and applications should consider integrating dynamic context adjustments to enhance the performance of LLMs further.

Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. 

Join our Telegram Channel and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 45k+ ML SubReddit

Create, edit, and augment tabular data with the first compound AI system, Gretel Navigator, now generally available! [Advertisement]
The post EAGLE-2: An Efficient and Lossless Speculative Sampling Method Achieving Speedup Ratios 3.05x – 4.26x which is 20% – 40% Faster than EAGLE-1 appeared first on MarkTechPost.

A New Machine Learning Research from UCLA Uncovers Unexpected Irregula …

Recent language models like GPT-3+ have shown remarkable performance improvements by simply predicting the next word in a sequence, using larger training datasets and increased model capacity. A key feature of these transformer-based models is in-context learning, which allows the model to learn tasks by conditioning a series of examples without explicit training. However, the working mechanism of in-context learning is still partially understood. Researchers have explored the factors affecting in-context learning, where it was found that accurate examples are not always necessary to be effective, whereas, the structure of the prompts, the model’s size, and the order of examples significantly impact the results.

This paper explores three existing methods of in-context learning in transformers and large language models (LLMs) by conducting a series of binary classification tasks (BCTs) under varying conditions. The first method focuses on the theoretical understanding of in-context learning, aiming to link it with gradient descent (GD). The second method is the practical understanding, which looks at how in-context learning works in LLMs, considering factors like the label space, input text distribution, and overall sequence format. The final method is learning to learn in-context. To enable in-context learning, MetaICL is utilized, which is a meta-training framework for finetuning pre-trained LLMs on a large and diverse collection of tasks.

Researchers from the Department of Computer Science at the University of California, Los Angeles (UCLA) have introduced a new perspective by viewing in-context learning in LLMs as a unique machine learning algorithm. This conceptual framework allows traditional machine learning tools to analyze decision boundaries in binary classification tasks. Many invaluable insights are achieved for the performance and behavior of in-context learning by visualizing these decision boundaries in linear and non-linear settings. This approach explores the generalization capabilities of LLMs, providing a distinct perspective on the strength of their in-context learning performance.

Experiments carried out by researchers mostly focused on solving these questions:

How do existing pre-trained LLMs perform on BCTs? 

How do different factors influence the decision boundaries of these models? 

How can we improve the smoothness of decision boundaries?

The decision boundary of LLMs was explored for classification tasks by prompting them with n in-context examples of BCTs, with an equal number of examples for each class. Using scikit-learn, three types of datasets were created to represent different shapes of decision boundaries such as linear, circular, and moon-shaped. Moreover, various LLMs were explored, ranging from 1.3B to 13B parameters, including open-source models like Llama2-7B, Llama3-8B, Llama2-13B, Mistral-7B-v0.1, and sheared-Llama-1.3B, to understand their decision boundaries.

Results of the experiments demonstrated that finetuning LLMs on in-context examples does not result in smoother decision boundaries. For instance, when the Llama3-8B on 128 in-context learning examples was fine-tuned, the resulting decision boundaries remained non-smooth. So, to improve the decision boundary smoothness of LLMs on a Dataset of Classification Tasks, a pre-trained Llama model was fine-tuned on a set of 1000 binary classification tasks generated from scikit-learn, which featured decision boundaries that were linear, circular, or moon-shaped, with equal probabilities. 

In conclusion, the research team has proposed a novel method to understand in-context learning in LLMs by examining their decision boundaries in in-context learning in BCTs. Despite obtaining high test accuracy, it was found that the decision boundaries of LLMs are often non-smooth. So, factors that affect this decision boundary were identified through experiments. Further, fine-tuning and adaptive sampling methods were also explored, which proved effective in improving the smoothness of the boundaries. In the future, these findings will provide new insights into the mechanics of in-context learning and suggest pathways for research and optimization. 

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. 

Join our Telegram Channel and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 45k+ ML SubReddit

Create, edit, and augment tabular data with the first compound AI system, Gretel Navigator, now generally available! [Advertisement]
The post A New Machine Learning Research from UCLA Uncovers Unexpected Irregularities and Non-Smoothness in LLMs’ In-Context Decision Boundaries appeared first on MarkTechPost.

EvolutionaryScale Introduces ESM3: A Frontier Multimodal Generative La …

Over more than three billion years, natural evolution has intricately shaped the proteins we see today. Through countless random mutations and selective pressures, nature has crafted these proteins, reflecting the deep biological principles that govern life. Modern gene sequencing unravels the immense diversity of these protein sequences and structures, revealing patterns shaped by evolutionary forces. Researchers are increasingly using large language models to decode this ‘protein language,’ discovering that these models, even without specific training on biological functions, can naturally learn to represent protein structures and functions, with their capabilities expanding significantly as they scale up in complexity and data.

Researchers from Evolutionary Scale PBC, Arc Institute, and the University of California have developed ESM3, an advanced generative language model for proteins. ESM3 can simulate evolutionary processes to create functional proteins vastly different from known ones. It integrates sequence, structure, and function to generate proteins following complex prompts. Notably, ESM3 generated a new fluorescent protein, esmGFP, which is 58% different from any known fluorescent proteins—a degree of difference comparable to 500 million years of natural evolution. This breakthrough demonstrates ESM3’s potential in protein engineering, offering creative solutions to biological challenges.

ESM3 is a sophisticated generative language model designed to understand and predict proteins’ sequence, structure, and function using tokenized data. It employs a masked language modeling approach to predict masked portions of protein data across various masking rates. ESM3 integrates sequence, structure, and function into a unified latent space and processes these modalities through transformer blocks with geometric attention. Trained on vast datasets, including 2.78 billion proteins and 236 million structures, ESM3 scales up to 98 billion parameters. Its tokenization method efficiently captures atomic details, enabling high accuracy in generating and reconstructing protein structures.

ESM3, a language model with up to 98 billion parameters, effectively predicts and generates protein sequences, structures, and functions. It processes these aspects through transformer blocks with geometric attention, training on a vast natural and synthetic protein dataset. ESM3’s generative capabilities allow it to create diverse, high-quality proteins that differ significantly from known natural proteins. It excels at following prompts from various inputs, like sequence or structural details, and can innovate within these constraints, producing novel protein designs. This versatility facilitates advanced, programmable protein design and exploration beyond natural evolutionary patterns.

Scaling and fine-tuning ESM3 models significantly enhance their ability to generate proteins that align with complex prompts, such as specific atomic coordination and structural motifs. Although the base models, trained on extensive protein datasets, perform well, fine-tuning with preference data—pairing high and low-quality outputs—reveals latent capabilities. This alignment, especially in larger models, doubles the success rate in generating accurate protein structures and increases the diversity of successful solutions. The process demonstrates that larger models have a greater inherent ability to adapt to challenging tasks, showing improved performance when aligned with specific objectives.

ESM3, a language model trained on protein sequences, generated a green fluorescent protein (GFP) with minimal similarity to existing ones. By prompting the model with critical residues and structures necessary for GFP function, ESM3 created thousands of potential designs. From these, a unique fluorescent protein, esmGFP, was identified, which differed significantly from known proteins and exhibited natural GFP-like fluorescence. This process mirrors evolutionary paths, suggesting ESM3 can explore protein spaces that evolution hasn’t, effectively simulating millions of years of evolutionary potential in generating new functional proteins.

Check out the Paper and Details. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. 

Join our Telegram Channel and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 45k+ ML SubReddit

Create, edit, and augment tabular data with the first compound AI system, Gretel Navigator, now generally available! [Advertisement]
The post EvolutionaryScale Introduces ESM3: A Frontier Multimodal Generative Language Model that Reasons Over the Sequence, Structure, and Function of Proteins appeared first on MarkTechPost.

Automate derivative confirms processing using AWS AI services for the …

Capital markets operation teams face numerous challenges throughout the post-trade lifecycle, including delays in trade settlements, booking errors, and inaccurate regulatory reporting. For derivative trades, it’s even more challenging. The timely settlement of derivative trades is an onerous task. This is because trades involve different counterparties and there is a high degree of variation among documents containing commercial terms (such as trade date, value date, and counterparties). We commonly see the application of screen scrapping solutions with OCR in capital market organizations. These applications come with the drawback of being inflexible and high-maintenance.
Artificial intelligence and machine learning (AI/ML) technologies can assist capital market organizations overcome these challenges. Intelligent document processing (IDP) applies AI/ML techniques to automate data extraction from documents. Using IDP can reduce or eliminate the requirement for time-consuming human reviews. IDP has the power to transform the way capital market back-office operations work. It has the potential to boost employee efficiency, enhance cash flow by speeding up trade settlements, and minimize operational and regulatory risks.
In this post, we show how you can automate and intelligently process derivative confirms at scale using AWS AI services. The solution combines Amazon Textract, a fully managed ML service to effortlessly extract text, handwriting, and data from scanned documents, and AWS Serverless technologies, a suite of fully managed event-driven services for running code, managing data, and integrating applications, all without managing servers.
Solution overview
The lifecycle of a derivative trade involves multiple phases, from trade research to execution, to clearing and settlement. The solution showcased in this post focuses on the trade clearing and settlement phase of the derivative trade lifecycle. During this phase, counterparties to the trade and their agents determine and verify the exact commercial terms of the transaction and prepare for settlement.
The following figure shows a sample derivative confirms the document.

We built the solution using the event-driven principles as depicted in the following diagram. The derivative confirmation documents received from customers are stored in Amazon Simple Storage Service (Amazon S3). An event notification on S3 object upload completion places a message in an Amazon Simple Queue Service (Amazon SQS) queue to invoke an AWS Lambda function. The function invokes the Amazon Textract API and performs a fuzzy match using the document schema mappings stored in Amazon DynamoDB. A web-based human-in-the-loop UI is built for reviewing the document processing pipeline and updating schemas to train services for new formats. The web UI uses Amazon Cognito for authentication and access control.

The process flow includes the following steps:

The user or business application uploads an image or PDF to the designated S3 bucket.
An event notification on S3 object upload completion places a message in an SQS queue.
An event on message receipt invokes a Lambda function that in turn invokes the Amazon Textract StartDocumentAnalysis API for information extraction.

This call starts an asynchronous analysis of the document for detecting items within the document such as key-value pairs, tables, and forms.
The call also returns the ID of the asynchronous job, and saves the job ID and Amazon S3 document key to a DynamoDB table.

Upon job completion, Amazon Textract sends a message to an Amazon Simple Notification Service (Amazon SNS) topic and places the resultant JSON in the designated S3 bucket for classification analysis.
A Lambda function receives the Amazon SQS payload and performs fuzzy match using Sorenson-Dice analysis between the Amazon Textract JSON results and DynamoDB document configuration mappings. The Sorenson-Dice analysis step compares the two texts and computes a number between 0–1, where the former indicates no match at all and the latter an exact match.
Upon analysis completion, a Lambda function writes a merged and cleansed JSON result to the original S3 bucket and inserts the analysis results back into the DynamoDB table.
Amazon API Gateway endpoints facilitate the interaction with the web-based UI.
The human-in-the-loop UI application provides a human-in-the-loop function to analyze the document processing pipeline and intervene as needed to update the document configuration mappings.

A human-in the-loop process was applied to visually compare the reconciled results with their locations in the input documents. End-users can verify the accuracy of the results and either accept or reject the findings. When new counterparties and formats are introduced, ML learning helps the users create new schema mappings in the human-in-the-loop UI for further processing.
What is human-in-the-loop?
A human-in-the-loop process combines supervised ML with human involvement in training and testing an algorithm. This practice of uniting human and machine intelligence creates an iterative feedback loop that allows the algorithm to produce better results.
You can apply human-in-the-loop to all types of deep learning AI projects, including natural language processing (NLP), computer vision, and transcription. Additionally, you can use human-in-the-loop in conjunction with AI content moderation systems to quickly and effectively analyze user-generated content. We refer this to as human-in-the-loop decision-making, where content is flagged by the AI and human moderators review what has been flagged.
The harmonious relationship between people and AI has several benefits, including:

Accuracy – In the context of document processing, there are limitations to how much of the analysis can be automated. AI can miss content that should be flagged (a false positive), and they can also incorrectly flag content that may be harmless (a false negative). Humans are essential in the content moderation process because they can interpret things such as context and multilingual text.
Increased efficiency – Machine intelligence can save significant time and cost by sifting through and trimming down large amounts of data. The task can then be passed on to humans to complete a final sort. Although you can’t automate the entirety of the process, you can automate a significant portion, saving time.

Looking forward: The art of the possible
Amazon Textract is an AWS service that uses ML to automatically extract text, handwriting, and data from any document.
Amazon Textract can extract information from a large variety of documents, including scanned paper records, forms, IDs, invoices, reports, certificates, legal documents, letters, bank statements, tables, handwritten notes, and more. Supported formats include common file types like PNG, JPEG, PDF, and TIFF. For formats like Word or Excel, you can convert them into images before sending them to Amazon Textract. The content is extracted within seconds and then indexed for search through a simple-to-use API.
The Queries feature within the Amazon Textract Analyze Document API provides you the flexibility to specify the data you need to extract from documents. Queries extract information from a variety of documents, like paystubs, vaccination cards, mortgage notes, and insurance cards. You don’t need to know the data structure in the document (table, form, nested data) or worry about variations across document versions and formats. The flexibility that Queries provides reduces the need to implement postprocessing and reliance on manual review of extracted data.
The automation of derivatives confirmation boosts the capacity of the operations team by saving processing time. In this post, we showcased common challenges in derivatives confirms processing and how can you use AWS intelligent document processing services to overcome them. The big part of capital markets’ back-office operations involves documents processing. The approach showed in this post sets a pattern for many back-office documents processing use cases, benefiting the capital markets industry in reducing costs and enhancing staff productivity.
We recommend a thorough review of Security in Amazon Textract and strict adherence to the guidelines provided. To learn more about the pricing of the solution, review the pricing details of Amazon Textract, Lambda, and Amazon S3.

“Using Amazon Textract and Serverless services, we have been able to build an end-to-end digital workflow for derivatives processing. We are expecting straight-through processing rates to increase to over 90%, reducing operational risks and costs associated with manual interventions. This automation provides the resilience and flexibility required to adapt to evolving market structures like T+1 settlement timeframes.”
– Stephen Kim, CIO, Head of Corporate Technology, Jefferies

About the Authors
Vipul Parekh, is a senior customer solutions manager at AWS guiding our Capital Markets customers in accelerating their business transformation journey on Cloud. He is a GenAI ambassador and a member of AWS AI/ML technical field community. Prior to AWS, Vipul played various roles at the top investment banks, leading transformations spanning from front office to back-office, and regulatory compliance areas.
Raj Talasila, is a senior technical program manager at AWS. He comes to AWS with 30+ years of experience in the Financial Services, Media and Entertainment, and CPG.
Saby Sahoo, is a senior solutions architect at AWS. Saby has 20+ years of experience in the field of design and implementation of IT Solutions, Data Analytics, and AI/ML/GenAI.
Sovik Kumar Nath is an AI/ML solution architect with AWS. He has extensive experience designing end-to-end machine learning and business analytics solutions in finance, operations, marketing, healthcare, supply chain management, and IoT. Sovik has published articles and holds a patent in ML model monitoring. He has double masters degrees from the University of South Florida, University of Fribourg, Switzerland, and a bachelors degree from the Indian Institute of Technology, Kharagpur. Outside of work, Sovik enjoys traveling, taking ferry rides, and watching movies.

AI-powered assistants for investment research with multi-modal data: A …

This post is a follow-up to Generative AI and multi-modal agents in AWS: The key to unlocking new value in financial markets. This blog is part of the series, Generative AI and AI/ML in Capital Markets and Financial Services.
Financial analysts and research analysts in capital markets distill business insights from financial and non-financial data, such as public filings, earnings call recordings, market research publications, and economic reports, using a variety of tools for data mining. They face many challenges because of the increasing variety of tools and amount of data. They must synthesize massive amounts of data from multiple sources, qualitative and quantitative, to provide insights and recommendations. Analysts need to learn new tools and even some programming languages such as SQL (with different variations). To add to these challenges, they must think critically under time pressure and perform their tasks quickly to keep up with the pace of the market.
Investment research is the cornerstone of successful investing, and involves gathering and analyzing relevant information about potential investment opportunities. Through thorough research, analysts come up with a hypothesis, test the hypothesis with data, and understand the effect before portfolio managers make decisions on investments as well as mitigate risks associated with their investments. Artificial intelligence (AI)-powered assistants can boost the productivity of a financial analysts, research analysts, and quantitative trading in capital markets by automating many of the tasks, freeing them to focus on high-value creative work. AI-powered assistants can amplify an analyst’s productivity by searching for relevant information in the customer’s own database as well as online, conducting qualitative and quantitative analysis on structured and unstructured data, enabling analysts to work faster and with greater accuracy.
In this post, we introduce a solution using Agents for Amazon Bedrock and Knowledge Bases for Amazon Bedrock that can help financial analysts use various data sources of multifaceted financial data (text, audio, and databases) and various tools (detect phrases, portfolio optimization, sentiment analysis, and stock query) to gather financial insights. The interaction shows how AI-powered assistants recognize and plan based on user’s prompts, come up with steps to retrieve context from data stores, and pass through various tools and LLM to arrive at a response.
AI-powered assistants for investment research

So, what are AI-powered assistants? AI-powered assistants are advanced AI systems, powered by generative AI and large language models (LLMs), which use AI technologies to understand goals from natural language prompts, create plans and tasks, complete these tasks, and orchestrate the results from the tasks to reach the goal. Generative AI agents, which form the backbone of AI-powered assistants, can orchestrate interactions between foundation models, data sources, software applications, and users. As AI technology advances, the abilities of generative AI agents are expected to grow, providing more opportunities to gain a competitive advantage.
Leading this evolution is Amazon Bedrock, a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon using a single API, along with a broad set of capabilities to build and scale generative AI applications with security, privacy, and responsible AI.
You can now use Agents for Amazon Bedrock and Knowledge Bases for Amazon Bedrock to build specialized agents and AI-powered assistants that run actions based on natural language input prompts and your organization’s data. These managed agents act as intelligent orchestrators, coordinating interactions between foundation models, API integrations, user questions and instructions, and knowledge sources loaded with your proprietary data. At runtime, the agent intelligently handles and orchestrates the user inputs throughout a dynamic number of steps.
The following video demonstrates an AI-powered assistant in Agents for Amazon Bedrock in action.

Solution overview
A key component of an AI-powered assistant is Agents for Amazon Bedrock. An agent consists of the following components:

Foundation model – The agent invokes an FM to interpret user input, generate subsequent prompts in its orchestration process, and generate responses.
Instructions – Instructions telling the agent what it’s designed to do and how to do it.
Action groups – Action groups are interfaces that an agent uses to interact with the different underlying components such as APIs and databases. An agent uses action groups to carry out actions, such as making an API call to another tool.
Knowledge base – The knowledge base is a link to an existing knowledge base, consisting of customer’s documents (such as PDF files and text files) that allows the agent to query for extra context for the prompts.

Both the action groups and knowledge base are optional and not required for the agent itself.
In this post, an AI-powered assistant for investment research can use both structured and unstructured data for providing context to the LLM using a Retrieval Augmented Generation (RAG) architecture, as illustrated in the following diagram.

For the AI-powered assistant, the following the action groups are associated:

Detect-phrases – Useful for when you need to detect key phrases in financial reports
Portfolio-optimization – Useful for when you need to build an optimal allocation portfolio from a list of stock symbols using python functions
Sentiment-analysis – Useful for when you need to analyze the sentiment of an excerpt from a financial report
Stock-query – Useful for when you need to answer any question about historical stock prices

Depending on the prompts, the AI-powered assistant for investment research uses different types of structured and unstructured data. The agent can find insights from different modalities of financial data:

Unstructured data – This includes annual 10K and quarterly 10Q earnings reports, which are converted into vectors using Amazon Titan Embeddings models and stored as vectors in an Amazon OpenSearch Serverless vector database, all orchestrated using a knowledge base
Structured data – This includes tabular stock data, which is stored in Amazon Simple Storage Service (Amazon S3) and queried using Amazon Athena
Other data modalities – This includes audio files of quarterly earnings calls, which are converted into unstructured data using Amazon Textract and Amazon Transcribe

When the AI-powered assistant receives a prompt from a business user, it follows a number of steps as part its orchestration:

Break down the prompt into a number of steps using an LLM within Amazon Bedrock.
Follow chain-of-thought reasoning and instructions, and complete the steps using appropriate action groups.
As part of the process, depending on the prompt, search and identify relevant context for RAG.
Pass the results with the prompt to an LLM within Amazon Bedrock.
Generate the final response and respond to the user in English with relevant data.

The following diagram illustrates this workflow.

Technical architecture and key steps
The multi-modal agent orchestrates various steps based on natural language prompts from business users to generate insights. For unstructured data, the agent uses AWS Lambda functions with AI services such as Amazon Comprehend for natural language processing (NLP). For structured data, the agent uses the SQL Connector and SQLAlchemy to analyze the database through Athena. The agent also uses the selected LLM for computations and quantitative modeling, and the context session equips the agent with conversation history. The multi-modal agent is implemented using Agents for Amazon Bedrock and coordinates the different actions and knowledge bases based on prompts from business users through the AWS Management Console, although it can also be invoked through the AWS API.
The following diagram illustrates the technical architecture.

The key components of the technical architecture are as follows:

Data storage and analytics – The quarterly financial earning recordings as audio files, financial annual reports as PDF files, and S&P stock data as CSV files are hosted on Amazon S3. Data exploration on stock data is done using Athena.
Large language models – The LLMs available to be used by Agents for Amazon Bedrock are Anthropic Claude Instant v1, v2.0, and v2.1.
Agents – We use Agents for Amazon Bedrock to build and configure autonomous agents. Agents orchestrate interactions between FMs, data sources, software applications, and user conversations. Depending on the user input, the agent decides the action or knowledge base to call to answer the question. We created the following purpose-built agent actions using Lambda and Agents for Amazon Bedrock for our scenario:

Stocks querying – To query S&P stocks data using Athena and SQLAlchemy.
Portfolio optimization – To build a portfolio based on the chosen stocks.
Sentiment analysis – To identify and score sentiments on a topic using Amazon Comprehend.
Detect phrases – To find key phrases in recent quarterly reports using Amazon Comprehend.

Knowledge base – To search for financial earnings information stored in multi-page PDF files, we use a knowledge base (using an OpenSearch Serverless vector store).

To dive deeper into the solution and code for all the steps, see the GitHub repo.
Benefits and lessons learned in migrating from LangChain agents to Agents for Amazon Bedrock
Agents for Amazon Bedrock and LangChain agents both use an LLM to interpret user input and prompts in their orchestration processes. The LLM acts as a reasoning engine to determine next actions. Agents for Amazon Bedrock offers several benefits when implementing an agent-based solution.


Agents for Amazon Bedrock is serverless, meaning you can build agents without managing any infrastructure.

Conversation history and session management

By default, LangChain agents are stateless, meaning they don’t remember previous interactions or keep history of the conversation. It supports either a simple memory system that recalls the most recent conversations or complex memory structures that analyze historical messages to return the most relevant results. In our previous post, we deployed a persistent storage solution using Amazon DynamoDB.
Agents for Amazon Bedrock provides a short-term memory for conversations by default, allowing the user to interact with the agent continuously during the session.

RAG support

Knowledge Bases for Amazon Bedrock provides an out-of-the-box RAG solution. It enables a faster time-to-market by abstracting the heavy lifting of building a pipeline and offers a persistent solution for keeping large data as vector embeddings in vector databases, thereby reducing latency to RAG systems.
A knowledge base simplifies the setup and implementation of RAG by automating several steps in this process:

Preprocessing data – Split the documents into manageable chunks for efficient retrieval. The chunks are then converted to embeddings and written to a vector index while maintaining a mapping to the original document.
Runtime processing – Embed user queries into vectors. Compare vector embeddings of user queries and document chunks to find semantically similar matches. Augment user prompts with context from matched chunks.

Knowledge Bases for Amazon Bedrock supports popular databases for vector storage, including the vector engine for OpenSearch Serverless, Pinecone, Redis Enterprise Cloud, Amazon Aurora (coming soon), and MongoDB (coming soon).


Most functions (tools) from our previous multi-modal agent can be migrated to Amazon Bedrock using action groups. Action groups define agent actions by providing an OpenAPI schema to define invocable APIs, as well as a Lambda function specifying input and output. Lambda natively supports Java, Go, PowerShell, Node.js, C#, Python, and Ruby code. LangChain’s supported languages do not include PowerShell and Node.js.

Simple prompt

A key element to get optimal results in our LangChain agent was using a good and clear prompt. In our previous multi-modal agent, we used the following prompt:
You are a Minimization Solutionist with a set of tools at your disposal. You would be presented with a problem. First understand the problem and devise a plan to solve the problem. Please output the plan starting with the header ‘Plan:’ and then followed by a numbered list of steps. Ensure the plan has the minimum amount of steps needed to solve the problem. Do not include unnecessary steps. <instructions> These are guidance on when to use a tool to solve a task, follow them strictly: 1. For the tool that specifically focuses on stock price data, use “Stock Query Tool”. 2…… </instructions>nnAssistant:”””
The prompt provided detailed information to give the agent as much guidance as possible to respond to a question.
With Agents for Amazon Bedrock, we used simple instructions for the agent to obtain the same results. With a shorter prompt (“You are a financial analyst with a set of tools at your disposal”), we were able to answer the same questions with the same quality.

Editability of base prompts

Agents for Amazon Bedrock also exposes the four default base prompt templates that are used during the preprocessing, orchestration, knowledge base response generation, and postprocessing. You can optionally edit these base prompt templates to customize your agent’s behavior at each step of its sequence.


Each response from an Amazon Bedrock agent is accompanied by a trace that details the steps being orchestrated by the agent. The trace provides information about the inputs to the action groups that the agent invokes and the knowledge bases that it queries to respond to the user. In addition, the trace provides information about the outputs that the action groups and knowledge bases return.


You can securely connect LLMs to your company data sources using Agents for Amazon Bedrock. With a knowledge base, you can use agents to give LLMs in Amazon Bedrock access to additional data that helps the model generate more relevant, context-specific, and accurate responses without continually retraining the LLM.
Dive deeper into the solution
To dive deeper into the solution and the code shown in this post, see the GitHub repo. The repo contains instructions for the end-to-end solution, including setting up the agents, associated action groups, unstructured data (earnings reports PDF files, earnings call audio files), and structured data (stocks time series).
In the appendix at the end of this post, we share different questions asked by a financial analyst, the agent tools invoked, and the answer from the multi-modal agent.
Clean up
After you run the multi-modal agent, make sure to clean up any resources manually that won’t be used later:

Delete your agent and the knowledge base associated to your agent.
Delete the vector index containing the data embeddings.
Delete the S3 buckets created by AWS CloudFormation and then delete the CloudFormation stack.

The solution presented in this post uses Agents for Amazon Bedrock and Knowledge Bases for Amazon Bedrock to assist financial analysts in navigating the complexities of multifaceted financial data. By seamlessly integrating various data sources, including text, audio, and databases, this AI-powered assistant can effectively plan and complete tasks based on user prompts—retrieving relevant information, processing it through various tools, and ultimately providing insightful conclusions. Agents for Amazon Bedrock and LangChain agents both use LLMs for interpreting user input and determining subsequent actions, but Agents for Amazon Bedrock offers several advantages, including serverless deployment, built-in stateful conversation history tracking, and out-of-the-box support for RAG with knowledge base integration. Additionally, Amazon Bedrock simplifies the agent development process by enabling migration of existing functions, requiring only basic prompts for optimal results, and providing traceability for agent actions and knowledge base queries. For further insights into generative AI, foundation models, and LLMs, Amazon Bedrock offers a wealth of use cases and resources. To learn more, see Generative AI and multi-modal agents in AWS: The key to unlocking new value in financial markets, which demonstrates how multi-modal agents use AWS services to efficiently gather financial insights from an organization’s text, speech, and tabular data sources for a financial analyst’s query using LangChain agents as alternative implementation.
The following table contains example questions provided by a financial analyst to ask the AI-powered assistant for investment research based on structured and unstructured data.

Question purpose
Question asked by financial analyst to AI-powered assistant
Action group or knowledge base invoked
Intermediate steps from AI-powered assistant chain-of-thought reasoning
Answer from AI-powered assistant

Prompt the agent to analyze the company’s financial health and performance.
Analyze Amazon’s financial health based on the 10k report. Calculate important financial ratios. Please limit your answer to 5 sentences.
Step1: To analyze Amazon’s financial health and calculate important financial ratios based on their 10K report, I will need to search the knowledge base for relevant information from Amazon’s 10K filing. I can then extract key financial data like revenues, expenses, assets, liabilities, etc. and use that to calculate ratios like current ratio, debt-to-equity ratio, return on assets, etc. to assess their financial position and performance. Step2: The search results provide some key financial data from Amazon’s recent 10K filing, including net sales, operating income, net income, and context around factors impacting their financials like foreign exchange rates and severance costs. To analyze their financial health and calculate ratios, I will need to extract additional data points like current assets, current liabilities, total assets, total liabilities, etc. from the 10K report. Step3: With the additional balance sheet data from Amazon’s 10K report, I now have the key financial figures needed to calculate some important ratios to analyze their financial health:nnCurrent Ratio = Total Current Assets / Total Current Liabilitiesn= $146.8 billion / $155.4 billion = 0.94nnThis current ratio slightly below 1 indicates Amazon may have a bit of a liquidity crunch in the short-term to cover current obligations.nnDebt-to-Equity Ratio = Total Liabilities / Total Shareholders’ Equity n= ($155.4 billion + $161.2 billion) / ($462.7 billion – $316.6 billion)n= $316.6 billion / $146.1 billion = 2.17nnThe relatively high debt-to-equity ratio over 2 shows Amazon has a highly leveraged capital structure, though this may be expected given their large investments in infrastructure and growth initiatives.nnReturn on Assets = Net Income / Total Assetsn= $9.9 billion / ($146.8 billion + $315.9 billion) n= $9.9 billion / $462.7 billion = 2.1%nnThe low single-digit return on assets indicates Amazon is not very efficient at generating profits from its asset base currently, likely due to heavy investments and spending on growth.
The current ratio of 0.94 indicates Amazon may face some liquidity challenges in covering short-term obligations.[1] The debt-to-equity ratio of 2.17 shows a highly leveraged capital structure, likely due to major investments in growth initiatives.[2] The return on assets of 2.1% is relatively low, suggesting Amazon is not very efficient at generating profits from its asset base currently.[3]

Prompt the agent to analyze the company’s financial performance from company’s earnings reports
What were the main drivers of AWS’s revenue growth in the Q1 2023 quarter? Please limit your answers to 5 sentences.
To determine the main drivers of AWS’s revenue growth in Q1 2023, I will need to search the knowledge base for any information specifically related to AWS’s performance and growth factors during that quarter.
The main drivers of AWS’s revenue growth of 16% in Q1 2023 were increased customer usage of AWS services, partially offset by pricing changes primarily driven by long-term customer contracts. Higher customer adoption and consumption of AWS offerings fueled the topline growth, though pricing concessions on long-term deals moderated the growth rate.

Prompt the agent to analyze the company’s management team and their plans for the future from company’s earnings reports
What are Amazon’s top priorities for 2023? What are the biggest risks facing the company? Please limit your answers to 5 sentences.
To determine Amazon’s top priorities and biggest risks for 2023, I will search the knowledge base for any relevant information from their recent earnings reports, investor presentations, or other public statements outlining their strategic focus areas and potential risk factors.
Amazon’s top priorities for 2023 include mitigating impacts from macroeconomic headwinds, managing foreign exchange volatility, and accurately forecasting customer demand amid economic uncertainty. Some of the biggest risks facing the company are inventory management challenges, rising payments processing costs and fraud risks, reliance on third-party payment providers, and a potential slowdown in enterprise AWS adoption due to customer cost optimization efforts.

Prompt the agent to perform sentiment analysis on the earnings call recordings
What is the sentiment around inflation in Amazon’s earnings call?
Sentiment Analysis Tool, Financial-Data-Explorer
Step 1: To determine the sentiment around inflation expressed in Amazon’s earnings call, I will need to search the knowledge base for any relevant excerpts or commentary from the earnings call transcript related to inflation. 2: The search results provide some relevant excerpts from Amazon’s earnings guidance and commentary related to inflation and macroeconomic factors. To better assess the overall sentiment, I will use the Sentiment Analysis tool on these excerpts.
The sentiment around inflation in Amazon’s earnings call is decidedly negative. Amazon expresses concerns that ongoing inflationary pressures, rising costs, and recessionary fears are driving cautious consumer spending, shifts to lower-priced products, and challenges in forecasting demand. The company expects these macroeconomic factors to continue impacting operations, customer demand, and revenue growth rates across segments like AWS in the coming quarters.

Prompt the agent to build an optimal portfolio using the collected data
What are the closing prices of stocks AAAA, WWW, DDD in year 2018? Can you build an optimized portfolio using these three stocks on how to allocate 10,000 USD?
Stocks Querying Tool, Portfolio Optimization Tool
Step 1: To get the closing prices of the stocks AAAA, WWW, and DDD in 2018, I will use the Stock-Query tool Step 2: The Stock-Query tool provided the closing prices for the requested stocks in 2018. To build an optimized portfolio allocation with $10,000 using these stocks, I will use the Portfolio-Optimization tool.
Based on the closing prices of stocks AAAA, WWW, and DDD in 2018 provided by the <REDACTED> tool, and the optimized portfolio allocation from the <REDACTED> tool, here is how to allocate $10,000 across these three stocks: With $10,000: – Buy 48 shares of AAAA stock at $172.44 per share, investing $8,415 – Buy 1 share of WWW stock at $85.91 per share, investing $165.80 – Buy 14 shares of DDD stock at $9.82 per share, investing $1,419.20 This optimized portfolio allocation maximizes returns while minimizing risk based on the 2018 stock prices.

About the Authors
Omar Ashton is a Partner Solutions Architect in Colombia. With 17 years of experience in software design and development, IT solution architecture, technology strategy and leadership of development teams.
Sovik Kumar Nath is an AI/ML and GenAI specialist senior solution architect with AWS working with financial services and capital markets customers. He has extensive experience designing end-to-end machine learning and business analytics solutions in finance, operations, marketing, healthcare, supply chain management, and IoT. Sovik has published articles and holds a patent in ML model monitoring. He has double masters degrees from the University of South Florida, University of Fribourg, Switzerland, and a bachelors degree from the Indian Institute of Technology, Kharagpur. Outside of work, Sovik enjoys traveling, taking ferry rides, and watching movies.
Jose Rojas is a Partner Solutions Architect at AWS. He helps Partners to increase productivity, efficiency and revenue by adopting and creating solutions on AWS. Before joining AWS, Jose worked at Cisco Meraki helping customers adopt cloud networking solutions. Outside work, he enjoys traveling with his family, swimming and cycling.
Mohan Musti is a Principal Technical Account Manger based out of Dallas. Mohan helps customers architect and optimize applications on AWS. Mohan has Computer Science and Engineering from JNT University, India. In his spare time, he enjoys spending time with his family and camping.
Jia (Vivian) Li is a Senior Solutions Architect in AWS, with specialization in AI/ML. She currently supports customers in financial industry. Prior to joining AWS in 2022, she had 7 years of experience supporting enterprise customers use AI/ML in the cloud to drive business results. Vivian has a BS from Peking University and a PhD from University of Southern California. In her spare time, she enjoys all the water activities, and hiking in the beautiful mountains in her home state, Colorado.
Uchenna Egbe is an AI/ML and GenAI specialist Solutions Architect who enjoys building reusable AIML solutions. Uchenna has an MS from the University of Alaska Fairbanks. He spends his free time researching about herbs, teas, superfoods, and how to incorporate them into his daily diet.

Unstructured Introduces Unstructured Serverless API: The Simplest, Fas …

The rapid evolution of AI and machine learning ML necessitates robust, scalable, and efficient data processing solutions. Unstructured, a leading innovator in data transformation, introduces its Unstructured Serverless API, a groundbreaking development aimed at simplifying, accelerating, and reducing the costs of making enterprise data AI-ready.

Introduction to Unstructured Serverless API

The Unstructured Serverless API represents the pinnacle of data processing technology, designed to render enterprise data ready for AI applications seamlessly and cost-effectively. This new offering from Unstructured is poised to redefine data handling with several key enhancements:

New Signup Flow and Admin Dashboard: Enhances user experience with simplified onboarding and efficient management tools.

Per-page pricing Model: This introduces predictable and reduced costs, allowing users to pay based on the number of pages processed.

Enhanced Performance Metrics: Achieves a 5x improvement in PDF processing throughput, 70% better table classification, 11% higher text accuracy, and a 20% reduction in word error rate.

Image Source

Advantages of Unstructured Serverless API

Improved Transformation Performance

The Unstructured Serverless API leverages next-generation document transformation models, delivering unparalleled performance improvements over its open-source predecessors. The key benefits include:

Faster Processing Throughput: Processing PDFs is now five times faster.

Better Table Classification: The accuracy of detecting and structuring tables has improved by 70%.

Higher Text Accuracy: Text extraction accuracy has seen an 11% enhancement.

Reduced Word Error Rate: The word error rate has decreased by 20%.

These improvements facilitate superior AI-enabled workflows in three critical areas:

Data Cleaning: Developers can easily remove unwanted document elements, such as headers, footers, or images, ensuring cleaner data for AI processing.

Advanced Chunking Strategies: Developers can more effectively manage and process document sections by chunking documents based on elements like titles.

Metadata Filtering: Enhances data retrieval by prioritizing the most relevant information within a file during queries.

Enhanced Developer Experience

Unstructured’s commitment to delivering an exceptional developer experience is evident in the new features of its Serverless API:

Refreshed Onboarding Process: A streamlined signup process ensures a smooth start for new users.

New Admin Panel: Simplifies API key management and usage tracking.

Comprehensive Documentation: Newly revamped documentation provides clear, detailed guidance for users.

These enhancements make the Unstructured Serverless API powerful and user-friendly, fostering a more productive development environment.

Cost Efficiency and Pricing Model

A significant shift in the pricing model accompanies the introduction of the Unstructured Serverless API. Moving from a compute-hour-based pricing model to a per-page pricing model, Unstructured offers more predictability and transparency:

Fast Pipeline: Costs $1 per 1,000 pages.

Hi-Res Pipeline: Costs $10 per 1,000 pages.

This new pricing structure significantly reduces costs, making it more economical for users to process large documents. For instance, processing 1,000 PDF pages now costs $10, down from $12.93 under the previous model.

Performance Enhancements

Unstructured Serverless API boasts near-instant startup speeds and reduced latency, thanks to continuously online worker nodes that cut ramp-up times to under three seconds from the previous thirty minutes. Document preprocessing pipelines are also optimized, processing documents five times faster through techniques like document splitting for parallelized transformation.

Security and Compliance

In ensuring enterprises can trust the Unstructured Serverless API with their most critical data workloads, Unstructured has achieved SOC 2 Type 2 compliance. This certification underscores the API’s security, availability, processing integrity, confidentiality, and privacy controls.


The Unstructured Serverless API is set to transform how enterprises handle data for AI applications, combining unmatched performance, cost efficiency, and ease of use. By providing scalable, resilient, and secure data processing solutions, Unstructured empowers organizations to harness the full potential of AI. 
The post Unstructured Introduces Unstructured Serverless API: The Simplest, Fastest, and Cost-Effective Way to Render Enterprise Data AI-Ready appeared first on MarkTechPost.