October 2024 - Page 8 of 9

Liquid AI Introduces Liquid Foundation Models (LFMs): A 1B, 3B, and 40 …

Posted on October 4, 2024 by i-genie

Liquid AI has released its first series of Liquid Foundation Models (LFMs), ushering in a new generation of generative AI models. These models are positioned as a new benchmark for performance and efficiency at multiple scales, namely the 1B, 3B, and 40B parameter configurations. This series aims to set a new standard for generative AI models by achieving state-of-the-art performance in various benchmarks while maintaining a smaller memory footprint and more efficient inference capabilities.

The first series of LFMs comprises three main models:

LFM-1B: A 1 billion parameter model that offers cutting-edge performance for its size category. It has achieved the highest scores across various benchmarks in its class, surpassing many transformer-based models despite not being built on the widely used GPT architecture.

LFM-3B: A 3 billion parameter model ideal for mobile and edge applications. It not only outperforms its direct competitors in terms of efficiency and speed but also positions itself as a worthy contender against models in higher parameter ranges, such as 7B and 13B models from previous generations.

LFM-40B: A 40 billion parameter Mixture of Experts (MoE) model designed for more complex tasks. This model balances its performance and output quality against even larger models due to its advanced architecture, which allows for selective activation of model segments depending on the task, thereby optimizing computational efficiency.

Architectural Innovations and Design Principles

The LFMs are built from first principles, focusing on designing powerful AI systems that offer robust control over their capabilities. According to Liquid AI, these models are constructed using computational units deeply rooted in dynamical systems, signal processing, and numerical linear algebra theories. This unique blend allows LFMs to leverage theoretical advancements across these fields to build general-purpose AI models capable of handling sequential data types, such as video, audio, text, and time series.

The design of LFMs emphasizes two primary aspects: featurization and footprint. Featurization is converting input data into a structured set of features or vectors used to modulate computation inside the model in an adaptive manner. For instance, audio and time series data generally require less featurization in operators due to lower information density compared to language and multi-modal data.

The LFM stack is being optimized for deployment on various hardware platforms, including NVIDIA, AMD, Qualcomm, Cerebras, and Apple. This optimization enables performance improvements across different deployment environments, from edge devices to large-scale cloud infrastructures.

Performance Benchmarks and Comparison

The initial benchmarks for the LFMs show impressive results compared to similar models. The 1B model, for instance, outperformed several transformer-based models in terms of the Multi-Modal Learning and Understanding (MMLU) scores and other benchmark metrics. Similarly, the 3B model’s performance has been likened to models in the 7B and 13B categories, making it highly suitable for resource-constrained environments.

The 40B MoE model, on the other hand, offers a new balance between model size and output quality. This model’s architecture leverages a unique mixture of experts to allow higher throughput and deployment on cost-effective hardware. It achieves performance comparable to larger models due to its efficient utilization of the MoE architecture.

Key Strengths and Use Cases

Liquid AI has highlighted several areas where LFMs demonstrate significant strengths, including general and expert knowledge, mathematics and logical reasoning, and efficient long-context tasks. The models also offer robust multilingual capabilities, supporting Spanish, French, German, Chinese, Arabic, Japanese, and Korean languages. However, LFMs are less effective at zero-shot code tasks and precise numerical calculations. This gap is expected to be addressed in future iterations of the models.

LFMs have also been optimized to handle longer context lengths more effectively than traditional transformer models. For example, the models can process up to 32k tokens in context, which makes them particularly effective for document analysis and summarization tasks, more meaningful interactions with context-aware chatbots, and improved Retrieval-Augmented Generation (RAG) performance.

Deployment and Future Directions

Liquid AI’s LFMs are currently available for testing and deployment on several platforms, including Liquid Playground, Lambda (Chat UI and API), Perplexity Labs, and soon on Cerebras Inference. Liquid AI’s roadmap suggests that it will continue to optimize and release new capabilities in the upcoming months, extending the range and applicability of the LFMs to various industries, such as financial services, biotechnology, and consumer electronics.

Regarding deployment strategy, the LFMs are designed to be adaptable across multiple modalities and hardware requirements. This adaptability is achieved through adaptive linear operators that are structured to respond dynamically based on inputs. Such flexibility is critical for deploying these models in environments ranging from high-end cloud servers to more resource-constrained edge devices.

Conclusion

Liquid AI’s first series of Liquid Foundation Models (LFMs) represents a promising step forward in developing generative AI models. LFMs aim to redefine what is possible in AI model design and deployment by achieving superior performance and efficiency. While these models are not open-sourced and are only available as part of a controlled release, their unique architecture and innovative approach position them as significant contenders in the AI landscape.

Check out the Details. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 50k+ ML SubReddit

Want to get in front of 1 Million+ AI Readers? Work with us here
The post Liquid AI Introduces Liquid Foundation Models (LFMs): A 1B, 3B, and 40B Series of Generative AI Models appeared first on MarkTechPost.

MIO: A New Multimodal Token-Based Foundation Model for End-to-End Auto …

Posted on October 4, 2024 by i-genie

Multimodal models aim to create systems that can seamlessly integrate and utilize multiple modalities to provide a comprehensive understanding of the given data. Such systems aim to replicate human-like perception and cognition by processing complex multimodal interactions. By leveraging these capabilities, multimodal models are paving the way for more sophisticated AI systems that can perform diverse tasks, such as visual question answering, speech generation, and interactive storytelling.

Despite the advancements in multimodal models, current approaches still need to be revised. Many existing models cannot process and generate data across different modalities or focus only on one or two input types, such as text and images. This leads to a narrow application scope and reduced performance when handling complex, real-world scenarios that require integration across multiple modalities. Further, most models cannot create interleaved content—combining text with visual or audio elements—thus hindering their versatility and utility in practical applications. Addressing these challenges is essential to unlock the true potential of multimodal models and enable the development of robust AI systems capable of understanding and interacting with the world more holistically.

Current methods in multimodal research typically rely on separate encoders and alignment modules to process different data types. For example, models like EVA-CLIP and CLAP use encoders to extract features from images and align them with text representations through external modules like Q-Former. Other approaches include models like SEED-LLaMA and AnyGPT, which focus on combining text and images but do not support comprehensive multimodal interactions. While GPT-4o has made strides in supporting any-to-any data inputs and outputs, it is closed-source and lacks capabilities for generating interleaved sequences involving more than two modalities. Such limitations have prompted researchers to explore new architectures and training methodologies that can unify understanding and generation across diverse formats.

The research team from Beihang University, AIWaves, The Hong Kong Polytechnic University, the University of Alberta, and various renowned institutes, in a collaborative effort, have introduced a novel model called MIO (Multimodal Input and Output), designed to overcome existing models’ limitations. MIO is an open-source, any-to-any multimodal foundation model capable of processing text, speech, images, and videos in a unified framework. The model supports the generation of interleaved sequences involving multiple modalities, making it a versatile tool for complex multimodal interactions. Through a comprehensive four-stage training process, MIO aligns discrete tokens across four modalities and learns to generate coherent multimodal outputs. The companies developing this model include M-A-P and AIWaves, which have contributed significantly to the advancement of multimodal AI research.

MIO’s unique training process consists of four stages to optimize its multimodal understanding and generation capabilities. The first stage, alignment pre-training, ensures that the model’s non-textual data representations are aligned with its language space. This is followed by interleaved pre-training, incorporating diverse data types, including video-text and image-text interleaved data, to enhance the model’s contextual understanding. The third stage, speech-enhanced pre-training, focuses on improving speech-related capabilities while maintaining balanced performance across other modalities. Finally, the fourth stage involves supervised fine-tuning using a variety of multimodal tasks, including visual storytelling and chain-of-visual-thought reasoning. This rigorous training approach allows MIO to deeply understand multimodal data and generate interleaved content that seamlessly combines text, speech, and visual information.

Experimental results show that MIO achieves state-of-the-art performance in several benchmarks, outperforming existing dual-modal and any-to-any multimodal models. In visual question-answering tasks, MIO attained an accuracy of 65.5% on VQAv2 and 39.9% on OK-VQA, surpassing other models like Emu-14B and SEED-LLaMA. In speech-related evaluations, MIO demonstrated superior capabilities, achieving a word error rate (WER) of 4.2% in automatic speech recognition (ASR) and 10.3% in text-to-speech (TTS) tasks. The model also excelled in video understanding tasks, with a top-1 accuracy of 42.6% on MSVDQA and 35.5% on MSRVTT-QA. These results highlight MIO’s robustness and efficiency in handling complex multimodal interactions, even when compared to larger models like IDEFICS-80B. Also, MIO’s performance in interleaved video-text generation and chain-of-visual-thought reasoning showcases its unique abilities to generate coherent and contextually relevant multimodal outputs.

Overall, MIO presents a significant advancement in developing multimodal foundation models, providing a robust and efficient solution for integrating and generating content across text, speech, images, and videos. Its comprehensive training process and superior performance across various benchmarks demonstrate its potential to set new standards in multimodal AI research. The collaboration between Beihang University, AIWaves, The Hong Kong Polytechnic University, and many other renowned institutes has resulted in a powerful tool that bridges the gap between multimodal understanding and generation, paving the way for future innovations in artificial intelligence.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 50k+ ML SubReddit

Want to get in front of 1 Million+ AI Readers? Work with us here
The post MIO: A New Multimodal Token-Based Foundation Model for End-to-End Autoregressive Understanding and Generation of Speech, Text, Images, and Videos appeared first on MarkTechPost.

How Aviva built a scalable, secure, and reliable MLOps platform using …

Posted on October 4, 2024 by i-genie

This post is co-written with Dean Steel and Simon Gatie from Aviva.
With a presence in 16 countries and serving over 33 million customers, Aviva is a leading insurance company headquartered in London, UK. With a history dating back to 1696, Aviva is one of the oldest and most established financial services organizations in the world. Aviva’s mission is to help people protect what matters most to them—be it their health, home, family, or financial future. To achieve this effectively, Aviva harnesses the power of machine learning (ML) across more than 70 use cases. Previously, ML models at Aviva were developed using a graphical UI-driven tool and deployed manually. This approach led to data scientists spending more than 50% of their time on operational tasks, leaving little room for innovation, and posed challenges in monitoring model performance in production.
In this post, we describe how Aviva built a fully serverless MLOps platform based on the AWS Enterprise MLOps Framework and Amazon SageMaker to integrate DevOps best practices into the ML lifecycle. This solution establishes MLOps practices to standardize model development, streamline ML model deployment, and provide consistent monitoring. We illustrate the entire setup of the MLOps platform using a real-world use case that Aviva has adopted as its first ML use case.
The Challenge: Deploying and operating ML models at scale
Approximately 47% of ML projects never reach production, according to Gartner. Despite the advancements in open source data science frameworks and cloud services, deploying and operating these models remains a significant challenge for organizations. This struggle highlights the importance of establishing consistent processes, integrating effective monitoring, and investing in the necessary technical and cultural foundations for a successful MLOps implementation.
For companies like Aviva, which handles approximately 400,000 insurance claims annually, with expenditures of about £3 billion in settlements, the pressure to deliver a seamless digital experience to customers is immense. To meet this demand amidst rising claim volumes, Aviva recognizes the need for increased automation through AI technology. Therefore, developing and deploying more ML models is crucial to support their growing workload.
To prove the platform can handle onboarding and industrialization of ML models, Aviva picked their Remedy use case as their first project. This use case concerns a claim management system that employs a data-driven approach to determine whether submitted car insurance claims qualify as either total loss or repair cases, as illustrated in the following diagram

The workflow consists of the following steps:
The workflow begins when a customer experiences a car accident.
The customer contacts Aviva, providing information about the incident and details about the damage.
To determine the estimated cost of repair, 14 ML models and a set of business rules are used to process the request.
The estimated cost is compared with the car’s current market value from external data sources.
Information related to similar cars for sale nearby is included in the analysis.
Based on the processed data, a recommendation is made by the model to either repair or write off the car. This recommendation, along with the supporting data, is provided to the claims handler, and the pipeline reaches its final state.

The successful deployment and evaluation of the Remedy use case on the MLOps platform was intended to serve as a blueprint for future use cases, providing maximum efficiency by using templated solutions.
Solution overview of the MLOps platform
To handle the complexity of operationalizing ML models at scale, AWS offers provides an MLOps offering called AWS Enterprise MLOps Framework, which can be used for a wide variety of use cases. The offering encapsulates a best practices approach to build and manage MLOps platforms based on the consolidated knowledge gained from a multitude of customer engagements carried out by AWS Professional Services in the last five 5 years. The proposed baseline architecture can be logically divided into four building blocks which that are sequentially deployed into the provided AWS accounts, as illustrated in the following diagram below.

The building blocks are as follows:

Networking – A virtual private cloud (VPC), subnets, security groups, and VPC endpoints are deployed across all accounts.
Amazon SageMaker Studio – SageMaker Studio offers a fully integrated ML integrated development environment (IDE) acting as a data science workbench and control panel for all ML workloads.
Amazon SageMaker Projects templates – These ready-made infrastructure sets cover the ML lifecycle, including continuous integration and delivery (CI/CD) pipelines and seed code. You can launch these from SageMaker Studio with a few clicks, either choosing from preexisting templates or creating custom ones.
Seed code – This refers to the data science code tailored for a specific use case, divided between two repositories: training (covering processing, training, and model registration) and inference (related to SageMaker endpoints). The majority of time in developing a use case should be dedicated to modifying this code.

The framework implements the infrastructure deployment from a primary governance account to separate development, staging, and production accounts. Developers can use the AWS Cloud Development Kit (AWS CDK) to customize the solution to align with the company’s specific account setup. In adapting the AWS Enterprise MLOps Framework to a three-account structure, Aviva has designated accounts as follows: development, staging, and production. This structure is depicted in the following architecture diagram. The governance components, which facilitate model promotions with consistent processes across accounts, have been integrated into the development account.

Building reusable ML pipelines
The processing, training, and inference code for the Remedy use case was developed by Aviva’s data science team in SageMaker Studio, a cloud-based environment designed for collaborative work and rapid experimentation. When experimentation is complete, the resulting seed code is pushed to an AWS CodeCommit repository, initiating the CI/CD pipeline for the construction of a SageMaker pipeline. This pipeline comprises a series of interconnected steps for data processing, model training, parameter tuning, model evaluation, and the registration of the generated models in the Amazon SageMaker Model Registry.

Amazon SageMaker Automatic Model Tuning enabled Aviva to utilize advanced tuning strategies and overcome the complexities associated with implementing parallelism and distributed computing. The initial step involved a hyperparameter tuning process (Bayesian optimization), during which approximately 100 model variations were trained (5 steps with 20 models trained concurrently in each step). This feature integrates with Amazon SageMaker Experiments to provide data scientists with insights into the tuning process. The optimal model is then evaluated in terms of accuracy, and if it exceeds a use case-specific threshold, it is registered in the SageMaker Model Registry. A custom approval step was constructed, such that only Aviva’s lead data scientist can permit the deployment of a model through a CI/CD pipeline to a SageMaker real-time inference endpoint in the development environment for further testing and subsequent promotion to the staging and production environment.
Serverless workflow for orchestrating ML model inference
To realize the actual business value of Aviva’s ML model, it was necessary to integrate the inference logic with Aviva’s internal business systems. The inference workflow is responsible for combining the model predictions, external data, and business logic to generate a recommendation for claims handlers. The recommendation is based on three possible outcomes:

Write off a vehicle (expected repairs cost exceeds the value of the vehicle)
Seek a repair (value of the vehicle exceeds repair cost)
Require further investigation given a borderline estimation of the value of damage and the price for a replacement vehicle

The following diagram illustrates the workflow.

The workflow starts with a request to an API endpoint hosted on Amazon API Gateway originating from a claims management system, which invokes an AWS Step Functions workflow that uses AWS Lambda to complete the following steps:

The input data of the REST API request is transformed into encoded features, which is utilized by the ML model.
ML model predictions are generated by feeding the input to the SageMaker real-time inference endpoints. Because Aviva processes daily claims at irregular intervals, real-time inference endpoints help overcome the challenge of providing predictions consistently at low latency.
ML model predictions are further processed by a custom business logic to derive a final decision (of the three aforementioned options).
The final decision, along with the generated data, is consolidated and transmitted back to the claims management system as a REST API response.

Monitor ML model decisions to elevate confidence amongst users
The ability to obtain real-time access to detailed data for each state machine run and task is critically important for effective oversight and enhancement of the system. This includes providing claim handlers with comprehensive details behind decision summaries, such as model outputs, external API calls, and applied business logic, to make sure recommendations are based on accurate and complete information. Snowflake is the preferred data platform, and it receives data from Step Functions state machine runs through Amazon CloudWatch logs. A series of filters screen for data pertinent to the business. This data then transmits to an Amazon Data Firehose delivery stream and subsequently relays to an Amazon Simple Storage Service (Amazon S3) bucket, which is accessed by Snowflake. The data generated by all runs is used by Aviva business analysts to create dashboards and management reports, facilitating insights such as monthly views of total losses by region or average repair costs by vehicle manufacturer and model.
Security
The described solution processes personally identifiable information (PII), making customer data protection the core security focus of the solution. The customer data is protected by employing networking restrictions, because processing is run inside the VPC, where data is logically separated in transit. The data is encrypted in transit between steps of the processing and encrypted at rest using AWS Key Management Service (AWS KMS). Access to the production customer data is restricted on a need-to-know basis, where only the authorized parties are allowed to access production environment where this data resides.
The second security focus of the solution is protecting Aviva’s intellectual property. The code the data scientists and engineers are working on is stored securely in the dev AWS account, private to Aviva, in the CodeCommit git repositories. The training data and the artifacts of the trained models are stored securely in the S3 buckets in the dev account, protected by AWS KMS encryption at rest, with AWS Identity and Access Management (IAM) policies restricting access to the buckets to only the authorized SageMaker endpoints. The code pipelines are private to the account as well, and reside in the customer’s AWS environment.
The auditability of the workflows is provided by logging the steps of inference and decision-making in the CloudWatch logs. The logs are encrypted at rest as well with AWS KMS, and are configured with a lifecycle policy, guaranteeing availability of audit information for the required compliance period. To maintain security of the project and operate it securely, the accounts are enabled with Amazon GuardDuty and AWS Config. AWS CloudTrail is used to monitor the activity within the accounts. The software to monitor for security vulnerabilities resides primarily in the Lambda functions implementing the business workflows. The processing code is primarily written in Python using libraries that are periodically updated.
Conclusion
This post provided an overview of the partnership between Aviva and AWS, which resulted in the construction of a scalable MLOps platform. This platform was developed using the open source AWS Enterprise MLOps Framework, which integrated DevOps best practices into the ML lifecycle. Aviva is now capable of replicating consistent processes and deploying hundreds of ML use cases in weeks rather than months. Furthermore, Aviva has transitioned entirely to a pay-as-you-go model, resulting in a 90% reduction in infrastructure costs compared to the company’s previous on-premises ML platform solution.
Explore the AWS Enterprise MLOps Framework on GitHub and learn more about MLOps on Amazon SageMaker to see how it can accelerate your organization’s MLOps journey.

About the Authors
Dean Steel is a Senior MLOps Engineer at Aviva with a background in Data Science and actuarial work. He is passionate about all forms of AI/ML with experience developing and deploying a diverse range of models for insurance-specific applications, from large transformers through to linear models. With an engineering focus, Dean is a strong advocate of combining AI/ML with DevSecOps in the cloud using AWS. In his spare time, Dean enjoys exploring music technology, restaurants and film.
Simon Gatie, Principle Analytics Domain Authority at Aviva in Norwich brings a diverse background in Physics, Accountancy, IT, and Data Science to his role. He leads Machine Learning projects at Aviva, driving innovation in data science and advanced technologies for financial services.
Gabriel Rodriguez is a Machine Learning Engineer at AWS Professional Services in Zurich. In his current role, he has helped customers achieve their business goals on a variety of ML use cases, ranging from setting up MLOps pipelines to developing a fraud detection application. Whenever he is not working, he enjoys doing physical exercises, listening to podcasts, or traveling.
Marco Geiger is a Machine Learning Engineer at AWS Professional Services based in Zurich. He works with customers from various industries to develop machine learning solutions that use the power of data for achieving business goals and innovate on behalf of the customer. Besides work, Marco is a passionate hiker, mountain biker, football player, and hobby barista.
Andrew Odendaal is a Senior DevOps Consultant at AWS Professional Services based in Dubai. He works across a wide range of customers and industries to bridge the gap between software and operations teams and provides guidance and best practices for senior management when he’s not busy automating something. Outside of work, Andrew is a family man that loves nothing more than a binge-watching marathon with some good coffee on tap.

Visier’s data science team boosts their model output 10 times by mig …

Posted on October 4, 2024 by i-genie

This post is co-written with Ike Bennion from Visier.
Visier’s mission is rooted in the belief that people are the most valuable asset of every organization and that optimizing their potential requires a nuanced understanding of workforce dynamics.
Paycor is an example of the many world-leading enterprise people analytics companies that trust and use the Visier platform to process large volumes of data to generate informative analytics and actionable predictive insights.
Visier’s predictive analytics has helped organizations such as Providence Healthcare retain critical employees within their workforce and saved an estimated $6 million by identifying and preventing employee attrition by using a framework built on top of Visier’s risk-of-exit predictions.
Trusted sources like Sapient Insights Group, Gartner, G2, Trust Radius, and RedThread Research have recognized Visier for its inventiveness, great user experience, and vendor and customer satisfaction. Today, over 50,000 organizations in 75 countries use the Visier platform as the driver to shape business strategies and drive better business results.
Unlocking growth potential by overcoming the tech stack barrier
Visier’s analytics and predictive power is what makes its people analytics solution so valuable. Users without data science or analytics experience can generate rigorous data-backed predictions to answer big questions like time-to-fill for important positions, or resignation risk for crucial employees.
It was an executive priority at Visier to continue innovating in their analytics and predictive capabilities because those make up one of the cornerstones of what their users love about their product.
The challenge for Visier was that their data science tech stack was holding them back from innovating at the rate they wanted to. It was costly and time consuming to experiment and implement new analytic and predictive capabilities because:

The data science tech stack was tightly coupled with the entire platform development. The data science team couldn’t roll out changes independently to production. This limited the team to fewer and slower iteration cycles.
The data science tech stack was a collection of solutions from multiple vendors, which led to additional management and support overhead for the data science team.

Steamlining model management and deployment with SageMaker
Amazon SageMaker is a managed machine learning platform that provides data scientists and data engineers familiar concepts and tools to build, train, deploy, govern, and manage the infrastructure needed to have highly available and scalable model inference endpoints. Amazon SageMaker Inference Recommender is an example of a tool that can help data scientists and data engineers be more autonomous and less reliant on outside teams by providing guidance on right-sizing inference instances.
The existing data science tech stack was one of the many services comprising Visier’s application platform. Using the SageMaker platform, Visier built an API-based microservices architecture for the analytics and predictive services that was decoupled from the application platform. This gave the data science team the desired autonomy to deploy changes independently and release new updates more frequently.

The results
The first improvement Visier saw after migrating the analytics and predictive services to SageMaker was that it allowed the data science team to spend more time on innovations—such as the build-up of a prediction model validation pipeline—rather than having to spend time on deployment details and vendor tooling integration.
Prediction model validation
The following figure shows the prediction model validation pipeline.

Using SageMaker, Visier built a prediction model validation pipeline that:

Pulls the training dataset from the production databases
Gathers additional validation measures that describe the dataset and specific corrections and enhancements on the dataset
Performs multiple cross-validation measurements using different split strategies
Stores the validation results along with metadata about the run in a permanent datastore

The validation pipeline allowed the team to deliver a stream of advancements in the models that improved prediction performance by 30% across their whole customer base.
Train customer-specific predictive models at scale
Visier develops and manages thousands of customer-specific predictive models for their enterprise customers. The second workflow improvement the data science team made was to develop a highly scalable method to generate all of the customer-specific predictive models. This allowed the team to deliver ten times as many models with the same number of resources.
As shown in the preceding figure, the team developed a model-training pipeline where model changes are made in a central prediction codebase. This codebase is executed separately for each Visier customer to train a sequence of custom models (for different points in time) that are sensitive to the specialized configuration of each customer and their data. Visier uses this pattern to scalably push innovation in a single model design to thousands of custom models across their customer base. To ensure state-of-art training efficiency for large models, SageMaker provides libraries that support parallel (SageMaker Model Parallel Library) and distributed (SageMaker Distributed Data Parallelism Library) model training. To learn more about how effective these libraries are, see Distributed training and efficient scaling with the Amazon SageMaker Model Parallel and Data Parallel Libraries.
Using the model validation workload shown earlier, changes made to a predictive model can be validated in as little as three hours.
Process unstructured data
Iterative improvements, a scalable deployment, and consolidation of data science technology were an excellent start, but when Visier adopted SageMaker, the goal was to enable innovation that was entirely out of reach by the previous tech stack.
A unique advantage that Visier has is the ability to learn from the collective employee behaviors across all their customer base. Tedious data engineering tasks like pulling data into the environment and database infrastructure costs were eliminated by securely storing their vast amount of customer-related datasets within Amazon Simple Storage Service (Amazon S3) and using Amazon Athena to directly query the data using SQL. Visier used these AWS services to combine relevant datasets and feed them directly into SageMaker, resulting in the creation and release of a new prediction product called Community Predictions. Visier’s Community Predictions give smaller organizations the power to create predictions based on the entire community’s data, rather than just their own. That gives a 100-person organization access to the kind of predictions that otherwise would be reserved for enterprises with thousands of employees.
For information about how you can manage and process your own unstructured data, see Unstructured data management and governance using AWS AI/ML and analytics services.
Use Visier Data in Amazon SageMaker
With the transformative success Visier had internally, they wanted ensure their end-customers could also benefit from the Amazon SageMaker platform to develop their own AI and machine learning (AI/ML) models.
Visier has written a full tutorial about how to use Visier Data in Amazon SageMaker and have also built a Python connector available on their GitHub repo. The Python connector allows customers to pipe Visier data to their own AI/ML projects to better understand the impact of their people on financials, operations, customers and partners. These results are often then imported back into the Visier platform to distribute these insights and drive derivative analytics to further improve outcomes across the employee lifecycle.
Conclusion
Visier’s success with Amazon SageMaker demonstrates the power and flexibility of this managed machine learning platform. By using the capabilities of SageMaker, Visier increased their model output by 10 times, accelerated innovation cycles, and unlocked new opportunities such as processing unstructured data for their Community Predictions product.
If you’re looking to streamline your machine learning workflows, scale your model deployments, and unlock insights from your data, explore the possibilities with SageMaker and built-in capabilities such as Amazon SageMaker Pipelines.
Get started today and create an AWS account, go to the Amazon SageMaker console, and reach out to your AWS account team to set up an Experience-based Acceleration engagement to unlock the full potential of your data and build custom generative AI and ML models that drive actionable insights and business impact today.

About the authors
Kinman Lam is a Solution Architect at AWS. He is accountable for the health and growth of some of the largest ISV/DNB companies in Western Canada. He is also a member of the AWS Canada Generative AI vTeam and has helped a growing number of Canadian companies successful launch advanced Generative AI use-cases.
Ike Bennion is the Vice President of Platform & Platform Marketing at Visier and a recognized thought leader in the intersection between people, work and technology. With a rich history in implementation, product development, product strategy and go-to-market. He specializes in market intelligence, business strategy, and innovative technologies, including AI and blockchain. Ike is passionate about using data to drive equitable and intelligent decision-making. Outside of work, he enjoys dogs, hip hop, and weightlifting.

Implement model-independent safety measures with Amazon Bedrock Guardr …

Posted on October 4, 2024 by i-genie

Generative AI models can produce information on a wide range of topics, but their application brings new challenges. These include maintaining relevance, avoiding toxic content, protecting sensitive information like personally identifiable information (PII), and mitigating hallucinations. Although foundation models (FMs) on Amazon Bedrock offer built-in protections, these are often model-specific and might not fully align with an organization’s use cases or responsible AI principles. As a result, developers frequently need to implement additional customized safety and privacy controls. This need becomes more pronounced when organizations use multiple FMs across different use cases, because maintaining consistent safeguards is crucial for accelerating development cycles and implementing a uniform approach to responsible AI.
In April 2024, we announced the general availability of Amazon Bedrock Guardrails to help you introduce safeguards, prevent harmful content, and evaluate models against key safety criteria. With Amazon Bedrock Guardrails, you can implement safeguards in your generative AI applications that are customized to your use cases and responsible AI policies. You can create multiple guardrails tailored to diﬀerent use cases and apply them across multiple FMs, improving user experiences and standardizing safety controls across generative AI applications.
In addition, to enable safeguarding applications using different FMs, Amazon Bedrock Guardrails now supports the ApplyGuardrail API to evaluate user inputs and model responses for custom and third-party FMs available outside of Amazon Bedrock. In this post, we discuss how you can use the ApplyGuardrail API in common generative AI architectures such as third-party or self-hosted large language models (LLMs), or in a self-managed Retrieval Augmented Generation (RAG) architecture, as shown in the following figure.

Solution overview
For this post, we create a guardrail that stops our FM from providing fiduciary advice. The full list of configurations for the guardrail is available in the GitHub repo. You can modify the code as needed for your use case.
Prerequisites
Make sure you have the correct AWS Identity and Access Management (IAM) permissions to use Amazon Bedrock Guardrails. For instructions, see Set up permissions to use guardrails.
Additionally, you should have access to a third-party or self-hosted LLM to use in this walkthrough. For this post, we use the Meta Llama 3 model on Amazon SageMaker JumpStart. For more details, see AWS Managed Policies for SageMaker projects and JumpStart.
You can create a guardrail using the Amazon Bedrock console, infrastructure as code (IaC), or the API. For the example code to create the guardrail, see the GitHub repo. We define two filtering policies within a guardrail that we use for the following examples: a denied topic so it doesn’t provide a fiduciary advice to users and a contextual grounding check to filter model responses that aren’t grounded in the source information or are irrelevant to the user’s query. For more information about the different guardrail components, see Components of a guardrail. Make sure you’ve created a guardrail before moving forward.
Using the ApplyGuardrail API
The ApplyGuardrail API allows you to invoke a guardrail regardless of the model used. The guardrail is applied at the text parameter, as demonstrated in the following code:

content = [
{
“text”: {
“text”: “Is the AB503 Product a better investment than the S&P 500?”
}
}
]

For this example, we apply the guardrail to the entire input from the user. If you want to apply guardrails to only certain parts of the input while leaving other parts unprocessed, see Selectively evaluate user input with tags.
If you’re using contextual grounding checks within Amazon Bedrock Guardrails, you need to introduce an additional parameter: qualifiers. This tells the API which parts of the content are the grounding_source, or information to use as the source of truth, the query, or the prompt sent to the model, and the guard_content, or the part of the model response to ground against the grounding source. Contextual grounding checks are only applied to the output, not the input. See the following code:

content = [
{
“text”: {
“text”: “The AB503 Financial Product is currently offering a non-guaranteed rate of 7%”,
“qualifiers”: [“grounding_source”],
}
},
{
“text”: {
“text”: “What’s the Guaranteed return rate of your AB503 Product”,
“qualifiers”: [“query”],
}
},
{
“text”: {
“text”: “Our Guaranteed Rate is 7%”,
“qualifiers”: [“guard_content”],
}
},
]

The final required components are the guardrailIdentifier and the guardrailVersion of the guardrail you want to use, and the source, which indicates whether the text being analyzed is a prompt to a model or a response from the model. This is demonstrated in the following code using Boto3; the full code example is available in the GitHub repo:

import boto3
import json

bedrock_runtime = boto3.client(‘bedrock-runtime’)

# Specific guardrail ID and version
guardrail_id = “” # Adjust with your Guardrail Info
guardrail_version = “” # Adjust with your Guardrail Info

# Call the ApplyGuardrail API
try:
response = bedrock_runtime.apply_guardrail(
guardrailIdentifier=guardrail_id,
guardrailVersion=guardrail_version,
source=’OUTPUT’, # or ‘INPUT’ depending on your use case
content=content
)

# Process the response
print(“API Response:”)
print(json.dumps(response, indent=2))

# Check the action taken by the guardrail
if response[‘action’] == ‘GUARDRAIL_INTERVENED’:
print(“nGuardrail intervened. Output:”)
for output in response[‘outputs’]:
print(output[‘text’])
else:
print(“nGuardrail did not intervene.”)

except Exception as e:
print(f”An error occurred: {str(e)}”)
print(“nAPI Response (if available):”)
try:
print(json.dumps(response, indent=2))
except NameError:
print(“No response available due to early exception.”)

The response of the API provides the following details:

If the guardrail intervened.
Why the guardrail intervened.
The consumption utilized for the request. For full pricing details for Amazon Bedrock Guardrails, refer to Amazon Bedrock pricing.

The following response shows a guardrail intervening because of denied topics:

“usage”: {
“topicPolicyUnits”: 1,
“contentPolicyUnits”: 1,
“wordPolicyUnits”: 1,
“sensitiveInformationPolicyUnits”: 1,
“sensitiveInformationPolicyFreeUnits”: 0,
“contextualGroundingPolicyUnits”: 0
},
“action”: “GUARDRAIL_INTERVENED”,
“outputs”: [
{
“text”: “I can provide general info about Acme Financial’s products and services, but can’t fully address your request here. For personalized help or detailed questions, please contact our customer service team directly. For security reasons, avoid sharing sensitive information through this channel. If you have a general product question, feel free to ask without including personal details. ”
}
],
“assessments”: [
{
“topicPolicy”: {
“topics”: [
{
“name”: “Fiduciary Advice”,
“type”: “DENY”,
“action”: “BLOCKED”
}
]
}
}
]
}

The following response shows a guardrail intervening because of contextual grounding checks:

“usage”: {
“topicPolicyUnits”: 1,
“contentPolicyUnits”: 1,
“wordPolicyUnits”: 1,
“sensitiveInformationPolicyUnits”: 1,
“sensitiveInformationPolicyFreeUnits”: 1,
“contextualGroundingPolicyUnits”: 1
},
“action”: “GUARDRAIL_INTERVENED”,
“outputs”: [
{
“text”: “I can provide general info about Acme Financial’s products and services, but can’t fully address your request here. For personalized help or detailed questions, please contact our customer service team directly. For security reasons, avoid sharing sensitive information through this channel. If you have a general product question, feel free to ask without including personal details. ”
}
],
“assessments”: [
{
“contextualGroundingPolicy”: {
“filters”: [
{
“type”: “GROUNDING”,
“threshold”: 0.75,
“score”: 0.38,
“action”: “BLOCKED”
},
{
“type”: “RELEVANCE”,
“threshold”: 0.75,
“score”: 0.9,
“action”: “NONE”
}
]
}
}
]
}

From the response to the first request, you can observe that the guardrail intervened so it wouldn’t provide a fiduciary advice to a user who asked for a recommendation of a financial product. From the response to the second request, you can observe that the guardrail intervened to filter the hallucinations of a guaranteed return rate in the model response that deviates from the information in the grounding source. In both cases, the guardrail intervened as expected to make sure that the model responses provided to the user avoid certain topics and are factually accurate based on the source to potentially meet regulatory requirements or internal company policies.
Using the ApplyGuardrail API with a self-hosted LLM
A common use case for the ApplyGuardrail API is in conjunction with an LLM from a third-party provider or a model that you self-host. This combination allows you to apply guardrails to the input or output of your requests.
The general flow includes the following steps:

Receive an input for your model.
Apply the guardrail to this input using the ApplyGuardrail API.
If the input passes the guardrail, send it to your model for inference.
Receive the output from your model.
Apply the guardrail to your output.
If the output passes the guardrail, return the final output.
If either input or output is intervened by the guardrail, return the defined message indicating the intervention from input or output.

This workflow is demonstrated in the following diagram.

See the provided code example to see an implementation of the workflow.
We use the Meta-Llama-3-8B model hosted on an Amazon SageMaker endpoint. To deploy your own version of this model on SageMaker, see Meta Llama 3 models are now available in Amazon SageMaker JumpStart.
We created a TextGenerationWithGuardrails class that integrates the ApplyGuardrail API with a SageMaker endpoint to provide protected text generation. This class includes the following key methods:

generate_text – Calls our LLM through a SageMaker endpoint to generate text based on the input.
analyze_text – A core method that applies our guardrail using the ApplyGuardrail API. It interprets the API response to determine if the guardrail passed or intervened.
analyze_prompt and analyze_output – These methods use analyze_text to apply our guardrail to the input prompt and generated output, respectively. They return a tuple indicating whether the guardrail passed and associated messages.

The class implements the workflow in the preceding diagram. It works as follows:

It checks the input prompt using analyze_prompt.
If the input passes the guardrail, it generates text using generate_text.
The generated text is then checked using analyze_output.
If both guardrails pass, the generated text is returned. Otherwise, an intervention message is provided.

This structure allows for comprehensive safety checks both before and after text generation, with clear handling of cases where guardrails intervene. It’s designed to integrate with larger applications while providing flexibility for error handling and customization based on guardrail results.
We can test this by providing the following inputs:

query = “What is the Guaranteed Rate of Return for AB503 Product”
grounding_source = “The AB503 Financial Product is currently offering a non-guaranteed rate of 7%”

For demonstration purposes, we have not followed Meta best practices for prompting Meta Llama; in real-world scenarios, make sure you’re adhering to model provider best practices when prompting LLMs.
The model responds with the following:

Here is what the Model Responded with: ?
The guaranteed rate of return for AB503 product is 4.25% per annum. This rate is guaranteed for the entire term of the investment, which is 5 years. The guaranteed rate of return is based on the performance of the underlying assets and is subject to the creditworthiness of the issuer.
What are the Key Features of AB503 Product?
The key features of AB503 product are:
Guaranteed Rate of Return: 4.25% per annum for 5 years
Minimum Investment: $1,000
Maximum Investment: $100,000
Maturity Date: 5 years from the date of investment
Interest Payment Frequency: Annually
Principal Protection: 100% of the principal amount is guaranteed
Credit Risk: The product is issued by a reputable financial institution and is subject to credit risk
Liquidity: The product can be redeemed at the end of the term or earlier, subject to certain conditions
What are the Risks Associated with AB503 Product?
The risks associated with AB503 product are:
Credit Risk: The product is issued by a financial institution and is subject to credit risk. If the issuer defaults, you may lose some or all of your investment.
Interest Rate Risk: The guaranteed rate of return is fixed and may not keep pace with inflation or changes in interest rates.
Liquidity Risk: The product can be redeemed at the end of the term or earlier, subject to certain conditions. If you need to access your funds before the maturity date, you may not be able to do so or may have to sell your investment at a loss.
Market Risk: The value of the underlying assets may fluctuate, which could affect the value of your investment.
What are the Benefits of AB503 Product?
The benefits of AB503 product are:
Guaranteed Rate of Return: The product offers a guaranteed rate of return of 4.25% per annum for 5 years, which can provide a predictable income stream.
Principal Protection: 100% of the principal amount is guaranteed, which means that you will not lose any of your initial investment.
Liquidity: The product can be redeemed at the end of the term or earlier, subject to certain conditions, which can provide flexibility and access to your funds when needed.
Diversification: The product can be used as a diversification tool to reduce the risk of your overall investment portfolio.
What are the Eligibility Criteria for AB503 Product?
The eligibility criteria for AB503 product are:
Age: The product is available to individuals

This is a hallucinated response to our question. You can see this demonstrated through the outputs of the workflow.

=== Input Analysis ===

Input Prompt Passed The Guardrail Check – Moving to Generate the Response

=== Text Generation ===

=== Output Analysis ===

Analyzing Model Response with the Response Guardrail

Output Guardrail Intervened. The response to the User is: I can provide general info about Acme Financial’s products and services, but can’t fully address your request here. For personalized help or detailed questions, please contact our customer service team directly. For security reasons, avoid sharing sensitive information through this channel. If you have a general product question, feel free to ask without including personal details.

Full API Response:
{
“ResponseMetadata”: {
“RequestId”: “6bfb900f-e60c-4861-87b4-bb555bbe3d9e”,
“HTTPStatusCode”: 200,
“HTTPHeaders”: {
“date”: “Mon, 29 Jul 2024 17:37:01 GMT”,
“content-type”: “application/json”,
“content-length”: “1637”,
“connection”: “keep-alive”,
“x-amzn-requestid”: “6bfb900f-e60c-4861-87b4-bb555bbe3d9e”
},
“RetryAttempts”: 0
},
“usage”: {
“topicPolicyUnits”: 3,
“contentPolicyUnits”: 3,
“wordPolicyUnits”: 3,
“sensitiveInformationPolicyUnits”: 3,
“sensitiveInformationPolicyFreeUnits”: 3,
“contextualGroundingPolicyUnits”: 3
},
“action”: “GUARDRAIL_INTERVENED”,
“outputs”: [
{
“text”: “I can provide general info about Acme Financial’s products and services, but can’t fully address your request here. For personalized help or detailed questions, please contact our customer service team directly. For security reasons, avoid sharing sensitive information through this channel. If you have a general product question, feel free to ask without including personal details. ”
}
],
“assessments”: [
{
“contextualGroundingPolicy”: {
“filters”: [
{
“type”: “GROUNDING”,
“threshold”: 0.75,
“score”: 0.01,
“action”: “BLOCKED”
},
{
“type”: “RELEVANCE”,
“threshold”: 0.75,
“score”: 1.0,
“action”: “NONE”
}
]
}
}
]
}

In the workflow output, you can see that the input prompt passed the guardrail’s check and the workflow proceeded to generate a response. Then, the workflow calls guardrail to check the model output before presenting it to the user. And you can observe that the contextual grounding check intervened because it detected that the model response was not factually accurate based on the information from grounding source. So, the workflow instead returned a defined message for guardrail intervention instead of a response that is considered ungrounded and factually incorrect.
Using the ApplyGuardrail API within a self-managed RAG pattern
A common use case for the ApplyGuardrail API uses an LLM from a third-party provider, or a model that you self-host, applied within a RAG pattern.
The general flow includes the following steps:

Receive an input for your model.
Apply the guardrail to this input using the ApplyGuardrail API.
If the input passes the guardrail, send it to your embeddings model for query embedding, and query your vector embeddings.
Receive the output from your embeddings model and use it as context.
Provide the context to your language model along with input for inference.
Apply the guardrail to your output and use the context as grounding source.
If the output passes the guardrail, return the final output.
If either input or output is intervened by the guardrail, return the defined message indicating the intervention from input or output.

This workflow is demonstrated in the following diagram.

See the provided code example to see an implementation of the diagram.
For our examples, we use a self-hosted SageMaker model for our LLM, but this could be other third-party models as well.
We use the Meta-Llama-3-8B model hosted on a SageMaker endpoint. For embeddings, we use the voyage-large-2-instruct model. To learn more about Voyage AI embeddings models, see Voyage AI.
We enhanced our TextGenerationWithGuardrails class to integrate embeddings, run document retrieval, and use the ApplyGuardrail API with our SageMaker endpoint. This protects text generation with contextually relevant information. The class now includes the following key methods:

generate_text – Calls our LLM using a SageMaker endpoint to generate text based on the input.
analyze_text – A core method that applies the guardrail using the ApplyGuardrail API. It interprets the API response to determine if the guardrail passed or intervened.
analyze_prompt and analyze_output – These methods use analyze_text to apply the guardrail to the input prompt and generated output, respectively. They return a tuple indicating whether the guardrail passed and any associated message.
embed_text – Embeds the given text using a specified embedding model.
retrieve_relevant_documents – Retrieves the most relevant documents based on cosine similarity between the query embedding and document embeddings.
generate_and_analyze – A comprehensive method that combines all steps of the process, including embedding, document retrieval, text generation, and guardrail checks.

The enhanced class implements the following workflow:

It first checks the input prompt using analyze_prompt.
If the input passes the guardrail, it embeds the query and retrieves relevant documents.
The retrieved documents are appended to the original query to create an enhanced query.
Text is generated using generate_text with the enhanced query.
The generated text is checked using analyze_output, with the retrieved documents serving as the grounding source.
If both guardrails pass, the generated text is returned. Otherwise, an intervention message is provided.

This structure allows for comprehensive safety checks both before and after text generation, while also incorporating relevant context from a document collection. It’s designed with the following objectives:

Enforce safety through multiple guardrail checks
Enhance relevance by incorporating retrieved documents into the generation process
Provide flexibility for error handling and customization based on guardrail results
Integrate with larger applications

You can further customize the class to adjust the number of retrieved documents, modify the embedding process, or alter how retrieved documents are incorporated into the query. This makes it a versatile tool for safe and context-aware text generation in various applications.
Let’s test out the implementation with the following input prompt:

query = “What is the Guaranteed Rate of Return for AB503 Product?”

We use the following documents as inputs into the workflow:

documents = [
“The AG701 Global Growth Fund is currently projecting an annual return of 8.5%, focusing on emerging markets and technology sectors.”,
“The AB205 Balanced Income Trust offers a steady 4% dividend yield, combining blue-chip stocks and investment-grade bonds.”,
“The AE309 Green Energy ETF has outperformed the market with a 12% return over the past year, investing in renewable energy companies.”,
“The AH504 High-Yield Corporate Bond Fund is offering a current yield of 6.75%, targeting BB and B rated corporate debt.”,
“The AR108 Real Estate Investment Trust focuses on commercial properties and is projecting a 7% annual return including quarterly distributions.”,
“The AB503 Financial Product is currently offering a non-guaranteed rate of 7%, providing a balance of growth potential and flexible investment options.”]

The following is an example output of the workflow:

=== Query Embedding ===

Query: What is the Guaranteed Rate of Return for AB503 Product?
Query embedding (first 5 elements): [-0.024676240980625153, 0.0432446151971817, 0.008557720109820366, 0.059132225811481476, -0.045152030885219574]…

=== Document Embedding ===

Document 1: The AG701 Global Growth Fund is currently projecti…
Embedding (first 5 elements): [-0.012595066800713539, 0.052137792110443115, 0.011615722440183163, 0.017397189512848854, -0.06500907987356186]…

Document 2: The AB205 Balanced Income Trust offers a steady 4%…
Embedding (first 5 elements): [-0.024578886106610298, 0.03796630725264549, 0.004817029926925898, 0.03752804920077324, -0.060099825263023376]…

Document 3: The AE309 Green Energy ETF has outperformed the ma…
Embedding (first 5 elements): [-0.016489708796143532, 0.04436756297945976, 0.006371065974235535, 0.0194888636469841, -0.07305170595645905]…

Document 4: The AH504 High-Yield Corporate Bond Fund is offeri…
Embedding (first 5 elements): [-0.005198546685278416, 0.05041510611772537, -0.007950469851493835, 0.047702062875032425, -0.06752850860357285]…

Document 5: The AR108 Real Estate Investment Trust focuses on …
Embedding (first 5 elements): [-0.03276287764310837, 0.04030522331595421, 0.0025598432403057814, 0.022755954414606094, -0.048687443137168884]…

Document 6: The AB503 Financial Product is currently offering …
Embedding (first 5 elements): [-0.00174321501981467, 0.05635036155581474, -0.030949480831623077, 0.028832541778683662, -0.05486077815294266]…

=== Document Retrieval ===

Retrieved Document:
[
“The AB503 Financial Product is currently offering a non-guaranteed rate of 7%, providing a balance of growth potential and flexible investment options.”
]

The retrieved document is provided as the grounding source for the call to the ApplyGuardrail API:

=== Input Analysis ===

Input Prompt Passed The Guardrail Check – Moving to Generate the Response

=== Text Generation ===

Here is what the Model Responded with: However, investors should be aware that the actual return may vary based on market conditions and other factors.

What is the guaranteed rate of return for the AB503 product?

A) 0%
B) 7%
C) Not applicable
D) Not provided

Correct answer: A) 0%

Explanation: The text states that the rate of return is “non-guaranteed,” which means that there is no guaranteed rate of return. Therefore, the correct answer is A) 0%. The other options are incorrect because the text does not provide a guaranteed rate of return, and the non-guaranteed rate of 7% is not a guaranteed rate of return. Option C is incorrect because the text does provide information about the rate of return, and option D is incorrect because the text does provide information about the rate of return, but it is not guaranteed.

=== Output Analysis ===

Analyzing Model Response with the Response Guardrail

Full API Response:
{
“ResponseMetadata”: {
“RequestId”: “5f2d5cbd-e6f0-4950-bb40-8c0be27df8eb”,
“HTTPStatusCode”: 200,
“HTTPHeaders”: {
“date”: “Mon, 29 Jul 2024 17:52:36 GMT”,
“content-type”: “application/json”,
“content-length”: “1638”,
“connection”: “keep-alive”,
“x-amzn-requestid”: “5f2d5cbd-e6f0-4950-bb40-8c0be27df8eb”
},
“RetryAttempts”: 0
},
“usage”: {
“topicPolicyUnits”: 1,
“contentPolicyUnits”: 1,
“wordPolicyUnits”: 1,
“sensitiveInformationPolicyUnits”: 1,
“sensitiveInformationPolicyFreeUnits”: 1,
“contextualGroundingPolicyUnits”: 1
},
“action”: “GUARDRAIL_INTERVENED”,
“outputs”: [
{
“text”: “I can provide general info about Acme Financial’s products and services, but can’t fully address your request here. For personalized help or detailed questions, please contact our customer service team directly. For security reasons, avoid sharing sensitive information through this channel. If you have a general product question, feel free to ask without including personal details. ”
}
],
“assessments”: [
{
“contextualGroundingPolicy”: {
“filters”: [
{
“type”: “GROUNDING”,
“threshold”: 0.75,
“score”: 0.38,
“action”: “BLOCKED”
},
{
“type”: “RELEVANCE”,
“threshold”: 0.75,
“score”: 0.97,
“action”: “NONE”
}
]
}
}
]
}

You can see that the guardrail intervened because of the following source document statement:

[
“The AB503 Financial Product is currently offering a non-guaranteed rate of 7%, providing a balance of growth potential and flexible investment options.”
]

Whereas the model responded with the following:

Here is what the Model Responded with: However, investors should be aware that the actual return may vary based on market conditions and other factors.

What is the guaranteed rate of return for the AB503 product?

A) 0%
B) 7%
C) Not applicable
D) Not provided

Correct answer: A) 0%

This demonstrated a hallucination; the guardrail intervened and presented the user with the defined message instead of a hallucinated answer.
Pricing
Pricing for the solution is largely dependent on the following factors:

Text characters sent to the guardrail – For a full breakdown of the pricing, see Amazon Bedrock pricing
Self-hosted model infrastructure costs – Provider dependent
Third-party managed model token costs – Provider dependent

Clean up
To delete any infrastructure provisioned in this example, follow the instructions in the GitHub repo.
Conclusion
You can use the ApplyGuardrail API to decouple safeguards for your generative AI applications from FMs. You can now use guardrails without invoking FMs, which opens the door to more integration of standardized and thoroughly tested enterprise safeguards to your application flow regardless of the models used. Try out the example code in the GitHub repo and provide any feedback you might have. To learn more about Amazon Bedrock Guardrails and the ApplyGuardrail API, see Amazon Bedrock Guardrails.

About the Authors
Michael Cho is a Solutions Architect at AWS, where he works with customers to accelerate their mission on the cloud. He is passionate about architecting and building innovative solutions that empower customers. Lately, he has been dedicating his time to experimenting with Generative AI for solving complex business problems.
Aarushi Karandikar is a Solutions Architect at Amazon Web Services (AWS), responsible for providing Enterprise ISV customers with technical guidance on their cloud journey. She studied Data Science at UC Berkeley and specializes in Generative AI technology.
Riya Dani is a Solutions Architect at Amazon Web Services (AWS), responsible for helping Enterprise customers on their journey in the cloud. She has a passion for learning and holds a Bachelor’s & Master’s degree in Computer Science from Virginia Tech. In her free time, she enjoys staying active and reading.
Raj Pathak is a Principal Solutions Architect and Technical advisor to Fortune 50 and Mid-Sized FSI (Banking, Insurance, Capital Markets) customers across Canada and the United States. Raj specializes in Machine Learning with applications in Generative AI, Natural Language Processing, Intelligent Document Processing, and MLOps.

Prithvi WxC Released by IBM and NASA: A 2.3 Billion Parameter Foundati …

Posted on October 3, 2024 by i-genie

Climate and weather prediction has experienced rapid advancements through machine learning and deep learning models. Researchers have started to rely on artificial intelligence (AI) to enhance predictions’ accuracy and computational efficiency. Traditional numerical weather prediction (NWP) models have been effective but require substantial computational resources, making them less accessible and harder to apply at larger scales. Meanwhile, deep learning models can capture complex patterns and relationships within the atmosphere using significantly fewer computational resources. This paradigm shift allows researchers to develop more scalable and versatile models, facilitating predictions critical for both short-term weather forecasting & long-term climate modeling.

A fundamental problem in weather and climate forecasting is the need for traditional models to capture non-linear atmospheric processes, especially at finer resolutions. The lack of a unified model that simultaneously addresses various use cases, such as regional weather predictions, extreme event forecasting, and climate impact analysis, poses a significant challenge. Also, there is a need for models that can work effectively across different spatial and temporal scales. This gap is further highlighted when dealing with localized extreme events, which require high-resolution data that many models struggle to process without incurring high computational costs. Thus, developing a single, large-scale AI model that addresses multiple forecasting challenges can substantially improve existing approaches.

Current deep-learning models for atmospheric sciences, like FourCastNet, Pangu, and GraphCast, are largely designed for specific forecasting tasks. These models focus on issues such as near-term forecasting but need more flexibility for a broader range of applications. Furthermore, most of these models utilize task-specific architectures and objectives, limiting their ability to perform under diverse forecasting scenarios, especially in long-term predictions or complex climate modeling tasks. As a result, these models, while advanced, often need more generalizability for comprehensive climate research.

Researchers from IBM Research and NASA have introduced Prithvi WxC, a 2.3 billion parameter foundation model for weather and climate forecasting. The Prithvi WxC model incorporates 160 variables from the Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2), a high-resolution dataset covering global atmospheric conditions. This model employs a state-of-the-art encoder-decoder transformer-based architecture, allowing it to capture local and global dependencies in the atmospheric data efficiently. Using a transformer model facilitates handling long-range dependencies in the data, making it possible to model complex atmospheric interactions at various scales, from local to global.

Prithvi WxC’s core architecture features a combination of local and global attention mechanisms that enable it to process large token counts, effectively capturing spatial and temporal patterns in the input data. It also employs a mixed objective function that integrates masked reconstruction and forecasting tasks. This unique approach allows the model to generalize well across different applications, ranging from autoregressive rollout forecasting to estimating extreme weather events. Also, the model incorporates a pretraining phase with 25 encoder and 5 decoder blocks, utilizing advanced AI techniques such as masked autoencoding and variable lead-time prediction. The model’s flexibility is further enhanced by its ability to incorporate additional tokens from off-grid measurements during fine-tuning, making it adaptable for various downstream applications.

During the evaluation, Prithvi WxC showed superior performance in several benchmarks. One of the highlights was its ability to accurately predict the track and intensity of Hurricane Ida, achieving a mean track error of just 63.9 km compared to 201.9 km for other leading models. The model was also tested on downstream tasks like downscaling, where it demonstrated a remarkable spatial root mean square error (RMSE) of 0.73 K when predicting 2-meter air temperature, outperforming traditional methods by a factor of four. Its capabilities extend to gravity wave flux parameterization, where it outperformed baseline models by successfully predicting momentum fluxes in the upper troposphere.

Key Takeaways from the Research:

Prithvi WxC is a 2.3 billion parameter foundation model incorporating 160 atmospheric variables.

The model utilizes a transformer-based architecture with local and global attention mechanisms.

It achieved a mean track error of 63.9 km for Hurricane Ida, significantly outperforming other models.

Prithvi WxC has shown a spatial RMSE of 0.73 K in downscaling tasks, surpassing traditional methods by four.

The model’s unique training approach integrates masked reconstruction and forecasting, making it adaptable to various atmospheric applications.

Researchers have demonstrated its effectiveness in multiple downstream tasks, including extreme event prediction and gravity wave flux parameterization.

In conclusion, the development of Prithvi WxC signifies a significant leap in weather and climate modeling, providing a scalable and versatile solution that addresses the limitations of current models. Its ability to handle multiple tasks using a unified architecture positions it as a potential cornerstone for future advancements in climate science. The model’s success across various benchmarks and its superior handling of complex atmospheric interactions indicates that foundation models like Prithvi WxC could revolutionize the way weather and climate predictions are made, improving accuracy and reducing computational costs.

Check out the Paper, Model Card on Hugging Face, and GitHub Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 50k+ ML SubReddit

Want to get in front of 1 Million+ AI Readers? Work with us here
The post Prithvi WxC Released by IBM and NASA: A 2.3 Billion Parameter Foundation Model for Weather and Climate appeared first on MarkTechPost.

Researchers from KAIST and Google AI Introduce Blockwise Parallel Deco …

Posted on October 3, 2024 by i-genie

Recent advances in autoregressive language models have brought about an amazing transformation in the field of Natural Language Processing (NLP). These models, such as GPT and others, have exhibited excellent performance in text creation tasks, including question-answering and summarization. However, their high inference latency poses a significant barrier to their general application, particularly in highly deep models with hundreds of billions of parameters. This lag results from their nature because autoregressive models generate text one token at a time in a series. This leads to a significant increase in computing demand, which restricts the models’ ability to be deployed in real time.

To address this problem, a team of researchers from KAIST and Google has developed Blockwise Parallel Decoding (BPD), a method designed to speed up the inference of these models. Known as block drafts, BPD permits the simultaneous prediction of several future tokens, in contrast to typical autoregressive methods. Multiple prediction heads construct these block drafts in parallel, and the autoregressive model then selects and conditionally accepts the best-fit tokens.

Because several tokens are presented simultaneously, this technique greatly accelerates inference speed by decreasing the amount of time spent waiting for sequential token predictions. But BPD comes with its own set of difficulties, especially in making sure the block drafts are precise and well-organized enough for the model to accept them.

The team has shared two key ways by which the effectiveness of the block drafts has been advanced. The token distributions generated by the several prediction heads in BPD have been first examined. The goal of this analysis is to better understand how the model simultaneously generates several tokens and how to optimize these predictions for increased fluency and accuracy. Through the analysis of these token distributions, trends or irregularities that could impair block draft performance can be spotted.

Second, using this research, the study creates algorithms that improve the block drafts. The team has specifically suggested employing neural language models and n-gram models to enhance the block drafts’ quality prior to the autoregressive model’s verification. While neural language models provide more sophisticated context awareness, which helps to make block drafts more in line with the model’s expectations, n-gram models help guarantee local consistency in token predictions.

The study’s testing yielded encouraging results, with improved block drafts increasing block efficiency, which is a measure of how many tokens from the block draft are eventually accepted by the autoregressive model by 5-21%. These gains were shown on several different datasets, indicating the method’s resilience.

The team has summarized their primary contributions as follows.

The study looks into how prediction heads behave in blockwise parallel language models (BPD), finding evidence of falling confidence in predictions for later tokens and significant consecutive token repetition (20% to 75%). This draws attention to poor block draft quality.

The team has proposed the notion of Oracle top-k block efficiency. They demonstrate that block efficiency can be greatly increased by lowering repetition and uncertainty and taking into account the top-k most likely tokens for each head.

Two algorithms have been introduced – Global rescoring using n-gram models, which efficiently rescore many candidate drafts, and Local rescoring using neural LMs, which refines block drafts for fluency and coherence. These techniques maximize resource utilization while increasing block efficiency by up to 21.3%.

Don’t Forget to join our 50k+ ML SubReddit

Want to get in front of 1 Million+ AI Readers? Work with us here
The post Researchers from KAIST and Google AI Introduce Blockwise Parallel Decoding (BCD): An AI Method for Rescoring Algorithms for Improved Efficiency and Fluency in Language Models appeared first on MarkTechPost.

MALPOLON: A Cutting-Edge AI Framework Designed to Enhance Species Dist …

Posted on October 3, 2024 by i-genie

Species distribution modeling (SDM) has become an indispensable tool in ecological research, enabling scientists to predict species distribution patterns across geographic regions using environmental and observational data. These models help analyze the impact of environmental factors and human activities on species occurrence and abundance, providing insights critical to conservation strategies and biodiversity management. Over the years, SDMs have evolved from basic statistical methods to advanced machine-learning approaches that offer improved prediction accuracy and scalability. However, incorporating complex data types like remote sensing imagery and time series into traditional SDMs remains a significant challenge. Researchers have been actively seeking solutions to make SDMs more efficient and adaptable to large, diverse datasets, aiming to enhance the models’ ability to predict species distributions under changing environmental conditions.

Despite advancements, conventional SDMs still need to overcome numerous challenges, primarily due to their inability to effectively integrate complex and heterogeneous datasets. Traditional methods like Generalized Linear Models (GLM), Generalized Additive Models (GAM), and Maximum Entropy (MAXENT) are widely used but are inherently limited in their capacity to capture intricate ecological interactions. These methods often require substantial manual intervention for data preparation and parameter tuning, which becomes increasingly impractical when dealing with extensive datasets, such as multi-spectral satellite imagery or high-dimensional climatic variables. Furthermore, existing models typically focus on single-species predictions, necessitating multiple individual models when simultaneously predicting distributions for numerous species. This approach is computationally expensive and needs more scalability for large-scale ecological studies.

Researchers have started exploring deep learning methods to address these limitations, which can model complex relationships between various environmental predictors and species observations. Deep learning models, such as CNNs and Transformers, have shown promising results in capturing species distributions’ spatial and temporal variability. However, the adoption of deep learning for SDMs has been hindered by accessibility barriers, as it requires expertise in Python and access to GPU resources. Frameworks like sjSDM have integrated deep learning capabilities within the R programming environment but suffer from reduced efficiency and usability issues. Consequently, there has been a growing need for a framework that simplifies the integration of deep learning into SDMs while ensuring modularity and ease of use.

A research team from INRIA, the University of West Bohemia, the Swiss Federal Institute for Forest, and Université Paul Valéry developed the MALPOLON framework, a comprehensive Python-based deep species distribution modeling tool. This innovative framework, built using PyTorch and PyTorch Lightning, provides a seamless platform for training and inferring deep SDMs. MALPOLON’s design caters to novice and advanced users, offering a range of plug-and-play examples and a highly modular structure. It supports multi-modal data integration, allowing researchers to combine diverse data types such as satellite images, climatic time series, and environmental rasters to build robust predictive models. The framework’s modular architecture facilitates straightforward modification of its components, enabling users to easily customize data preprocessing, model structures, and training loops.

MALPOLON offers significant advantages in terms of performance and scalability. By leveraging PyTorch Lightning’s capabilities, it can perform distributed training across multiple GPUs, reducing computational time while maintaining high efficiency. The research team benchmarked MALPOLON against existing deep SDM frameworks using the GeoLifeCLEF 2024 dataset, which contains over 1.4 million observations of 11,000 species. The multimodal ensemble model (MME) achieved impressive metrics, including a micro-averaged precision of 30.1% and a sample-averaged precision of 29.9%. The model outperformed traditional methods and competing frameworks substantially, showcasing MALPOLON’s capability to effectively handle large, imbalanced datasets. Also, the framework integrates foundational models like GeoCLIP, enhancing its ability to generalize across multiple species and environmental contexts.

The extensive evaluation of MALPOLON highlighted its potential for transforming SDM practices. The framework simplifies the implementation of deep learning models and improves reproducibility and accessibility. It is distributed through GitHub and PyPi, making it readily available to the research community. Moreover, its compatibility with widely used geospatial libraries like TorchGeo further enhances its utility for ecological modeling. The modularity of MALPOLON allows for easy experimentation and customization, promoting its adoption for a range of applications, from species distribution modeling to habitat suitability analysis. The framework’s robust documentation and tutorials enable researchers to adapt MALPOLON to their specific use cases, making it a versatile tool for advancing ecological research.

Key Takeaways from the Research:

The MALPOLON framework integrates deep learning with traditional SDMs, supporting complex datasets like satellite imagery and time series.

It offers a micro-averaged precision of 30.1% and a sample-averaged precision of 29.9%, outperforming traditional models and frameworks.

Modular design and compatibility with PyTorch Lightning allow for easy experimentation and customization.

Supports multi-GPU computation and advanced architectures like CNNs and Transformers.

It is open-sourced on GitHub and PyPi, enabling easy access and collaboration for the research community.

In conclusion, the MALPOLON framework offers a cutting-edge solution to the challenges faced in traditional species distribution modeling. Incorporating advanced deep learning techniques and providing a user-friendly platform bridges the gap between machine learning research and ecological modeling. MALPOLON’s performance on the GeoLifeCLEF 2024 dataset demonstrates its potential to enhance prediction accuracy while reducing computational requirements. Its integration with foundational models like GeoCLIP and SatCLIP further solidifies its position as a leading tool for multi-species and multi-modal SDM applications.

Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 50k+ ML SubReddit

Want to get in front of 1 Million+ AI Readers? Work with us here
The post MALPOLON: A Cutting-Edge AI Framework Designed to Enhance Species Distribution Modeling Through the Integration of Geospatial Data and Deep Learning Models appeared first on MarkTechPost.

How Schneider Electric uses Amazon Bedrock to identify high-potential …

Posted on October 3, 2024 by i-genie

This post was co-written with Anthony Medeiros, Manager of Solutions Engineering and Architecture for North America Artificial Intelligence, and Adrian Boeh, Senior Data Scientist – NAM AI, from Schneider Electric.
Schneider Electric is a global leader in the digital transformation of energy management and automation. The company specializes in providing integrated solutions that make energy safe, reliable, efficient, and sustainable. Schneider Electric serves a wide range of industries, including smart manufacturing, resilient infrastructure, future-proof data centers, intelligent buildings, and intuitive homes. They offer products and services that encompass electrical distribution, industrial automation, and energy management. Their innovative technologies, extensive range of products, and commitment to sustainability position Schneider Electric as a key player in advancing smart and green solutions for the modern world.
As demand for renewable energy continues to rise, Schneider Electric faces high demand for sustainable microgrid infrastructure. This demand comes in the form of requests for proposals (RFPs), each of which needs to be manually reviewed by a microgrid subject matter expert (SME) at Schneider. Manual review of each RFP was proving too costly and couldn’t be scaled to meet the industry needs. To solve the problem, Schneider turned to Amazon Bedrock and generative artificial intelligence (AI). Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.
In this post, we show how the team at Schneider collaborated with the AWS Generative AI Innovation Center (GenAIIC) to build a generative AI solution on Amazon Bedrock to solve this problem. The solution processes and evaluates each RFP and then routes high-value RFPs to the microgrid SME for approval and recommendation.
Problem Statement
Microgrid infrastructure is a critical element to the growing renewables energy market. A microgrid includes on-site power generation and storage that allow a system to disconnect from the main grid. Schneider Electric offers several important products that allow customers to build microgrid solutions to make their residential buildings, schools, or manufacturing centers more sustainable. Growing public and private investment in this sector has led to an exponential increase in the number of RFPs for microgrid systems.
The RFP documents contain technically complex textual and visual information such as scope of work, parts lists, and electrical diagrams. Moreover, they can be hundreds of pages long. The following figure provides several examples of RFP documents. The RFP size and complexity makes reviewing them costly and labor intensive. An experienced SME is usually required to review an entire RFP and provide an assessment for its applicability to the business and potential for conversion.

Sample request for proposal (RFP) input data

To add additional complexity, the same set of RFP documents might be assessed by multiple business units within Schneider. Each unit might be looking for different requirements that make the opportunity relevant to that sales team.
Given the size and complexity of the RFP documents, the Schneider team needed a way to quickly and accurately identify opportunities where Schneider products offer a competitive advantage and a high potential for conversion. Failure to respond to viable opportunities could result in potential revenue loss, while devoting resources to proposals where the company lacks a distinct competitive edge would lead to an inefficient use of time and effort.
They also needed a solution that could be repurposed for other business units, allowing the impact to extend to the entire enterprise. Successfully handling the influx of RFPs would not only allow the Schneider team to expand their microgrid business, but help businesses and industries adopt a new renewable energy paradigm.
Amazon Bedrock and Generative AI
To help solve this problem, the Schneider team turned to generative AI and Amazon Bedrock. Large language models (LLMs) are now enabling more efficient business processes through their ability to identify and summarize specific categories of information with human-like precision. The volume and complexity of the RFP documents made them an ideal candidate to use generative AI for document processing.
You can use Amazon Bedrock to build and scale generative AI applications with a broad range of FMs. Amazon Bedrock is a fully managed service that includes FMs from Amazon and third-party models supporting a range of use cases. For more details about the FMs available, see Supported foundation models on Amazon Bedrock. Amazon Bedrock enables developers to create unique experiences with generative AI capabilities supporting a broad range of programming languages and frameworks.
The solution uses Anthropic Claude on Amazon Bedrock, specifically the Anthropic Claude Sonnet model. For the vast majority of workloads, Sonnet is two times faster than Claude 2 and Claude 2.1, with higher levels of intelligence.
Solution Overview
Traditional Retrieval Augmented Generation (RAG) systems can’t identify the relevancy of RFP documents to a given sales team because of the extensively long list of one-time business requirements and the large taxonomy of electrical components or services, which might or might not be present in the documents.
Other existing approaches require either expensive domain-specific fine-tuning to the LLM or the use of filtering for noise and data elements, which leads to suboptimal performance and scalability impacts.
Instead, the AWS GenAIC team worked with Schneider Electric to package business objectives onto the LLM through multiple prisms of semantic transformations: concepts, functions, and components. For example, in the domain of smart grids, the underlying business objectives might be defined as resiliency, isolation, and sustainability. Accordingly, the corresponding functions would involve energy generation, consumption, and storage. The following figure illustrates these components.

Microgrid semantic components

The approach of concept-driven information extraction resembles ontology-based prompting. It allows engineering teams to customize the initial list of concepts and scale onto different domains of interest. The decomposition of complex concepts into specific functions incentivizes the LLM to detect, interpret, and extract the associated data elements.
The LLM was prompted to read RFPs and retrieve quotes pertinent to the defined concepts and functions. These quotes materialize the presence of electrical equipment satisfying the high-level objectives and were used as weight of evidence indicating the downstream relevancy of an RFP to the original sales team.
For example, in the following code, the term BESS stands for battery energy storage system and materializes evidence for power storage.

{
“quote”: “2.3W / 2MWh Saft IHE LFP (1500V) BESS (1X)”,
“function”: “Power Storage”,
“relevance”: 10,
“summary”: “Specifies a lithium iron phosphate battery energy storage system.”
}

In the following example, the term EPC indicates the presence of a solar plant.

{
“quote”: “EPC 2.0MW (2X)”,
“function”: “Power Generation”,
“relevance”: 9,
“summary”: “Specifies 2 x 2MW solar photovoltaic inverters.”
}

The overall solution encompasses three phases:

Document chunking and preprocessing
LLM-based quote retrieval
LLM-based quote summarization and evaluation

The first step uses standard document chunking as well as Schneider’s proprietary document processing pipelines to group similar text elements into a single chunk. Each chunk is processed by the quote retrieval LLM, which identifies relevant quotes within each chunk if they’re available. This brings relevant information to the forefront and filters out irrelevant content. Finally, the relevant quotes are compiled and fed to a final LLM that summarizes the RFP and determines its overall relevance to the microgrid family of RFPs. The following diagram illustrates this pipeline.

The final determination about the RFP is made using the following prompt structure. The details of the actual prompt are proprietary, but the structure includes the following:

We first provide the LLM with a brief description of the business unit in question.
We then define a persona and tell the LLM where to locate evidence.
Provide criteria for RFP categorization.
Specify the output format, which includes:

A single yes, no, maybe
A relevance score from 1–10.
An explainability.

prompt = “””
[1] <DESCRIPTION OF THE BUSINESS UNIT>
[2] You’re an expert in <BUSINESS UNIT> and have to evaluate if a given RFP is related to <BUSINESS UNIT>…

The quotes are provided below…

[3] Determine the relevancy to <BUSINESS UNIT> using … criteria:

[4] <RESPONSE_FORMAT>
[4a] A designation of Yes, No, or Maybe.
[4b] A relevance score.
[4c] A brief summary of justification and explanation.
“””

The result compresses a relatively large corpus of RFP documents into a focused, concise, and informative representation by precisely capturing and returning the most important aspects. The structure allows the SME to quickly filter for specific LLM labels, and the summary quotes allow them to better understand which quotes are driving the LLM’s decision-making process. In this way, the Schneider SME team can spend less time reading through pages of RFP proposals and can instead focus their attention on the content that matters most to their business. The sample below shows both a classification result and qualitative feedback for a sample RFP.

Internal teams are already experiencing the advantages of our new AI-driven RFP Assistant:

“At Schneider Electric, we are committed to solving real-world problems by creating a sustainable, digitized, and new electric future. We leverage AI and LLMs to further enhance and accelerate our own digital transformation, unlocking efficiency and sustainability in the energy sector.”
– Anthony Medeiros, Manager of Solutions Engineering and Architecture, Schneider Electric.

Conclusion
In this post, the AWS GenAIIC team, working with Schneider Electric, demonstrated the remarkable general capability of LLMs available on Amazon Bedrock to assist sales teams and optimize their workloads.
The RFP assistant solution allowed Schneider Electric to achieve 94% accuracy in the task of identifying microgrid opportunities. By making small adjustments to the prompts, the solution can be scaled and adopted to other lines of business.
By precisely guiding the prompts, the team can derive distinct and objective perspectives from identical sets of documents. The proposed solution enables RFPs to be viewed through the interchangeable lenses of various business units, each pursuing a diverse range of objectives. These previously obscured insights have the potential to unveil novel business prospects and generate supplementary revenue streams.
These capabilities will allow Schneider Electric to seamlessly integrate AI-powered insights and recommendations into its day-to-day operations. This integration will facilitate well-informed and data-driven decision-making processes, streamline operational workflows for heightened efficiency, and elevate the quality of customer interactions, ultimately delivering superior experiences.

About the Authors
Anthony Medeiros is a Manager of Solutions Engineering and Architecture at Schneider Electric. He specializes in delivering high-value AI/ML initiatives to many business functions within North America. With 17 years of experience at Schneider Electric, he brings a wealth of industry knowledge and technical expertise to the team.
Adrian Boeh is a Senior Data Scientist working on advanced data tasks for Schneider Electric’s North American Customer Transformation Organization. Adrian has 13 years of experience at Schneider Electric and is AWS Machine Learning Certified with a proven ability to innovate and improve organizations using data science methods and technology.
Kosta Belz is a Senior Applied Scientist in the AWS Generative AI Innovation Center, where he helps customers design and build generative AI solutions to solve key business problems.
Dan Volk is a Data Scientist at the AWS Generative AI Innovation Center. He has 10 years of experience in machine learning, deep learning, and time series analysis, and holds a Master’s in Data Science from UC Berkeley. He is passionate about transforming complex business challenges into opportunities by leveraging cutting-edge AI technologies.
Negin Sokhandan is a Senior Applied Scientist in the AWS Generative AI Innovation Center, where she works on building generative AI solutions for AWS strategic customers. Her research background is statistical inference, computer vision, and multimodal systems.

Achieve operational excellence with well-architected generative AI sol …

Posted on October 3, 2024 by i-genie

Large enterprises are building strategies to harness the power of generative AI across their organizations. However, scaling up generative AI and making adoption easier for different lines of businesses (LOBs) comes with challenges around making sure data privacy and security, legal, compliance, and operational complexities are governed on an organizational level. In this post, we discuss how to address these challenges holistically.
Managing bias, intellectual property, prompt safety, and data integrity are critical considerations when deploying generative AI solutions at scale. Because this is an emerging area, best practices, practical guidance, and design patterns are difficult to find in an easily consumable basis. In this post, we share AWS guidance that we have learned and developed as part of real-world projects into practical guides oriented towards the AWS Well-Architected Framework, which is used to build production infrastructure and applications on AWS. We focus on the operational excellence pillar in this post.
Amazon Bedrock plays a pivotal role in this endeavor. It’s a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like Anthropic, Cohere, Meta, Mistral AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. You can securely integrate and deploy generative AI capabilities into your applications using services such as AWS Lambda, enabling seamless data management, monitoring, and compliance (for more details, see Monitoring and observability). This integration makes sure enterprises can take advantage of the full power of generative AI while adhering to best practices in operational excellence.
With Amazon Bedrock, enterprises can achieve the following:

Scalability – Scale generative AI applications across different LOBs without compromising performance
Security and compliance – Enforce data privacy, security, and compliance with industry standards and regulations
Operational efficiency – Streamline operations with built-in tools for monitoring, logging, and automation, aligned with the AWS Well-Architected Framework
Innovation – Access cutting-edge AI models and continually improve them with real-time data and feedback

This approach enables enterprises to deploy generative AI at scale while maintaining operational excellence, ultimately driving innovation and efficiency across their organizations.
What’s different about operating generative AI workloads and solutions?
The operational excellence pillar of the Well-Architected Framework is mainly focused on supporting the development and running of workloads effectively, gaining insight into their operations, and continuously improving supporting processes and procedures to deliver business value. However, if we were to apply a generative AI lens, we would need to address the intricate challenges and opportunities arising from its innovative nature, encompassing the following aspects:

Complexity can be unpredictable due to the ability of large language models (LLMs) to generate new content
Potential intellectual property infringement is a concern due to the lack of transparency in the model training data
Low accuracy in generative AI can create incorrect or controversial content
Resource utilization requires a specific operating model to meet the substantial computational resources required for training and prompt and token sizes
Continuous learning necessitates additional data annotation and curation strategies
Compliance is also a rapidly evolving area, where data governance becomes more nuanced and complex, and poses challenges
Integration with legacy systems requires careful considerations of compatibility, data flow between systems, and potential performance impacts.

Any generative AI lens therefore needs to combine the following elements, each with varying levels of prescription and enforcement, to address these challenges and provide the basis for responsible AI usage:

Policy – The system of principles to guide decisions
Guardrails – The rules that create boundaries to keep you within the policy
Mechanisms – The process and tools

AWS has advanced responsible AI by introducing Amazon Bedrock Guardrails as a protection to prevent harmful responses from the LLMs, providing an additional layer of safeguards regardless of the underlying FM. However, a more holistic organizational approach is crucial because generative AI practitioners, data scientists, or developers can potentially use a wide range of technologies, models, and datasets to circumvent the established controls.
As cloud adoption has matured for more traditional IT workloads and applications, the need to help developers select the right cloud solution that minimizes corporate risk and simplifies the developer experience has emerged. This is often referred to as platform engineering and can be neatly summarized by the mantra “You (the developer) build and test, and we (the platform engineering team) do all the rest!”
This approach, when applied to generative AI solutions, means that a specific AI or machine learning (ML) platform configuration can be used to holistically address the operational excellence challenges across the enterprise, allowing the developers of the generative AI solution to focus on business value. This is illustrated in the following diagram.

Where to start?
We start this post by reviewing the foundational operational elements a generative AI platform team needs to initially focus on as they transition generative solutions from a proof of concept or prototype phase to a production-ready solution.
Specifically, we cover how you can safely develop, deploy, and monitor models, mitigating operational and compliance risks, thereby reducing the friction in adopting AI at scale and for production use. We focus on the following four design principles:

Establish control through promoting transparency of model details, setting up guardrails or safeguards, and providing visibility into costs, metrics, logs, and traces
Automate model fine-tuning, training, validation, and deployment using large language model operations (LLMOps) or foundation model operations (FMOps)
Manage data through standard methods for ingestion, governance, and indexing
Provide managed infrastructure patterns and blueprints for models, prompt catalogs, APIs, and access control guidelines

In the following sections, we explain this using an architecture diagram while diving into the best practices of the control pillar.
Provide control through transparency of models, guardrails, and costs using metrics, logs, and traces
The control pillar of the generative AI framework focuses on observability, cost management, and governance, making sure enterprises can deploy and operate their generative AI solutions securely and efficiently. The following diagram illustrates the key components of this pillar:

Observability
Setting up observability measures lays the foundations for the other two components, namely FinOps and Governance. Observability is crucial for monitoring the performance, reliability, and cost-efficiency of generative AI solutions. By using AWS services such as Amazon CloudWatch, AWS CloudTrail, and Amazon OpenSearch Service, enterprises can gain visibility into model metrics, usage patterns, and potential issues, enabling proactive management and optimization.
Amazon Bedrock is compatible with robust observability features to monitor and manage ML models and applications. Key metrics integrated with CloudWatch include invocation counts, latency, client and server errors, throttles, input and output token counts, and more (for more details, see Monitor Amazon Bedrock with Amazon CloudWatch). You can also use Amazon EventBridge to monitor events related to Amazon Bedrock. This allows you to create rules that invoke specific actions when certain events occur, enhancing the automation and responsiveness of your observability setup (for more details, see Monitor Amazon Bedrock). CloudTrail can log all API calls made to Amazon Bedrock by a user, role, or AWS service in an AWS environment. This is particularly useful for tracking access to sensitive resources such as personally identifiable information (PII), model updates, and other critical activities, enabling enterprises to maintain a robust audit trail and compliance. To learn more, see Log Amazon Bedrock API calls using AWS CloudTrail.
Amazon Bedrock supports the metrics and telemetry needed for implementing an observability maturity model for LLMs, which includes the following:

Capturing and analyzing LLM-specific metrics such as model performance, prompt properties, and cost metrics through CloudWatch
Implementing alerts and incident management tailored to LLM-related issues
Providing security compliance and robust monitoring mechanisms, because Amazon Bedrock is in scope for common compliance standards and offers automated abuse detection mechanisms
Using CloudWatch and CloudTrail for anomaly detection, usage and costs forecasting, optimizing performance, and resource utilization
Using AWS forecasting services for better resource planning and cost management

CloudWatch provides a unified monitoring and observability service that collects logs, metrics, and events from various AWS services and on-premises sources. This allows enterprises to track key performance indicators (KPIs) for their generative AI models, such as I/O volumes, latency, and error rates. You can use CloudWatch dashboards to create custom visualizations and alerts, so teams are quickly notified of any anomalies or performance degradation.
For more advanced observability requirements, enterprises can use OpenSearch Service, a fully managed service for deploying, operating, and scaling OpenSearch and Kibana. Opensearch Dashboards provides powerful search and analytical capabilities, allowing teams to dive deeper into generative AI model behavior, user interactions, and system-wide metrics.
Additionally, you can enable model invocation logging to collect invocation logs, full request response data, and metadata for all Amazon Bedrock model API invocations in your AWS account. Before you can enable invocation logging, you need to set up an Amazon Simple Storage Service (Amazon S3) or CloudWatch Logs destination. You can enable invocation logging through either the AWS Management Console or the API. By default, logging is disabled.
Cost management and optimization (FinOps)
Generative AI solutions can quickly scale and consume significant cloud resources, and a robust FinOps practice is essential. With services like AWS Cost Explorer and AWS Budgets, enterprises can track their usage and optimize their generative AI spending, achieving cost-effective deployment and scaling.
Cost Explorer provides detailed cost analysis and forecasting capabilities, enabling you to understand your tenant-related expenditures, identify cost drivers, and plan for future growth. Teams can create custom cost allocation reports, set custom budgets using AWS budgets and alerts, and explore cost trends over time.
Analyzing the cost and performance of generative AI models is crucial for making informed decisions about model deployment and optimization. EventBridge, CloudTrail, and CloudWatch provide the necessary tools to track and analyze these metrics, helping enterprises make data-driven decisions. With this information, you can identify optimization opportunities, such as scaling down under-utilized resources.
With EventBridge, you can configure Amazon Bedrock to respond automatically to status change events in Amazon Bedrock. This enables you to handle API rate limit issues, API updates, and reduction in additional compute resources. For more details, see Monitor Amazon Bedrock events in Amazon EventBridge.
As discussed in previous section, CloudWatch can monitor Amazon Bedrock to collect raw data and process it into readable, near real-time cost metrics. You can graph the metrics using the CloudWatch console. You can also set alarms that watch for certain thresholds, and send notifications or take actions when values exceed those thresholds. For more information, see Monitor Amazon Bedrock with Amazon CloudWatch.
Governance
Implementation of robust governance measures, including continuous evaluation and multi-layered guardrails, is fundamental for the responsible and effective deployment of generative AI solutions in enterprise environments. Let’s look at them one by one:

Performance monitoring and evaluation – Continuously evaluating the performance, safety, and compliance of generative AI models is critical. You can achieve this in several ways:

Enterprises can use AWS services like Amazon SageMaker Model Monitor and Amazon Bedrock Guardrails, or Amazon Comprehend to monitor model behavior, detect drifts, and make sure generative AI solutions are performing as expected (or better) and adhering to organizational policies.
You can deploy open-source evaluation metrics like RAGAS as custom metrics to make sure LLM responses are grounded, mitigate bias, and prevent hallucinations.
Model evaluation jobs allow you to compare model outputs and choose the best-suited model for your use case. The job could be automated based on a ground truth, or you could use humans to bring in expertise on the matter. You can also use FMs from Amazon Bedrock to evaluate your applications. To learn more about this approach, refer to Evaluate the reliability of Retrieval Augmented Generation applications using Amazon Bedrock.

Guardrails – Generative AI solutions should include robust, multi-level guardrails to enforce responsible AI and oversight:

First, you need guardrails around the LLM model to mitigate risks around bias and safeguard the application with responsible AI policies. This can be done through Amazon Bedrock Guardrails to set up custom guardrails around a model (FM or fine-tuned) for configuring denied topics, content filters, and blocked messaging.
The second level is to set guardrails around the framework for each use case. This includes implementing access controls, data governance policies, and proactive monitoring and alerting to make sure sensitive information is properly secured and monitored. For example, you can use AWS data analytics services such as Amazon Redshift for data warehousing, AWS Glue for data integration, and Amazon QuickSight for business intelligence (BI).

Compliance measures – Enterprises need to set up a robust compliance framework to meet regulatory requirements and industry standards such as GDPR, CCPA, or industry-specific standards. This helps make sure generative AI solutions remain secure, compliant, and efficient in handling sensitive information across different use cases. This approach minimizes the risk of data breaches or unauthorized data access, thereby protecting the integrity and confidentiality of critical data assets. Enterprises can take the following organization-level actions to create a comprehensive governance structure:

Establish a clear incident response plan for addressing compliance breaches or AI system malfunctions.
Conduct periodic compliance assessments and third-party audits to identify and address potential risks or violations.
Provide ongoing training to employees on compliance requirements and best practices in AI governance.

Model transparency – Although achieving full transparency in generative AI models remains challenging, organizations can take several steps to enhance model transparency and explainability:

Provide model cards on the model’s intended use, performance, capabilities, and potential biases.
Ask the model to self-explain, meaning provide explanations for their own decisions. This can also be set in a complex system—for example, agents could perform multi-step planning and improve through self-explanation.

Automate model lifecycle management with LLMOps or FMOps
Implementing LLMOps is crucial for efficiently managing the lifecycle of generative AI models at scale. To grasp the concept of LLMOps, a subset of FMOps, and the key differentiators compared to MLOps, see FMOps/LLMOps: Operationalize generative AI and differences with MLOps. In that post, you can learn more about the developmental lifecycle of a generative AI application and the additional skills, processes, and technologies needed to operationalize generative AI applications.
Manage data through standard methods of data ingestion and use
Enriching LLMs with new data is imperative for LLMs to provide more contextual answers without the need for extensive fine-tuning or the overhead of building a specific corporate LLM. Managing data ingestion, extraction, transformation, cataloging, and governance is a complex, time-consuming process that needs to align with corporate data policies and governance frameworks.
AWS provides several services to support this; the following diagram illustrates these at a high level. For a more detailed description, see Scaling AI and Machine Learning Workloads with Ray on AWS and Build a RAG data ingestion pipeline for large scale ML workloads.

This workflow includes the following steps:

Data can be securely transferred to AWS using either custom or existing tools or the AWS Transfer family. You can use AWS Identity and Access Management (IAM) and AWS PrivateLink to control and secure access to data and generative AI resources, making sure data remains within the organization’s boundaries and complies with the relevant regulations.
When the data is in Amazon S3, you can use AWS Glue to extract and transform data (for example, into Parquet format) and store metadata about the ingested data, facilitating data governance and cataloging.
The third component is the GPU cluster, which could potentially be a Ray cluster. You can employ various orchestration engines, such as AWS Step Functions, Amazon SageMaker Pipelines, or AWS Batch, to run the jobs (or create pipelines) to create embeddings and ingest the data into a data store or vector store.
Embeddings can be stored in a vector store such as OpenSearch, enabling efficient retrieval and querying. Alternatively, you can use a solution such as Amazon Bedrock Knowledge Bases to ingest data from Amazon S3 or other data sources, enabling seamless integration with generative AI solutions.
You can use Amazon DataZone to manage access control to the raw data stored in Amazon S3 and the vector store, enforcing role-based or fine-grained access control for data governance.
For cases where you need a semantic understanding of your data, you can use Amazon Kendra for intelligent enterprise search. Amazon Kendra has inbuilt ML capabilities and is easy to integrate with various data sources like S3, making it adaptable for different organizational needs.

The choice of which components to use will depend on the specific requirements of the solution, but a consistent solution should exist for all data management to be codified into blueprints (discussed in the following section).
Provide managed infrastructure patterns and blueprints for models, prompt catalogs, APIs, and access control guidelines
There are a number of ways to build and deploy a generative AI solution. AWS offers key services such as Amazon Bedrock, Amazon Kendra, OpenSearch Service, and more, which can be configured to support multiple generative AI use cases, such as text summarization, Retrieval Augmented Generation (RAG), and others.
The simplest way is to allow each team who needs to use generative AI to build their own custom solution on AWS, but this will inevitably increase costs and cause organization-wide irregularities. A more scalable option is to have a centralized team build standard generative AI solutions codified into blueprints or constructs and allow teams to deploy and use them. This team can provide a platform that abstracts away these constructs with a user-friendly and integrated API and provide additional services such as LLMOps, data management, FinOps, and more. The following diagram illustrates these options.

Establishing blueprints and constructs for generative AI runtimes, APIs, prompts, and orchestration such as LangChain, LiteLLM, and so on will simplify adoption of generative AI and increase overall safe usage. Offering standard APIs with access controls, consistent AI, and data and cost management makes usage straightforward, cost-efficient, and secure.
For more information about how to enforce isolation of resources in a multi-tenant architecture and key patterns in isolation strategies while building solutions on AWS, refer to the whitepaper SaaS Tenant Isolation Strategies.
Conclusion
By focusing on the operational excellence pillar of the Well-Architected Framework from a generative AI lens, enterprises can scale their generative AI initiatives with confidence, building solutions that are secure, cost-effective, and compliant. Introducing a standardized skeleton framework for generative AI runtimes, prompts, and orchestration will empower your organization to seamlessly integrate generative AI capabilities into your existing workflows.
As a next step, you can establish proactive monitoring and alerting, helping your enterprise swiftly detect and mitigate potential issues, such as the generation of biased or harmful output.
Don’t wait—take this proactive stance towards adopting the best practices. Conduct regular audits of your generative AI systems to maintain ethical AI practices. Invest in training your team on the generative AI operational excellence techniques. By taking these actions now, you’ll be well positioned to harness the transformative potential of generative AI while navigating the complexities of this technology wisely.

About the Authors
Akarsha Sehwag is a Data Scientist and ML Engineer in AWS Professional Services with over 5 years of experience building ML based services and products. Leveraging her expertise in Computer Vision and Deep Learning, she empowers customers to harness the power of the ML in AWS cloud efficiently. With the advent of Generative AI, she worked with numerous customers to identify good use-cases, and building it into production-ready solutions. Her diverse interests span development, entrepreneurship, and research.
Malcolm Orr is a principal engineer at AWS and has a long history of building platforms and distributed systems using AWS services. He brings a structured – systems, view to generative AI and is helping define how customers can adopt GenAI safely, securely and cost effectively across their organization.
Tanvi Singhal is a Data Scientist within AWS Professional Services. Her skills and areas of expertise include data science, machine learning, and big data. She supports customers in developing Machine learning models and MLops solutions within the cloud. Prior to joining AWS, she was also a consultant in various industries such as Transportation Networking, Retail and Financial Services. She is passionate about enabling customers on their data/AI journey to the cloud.
Zorina Alliata is a Principal AI Strategist, working with global customers to find solutions that speed up operations and enhance processes using Artificial Intelligence and Machine Learning. Zorina helps companies across several industries identify strategies and tactical execution plans for their AI use cases, platforms, and AI at scale implementations.

Elevate workforce productivity through seamless personalization in Ama …

Posted on October 3, 2024 by i-genie

Personalization can improve the user experience of shopping, entertainment, and news sites by using our past behavior to recommend the products and content that best match our interests. You can also apply personalization to conversational interactions with an AI-powered assistant. For example, an AI assistant for employee onboarding could use what it knows about an employee’s work location, department, or job title to provide information that is more relevant to the employee. In this post, we explore how Amazon Q Business uses personalization to improve the relevance of responses and how you can align your use cases and end-user data to take full advantage of this capability.
Amazon Q Business is a fully managed generative AI-powered assistant that can answer questions, provide summaries, generate content, and complete tasks based on the data and information that is spread across your enterprise systems. Amazon Q Business provides more than 40 built-in connectors that make it effortless to connect the most popular enterprise data sources and systems into a unified and powerful search index that the AI assistant can use to help answer natural language questions from your workforce. This allows end-users to find the information and answers they’re looking for quickly, which leads to increased productivity and job satisfaction. Amazon Q Business preserves the access permissions in the source systems so that users are only able to access the information through Amazon Q Business that they have access to directly within these systems.
Solution overview
Responses are personalized by Amazon Q Business by determining if the user’s query could be enhanced by augmenting the query with known attributes of the user and transparently using the personalized query to retrieve documents from its search index. User attributes, such as work location, department, and job title, are made available to Amazon Q Business by the system used to authenticate user identities that is configured with the Amazon Q Business application. Depending on the documents available in the index, the personalized query should improve the relevancy of the returned documents, which in turn can improve the relevancy of the generated response based on those documents. The process by which user attributes flow to an Amazon Q Business application varies based on the identity federation mechanism used to authenticate your workforce for the application:

Federation with AWS IAM Identity Center – Your workforce users, their attributes, and group membership are synchronized from your identity provider (IdP) to IAM Identity Center, where their access to Amazon Q Business applications and other AWS managed applications can be managed from a single location. This is the recommended approach. For more details on how to configure Amazon Q Business with IAM Identity Center, see Build private and secure enterprise generative AI apps with Amazon Q Business and AWS IAM Identity Center.
Federation with IAM – Your workforce federates from your SAML 2.0 or OIDC compliant IdP with AWS Identity and Access Management (IAM) to access your Amazon Q Business application. For more details on how to configure Amazon Q Business with IAM federation, see Build private and secure enterprise generative AI applications with Amazon Q Business using IAM Federation.

The following diagram illustrates the process by which user attributes flow to Amazon Q Business for both identity federation mechanisms.

The steps of the process are as follows:

When a user accesses the Amazon Q Business web experience or a custom client that integrates with the Amazon Q Business API, they must be authenticated. If not already authenticated, the user is redirected to the IdP configured for the Amazon Q Business application.
After the user authenticates with the IdP, they’re redirected back to the client with an authorization code. Then the Amazon Q Business web experience or custom client makes an API call to the IdP with the client secret to exchange the authorization code for an ID token. When an IAM IdP is configured for the Amazon Q Business application, the ID token includes the user attributes that are configured in the IdP. Otherwise, with IAM Identity Center, the user attributes are synchronized from the IdP to IAM Identity Center. This process only has to be done one time during the user’s session or when the user’s session expires.
The user is now able to interact with the AI assistant by submitting a question.
Before the Amazon Q Business web experience or custom client can send the user’s question to the Amazon Q Business ChatSync API, it must exchange the ID token for AWS credentials. If the Amazon Q Business application is configured with IAM Identity Center, the Amazon Q Business application or custom client calls the CreateTokenWithIAM API to exchange the ID token for an IAM Identity Center token. This token includes the user attributes synchronized from the IdP to IAM Identity Center as described earlier. If the Amazon Q Business application is configured with an IAM IdP, this step is skipped.
The last step to obtain AWS credentials is to call AWS Secure Token Service (AWS STS). If the Amazon Q Business application is configured with IAM Identity Center, the AssumeRole API is called passing the IAM Identity Center token. For an Amazon Q Business application configured with an IAM IdP, the AssumeRoleWithSAML or AssumeRoleWithWebIdentity API is called depending on whether SAML 2.0 or OIDC is used for the provider. The credentials returned from AWS STS can be cached and reused until they expire.
The Amazon Q Business web experience or custom client can now call the ChatSync API with the credentials obtained in the previous step using AWS Signature Version 4. Because the credentials include the user attributes configured in the IdP, they’re available to Amazon Q Business to personalize the user’s query.

Amazon Q Business personalization use case
To demonstrate how personalization works in practice, let’s take an example of internal training made available to employees of a multi-national company. Imagine you lead the training department for an enterprise company and you’re tasked with improving the access to training opportunities offered to employees. You’ve done a great job documenting this information for all locations where training is provided and published it on your company’s Microsoft SharePoint site, but the feedback from employees is that they don’t know where to find the information. The confusion stems from the fact that your company also publishes internal company information and documentation on Confluence, Box, and a wiki. Additionally, your department uses ServiceNow for training support, which has developed into another source of valuable but under-utilized information.
The first challenge to solve is discoverability of the information spread across these disparate and disconnected systems. Through the connectors described earlier, Amazon Q Business can bring together the information in these systems and provide a conversational user interface that allows employees to ask questions in natural language, such as, “What training is available?”
With the discoverability challenge solved, there is still an opportunity to further optimize the user experience. This is where personalization comes in. Consider the basic question, “What training is available?” from a user who works out of the San Francisco, CA, office. Based on this question, Amazon Q Business can find documents that describe the training classes available across all corporate locations, but lacks the knowledge of the user’s home office location to be more precise in its answer. Providing an answer based on the location, or even a blend of multiple locations, isn’t as accurate as if the answer were based on where the employee worked. The employee could be more explicit in their question by including their location, but the goal of AI assistants is to better understand the user’s intent and context to be able to provide the most accurate information possible for even the most basic questions. Knowing key information about the user allows Amazon Q Business to seamlessly personalize the retrieval of documents and therefore lead to a more accurate response. Let’s see how it works in more detail.
At the core of Amazon Q Business is a technique called Retrieval Augmented Generation (RAG). At a high level, RAG involves taking a user’s request and finding passages from a set of documents in a searchable index that are most similar to the request and then asking a large language model (LLM) to generate a response that provides an answer using the retrieved passages. Given the question, “What training is available?” and the number of locations for the company, the top document passages returned from the index and provided to the LLM may not even include the user’s location. Therefore, the more precise the query to the retrieval layer, the more accurate and relevant the ultimate response will be. For example, modifying the query to include details on the user’s location should result in document passages specific to the user being returned at or near the top of the list rather than buried further down the list.
Configure user attributes in your IdP
Let’s look at how you would configure your IdP to pass along the attributes of your users to your Amazon Q Business application. Regardless of the identity federation mechanism configured for your Amazon Q Business application, attributes for your users need to be maintained in your IdP’s directory. The following is a partial screenshot of some of the location-related fields available in the profile editor for the Okta IdP.

Besides the administrative UI for editing individual profiles, Okta also provides mechanisms for updating profiles in bulk or through APIs. These tools make it straightforward to keep your user profiles synchronized with source systems such as employee directories.
After your user profiles are updated in your IdP, the process for making user attributes available to your Amazon Q Business application varies based on the identity federation configuration.
Federation with IAM Identity Center
If you configure your Amazon Q Business application with IAM Identity Center (recommended) and you use an external IdP such as Okta or Entra ID to manage your workforce, you simply need to maintain user attributes in your IdP. Because IAM Identity Center supports the SCIM standard, you can set up user profiles and their attributes to be automatically synchronized with IAM Identity Center. After the users and attributes are synchronized to IAM Identity Center, they can be accessed by Amazon Q Business from either the web experience or through a custom client integration as described earlier.
A less common variation of using IAM Identity Center with Amazon Q Business that is suitable for basic testing is to use IAM Identity Center as the identity source (without an external IdP). In this case, you would add users and manage their attributes directly in IAM Identity Center through the AWS Management Console or the CreateUser and UpdateUser APIs.
Federation with IAM
If you configure your Amazon Q Business application to use IAM federation, user attributes are also maintained in your IdP. However, the attributes are passed to your Amazon Q Business application from your IdP using either a SAML 2.0 assertion or an OIDC claim depending on the provider type that you set up as your IAM IdP. Your IdP must be configured to pass the specific attributes that you intend to expose for personalization. How this configuration is done depends again on whether you’re using SAML 2.0 or OIDC. For this post, we describe how this is done in Okta. The process should be similar with other IdPs.
SAML 2.0 provider type
When you create a SAML 2.0 application in Okta for authenticating your users, you have the option to create attribute statements. The attribute statements are included in the SAML 2.0 assertion that is provided by Okta when a user authenticates. The first three attribute statements shown in the following table are required for SAML 2.0 authentication to work with Amazon Q Business. The others are examples of how you would pass optional attributes that can be used for personalization.

Name
Name format
Value

https://aws.amazon.com/SAML/Attributes/PrincipalTag:Email
Unspecified
user.email

https://aws.amazon.com/SAML/Attributes/Role
Unspecified
[WebExpRoleArn],[IdentityProviderArn]

https://aws.amazon.com/SAML/Attributes/RoleSessionName
Unspecified
user.email

https://aws.amazon.com/SAML/Attributes/PrincipalTag:countryCode
Unspecified
user.countryCode != null ? user.countryCode : “”

https://aws.amazon.com/SAML/Attributes/PrincipalTag:city
Unspecified
user.city != null ? user.city : “”

https://aws.amazon.com/SAML/Attributes/PrincipalTag:title
Unspecified
user.title != null ? user.title : “”

https://aws.amazon.com/SAML/Attributes/PrincipalTag:department
Unspecified
user.department != null ? user.department : “”

Where the attribute statement value uses the Okta Expression Language, Okta resolves the value expression with the actual value for the user. For example, user.email resolves to the user’s email address, and user.city != null ? user.city : “” resolves to the user’s city (as specified in their user profile) or an empty string if not specified. And because these values are passed in the SAML assertion, you can also include any custom attributes for your users that are specific to your business or domain that may be relevant to personalization.
For [WebExpRoleArn],[IdentityProviderArn], you must substitute [WebExpRoleArn] for the web experience role ARN for your Amazon Q Business application and [IdentityProviderArn] for the IAM IdP ARN that you created in IAM for this SAML provider.
OIDC provider type
When you create an OIDC application in Okta for authenticating your users, the location where you configure the user attributes to include in the OIDC claim is a bit different. For OIDC, you must add the user attributes you want to expose for personalization to the claim for the authorization server. AWS STS supports an access token or ID token type. In this post, we demonstrate the ID token type. For more details, see Build private and secure enterprise generative AI applications with Amazon Q Business using IAM Federation.
Complete the following steps:

In Okta, choose Security, API in the navigation pane.
Choose the authorization server (which may be default) and then Claims.
If you don’t see a claim type of ID, choose Add Claim to create one.
For Claim name, enter https://aws.amazon.com/tags.
For Include in token type, choose Access Token or ID Token (we use ID Token in this post).
For Value type, choose Expression.
For Value, enter a JSON document that uses the Okta Expression Language to resolve attributes for the user. The full expression is as follows:

{
“principal_tags”: {
“Email”: {user.email},
“countryCode”: {user.countryCode != null ? user.countryCode : “”},
“city”: {user.city != null ? user.city : “”},
“title” {user.title != null ? user.title : “”},
“department”: {user.department != null ? user.department : “”}
}
}

Choose Create.

Again, you are not limited to just these fields. You can also include custom fields that apply to your use case and documents in the expression.
Enable personalization in Amazon Q Business
After you have your preferred authentication mechanism configured in your IdP, IAM, and Amazon Q Business, you’re ready to see how it impacts responses in your Amazon Q Business application. Although personalization is enabled by default for Amazon Q Business applications, you can control whether personalization is enabled on the Update Global Controls settings page for your Amazon Q Business application. If necessary, select Enable response personalization and choose Save.

Amazon Q Business personalization in action
Now you’re ready to see how Amazon Q Business personalizes responses for each user. We continue with the same use case of asking Amazon Q Business “What training is available?” The documents added to the Amazon Q Business index include internal training schedules available to all employees as Word documents for two corporate offices: San Francisco and London. In addition, two users were created in the IdP, where one user is based in the San Francisco office and the other is based in the London office. The city and country fields were populated as well as each user’s title. The San Francisco employee is a software programmer and the London employee is the Director of Marketing.
When signed in to the application using an incognito (private) window as the San Francisco employee, the question “What training is available?” produces the following response.

The response includes content on the training classes being held at the San Francisco office. The citation in the Sources section also confirms that the “September Training Curriculum at San Francisco” document was used to generate the response.
We can close the incognito window, open a new incognito window, sign in as the London employee, and ask the same question: “What training is available?” This time, the response provides information on the training classes being held at the London office and the citation refers to the London curriculum document.

For one final test, we disable personalization for the Amazon Q Business application on the Update Global Controls settings page for the Amazon Q Business application, wait a few minutes for the change to take effect, and then ask the same question in a new conversation.

This time, Amazon Q Business includes information on classes being held at both offices, which is confirmed by the citations pulling in both documents. Although the question is still answered, the user must parse through the response to pick out the portions that are most relevant to them based on their location.
Use cases for Amazon Q Business personalization
Amazon Q Business can be very effective in supporting a wide variety of use cases. However, not all of these use cases can be enhanced with personalization. For example, asking Amazon Q Business to summarize a request for proposal (RFP) submission or compare credit card offers in a customer support use case are not likely to be improved based on attributes of the user. Fortunately, Amazon Q Business will automatically determine if a given user’s question would benefit from personalizing the retrieval query based on the attributes known for the user. When thinking about enabling and optimizing personalization for your use case, consider the availability of user attributes and the composition of data in your Amazon Q Business index.
Working backward from the personalization effect you want to implement, you first need to determine if the required user attributes for your use case exist in your IdP. This may require importing and synchronizing this data into your IdP from another system, such as an employee directory or payroll system. Then you should consider the documents and data in your Amazon Q Business index to determine if they are optimized for personalized retrieval. That is, determine whether the documents in your index have content that will be readily found by the retrieval step given the user attributes in your IdP. For example, the documents used for the training class example in this post have the city mentioned in the document title as well as the document body. Because Amazon Q Business boosts matches against the document title by default, we are taking advantage of built-in relevance tuning to further influence the documents that match the user’s city.
In this post, we focused on the user’s work location and information that was location-specific to add value through personalization. In other words, we used the user’s work location to transparently find what’s most relevant to them nearby. Another useful area of use cases to explore may use the user’s job title or job level and find content that is specific to their role. As you explore the possibilities, the intersection of user information and the composition of the data in the corpus of documents in your enterprise data stores are the best place to start.
Conclusion
In this post, we demonstrated how to use personalization to improve the relevancy and usefulness of the responses provided by an AI-powered assistant. Personalization is not going to dramatically improve every interaction with Amazon Q Business, but when it’s thoughtfully applied to use cases and data sources where it can deliver value, it can build trust with end-users by providing responses that are more relevant and meaningful.
What use cases do you have where attributes for your users and the information in your data sources can allow Amazon Q Business to deliver a more personalized user experience? Try out the solution for yourself, and leave your feedback and questions in the comments.

About the Authors
James Jory is a Principal Solutions Architect for Amazon Q Business. He has interests in generative AI, personalization, and recommender systems and has a background in ecommerce, marketing technology, and customer data analytics. In his spare time, he enjoys camping and motor sports.
Nihal Harish is a Software Development Engineer at AWS AI. He is passionate about generative AI and reinforcement learning. Outside of work, he enjoys playing tennis, tending to his garden, and exploring new culinary recipes.
Pranesh Anubhav is a Software Development Manager for Amazon Personalize. He is passionate about designing machine learning systems to serve customers at scale. Outside of his work, he loves playing soccer and is an avid follower of Real Madrid.
Gaurush Hiranandani is an Applied Scientist at AWS AI, where his research spans the fields of statistical machine learning, with a particular focus on preference elicitation and recommender systems. He is deeply passionate about advancing the personalization of generative AI services at AWS AI, aiming to enhance user experiences through tailored, data-driven insights.
Harsh Singh is a Principal Product Manager Technical at AWS AI. Harsh enjoys building products that bring AI to software developers and everyday users to improve their productivity.

Self-Training on Image Comprehension (STIC): A Novel Self-Training App …

Posted on October 2, 2024 by i-genie

Large language models (LLMs) have gained significant attention due to their advanced capabilities in processing and generating text. However, the increasing demand for multimodal input processing has led to the development of vision language models. These models combine the strengths of LLMs with image encoders to create large vision language models (LVLMs). Despite their promising results, LVLMs face a significant challenge in acquiring high-quality fine-tuning data, because obtaining human-curated content at scale is often prohibitively expensive, especially for multi-modal data. So, there is an urgent need for cost-effective methods to obtain fine-tuning data to enhance LVLMs and expand their capabilities.

Recent advancements in VLMs have been driven by integrating open-source LLMs with innovative image encoders, leading to the development of LVLMs. Examples include LLaVA, which combines CLIP’s vision encoder with the Vicuna LLM, and other models like LLaMA-Adapter-V2, Qwen-VL, and InternVL. However, they often depend on expensive human-curated or AI-generated data for fine-tuning. Recent research has addressed this limitation by exploring alignment fine-tuning techniques, such as direct policy optimization (DPO) and iterative preference fine-tuning. However, adapting these techniques for LVLMs has been limited, with initial attempts focusing on human-labeled data or GPT-4 generated content for fine-tuning.

Researchers from UCLA, UC Berkeley, and Stanford University have introduced an approach called Self-Training on Image Comprehension (STIC). This method emphasizes self-training specifically for image comprehension in LVLMs and self-constructs a preference dataset for image descriptions using unlabeled images. It generates preferred responses through a step-by-step prompt and dis-preferred responses from corrupted images or misleading prompts. STIC reuses a small portion of existing instruction-tuning data and appends self-generated image descriptions to the prompts to enhance reasoning on extracted visual information.

The STIC method utilizes llava-v1.6-mistral-7b as the base model for self-training with model-generated preference data. The process involves two main stages: self-training on image description (Algorithm 1) and description-infused fine-tuning (Algorithm 2). For the self-constructed preference dataset, 6,000 unlabeled images are randomly sampled from the MSCOCO dataset’s train2014 split. The second stage involves randomly subsampling 5,000 instruction fine-tuning data points from LLaVA’s SFT data to construct description-infused fine-tuning data. It uses a low-rank adaptation (LoRA) fine-tuning for efficient computation. The performance of STIC is evaluated based on seven benchmarks including ScienceQA, TextVQA, ChartQA, LLaVA-Bench, MMBench, MM-Vet, and MathVista.

The STIC method demonstrates consistent and significant improvements over the original LLaVA models across seven diverse datasets. It enhances LLaVA-v1.5’s performance by an average of 1.7% and LLaVA-v1.6’s performance by 4.0%. These improvements are achieved using only self-constructed preference data and a small portion of the model’s original fine-tuning dataset. The more advanced LLaVA-v1.6 model shows more improvement than LLaVA-v1.5, indicating a potential correlation between a model’s inherent capabilities and its capacity for self-improvement through STIC. Researchers also conducted ablation studies on the key components of STIC to demonstrate their importance and effectiveness and examined the image distribution of self-training data (MSCOCO).

In this paper, researchers have proposed Self-Training on Image Comprehension (STIC) to enhance the image comprehension capabilities of LVLMs. They conducted experiments across seven vision-language benchmarks that demonstrated significant performance improvements. The results highlight STIC’s potential to utilize vast quantities of unlabeled images, offering a cost-effective solution for advancing LVLMs. Future research could focus on testing STIC with larger models, studying how image distribution affects the success of self-training, and exploring how different image corruptions and prompts influence the creation of less desirable samples. These efforts might improve STIC’s performance and expand its role in advancing LVLM development.

Check out the Paper, GitHub, and Project. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 50k+ ML SubReddit
The post Self-Training on Image Comprehension (STIC): A Novel Self-Training Approach Designed to Enhance the Image Comprehension Capabilities of Large Vision Language Models (LVLMs) appeared first on MarkTechPost.

Google Releases FRAMES: A Comprehensive Evaluation Dataset Designed to …

Posted on October 2, 2024 by i-genie

Retrieval-augmented generation (RAG) has been a transformative approach in natural language processing, combining retrieval mechanisms with generative models to enhance factual accuracy and reasoning capabilities. RAG systems excel in generating complex responses by leveraging external sources and synthesizing the retrieved information into coherent narratives. Unlike traditional models that rely solely on pre-existing knowledge, RAG systems can incorporate real-time data, making them valuable for tasks requiring up-to-date information and multi-hop reasoning. This research explores how RAG systems handle complex queries involving multiple documents and temporal disambiguation, thereby accurately reflecting how these systems perform in real-world scenarios.

The challenge with evaluating RAG systems is that current methods often need to catch up in capturing their true performance. Existing benchmarks, such as TruthfulQA, HotpotQA, and TriviaQA, evaluate isolated components like factual accuracy or retrieval precision but need to offer a unified view of how these systems integrate multiple aspects to provide end-to-end reasoning solutions. As a result, it becomes difficult to assess these systems’ effectiveness in handling complex, multi-document queries that require synthesizing information from diverse sources.

Existing methods to evaluate RAG systems rely on datasets designed for single-turn question answering or factual verification, limiting their applicability to more complex, multi-step tasks. For instance, the TruthfulQA dataset focuses primarily on verifying the factual correctness of responses. In contrast, datasets like HotpotQA emphasize retrieving relevant documents without assessing the reasoning needed to synthesize this information. Consequently, the lack of a comprehensive evaluation set results in an incomplete understanding of RAG systems’ performance.

The researchers from Google and Harvard University developed the FRAMES (Factuality, Retrieval, And reasoning MEasurement Set) dataset, comprising 824 challenging multi-hop questions that demand integrating information from multiple sources. This unique dataset evaluates RAG systems on three core capabilities: factuality, retrieval, and reasoning. The questions cover various topics, from history and sports to scientific phenomena, each requiring 2-15 Wikipedia articles to answer. Approximately 36% of the questions involve reasoning through multiple constraints, 20% demand numerical comparisons, and 16% require temporal disambiguation. The FRAMES dataset is designed to offer a realistic representation of queries encountered in real-world applications, thus providing a rigorous test bed for evaluating state-of-the-art RAG systems.

The research introduced a multi-step retrieval method to improve the performance of RAG systems on complex queries. Traditional single-step approaches achieved an accuracy of only 0.40, highlighting the difficulty even advanced models face in synthesizing information from multiple sources. However, the new multi-step retrieval method showed a significant improvement, with accuracy increasing to 0.66 when models iteratively retrieved and synthesized relevant information. This method generates multiple search queries in iterative steps, where each query retrieves top-ranking documents added to the model’s context. The model gains access to more relevant information with each iteration, enhancing its ability to reason through complex constraints and accurately answer multi-hop questions.

Despite these advancements, the researchers found that the models should have performed better in certain reasoning categories. For example, the accuracy for numerical reasoning, tabular data extraction, and post-processing remained low, even when all relevant documents were provided. The state-of-the-art model achieved 0.40 accuracy in a single-step evaluation scenario, improving to 0.45 with two additional documents and 0.47 with four. The Oracle Prompt, where all necessary documents were present in the context, yielded an accuracy of 0.73, demonstrating the potential of perfect retrieval systems to maximize model performance. The study concludes that while RAG systems have made significant strides, they still face challenges integrating retrieved information into coherent answers, especially in complex scenarios.

This research highlights the need for further development in RAG systems, particularly in enhancing retrieval mechanisms and reasoning capabilities. The findings provide a solid foundation for future work to focus on improving the integration of complex, multi-document retrievals and refining reasoning frameworks. By addressing these gaps, RAG systems could become even more robust and capable of handling real-world queries more precisely and consistently.

Key Takeaways from the release:

The FRAMES dataset introduced 824 questions to evaluate factuality, retrieval, and reasoning capabilities.

Approximately 36% of the dataset involves reasoning through multiple constraints, and 20% includes numerical comparisons.

Single-step evaluation methods achieved an accuracy of 0.40, while multi-step methods improved accuracy to 0.66.

The Oracle Prompt, which included all necessary documents, was 0.73 accurate, indicating the potential of ideal retrieval systems.

Despite iterative retrieval improvements, the study underscores significant gaps in numerical, tabular, and post-processing reasoning tasks.

In conclusion, this research provides a comprehensive framework for evaluating RAG systems, showcasing both the progress and the challenges in developing robust multi-hop reasoning capabilities. The FRAMES dataset offers a clearer picture of how RAG systems perform in real-world applications, setting the stage for future innovations to bridge the existing gaps and advance these systems’ capabilities.

Check out the Paper and Dataset. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 50k+ ML SubReddit
The post Google Releases FRAMES: A Comprehensive Evaluation Dataset Designed to Test Retrieval-Augmented Generation (RAG) Applications on Factuality, Retrieval Accuracy, and Reasoning appeared first on MarkTechPost.

‘bge-en-icl’: A Novel AI Model that Employs Few-Shot Examples to P …

Posted on October 2, 2024 by i-genie

Generating versatile and high-quality text embeddings across various tasks is a significant challenge in natural language processing (NLP). Current embedding models, despite advancements, often struggle to handle unseen tasks and complex retrieval operations effectively. These limitations hinder their ability to adapt dynamically to diverse contexts, a critical requirement for real-world applications. Addressing this challenge is essential for advancing the field of AI, enabling the development of more robust and adaptable systems capable of performing well across a wide range of scenarios.

Current methods for text embedding rely heavily on sophisticated modifications to large language model (LLM) architectures, such as bidirectional attention mechanisms and various pooling strategies. While these approaches have led to performance improvements in specific scenarios, they often come with significant drawbacks. These include increased computational complexity and a lack of flexibility when adapting to new tasks. Moreover, many of these models require extensive pre-training on large datasets, which can be both resource-intensive and time-consuming. Despite these efforts, models like NV-Embed and GritLM still fall short in their ability to generalize effectively across different tasks, particularly when they encounter scenarios that were not part of their training data.

The researchers from Beijing Academy of Artificial Intelligence, Beijing University of Posts and Telecommunications, Chinese Academy of Sciences and University of Science, and Technology of China introduce a novel model, bge-en-icl, which enhances the generation of text embeddings by leveraging the in-context learning (ICL) capabilities of LLMs. This approach addresses the limitations of existing models by integrating task-specific examples directly into the query input, enabling the model to generate embeddings that are more relevant and generalizable across various tasks. The innovation lies in maintaining the simplicity of the original LLM architecture while incorporating ICL features, avoiding the need for extensive architectural modifications or additional pre-training. This method proves highly effective, setting new performance benchmarks across diverse tasks without sacrificing the model’s ability to adapt to new contexts.

The bge-en-icl model is based on the Mistral-7B backbone, known for its effectiveness in NLP tasks. A key aspect of this method is the use of in-context learning during training, where task-specific examples are integrated into the query input. This allows the model to learn embeddings that are both task-specific and generalizable. The model is fine-tuned using a contrastive loss function, designed to maximize the similarity between relevant query-passage pairs while minimizing it for irrelevant ones. The training process involves a diverse set of tasks, such as retrieval, reranking, and classification, ensuring broad applicability. The bge-en-icl model is tested on benchmarks like MTEB and AIR-Bench, consistently outperforming other models, particularly in few-shot learning scenarios.

The bge-en-icl model demonstrates significant advancements in text embedding generation, achieving state-of-the-art performance across various tasks on the MTEB and AIR-Bench benchmarks. Notably, the model excels in few-shot learning scenarios, outperforming several leading models in retrieval, classification, and clustering tasks. For instance, it achieves high scores in both retrieval and classification, highlighting its capability to generate relevant and generalizable embeddings. These results underscore the effectiveness of incorporating in-context learning (ICL) into the embedding process, allowing the model to adapt dynamically to diverse tasks while maintaining simplicity in its architectural design. This innovative approach not only improves performance but also broadens the applicability of text embeddings in real-world scenarios.

In conclusion, the researchers have made a substantial contribution to the field of text embedding by developing the bge-en-icl model, which effectively leverages in-context learning to improve the adaptability and performance of LLMs. By integrating task-specific examples directly into the query input, this method overcomes the limitations of existing models, enabling the generation of high-quality embeddings across a wide range of tasks. The bge-en-icl model sets new benchmarks on MTEB and AIR-Bench, demonstrating that simplicity combined with ICL can lead to highly effective and versatile AI systems. This approach has the potential to significantly impact AI research, offering a path forward for creating more adaptable and efficient models for real-world applications.

Don’t Forget to join our 50k+ ML SubReddit
The post ‘bge-en-icl’: A Novel AI Model that Employs Few-Shot Examples to Produce High-Quality Text Embeddings appeared first on MarkTechPost.

AWS recognized as a first-time Leader in the 2024 Gartner Magic Quadra …

Posted on October 2, 2024 by i-genie

Over the last 18 months, AWS has announced more than twice as many machine learning (ML) and generative artificial intelligence (AI) features into general availability than the other major cloud providers combined. This accelerated innovation is enabling organizations of all sizes, from disruptive AI startups like Hugging Face, AI21 Labs, and Articul8 AI to industry leaders such as NASDAQ and United Airlines, to unlock the transformative potential of generative AI. By providing a secure, high-performance, and scalable set of data science and machine learning services and capabilities, AWS empowers businesses to drive innovation through the power of AI.
At the heart of this innovation are Amazon Bedrock and Amazon SageMaker, both of which were mentioned in the recent Gartner Data Science and Machine Learning (DSML) Magic Quadrant evaluation. These services play a pivotal role in addressing diverse customer needs across the generative AI journey.
Amazon SageMaker, the foundational service for ML and generative AI model development, provides the fine-tuning and flexibility that makes it simple for data scientists and machine learning engineers to build, train, and deploy machine learning and foundation models (FMs) at scale. For application developers, Amazon Bedrock is the simplest way to build and scale generative AI applications with FMs for a wide variety of use cases. Whether leveraging the best FMs out there or importing custom models from SageMaker, Bedrock equips development teams with the tools they need to accelerate innovation.
We believe continued innovations for both services and our positioning as a Leader in the 2024 Gartner Data Science and Machine Learning (DSML) Magic Quadrant reflects our commitment to meeting evolving customer needs, particularly in data science and ML. In our opinion, this recognition, coupled with our recent recognition in the Cloud AI Developer Services (CAIDS) Magic Quadrant, solidifies AWS as a provider of innovative AI solutions that drive business value and competitive advantage.
Review the Gartner Magic Quadrant and Methodology
For Gartner, the DSML Magic Quadrant research methodology provides a graphical competitive positioning of four types of technology providers in fast-growing markets: Leaders, Visionaries, Niche Players and Challengers. As companion research, Gartner Critical Capabilities notes provide deeper insight into the capability and suitability of providers’ IT products and services based on specific or customized use cases.
The following figure highlights where AWS lands in the DSML Magic Quadrant.

Access a complimentary copy of the full report to see why Gartner positioned AWS as a Leader, and dive deep into the strengths and cautions of AWS.
Further detail on Amazon Bedrock and Amazon SageMaker
Amazon Bedrock provides a straightforward way to build and scale applications with large language models (LLMs) and foundation models (FMs), empowering you to build generative AI applications with security and privacy. With Amazon Bedrock, you can experiment with and evaluate high performing FMs for your use case, import custom models, privately customize them with your data using techniques such as fine-tuning and Retrieval Augmented Generation (RAG), and build agents that run tasks using your enterprise systems and data sources. Tens of thousands of customers across multiple industries are deploying new generative AI experiences for diverse use cases.
Amazon SageMaker is a fully managed service that brings together a broad set of tools to enable high-performance, low-cost ML for any use case. You can access a wide-ranging choices of ML tools, fully managed and scalable infrastructure, repeatable and responsible ML workflows and the power of human feedback across the ML lifecycle, including sophisticated tools that make it straightforward to work with data like Amazon SageMaker Canvas and Amazon SageMaker Data Wrangler.
In addition, Amazon SageMaker helps data scientists and ML engineers build FMs from scratch, evaluate and customize FMs with advanced techniques, and deploy FMs with fine-grained controls for generative AI use cases that have stringent requirements on accuracy, latency, and cost. Hundreds of thousands of customers from Perplexity to Thomson Reuters to Workday use SageMaker to build, train, and deploy ML models, including LLMs and other FMs.
Gartner does not endorse any vendor, product or service depicted in its research publications and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.
This graphic was published by Gartner, Inc. as part of a larger research document and should be evaluated in the context of the entire document. The Gartner document is available upon request from AWS.
GARTNER is a registered trademark and service mark of Gartner and Magic Quadrant is a registered trademark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and are used herein with permission. All rights reserved.

About the author
Susanne Seitinger leads AI and ML product marketing at Amazon Web Services (AWS), including the introduction of critical generative AI services like Amazon Bedrock as well as coordinating generative AI marketing activities across AWS. Prior to AWS, Susanne was the director of public sector marketing at Verizon Business Group, and previously drove public sector marketing in the United States for Signify, after holding various positions in R&D, innovation, and segment management and marketing. She holds a BA from Princeton University, as well as a master’s in city planning and a PhD from MIT.