This AI Paper from ETH Zurich Introduces DINKEL: A State-Aware Query G …

Graph database management systems (GDBMSs) have become essential in today’s data-driven world, which requires more and more management of complex, highly interconnected data for social networking, recommendation systems, and large language models. Graph systems efficiently store and manipulate graphs to quickly retrieve data for relationship analysis. The reliability of GDBMS will then be crucial for sectors in which data integrity is very important, such as finance and social media.

Despite high diffusion, the intrinsic complexity and dynamic data changes these systems handle are serious problems and hassles in the GDBMS. A bug in these systems could create serious problems, including data corruption and possible security flaws. For instance, these bugs in GDBMS can lead to denial-of-service attacks or information disclosure that will be disastrous in high-assurance applications. As both the systems and the nature of the queries they process are very complex, their detection and resolution are quite challenging; hence, these bugs might pose a severe reliability and security risk.

State-of-the-art techniques for testing GDBMS generate queries in Cypher, the most widely adopted graph query language. However, these techniques only generate relatively small complexity queries and fully model state changes and data dependencies typical of complex real-world applications. Indeed, state-of-the-art approaches usually cover a small portion of the functionality offered by GDBMSs and fail to detect bugs that may compromise system integrity. These deficiencies underline the need for more sophisticated testing tools capable of accurately modeling complex operations in GDBMS.

That being the case, ETH Zurich researchers have proposed an alternative way of testing GDBMS focusing on state-aware query generation. The team implemented this approach as a fully automatic GDBMS testing framework, DINKEL. This method enables modeling the dynamic states of a graph database to create complex Cypher queries that represent real-life data manipulation in GDBMS. In contrast to traditional techniques, DINKEL permits the continuous update of state information about a graph during the generation of queries, thus guaranteeing that every independent query reflects a database’s possible states and dependencies. Hence, this multi-state change and complex data interaction empower queries to enable the testing of GDBMS with high test coverage and effectiveness.

Another approach by DINKEL is based on the systematic modeling of graph states, divided by query context and graph schema. Query context contains information about the temporary variables declared in the queries; graph schema includes information on current graph labels and properties. On the generation of Cypher queries, DINKEL incrementally constructs every query, drawing on information about the current state of the graph’s accessible elements and updating state information as the query evolves. This state awareness guarantees syntactical correctness but also ensures real-world scenarios are represented by the queries generated from DINKEL, enabling the revelation of bugs that would have flown under the radar.

The results of DINKEL performance are really impressive. His extensive testing on three major open-source GDBMSs—Neo4j, RedisGraph, and Apache AGE—DINKEL showed a brilliant validity rate for complex Cypher queries of 93.43%. In a 48-hour test campaign, DINKEL exposed 60 unique bugs, among which 58 were confirmed, and the developers later fixed 51. By applying this methodology, DINKEL could cover over 60% more code than the best baseline testing techniques, thus demonstrating improved deep bug-exposing ability. Most of these bugs were previously unknown and involved tricky logic or state changes in the GDBMS, underpinning the effectiveness of DINKEL’s state-aware query generation.

The approach by the ETH Zurich team serves a needy cause in testing GDBMS. They have developed a state-aware approach to generating queries for building this tool, drastically improving complex bug detection that hazard reliability and security in graph database systems. Results Their work, materialized through DINKEL, showed remarkable improvements in test coverage and bug detection. This advance is a step ahead in GDBMS robustness assurance; DINKEL is one relevant tool for developers and researchers.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 49k+ ML SubReddit

Find Upcoming AI Webinars here
The post This AI Paper from ETH Zurich Introduces DINKEL: A State-Aware Query Generation Framework for Testing GDBMS (Graph Database Management Systems) appeared first on MarkTechPost.

Speculative Retrieval Augmented Generation (Speculative RAG): A Novel …

The field of natural language processing has made substantial strides with the advent of Large Language Models (LLMs), which have shown remarkable proficiency in tasks such as question answering. These models, trained on extensive datasets, can generate highly plausible and contextually appropriate responses. However, despite their success, LLMs need help dealing with knowledge-intensive queries. Specifically, these queries often require up-to-date information or involve obscure facts that the model might have yet to encounter during training. This limitation can lead to factual inaccuracies or the generation of hallucinated content, particularly when the model is pressed for details outside its stored knowledge. The problem becomes even more pronounced when precision and reliability are paramount, such as in medical or scientific inquiries.

A central challenge in developing and applying LLMs is achieving an optimal balance between accuracy and processing efficiency. When LLMs are tasked with answering complex queries that require integrating information from various sources, they often need help managing long contexts. As the number of relevant documents increases, so does the complexity of reasoning, which can overwhelm the model’s capacity to process information efficiently. This inefficiency slows the response generation and increases the likelihood of errors, particularly in scenarios where the model must sift through extensive contextual information to find the most relevant details. The need for systems that can efficiently incorporate external knowledge, reducing both latency and the risk of inaccuracies, is thus a critical area of research in natural language processing.

Researchers have developed methods like Retrieval Augmented Generation (RAG), which integrates external knowledge sources directly into the generative process of LLMs. Traditional RAG systems retrieve multiple documents related to the query and incorporate them into the model’s input to ensure a thorough understanding of the topic. While this approach has proven effective in reducing factual errors, it introduces new challenges. Including multiple documents significantly increases the input length, which, in turn, can slow down the inference process and complicate the reasoning required to generate accurate responses. Some advanced RAG systems attempt to refine the quality of the retrieved documents to improve the contextual information provided to the LLM. However, these methods often focus on improving accuracy only after adequately addressing the associated latency issues, which remain a significant bottleneck in the practical application of these models.

Researchers from the University of California San Diego, Google Cloud AI Research, Google DeepMind, and Google Cloud AI introduced a novel approach called Speculative Retrieval Augmented Generation (Speculative RAG). This framework innovatively combines the strengths of both specialist and generalist language models to improve efficiency and accuracy in response generation. The core idea behind Speculative RAG is to leverage a smaller, specialist LM that can generate multiple drafts of potential answers in parallel. Each draft is created from a distinct subset of documents retrieved based on the query to capture diverse perspectives and reduce redundancy. Once these drafts are generated, a larger, generalist LM steps in to verify them. The generalist LM evaluates the coherence and relevance of each draft, ultimately selecting the most accurate one for the final response. This method effectively reduces the input token count per draft, enhancing the response generation process’s efficiency without compromising the answers’ accuracy.

Speculative RAG employs a divide-and-conquer strategy that partitions retrieved documents into subsets based on content similarity. The documents are grouped using clustering techniques, and one document from each cluster is sampled to form a diverse subset. These subsets are then processed by the specialist LM, which generates answer drafts along with corresponding rationales. The generalist LM then evaluates these drafts by calculating a confidence score based on the coherence of the draft and its reasoning. This approach minimizes redundancy in the retrieved documents and ensures that the final answer is informed by multiple perspectives, thereby improving the overall quality and reliability of the response.

The performance of Speculative RAG has been rigorously tested against traditional RAG methods across various benchmarks, including TriviaQA, PubHealth, and ARC-Challenge. The results are compelling: Speculative RAG enhances accuracy by up to 12.97% on the PubHealth benchmark while reducing latency by 51%. In the TriviaQA benchmark, the method achieved an accuracy improvement of 2.15% and a latency reduction of 23.41%. On the ARC-Challenge benchmark, the accuracy increased by 2.14%, with a corresponding latency reduction of 26.73%. These figures underscore the effectiveness of the Speculative RAG framework in delivering high-quality responses more efficiently than conventional RAG systems.

In conclusion, Speculative RAG effectively addresses the limitations of traditional RAG systems by strategically combining the strengths of smaller, specialist language models with larger, generalist ones. The method’s ability to generate multiple drafts in parallel, reduce redundancy, and leverage diverse perspectives ensures that the final output is accurate and efficiently produced. Speculative RAG’s substantial improvements in accuracy and latency across multiple benchmarks highlight its potential to set new standards in applying LLMs for complex, knowledge-intensive queries. As natural language processing continues to evolve, approaches like Speculative RAG will likely play a crucial role in enhancing language models’ capabilities and practical applications in various domains.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 49k+ ML SubReddit

Find Upcoming AI Webinars here
The post Speculative Retrieval Augmented Generation (Speculative RAG): A Novel Framework Enhancing Accuracy and Efficiency in Knowledge-intensive Query Processing with LLMs appeared first on MarkTechPost.

Code as a Catalyst: Improving LLM Capabilities Across Diverse Tasks

Large Language Models (LLMs) have gained significant attention in recent years, with researchers focusing on improving their performance across various tasks. A critical challenge in developing these models lies in understanding the impact of pre-training data on their overall capabilities. While the importance of diverse data sources and computational resources has been established, a crucial question remains: what properties of data contribute most effectively to general performance? Surprisingly, code data has become a common component in pre-training mixtures, even for models not explicitly designed for code generation. This inclusion raises questions about the precise impact of code data on non-code tasks. This topic has yet to be systematically investigated despite its potential significance in advancing LLM capabilities.

Researchers have made numerous attempts to understand and improve LLM performance through data manipulation. These efforts include studying the effects of data age, quality, toxicity, and domain, as well as exploring techniques like filtering, de-duplication, and data pruning. Some studies have investigated the role of synthetic data in enhancing performance and bridging gaps between open-source and proprietary models. While these approaches offer valuable insights into general data characteristics, they do not specifically address the impact of code data on non-code tasks.

The inclusion of code in pre-training mixtures has become a common practice, even for models not primarily designed for code-related tasks. Previous studies suggest that code data improves LLM performance on various natural language processing tasks, including entity linking, commonsense reasoning, and mathematical reasoning. Some researchers have demonstrated the benefits of using Python code data in low-resource pre-training settings. However, these studies often focused on specific aspects or limited evaluation setups, lacking a comprehensive examination of code data’s impact across various tasks and model scales.

Researchers from Cohere For AI and Cohere conducted an extensive set of large-scale controlled pre-training experiments to investigate the impact of code data on LLM performance. Their study focused on various aspects, including the timing of code introduction in the training process, code proportions, scaling effects, and the quality and properties of the code data used. Despite the significant computational cost of these ablations, the results consistently demonstrated that code data provides critical improvements to non-code performance.

The study’s key findings reveal that compared to text-only pre-training, the best variant with code data inclusion resulted in relative increases of 8.2% in natural language reasoning, 4.2% in world knowledge, 6.6% in generative win rates, and a 12-fold boost in code performance. Also, performing cooldown with code led to additional improvements: 3.7% in natural language reasoning, 6.8% in world knowledge, and 20% in code performance, relative to cooldown without code.

Several factors proved crucial, including optimizing the proportion of code, enhancing code quality through synthetic code and code-adjacent data, and utilizing code across multiple training stages, including cool down. The researchers conducted extensive evaluations on a wide range of benchmarks, covering world knowledge tasks, natural language reasoning, code generation, and LLM-as-a-judge win rates. These experiments spanned models ranging from 470 million to 2.8 billion parameters.

The research methodology involved a comprehensive experimental framework to evaluate the impact of code on LLM performance. The study used SlimPajama, a high-quality text dataset, as the primary source for natural language data, carefully filtering out code-related content. For code data, researchers employed multiple sources to explore different properties:

1. Web-based Code Data: Derived from the Stack dataset, focusing on the top 25 programming languages.

2. Markdown Data: Including markup-style languages like Markdown, CSS, and HTML.

3. Synthetic Code Data: A proprietary dataset of formally verified Python programming problems.

4. Code-Adjacent Data: Incorporating GitHub commits, Jupyter notebooks, and StackExchange threads.

The training process consisted of two phases: continued pre-training and cooldown. Continued pre-training involved training a model initialized from a pre-trained model for a fixed token budget. Cooldown, a technique to boost model quality, involved up-weighting high-quality datasets and annealing the learning rate during the final training stages.

The evaluation suite was designed to assess performance across various domains: world knowledge, natural language reasoning, and code generation. Also, the researchers employed LLM-as-a-judge win rates to evaluate generative performance. This comprehensive approach allowed for a systematic understanding of code’s impact on general LLM performance beyond just code-related tasks.

The study revealed significant impacts of code data on LLM performance across various tasks. For natural language reasoning, models initialized with code data showed the best performance. The code-initialized text model (code→text) and the balanced-initialized text model (balanced→text) outperformed the text-only baseline by 8.8% and 8.2% respectively. The balanced-only model also showed a 3.2% improvement over the baseline, indicating that initialization from a pre-trained model with a mix of code has a strong positive effect on natural language reasoning tasks.

In world knowledge tasks, the balanced→text model performed best, surpassing the code→text model by 21% and the text-only model by 4.1%. This suggests that world knowledge tasks benefit from a more balanced data mixture for initialization and a larger proportion of text in the continual pre-training stage.

For code generation tasks, the balanced-only model achieved the best performance, showing a 46.7% and 54.5% improvement over balanced→text and code→text models respectively. However, this came at the cost of lower performance in natural language tasks.

Generative quality, as measured by win rates, also improved with the inclusion of code data. Both code→text and balanced-only models outperformed the text-only variant by a 6.6% difference in win-loss rates, even on non-code evaluations.

These results demonstrate that including code data in pre-training not only enhances reasoning capabilities but also improves the overall quality of generated content across various tasks, highlighting the broad benefits of code data in LLM training.

This study provides new insights into the impact of code data on LLM performance across a wide range of tasks. The researchers conducted a comprehensive analysis, focusing not only on code-related tasks but also on natural language performance and generative quality. Their systematic approach included various ablations examining initialization strategies, code proportions, code quality and properties, and the role of code in pre-training cooldown.

Key findings from the study include:

Code data significantly improves non-code task performance. The best variant with code data showed relative increases of 8.2% in natural language reasoning, 4.2% in world knowledge, and 6.6% in generative win rates compared to text-only pre-training.

Code performance saw a dramatic 12-fold boost with the inclusion of code data.

Cooldown with code further enhanced performance, improving natural language reasoning by 3.6%, world knowledge by 10.1%, and code performance by 20% relative to pre-cooldown models. This also led to a 52.3% increase in generative win rates.

The addition of high-quality synthetic code data, even in small amounts, had a disproportionately positive impact, increasing natural language reasoning by 9% and code performance by 44.9%.

These results demonstrate that incorporating code data in LLM pre-training leads to substantial improvements across various tasks, extending far beyond code-specific applications. The study highlights the critical role of code data in enhancing LLM capabilities, offering valuable insights for future model development and training strategies.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 49k+ ML SubReddit

Find Upcoming AI Webinars here
The post Code as a Catalyst: Improving LLM Capabilities Across Diverse Tasks appeared first on MarkTechPost.

Build private and secure enterprise generative AI applications with Am …

Amazon Q Business is a conversational assistant powered by generative artificial intelligence (AI) that enhances workforce productivity by answering questions and completing tasks based on information in your enterprise systems, which each user is authorized to access. In an earlier post, we discussed how you can build private and secure enterprise generative AI applications with Amazon Q Business and AWS IAM Identity Center. If you want to use Amazon Q Business to build enterprise generative AI applications, and have yet to adopt organization-wide use of AWS IAM Identity Center, you can use Amazon Q Business IAM Federation to directly manage user access to Amazon Q Business applications from your enterprise identity provider (IdP), such as Okta or Ping Identity. Amazon Q Business IAM Federation uses Federation with IAM and doesn’t require the use of IAM Identity Center.
AWS recommends using AWS Identity Center if you have a large number of users in order to achieve a seamless user access management experience for multiple Amazon Q Business applications across many AWS accounts in AWS Organizations. You can use federated groups to define access control, and a user is charged only one time for their highest tier of Amazon Q Business subscription. Although Amazon Q Business IAM Federation enables you to build private and secure generative AI applications, without requiring the use of IAM Identity Center, it is relatively constrained with no support for federated groups, and limits the ability to charge a user only one time for their highest tier of Amazon Q Business subscription to Amazon Q Business applications sharing SAML identity provider or OIDC identity provider in a single AWS accouGnt.
This post shows how you can use Amazon Q Business IAM Federation for user access management of your Amazon Q Business applications.
Solution overview
To implement this solution, you create an IAM identity provider for SAML or IAM identity provider for OIDC based on your IdP application integration. When creating an Amazon Q Business application, you choose and configure the corresponding IAM identity provider.
When responding to requests by an authenticated user, the Amazon Q Business application uses the IAM identity provider configuration to validate the user identity. The application can respond securely and confidentially by enforcing access control lists (ACLs) to generate responses from only the enterprise content the user is authorized to access.
We use the same example from Build private and secure enterprise generative AI apps with Amazon Q Business and AWS IAM Identity Center—a generative AI employee assistant built with Amazon Q Business—to demonstrate how to set it up using IAM Federation to only respond using enterprise content that each employee has permissions to access. Thus, the employees are able to converse securely and privately with this assistant.
Architecture
Amazon Q Business IAM Federation requires federating the user identities provisioned in your enterprise IdP such as Okta or Ping Identity account using Federation with IAM. This involves a onetime setup of creating a SAML or OIDC application integration in your IdP account, and then creating a corresponding SAML identity provider or an OIDC identity provider in AWS IAM. This SAML or OIDC IAM identity provider is required for you to create an Amazon Q Business application. The IAM identity provider is used by the Amazon Q Business application to validate and trust federated identities of users authenticated by the enterprise IdP, and associate a unique identity with each user. Thus, a user is uniquely identified across all Amazon Q Business applications sharing the same SAML IAM identity provider or OIDC IAM identity provider.
The following diagram shows a high-level architecture and authentication workflow. The enterprise IdP, such as Okta or Ping Identity, is used as the access manager for an authenticated user to interact with an Amazon Q Business application using an Amazon Q web experience or a custom application using an API.

The user authentication workflow consists of the following steps:

The client application makes an authentication request to the IdP on behalf of the user.
The IdP responds with identity or access tokens in OIDC mode, or a SAML assertion in SAML 2.0 mode. Amazon Q Business IAM Federation requires the enterprise IdP application integration to provide a special principal tag email attribute with its value set to the email address of the authenticated user. If user attributes such as role or location (city, state, country) are present in the SAML or OIDC assertions, Amazon Q Business will extract these attributes for personalization. These attributes are included in the identity token claims in OIDC mode, and SAML assertions in the SAML 2.0 mode.
The client application makes an AssumeRoleWithWebIdentity (OIDC mode) or AssumeRoleWithSAML (SAML mode) API call to AWS Security Token Service (AWS STS) to acquire AWS Sig V4 credentials. Email and other attributes are extracted and enforced by the Amazon Q Business application using session tags in AWS STS. The AWS Sig V4 credentials include information about the federated user.
The client application uses the credentials obtained in the previous step to make Amazon Q Business API calls on behalf of the authenticated user. The Amazon Q Business application knows the user identity based on the credential used to make the API calls, shows only the specific user’s conversation history, and enforces document ACLs. The application retrieves only those documents from the index that the user is authorized to access and are relevant to the user’s query, to be included as context when the query is sent to the underlying large language model (LLM). The application generates a response based only on enterprise content that the user is authorized to access.

How subscriptions work with Amazon Q Business IAM Federation
The way user subscriptions are handled when you use IAM Identity Center vs. IAM Federation is different.
For applications that use IAM Identity Center, AWS will de-duplicate subscriptions across all Amazon Q Business applications accounts, and charge each user only one time for their highest subscription level. De-duplication will apply only if the Amazon Q Business applications share the same organization instance of IAM Identity Center. Users subscribed to Amazon Q Business applications using IAM federation will be charged one time when they share the same SAML IAM identity provider or OIDC IAM identity provider. Amazon Q Business applications can share the same SAML IAM identity provider or OIDC IAM identity provider only if they are in the same AWS account. For example, if you use Amazon Q Business IAM Federation, and need to use Amazon Q Business applications across 3 separate AWS accounts, each AWS account will require its own SAML identity provider or OIDC identity provider to be created and used in the corresponding Amazon Q Business applications, and a user subscribed to these three Amazon Q Business applications will be charged three times. In another example, if a user is subscribed to some Amazon Q Business applications that use IAM Identity Center and others that use IAM Federation, they will be charged one time across all IAM Identity Center applications and one time per SAML IAM identity provider or OIDC IAM identity provider used by the Amazon Q Business applications using IAM Federation.
For Amazon Q Business applications using IAM Identity Center, the Amazon Q Business administrator directly assigns subscriptions for groups and users on the Amazon Q Business management console. For an Amazon Q Business application using IAM federation, the administrator chooses the default subscription tier during application creation. When an authenticated user logs in using either the Amazon Q Business application web experience or a custom application using the Amazon Q Business API, that user is automatically subscribed to the default tier.
Limitations
At the time of writing, Amazon Q Business IAM Federation has the following limitations:

Amazon Q Business doesn’t support OIDC for Google and Microsoft Entra ID.
There is no built-in mechanism to validate a user’s membership to federated groups defined in the enterprise IdP. If you’re using ACLs in your data sources with groups federated from the enterprise IdP, you can use the PutGroup API to define the federated groups in the Amazon Q Business user store. This way, the Amazon Q Business application can validate a user’s membership to the federated group and enforce the ACLs accordingly. This limitation does not apply to configurations where groups used in ACLs are defined locally within the data sources. For more information, refer to Group mapping.

Guidelines to choosing a user access mechanism
The following table summarizes the guidelines to consider when choosing a user access mechanism.

Federation Type
AWS Account Type
Amazon Q Business Subscription Billing Scope
Supported Identity Source
Other Considerations

Federated with IAM Identity Center
Multiple accounts managed by AWS Organizations
AWS organization, support for federated group-level subscriptions to Amazon Q Business applications
All identity sources supported by IAM Identity Center: IAM Identity Center directory, Active Directory, and IdP
AWS recommends this option if you have a large number of users and multiple applications, with many federated groups used to define access control and permissions.

Federated with IAM using OIDC IAM identity provider
Single, standalone account
All Amazon Q Business applications within a single standalone AWS account sharing the same OIDC IAM identity provider
IdP with OIDC application integration
This method is more straightforward to configure compared to a SAML 2.0 provider. It’s also less complex to share IdP application integrations across Amazon Q Business web experiences and custom applications using Amazon Q Business APIs.

Federated with IAM using SAML IAM identity provider
Single, standalone account
All Amazon Q Business applications within a single standalone AWS account sharing the same SAML IAM identity provider
IdP with SAML 2.0 application integration
This method is more complex to configure compared to OIDC, and requires a separate IdP application integration for each Amazon Q Business web experience. Some sharing is possible for custom applications using Amazon Q Business APIs.

Prerequisites
To implement the sample use case described in this post, you need an Okta account. This post covers workflows for both OIDC and SAML 2.0, so you can follow either one or both workflows based on your interest. You need to create application integrations for OIDC or SAML mode, and then configure the respective IAM identity providers in your AWS account, which will be required to create and configure your Amazon Q Business applications. Though you use the same Okta account and the same AWS account to create two Amazon Q Business applications one using an OIDC IAM identity provider, and the other using SAML IAM identity provider, the same user subscribed to both these Amazon Q Business applications will be charged twice, since they don’t share the underlying SAML or OIDC IAM identity providers.
Create an Amazon Q Business application with an OIDC IAM identity provider
To set up an Amazon Q Business application with an OIDC IAM identity identifier, you first configure the Okta application integration using OIDC. Then you create an IAM identity provider for that OIDC app integration, and create an Amazon Q Business application using that OIDC IAM identity provider. Lastly, you update the Okta application integration with the web experience URIs of the newly created Amazon Q Business application.
Create an Okta application integration with OIDC
Complete the following steps to create your Okta application integration with OIDC:

On the administration console of your Okta account, choose Applications, then Applications in the navigation pane.
Choose Create App Integration.
For Sign-in method, select OIDC.
For Application type, select Web Application.
Choose Next.

Give your app integration a name.
Select Authorization Code and Refresh Token for Grant Type.
Confirm that Refresh token behavior is set to Use persistent token.
For Sign-in redirect URIs, provide a placeholder value such as https://example.com/authorization-code/callback.

You update this later with the web experience URI of the Amazon Q Business application you create.

On the Assignments tab, assign access to appropriate users within your organization to your Amazon Q Business application.

In this step, you can select all users in your Okta organization, or choose select groups, such as Finance-Group if it’s defined, or select individual users.

Choose Save to save the app integration.

Your app integration will look similar to the following screenshots.

Note the values for Client ID and Client secret to use in subsequent steps.

On the Sign on tab, choose Edit next to OpenID Connect ID Token.
For Issuer, note the Okta URL.
Choose Cancel.

In the navigation pane, choose Security and then API.
Under API, Authorization Servers, choose default.
On the Claims tab, choose Add Claim.
For Name, enter https://aws.amazon.com/tags.
For Include in token type, select ID Token.
For Value, enter {“principal_tags”: {“Email”: {user.email}}}.
Choose Create.

The claim will look similar to the following screenshot. It is a best practice to use a custom authorization server. However, because this is an illustration, we use the default authorization server.

Set up an IAM identity provider for OIDC
To set up an IAM identity provider for OIDC, complete the following steps:

On the IAM console, choose Identity providers in the navigation pane.
Choose Add provider.
For Provider type, select OpenID Connect.
For Provider URL, enter the Okta URL you copied earlier, followed by /oauth2/default.
For Audience, enter the client ID you copied earlier.
Choose Add provider.

Create an Amazon Q Business application with the OIDC IAM identity provider
Complete the following steps to create an Amazon Q Business application with the OIDC IdP:

On the Amazon Q Business console, choose Create application.
Give the application a name.
For Access management method, select AWS IAM Identity provider.
For Choose an Identity provider type, select OpenID Connect (OIDC).
For Select Identity Provider, choose the IdP you created.
For Client ID, enter the client ID of the Okta application integration you copied earlier.
Leave the remaining settings as default and choose Create.

In the Select retriever step, unless you want to change the retriever type or the index type, choose Next.
For now, select Next on the Connect data sources We configure the data source later.

On the Manage access page, in Default subscription settings, Subscription Tier of Q Business Pro is selected by default. This means that when an authenticated user starts using the Amazon Q Business application, they will automatically get subscribed as Amazon Q Business Pro. The Amazon Q Business administrator can change the subscription tier for a user at any time.

In Web experience settings uncheck Create web experience. Choose Done.
On the Amazon Q Business Applications page, choose the application you just created to view the details.
In the Application Details page, note the Application ID.
In a new tab of your web browser open the management console for AWS Secrets Manager. Choose Store a new secret.
For Choose secret type choose Other type of secret. For Key/value pairs, enter client_secret as key and enter the client secret you copied from the Okta application integration as value. Choose Next.
For Configure secret give a Secret name.
For Configure rotation, unless you want to make any changes, accept the defaults, and choose Next.
For Review, review the secret you just stored, and choose Store.
On AWS Secrets Manager, Secrets page choose the secret you just created. Note the Secret name and Secret ARN.
Follow the instructions on IAM role for an Amazon Q web experience using IAM Federation to create Web experience IAM role, and Secret Manager Role. You will require the Amazon Q Business Application ID, Secret name and Secret ARN you copied earlier.
Open the Application Details for your Amazon Q Business application. Choose Edit.
For Update application, there is no need to make changes. Choose Update.
For Update retriever, there is no need to make changes. Choose Next.
For Connect data sources, there is no need to make changes. Choose Next.
For Update access, select Create web experience.
For Service role name select the web experience IAM role you created earlier.
For AWS Secrets Manager secret, select the secret you stored earlier.
For Web Experience to use Secrets: Service role name, select the Secret Manager Role you created earlier.
Choose Update.
On the Amazon Q Business Applications page, choose the application you just updated to view the details.
Note the value for Deployed URL.

Before you can use the web experience to interact with the Amazon Q Business application you just created, you need to update the Okta application integration with the redirect URL of the web experience.

Open the Okta administration console, then open the Okta application integration you created earlier.
On the General tab, choose Edit next to General Settings.
For Sign-in redirect URIs, replace the placeholder https://example.com/ with the value for Deployed URL of your web experience. Make sure the authorization-code/callback suffix is not deleted. The full URL should look like https://your_deployed_url/authorization-code/callback.
Choose Save.

Create an Amazon Q Business application with a SAML 2.0 IAM identity provider
The process to set up an Amazon Q Business application with a SAML 2.0 IAM identity provider is similar to creating an application using OIDC. You first configure an Okta application integration using SAML 2.0. Then you create an IAM identity provider for that SAML 2.0 app integration, and create an Amazon Q Business application using the SAML 2.0 IAM identity provider. Lastly, you update the Okta application integration with the web experience URIs of the newly created Amazon Q Business application.
Create an Okta application integration with SAML 2.0
Complete the following steps to create your Okta application integration with SAML 2.0:

On the administration console of your Okta account, choose Applications, then Applications in the navigation pane.
Choose Create App Integration.
For Sign-in method, select SAML 2.0.
Choose Next.

On the General Settings page, enter an app name and choose Next.

This will open the Create SAML Integration page.

For Single sign-on URL, enter a placeholder URL such as https://example.com/saml and deselect Use this for Recipient URL and Destination URL.
For Recipient URL, enter https://signin.aws.amazon.com/saml.
For Destination URL, enter the placeholder https://example.com/saml.
For Audience URL (SP Entity ID), enter https://signin.aws.amazon.com/saml.
For Name ID format, choose Persistent.
Choose Next and then Finish.

The placeholder values of https://example.com will need to be updated with the deployment URL of the Amazon Q Business web experience, which you create in subsequent steps.

On the Sign On tab of the app integration you just created, note the value for Metadata URL.

Open the URL in your web browser, and save it on your local computer.

The metadata will be required in subsequent steps.

Set up an IAM identity provider for SAML 2.0
To set up an IAM IdP for SAML 2.0, complete the following steps:

On the IAM console, choose Identity providers in the navigation pane.
Choose Add provider.
For Provider type, select SAML.
Enter a provider name.
For Metadata document, choose Choose file and upload the metadata document you saved earlier.
Choose Add provider.

From the list of identity providers, choose the identity provider you just created.
Note the values for ARN, Issuer URL, and SSO service location to use in subsequent steps.

Create an Amazon Q Business application with the SAML 2.0 IAM identity provider
Complete the following steps to create an Amazon Q Business application with the SAML 2.0 IAM identity provider:

On the Amazon Q Business console, choose Create application.
Give the application a name.
For Access management method, select AWS IAM Identity provider.
For Choose an Identity provider type, select SAML.
For Select Identity Provider, choose the IdP you created.
Leave the remaining settings as default and choose Create.

In the Select retriever step, unless you want to change the retriever type or the index type, choose Next.
For now, choose Next on the Connect data sources We will configure the data source later.

On the Manage access page, in Default subscription settings, Subscription Tier of Q Business Pro is selected by default. This means that when an authenticated user starts using the Amazon Q Business application, they will automatically get subscribed as Amazon Q Business Pro. The Amazon Q Business administrator can change the subscription tier for a user at any time.

For Web experience settings, uncheck Create web experience. Choose Done.
On the Amazon Q Business Applications page, choose the application you just created.
In the Application Details page, note the Application ID.
Follow the instructions on IAM role for an Amazon Q web experience using IAM Federation to create Web experience IAM role. You will require the Amazon Q Business Application ID you copied earlier.
Open the Application Details for your Amazon Q Business application. Choose Edit.
For Update application, there is no need to make changes. Choose Update.
For Update retriever, there is no need to make changes. Choose Next.
For Connect data sources, there is no need to make changes. Choose Next.
For Update access, select Create web experience.
For this post, we continue with the default setting.
For Authentication URL, enter the value for SSO service location that you copied earlier.
Choose Update.
On the Amazon Q Business Applications page, choose the application you just updated to view the details.
Note the values for Deployed URL and Web experience IAM role ARN to use in subsequent steps.

 Before you can use the web experience to interact with the Amazon Q Business application you just created, you need to update the Okta application integration with the redirect URL of the web experience.

Open the Okta administration console, then open the Okta application integration you created earlier.
On the General tab, choose Edit next to SAML Settings.
For Single sign-on URL and Destination URL, replace the placeholder https://example.com/ with the value for Deployed URL of your web experience. Make sure the /saml suffix isn’t deleted.
Choose Save.

On the Edit SAML Integration page, in the Attribute Statements (optional) section, add attribute statements as listed in the following table.

This step is not optional and these attributes are used by the Amazon Q Business application to determine the identity of the user, so be sure to confirm their correctness.

Name
Name format
Value

https://aws.amazon.com/SAML/Attributes/PrincipalTag:Email
Unspecified
user.email

https://aws.amazon.com/SAML/Attributes/Role
Unspecified
<Web experience IAM role ARN>,<identity-provider-arn>

https://aws.amazon.com/SAML/Attributes/RoleSessionName
Unspecified
user.email

For the value of the https://aws.amazon.com/SAML/Attributes/Role attribute, you need to concatenate the web experience IAM role ARN and IdP ARN you copied earlier with a comma between them, without spaces or any other characters.

Choose Next and Finish.
On the Assignments tab, assign users who can access the app integration you just created.

This step controls access to appropriate users within your organization to your Amazon Q Business application. In this step, you can enable self-service so that all users in your Okta organization, or choose select groups, such as Finance-Group if it’s defined, or select individual users.

Set up the data source
Whether you created the Amazon Q Business application using an OIDC IAM identity provider or SAML 2.0 IAM identity provider, the procedure to create a data source remains the same. For this post, we set up a data source for Atlassian Confluence. The following steps show how to configure the data source for the Confluence environment. For more details on how to set up a Confluence data source, refer to Connecting Confluence (Cloud) to Amazon Q Business.

On the Amazon Q Business Application details page, choose Add data source.

On the Add data source page, choose Confluence.

For Data source name, enter a name.
For Source, select Confluence Cloud and enter the Confluence URL.

For Authentication, select Basic authentication and enter the Secrets Manager secret.
For IAM role, select Create a new service role.
Leave the remaining settings as default.

For Sync scope, select the appropriate content to sync.
Under Space and regex patterns, provide the Confluence spaces to be included.
For Sync mode, select Full sync.
For Sync run schedule, choose Run on demand.
Choose Add data source.

After the data source creation is complete, choose Sync now to start the data source sync.

Wait until the sync is complete before logging in to the web experience to start querying.

Employee AI assistant use case
To illustrate how you can build a secure and private generative AI assistant for your employees using Amazon Q Business applications, let’s take a sample use case of an employee AI assistant in an enterprise corporation. Two new employees, Mateo Jackson and Mary Major, have joined the company on two different projects, and have finished their employee orientation. They have been given corporate laptops, and their accounts are provisioned in the corporate IdP. They have been told to get help from the employee AI assistant for any questions related to their new team member activities and their benefits.
The company uses Confluence to manage their enterprise content. The sample Amazon Q application used to run the scenarios for this post is configured with a data source using the built-in connector for Confluence to index the enterprise Confluence spaces used by employees. The example uses three Confluence spaces with the following permissions:

HR Space – All employees, including Mateo and Mary
AnyOrgApp Project Space – Employees assigned to the project, including Mateo
ACME Project Space – Employees assigned to the project, including Mary

Let’s look at how Mateo and Mary experience their employee AI assistant.
Both are provided with the URL of the employee AI assistant web experience. They use the URL and sign in to the IdP from the browsers of their laptops. Mateo and Mary both want to know about their new team member activities and their fellow team members. They ask the same questions to the employee AI assistant but get different responses, because each has access to separate projects. In the following screenshots, the browser window on the left is for Mateo Jackson and the one on the right is for Mary Major. Mateo gets information about the AnyOrgApp project and Mary gets information about the ACME project.

Mateo chooses Sources under the question about team members to take a closer look at the team member information, and Mary chooses Sources under the question for the new team member checklist. The following screenshots show their updated views.

Mateo and Mary want to find out more about the benefits their new job offers and how the benefits are applicable to their personal and family situations.
The following screenshot shows that Mary asks the employee AI assistant questions about her benefits and eligibility.

Mary can also refer to the source documents.

The following screenshot shows that Mateo asks the employee AI assistant different questions about his eligibility.

Mateo looks at the following source documents.

Both Mary and Mateo first want to know their eligibility for benefits. But after that, they have different questions to ask. Even though the benefits-related documents are accessible by both Mary and Mateo, their conversations with the employee AI assistant are private and personal. The assurance that their conversation history is private and can’t be seen by any other user is critical for the success of a generative AI employee productivity assistant.
Clean up
If you created a new Amazon Q Business application to try out the integration with IAM federation, and don’t plan to use it further, you can unsubscribe, remove automatically subscribed users from the application, and delete it so that your AWS account doesn’t accumulate costs.

To unsubscribe and remove users, go to the application details page and choose Manage subscriptions.

Select all the users, choose Remove to remove subscriptions, and choose Done.

To delete the application after removing the users, return to the application details page and choose Delete.

Conclusion
For enterprise generative AI assistants such as the one shown in this post to be successful, they must respect access control as well as assure the privacy and confidentiality of every employee. Amazon Q Business achieves this by integrating with IAM Identity Center or with IAM Federation to provide a solution that authenticates each user and validates the user identity at each step to enforce access control along with privacy and confidentiality.
In this post, we showed how Amazon Q Business IAM Federation uses SAML 2.0 and OIDC IAM identity providers to uniquely identify a user authenticated by the enterprise IdP, and then that user identity is used to match up document ACLs set up in the data source. At query time, Amazon Q Business responds to a user query utilizing only those documents that the user is authorized to access. This functionality is similar to that achieved by the integration of Amazon Q Business with IAM Identity Center we saw in an earlier post. Additionally, we also provided the guidelines to consider when choosing a user access mechanism.
To learn more, refer to Amazon Q Business, now generally available, helps boost workforce productivity with generative AI and the Amazon Q Business User Guide.

About the authors
Abhinav Jawadekar is a Principal Solutions Architect in the Amazon Q Business service team at AWS. Abhinav works with AWS customers and partners to help them build generative AI solutions on AWS.
Venky Nagapudi is a Senior Manager of Product Management for Q Business, Amazon Comprehend and Amazon Translate. His focus areas on Q Business include user identity management, and using offline intelligence from documents to improve Q Business accuracy and helpfulness.

Unleashing the power of generative AI: Verisk’s Discovery Navigator …

This post is co-written with Sneha Godbole and Kate Riordan from Verisk.
Verisk (Nasdaq: VRSK) is a leading strategic data analytics and technology partner to the global insurance industry. It empowers its customers to strengthen operating efficiency, improve underwriting and claims outcomes, combat fraud, and make informed decisions about global risks, including climate change, extreme events, sustainability, and political issues. At the forefront of harnessing cutting-edge technologies in the insurance sector such as generative artificial intelligence (AI), Verisk is committed to enhancing its clients’ operational efficiencies, productivity, and profitability. Verisk’s generative AI-powered solutions and applications are developed with a steadfast commitment to ethical and responsible use of AI, incorporating privacy and security controls, human oversight, and transparent practices consistent with its ethical AI principles and governance practices.
Verisk’s Discovery Navigator product is a leading medical record review platform designed for property and casualty claims professionals, with applications to any industry that manages large volumes of medical records. It streamlines document review for anyone needing to identify medical information within records, including bodily injury claims adjusters and managers, nurse reviewers and physicians, administrative staff, and legal professionals. By replacing hours of manual review for a single claim, insurers can modernize the reviewer’s workflow, saving time and empowering better, faster decision-making, which is critical to improving outcomes.
With AI-powered analysis, the process of reviewing an average file of a few hundred pages is reduced to minutes with Discovery Navigator. By responsibly building proprietary AI models created with Verisk’s extensive clinical, claims, and data science expertise, complex and unstructured documents are automatically organized, reviewed, and summarized. It employs sophisticated AI to extract medical information from records, providing users with structured information that can be easily reviewed and uploaded into their claims management system. This allows reviewers to access necessary information in minutes, compared to the hours spent doing this manually.
Discovery Navigator recently released automated generative AI record summarization capabilities. It was built using Amazon Bedrock, a fully managed service from AWS that provides access to foundation models (FMs) from leading AI companies through an API to build and scale generative AI applications. This new functionality offers an immediate overview of the initial injury and current medical status, empowering record reviewers of all skill levels to quickly assess injury severity with the click of a button. By automating the extraction and organization of key treatment data and medical information into a concise summary, claims handlers can now identify important bodily injury claims data faster than before.
In this post, we describe the development of the automated summary feature in Discovery Navigator incorporating generative AI, the data, the architecture, and the evaluation of the pipeline.
Solution overview
Discovery Navigator is designed to retrieve medical information and generate summaries from medical records. These medical records are mostly unstructured documents, often containing multiple dates of service. Examples of the myriad of documents include provider notes, tables in different formats, body figures to describe the injury, medical charts, health forms, and handwritten notes. The medical record documents are scanned and typically available as a single file.
Following a virus scan, the most immediate step in Discovery Navigator’s AI pipeline is to convert the scanned image pages of medical records into searchable documents. For this optical character recognition (OCR) conversion process, Discovery Navigator uses Amazon Textract.
The following figure illustrates the architecture of the Discovery Navigator AI pipeline.

Discovery Navigator AI Pipeline

The OCR converted medical records are passed through various AI models that extract key medical data. The AI extracted medical information is used to add highlighting in the original medical record document and to generate an indexed report. The highlighted medical record document allows the user to focus on the provided results and target their review towards the pages with highlights, thereby saving time. The report gives a quick summary of the extracted medical information with page links to navigate through the document for review.
The following figure shows the Discovery Navigator generative AI auto-summary pipeline. The OCR converted medical record pages are processed through Verisk’s AI models and select pages are sent to Amazon Bedrock using AWS PrivateLink, for generating visit summaries. The user is given a summary report consisting of AI extracted medical information and generative AI summaries.

Discovery Navigator Inference Pipeline

Discovery Navigator results
Discovery Navigator produces results in two different ways: first, it provides an initial document containing an indexed report of identified medical data points and includes a highlighting feature within the original document to emphasize the results. Additionally, an optional automated high-level summary created through generative AI capabilities is provided.
Discovery Navigator offers multiple different medical models, for example, diagnosis codes. These codes are identified and highlighted in the document. In the sample in the following figure, additional intelligence is provided utilizing a note feature to equip the user with the clinical description directly on the page, avoiding time spent locating this information elsewhere. The Executive Summary report displays an overview of all the medical terms extracted from the medical record, and the Index Report provides page links for quick review.

Indexed reports of extracted medical information

Discovery Navigator’s new generative AI summary feature creates an in-depth summarization report, as shown in the following figure. This report includes a summary of the initial injury following the date of loss, a list of certain medical information extracted from the medical record, and a summary of the future treatment plan based on the most recent visit in the medical record.

Discovery Navigator Executive Summary

Performance
To assess the generative AI summary quality, Verisk designed human evaluation metrics with the help of in-house clinical expertise. Verisk conducted multiple rounds of human evaluation of the generated summaries with respect to the medical records. Feedback from each round of tests was incorporated in the following test.
Verisk’s evaluation involved three major parts:

Prompt engineering – Prompt engineering is the process where you guide generative AI solutions to generate desired output. Verisk framed prompts using their in-house clinical experts’ knowledge on medical claims. With each round of testing, Verisk added instructions to the prompts to capture the pertinent medical information and to reduce possible hallucinations. The generative AI large language model (LLM) can be prompted with questions or asked to summarize a given text. Verisk decided to test three approaches: a question answer prompt, summarize prompt, and question answer prompt followed by summarize prompt.
Splitting of document pages – The medical record generative AI summaries are created for each date of visit in the medical record. Verisk tested two strategies of splitting the pages by visit: split visit pages individually and send them to a text splitter to generate text chunks for generative AI summarization, or concatenate all visit pages and send them to a text splitter to generate text for generative AI summarization. Summaries generated from each strategy were used during evaluation of the generative AI summary.
Quality of summary – For the generative AI summary, Verisk wanted to capture information regarding the reason for visit, assessment, and future treatment plan. For evaluation of summary quality, Verisk created a template of questions for the clinical expert, which allowed them to assess the best performing prompt in terms of inclusion of required medical information and the best document splitting strategy. The evaluation questions also collected feedback on the number of hallucinations and inaccurate or not helpful information. For each summary presented to the clinical expert, they were asked to categorize it as either good, acceptable, or bad.

Based on Verisk’s evaluation template questions and rounds of testing, they concluded that the question answer prompt with concatenated pages generated over 90% good or acceptable summaries with low hallucinations and inaccurate or unnecessary information.
Business impact
By quickly and accurately summarizing key medical data from bodily injury claims, Verisk’s Discovery Navigator, with its new generative AI auto-summary feature powered by Amazon Bedrock, has immense potential to drive operational efficiencies and boost profitability for insurers. The automated extraction and summarization of critical treatment information allows claims handlers to expedite the review process, thereby reducing settlement times. This accelerated claim resolution can help minimize claims leakage and optimize resource allocation, enabling insurers to focus efforts on more complex cases. The Discovery Navigator platform has a proven to be up to 90% faster than manual record review, allowing claims handlers to compile record summaries in a fraction of the time.
Conclusion
The incorporation of generative AI into Discovery Navigator underscores Verisk’s commitment to using cutting-edge technologies to drive operational efficiencies and enhance outcomes for its clients in the insurance industry. By automating the extraction and summarization of key medical data, Discovery Navigator empowers claims professionals to expedite the review process, facilitate quicker settlements, and ultimately provide a superior experience for customers. The collaboration with AWS and the successful integration of FMs from Amazon Bedrock have been pivotal in delivering this functionality. The rigorous evaluation process, guided by Verisk’s clinical expertise, makes sure that the generated summaries meet the highest standards of accuracy, relevance, and reliability.
As Verisk continues to explore the vast potential of generative AI, the Discovery Navigator auto-summary feature serves as a testament to the company’s dedication to responsible and ethical AI adoption. By prioritizing transparency, security, and human oversight, Verisk aims to build trust and drive innovation while upholding its core values. Looking ahead, Verisk remains steadfast in its pursuit of harnessing advanced technologies to unlock new levels of efficiency, insight, and value for its global customer base. With a focus on continuous improvement and a deep understanding of industry needs, Verisk is poised to shape the future of insurance analytics and drive resilience across communities and businesses worldwide.
Resources

Explore generative AI on AWS
Learn about unlocking the business value of generative AI
Learn more about Anthropic Claude 3 on Amazon Bedrock
Learn about Amazon Bedrock and how to build and scale generative AI applications with FMs
Generative AI Quickstart POCs

About the Authors
Sneha Godbole is a AVP of Analytics at Verisk. She has partnered with Verisk leaders on creating Discovery Navigator, an AI powered tool that automatically enables identification and retrieval of key data points within large unstructured documents. Sneha holds two Master of Science degrees (from University of Utah and SUNY Buffalo) and a Data Science Specialization certificate from Johns Hopkins University. Prior to joining Verisk, Sneha has worked as a software developer in France to build android solutions and collaborated on a paper publication with Brigham Young University, Utah.
Kate Riordan is the Director of Automation Initiatives at Verisk. She currently is the product owner for Discovery Navigator, an AI powered tool that automatically enables identification and retrieval of key data points within large unstructured documents and oversees automation and efficiency projects. Kate began her career at Verisk as a Medicare Set Aside compliance attorney. In that role, she completed and obtained CMS approval of hundreds of Medicare Set Asides. She is fluent in Section 111 reporting requirements, the conditional payment recovery process, Medicare Advantage, Part D and Medicaid recovery. Kate is a member of the Massachusetts bar.
Ryan Doty is a Sr. Solutions Architect at AWS, based out of New York. He helps enterprise customers in the Northeast U.S. accelerate their adoption of the AWS Cloud by providing architectural guidelines to design innovative and scalable solutions. Coming from a software development and sales engineering background, the possibilities that the cloud can bring to the world excite him.
Tarik Makota is a Principal Solutions Architect with Amazon Web Services. He provides technical guidance, design advice, and thought leadership to AWS’ customers across the US Northeast. He holds an M.S. in Software Development and Management from Rochester Institute of Technology.
Dom Bavaro is a Senior Solutions Architect for Financial Services. While providing technical guidance to customers across many use cases, He is focused on helping customer build and productionize Generative AI solutions and workflows.

Index your Atlassian Confluence Cloud contents using the Amazon Q Conf …

Amazon Q Business is a generative artificial intelligence (AI)-powered assistant designed to enhance enterprise operations. It’s a fully managed service that helps provide accurate answers to users’ questions while honoring the security and access restrictions of the content. It can be tailored to your specific business needs by connecting to your company’s information and enterprise systems using built-in connectors to a variety of enterprise data sources. Amazon Q Business enables users in various roles, such as marketing managers, project managers, and sales representatives, to have tailored conversations, solve business problems, generate content, take action, and more, through a web interface. This service aims to help make employees work smarter, move faster, and drive significant impact by providing immediate and relevant information to help them with their tasks.
One such enterprise data repository you can use to store content is Atlassian Confluence. Confluence is a team workspace that provides a place to create, and collaborate on various projects, products, or ideas. Team spaces help your teams structure, organize, and share work, so each user has visibility into the institutional knowledge of the enterprise and access to the information they need or answers to the questions they have.
There are two Confluence offerings:

Cloud – This is offered as a software as a service (SaaS) product. It’s always on and continuously updated.
Data Center (self-managed) – Here, you host Confluence on your infrastructure, which may be on premises or the cloud, allowing you to keep data within your chosen environment and manage it yourself.

Your users may need to get answers in Amazon Q Business from the content in Atlassian’s Confluence Cloud instance as a part of their work. For this you will need to configure an Amazon Q Confluence Cloud connector. As a part of this configuration, one of the steps is to configure the authentication of the connector so that it can authenticate with Confluence (Cloud) and then index the relevant content.
This post covers the steps to configure the Confluence Cloud connector for Amazon Q Business.
Types of documents
When you connect Amazon Q to a data source, what Amazon Q considers—and crawls—as a document varies by connector. The Confluence Cloud connector crawls the following as documents:

Spaces – Each space is considered a single document.
Pages – Each page is considered a single document.
Blogs – Each blog is considered a single document.
Comments – Each comment is considered a single document.
Attachments – Each attachment is considered a single document.

Metadata
Every document has structural attributes—or metadata—attached to it. Document attributes can include information such as document title, document author, time created, time updated, and document type.
When you connect Amazon Q Business to a data source, it automatically maps specific data source document attributes to fields within an Amazon Q Business index. If a document attribute in your data source doesn’t have an attribute mapping already available, or if you want to map additional document attributes to index fields, use the custom field mappings to specify how a data source attribute maps to an Amazon Q Business index field. You create field mappings by editing your data source after your application and retriever are created.
To learn more about the supported entities and the associated reserved and custom attributes for the Amazon Q Confluence connector, refer to Amazon Q Business Confluence (Cloud) data source connector field mappings.
Authentication types
An Amazon Q Business application requires you to use AWS IAM Identity Center to manage user access. Although it’s recommended to have an IAM Identity Center instance configured (with users federated and groups added) before you start, you can also choose to create and configure an IAM Identity Center instance for your Amazon Q Business application using the Amazon Q console.
You can also add users to your IAM Identity Center instance from the Amazon Q Business console, if you aren’t federating identity. When you add a new user, make sure that the user is enabled in your IAM Identity Center instance and they have verified their email ID. They need to complete these steps before they can log in to your Amazon Q Business web experience.
Your identity source in IAM Identity Center defines where your users and groups are managed. After you configure your identity source, you can look up users or groups to grant them single sign-on access to AWS accounts, applications, or both.
You can have only one identity source per organization in AWS Organizations. You can choose one of the following as your identity source:

IAM Identity Center directory – When you enable IAM Identity Center for the first time, it’s automatically configured with an IAM Identity Center directory as your default identity source. This is where you create your users and groups, and assign their level of access to your AWS accounts and applications.
Active Directory – Choose this option if you want to continue managing users in either your AWS Managed Microsoft AD directory using AWS Directory Service or your self-managed directory in Active Directory (AD).
External Identity Provider – Choose this option if you want to manage users in other external identity providers (IdPs) through the Security Assertion Markup Language (SAML) 2.0 standard, such as Okta.

Access control lists
Amazon Q Business connectors index access control list (ACL) information that’s attached to a Confluence document along with the document itself. For document ACLs, Amazon Q Business indexes the following:

User email address
Group name for the local group
Group name for the federated group

When you connect a Confluence (Cloud) data source to Amazon Q Business, the connector crawls ACL (user and group) information attached to a document from your Confluence (Cloud) instance. The information is used to determine which content can be used to construct chat responses for a given user, according the end-user’s document access permissions.
You configure user and group access to Confluence spaces using the space permissions page, in Confluence. Similarly for pages and blogs, you use the restrictions page. For more information about space permissions, see Space Permissions Overview on the Confluence Support website. For more information about page and blog restrictions, see Page Restrictions on the Confluence Support website.
An Amazon Q Business connector updates any changes in ACLs each time that your data source content is crawled. To capture ACL changes to make sure that the right end-users have access to the right content, re-sync your data source regularly.
Identity crawling for Amazon Q Business User Store
As stated earlier, Amazon Q Business crawls ACL information at the document level from supported data sources. In addition, Amazon Q Business crawls and stores principal information within each data source (local user alias, local group, and federated group identity configurations) into the Amazon Q Business User Store. This is useful when your application is connected to multiple data sources with different authorization and authentication systems, but you want to create a unified, access-controlled chat experience for your end-users.
Amazon Q Business internally maps the local user and group IDs attached to the document, to the federated identities of users and groups. Mapping identities streamlines user management and speeds up chat responses by reducing ACL information retrieval time during chat requests. Identity crawling, along with the authorization feature, helps filter and generate web experience content restricted by end-user context. For more information about this process, see Understanding Amazon Q Business User Store.
The group and user IDs are mapped as follows:

_group_ids – Group names are present on spaces, pages, and blogs where there are restrictions. They’re mapped from the name of the group in Confluence. Group names are always lowercase.
_user_id – Usernames are present on the space, page, or blog where there are restrictions. They’re mapped depending on the type of Confluence instance that you’re using. For Confluence Cloud, the _user_id is the account ID of the user.

Overview of solution
With Amazon Q Business, you can configure multiple data sources to provide a central place to search across your document repository. For our solution, we demonstrate how to index a Confluence repository using the Amazon Q Business connector for Confluence. In this blog we will:

Configure an Amazon Q Business Application.
Connect Confluence (Cloud) to Amazon Q Business.
Index the data in the Confluence repository.
Run a sample query to test the solution.

Prerequisites
Before you begin using Amazon Q Business for the first time, complete the following tasks:

Set up your AWS account.
Optionally, install the AWS Command Line Interface (AWS CLI).
Optionally, set up the AWS SDKs.
Consider AWS Regions and endpoints.
Set up required permissions.
Enable and configure an IAM Identity Center instance.

For more information, see Setting up for Amazon Q Business.
To set up the Amazon Q Business connector for Confluence, you need to complete additional prerequisites. For more information, see Prerequisites for connecting Amazon Q Business to Confluence (Cloud).
Create an Amazon Q Business application with the Confluence Cloud connector
As the first step towards creating a generative AI assistant, you configure an application. Then you select and create a retriever, and also connect any data sources. After this, you grant end-user access to users to interact with an application using the preferred identity provider, IAM Identity Center. Complete the following steps:

On the Amazon Q Business console, choose Get started.

Figure 1: Initial Amazon Q for Business home page

On the Applications page, choose Create application.

Figure 2: Amazon Q for Business application creation page

Enter a name for your application, select the level of service access, and connect to IAM Identity Center. (Note: The IAM Identity Center instance does not have to be in the same Region as Amazon Q Business.)
Choose Create.

Figure 3: Amazon Q for Business application configuration page

For additional details on configuring the Amazon Q application and connecting to IAM Identity Center, refer to Creating an Amazon Q Business application environment.

Select your retriever and index provisioning options.
Choose Next.

Figure 4: Amazon Q for Business retriever selection page

For additional details on creating and selecting a retriever, refer to Creating and selecting a retriever for an Amazon Q Business application.

Connect to Confluence as your data source.
Enter a name and description.
Select Confluence Cloud as the source and enter your Confluence URL.

Figure 5: Confluence connector page

There are two options for Authentication: Basic authentication and OAuth 2.0 authentication. Select the best option depending on your use case.

Figure 6: Confluence connector authentication options

Before you connect Confluence (Cloud) to Amazon Q Business, you need to create and retrieve the Confluence (Cloud) credentials you will use to connect Confluence (Cloud) to Amazon Q Business. You also need to add any permissions needed by Confluence (Cloud) to connect to Amazon Q Business.
The following procedures give you an overview of how to configure Confluence (Cloud) to connect to Amazon Q Business using either basic authentication or OAuth 2.0 authentication.
Configure Confluence (Cloud) basic authentication for Amazon Q Business
Complete the following steps to configure basic authentication:

Log in to your account from Confluence (Cloud). Note the username you logged in with. You will need this later to connect to Amazon Q Business.
From your Confluence (Cloud) home page, note your Confluence (Cloud) URL from your Confluence browser URL. For example, https://example.atlassian.net. You will need this later to connect to Amazon Q Business.
Navigate to the Security page in Confluence (Cloud).
On the API tokens page, choose Create API token.

Figure 7: Confluence API token creation

In the Create an API token dialog box, for Label, add a name for your API token.
Choose Create.

Figure 8: Confluence API token labelling

From the Your new API token dialog box, copy the API token and save it in your preferred text editor. You can’t retrieve the API token after you close the dialog box.

Figure 9: Copying your Confluence API token

Choose Close.

You now have the username, Confluence (Cloud) URL, and Confluence (Cloud) API token you need to connect to Amazon Q Business with basic authentication.
For more information, see Manage API tokens for your Atlassian account in Atlassian Support.
Configure Confluence (Cloud) OAuth 2.0 authentication for Amazon Q Business
Complete the following steps to configure Confluence (Cloud) OAuth 2.0 authentication:

Retrieve the username and Confluence (Cloud) URL.
Configure an OAuth 2.0 app integration.
Retrieve the Confluence (Cloud) client ID and client secret.
Generate a Confluence (Cloud) access token.
Generate a Confluence (Cloud) refresh token.
Generate a new Confluence (Cloud) access token using a refresh token.

Retrieve the username and Confluence (Cloud) URL
Complete the following steps:

Log in to your account from Confluence (Cloud). Note the username you logged in with. You will need this later to connect to Amazon Q Business.
From your Confluence (Cloud) home page, note your Confluence (Cloud) URL from your Confluence browser URL. For example, https://example.atlassian.net. You will need this later to both configure your OAuth 2.0 token and connect to Amazon Q Business.

Configuring an OAuth 2.0 app integration
Complete the following steps:

Log in to your account from the Atlassian Developer page.
Choose the profile icon in the top-right corner and on the dropdown menu, choose Developer console.

Figure 10: Logging into the Confluence Developer Console

On the welcome page, choose Create and choose OAuth 2.0 integration.

Figure 11: Creating your Confluence OAuth 2.0 token

Under Create a new OAuth 2.0 (3LO) integration, for Name, enter a name for the OAuth 2.0 application you’re creating. Then, read the Developer Terms, and select I agree to be bound by Atlassian’s developer terms checkbox, if you do.
Select Create.

Figure 12: Creating your Confluence OAuth 2.0 integration
The console will display a summary page outlining the details of the OAuth 2.0 app you created.

Figure 13: Your Confluence application

Still in the Confluence console, in the navigation pane, choose Authorization.
Choose Add to add OAuth 2.0 (3LO) to your app.

Figure 14: Adding OAuth 2.0 to your Confluence app

Under OAuth 2.0 authorization code grants (3LO) for apps, for Callback URL, enter the Confluence (Cloud) URL you copied, then choose Save changes.

Figure 15: Adding OAuth 2.0 to your Confluence app (part 2)

Under Authorization URL generator, choose Add APIs to add APIs to your app. This will redirect you to the Permissions page.
On the Permissions page, for Scopes, navigate to User Identity API. Select Add, then select Configure.

Figure 16: Configuring Permissions for your Confluence app

Under User Identity API, choose Edit Scopes, then add the following read scopes:

read:me – View active user profile.
read:account – View user profiles.

Figure 17: Configuring Scopes for your Confluence app

Choose Save and return to the Permissions page.
On the Permissions page, for Scopes, navigate to Confluence API. Select Add, and then select Configure.

Figure 18: Configuring Permissions for your Confluence app (part 2)

Under Confluence API, make sure you’re on the Classic scopes tab.

Figure 19: Configuring Permissions for your Confluence app (part 3)

Choose Edit Scopes and add the following read scopes:

read:confluence-space.summary – Read Confluence space summary.
read:confluence-props – Read Confluence content properties.
read:confluence-content.all – Read Confluence detailed content.
read:confluence-content.summary – Read Confluence content summary.
read:confluence-content.permission – Read content permission in Confluence.
read:confluence-user – Read user.
read:confluence-groups – Read user groups.

Choose Save.
Navigate to the Granular scopes

Figure 20: Configuring Permissions for your Confluence app (part 4)

Choose Edit Scopes and add the following read scopes:

read:content:confluence – View detailed contents.
read:content-details:confluence – View content details.
read:space-details:confluence – View space details.
read:audit-log:confluence – View audit records.
read:page:confluence – View pages.
read:attachment:confluence – View and download content attachments.
read:blogpost:confluence – View blog posts.
read:custom-content:confluence – View custom content.
read:comment:confluence – View comments.
read:template:confluence – View content templates.
read:label:confluence – View labels.
read:watcher:confluence – View content watchers.
read:group:confluence – View groups.
read:relation:confluence – View entity relationships.
read:user:confluence – View user details.
read:configuration:confluence – View Confluence settings.
read:space:confluence – View space details.
read:space.permission:confluence – View space permissions.
read:space.property:confluence – View space properties.
read:user.property:confluence – View user properties.
read:space.setting:confluence – View space settings.
read:analytics.content:confluence – View analytics for content.
read:content.permission:confluence – Check content permissions.
read:content.property:confluence – View content properties.
read:content.restriction:confluence – View content restrictions.
read:content.metadata:confluence – View content summaries.
read:inlinetask:confluence – View tasks.
read:task:confluence – View tasks.
read:permission:confluence – View content restrictions and space permissions.
read:whiteboard:confluence – View whiteboards.
read:app-data:confluence – Read app data.

For more information, see Implementing OAuth 2.0 (3LO) and Determining the scopes required for an operation in Atlassian Developer.
Retrieve the Confluence (Cloud) client ID and client secret
Complete the following steps:

In the navigation pane, choose Settings.
In the Authentication details section, copy and save the following in your preferred text editor:

Client ID – You enter this as the app key on the Amazon Q Business console.
Secret – You enter this as the app secret on the Amazon Q Business console.

Figure 21: Retrieving Confluence app authentication details

You need these to generate your Confluence (Cloud) OAuth 2.0 token and also to connect Amazon Q Business to Confluence (Cloud).
For more information, see Implementing OAuth 2.0 (3LO) and Determining the scopes required for an operation in the Atlassian Developer documentation.
Generate a Confluence (Cloud) access token
Complete the following steps:

Log in to your Confluence account from the Atlassian Developer page.
Open the OAuth 2.0 app you want to generate a refresh token for.
In the navigation pane, choose Authorization.
For OAuth 2.0 (3LO), choose Configure.
On the Authorization page, under Authorization URL generator, copy the URL for Granular Confluence API authorization URL and save it in your preferred text editor.

Figure 22: Retrieving Confluence API URL details

The URL is in the following format:

https://auth.atlassian.com/authorize?

audience=api.atlassian.com

&client_id=YOUR_CLIENT_ID

&scope=REQUESTED_SCOPE%20REQUESTED_SCOPE_TWO

&redirect_uri=https://YOUR_APP_CALLBACK_URL

&state=YOUR_USER_BOUND_VALUE

&response_type=code

&prompt=consent

In the saved authorization URL, update the state=${YOUR_USER_BOUND_VALUE} parameter value to any text of your choice. For example, state=sample_text.

For more information, see What is the state parameter used for? in the Atlassian Support documentation.

Open your preferred web browser and enter the authorization URL you copied into the browser URL.
On the page that opens, make sure everything is correct and choose Accept.

Figure 23: Testing a Confluence API URL

You will be returned to your Confluence (Cloud) home page.

Copy the URL of the Confluence (Cloud) home page and save it in your preferred text editor.

The URL contains the authorization code for your application. You will need this code to generate your Confluence (Cloud) access token. The whole section after code= is the authorization code.

Navigate to Postman.

If you don’t have Postman installed on your local system, you can also choose to use cURL to generate a Confluence (Cloud) access token. Use the following cURL command to do so:

curl –location ‘https://auth.atlassian.com/oauth/token’
–header ‘Content-Type: application/json’
–data ‘{“grant_type”: “authorization_code”,
“client_id”: “YOUR_CLIENT_ID”,
“client_secret”: “YOUR_CLIENT_SECRET”,
“code”: “AUTHORIZATION_CODE”,
“redirect_uri”: “YOUR_CALLBACK_URL”}’

If, however, you have Postman installed, on the main Postman window, choose POST as the method, then enter the following URL: https://auth.atlassian.com/oauth/token.
Choose Body, then choose raw and JSON.

Figure 24: Testing a Confluence access token in Postman

In the text box, enter the following code extract, replacing the fields with your credential values:

{“grant_type”: “authorization_code”,
“client_id”: “YOUR_CLIENT_ID”,
“client_secret”: “YOUR_CLIENT_SECRET”,
“code”: “YOUR_AUTHORIZATION_CODE”,
“redirect_uri”: “https://YOUR_APP_CALLBACK_URL”}

Choose Send.

If everything is configured correctly, Postman will return an access token.

Copy the access token and save it in your preferred text editor. You will need it to connect Confluence (Cloud) to Amazon Q Business.

For more information, see Implementing OAuth 2.0 (3LO) in the Atlassian Developer documentation.
Generate a Confluence (Cloud) refresh token
The access token you use to connect Confluence (Cloud) to Amazon Q Business using OAuth 2.0 authentication expires after 1 hour. When it expires, you can either repeat the whole authorization process and generate a new access token, or generate a refresh token.
Refresh tokens are implemented using a rotating refresh token mechanism. Each time they’re used, rotating refresh tokens issues a new limited-life refresh token that is valid for 90 days. Each new rotating refresh token resets the inactivity expiry time and allocates another 90 days. This mechanism improves on single persistent refresh tokens by reducing the period in which a refresh token can be compromised and used to obtain a valid access token. For additional details, see OAuth 2.0 (3LO) apps in the Atlassian Developer documentation.
To generate a refresh token, you add a %20offline_access parameter to the end of the scope value in the authorization URL you used to generate your access token. Complete the following steps to generate a refresh token:

Log in to your account from the Atlassian Developer page.
Open the OAuth 2.0 app you want to generate a refresh token for.
In the navigation pane, choose Authorization.
For OAuth 2.0 (3LO), choose Configure.
On the Authorization page, under Authorization URL generator, copy the URL for Granular Confluence API authorization URL and save it in your preferred text editor.

Figure 25: Retrieving Confluence API URL details

In the saved authorization URL, update the state=${YOUR_USER_BOUND_VALUE} parameter value to any text of your choice. For example, state=sample_text.

For more information, see What is the state parameter used for? in the Atlassian Support documentation.

Add the following text at the end of the scope value in your authorization URL: %20offline_access and copy it. For example:

https://auth.atlassian.com/authorize?

audience=api.atlassian.com

&client_id=YOUR_CLIENT_ID

&scope=REQUESTED_SCOPE%20REQUESTED_SCOPE_TWO%20offline_access

&redirect_uri=https://YOUR_APP_CALLBACK_URL

&state=YOUR_USER_BOUND_VALUE

&response_type=code

&prompt=consent

Open your preferred web browser and enter the modified authorization URL you copied into the browser URL.
On the page that opens, make sure everything is correct and then choose Accept.

Figure 26: Testing a Confluence API URL

You will be returned to the Confluence (Cloud) console.

Copy the URL of the Confluence (Cloud) home page and save it in a text editor of your choice.

The URL contains the authorization code for your application. You will need this code to generate your Confluence (Cloud) refresh token. The whole section after code= is the authorization code.

Navigate to Postman.

If you don’t have Postman installed on your local system, you can also choose to use cURL to generate a Confluence (Cloud) access token. Use the following cURL command to do so:

curl –location ‘https://auth.atlassian.com/oauth/token’
–header ‘Content-Type: application/json’
–data ‘{“grant_type”: “authorization_code”,
“client_id”: “YOUR CLIENT ID”,
“client_secret”: “YOUR CLIENT SECRET”,
“code”: “AUTHORIZATION CODE”,
“redirect_uri”: “YOUR CALLBACK URL”}’

If, however, you have Postman installed, on the main Postman window, choose POST as the method, then enter the following URL: https://auth.atlassian.com/oauth/token.
Choose Body on the menu, then choose raw and JSON.

Figure 27: Retrieving a Confluence refresh token in Postman

In the text box, enter the following code extract, replacing the fields with your credential values:

{“grant_type”: “authorization_code”,
“client_id”: “YOUR_CLIENT_ID”,
“client_secret”: “YOUR_CLIENT_SECRET”,
“code”: “YOUR_AUTHORIZATION_CODE”,
“redirect_uri”: “https://YOUR_APP_CALLBACK_URL”}

Choose Send.

If everything is configured correctly, Postman will return a refresh token.

Copy the refresh token and save it using your preferred text editor. You will need it to connect Confluence (Cloud) to Amazon Q Business.

For more information, see Implementing a Refresh Token Flow in the Atlassian Developer documentation.
Generate a new Confluence (Cloud) access token using a refresh token
You can use the refresh token you generated to create a new access token and refresh token pair when an existing access token expires. Complete the following steps to generate a refresh token:

Copy the refresh token you generated following the steps in the previous section.
Navigate to Postman.

If you don’t have Postman installed on your local system, you can also choose to use cURL to generate a Confluence (Cloud) access token. Use the following cURL command to do so:

curl –location ‘https://auth.atlassian.com/oauth/token’
–header ‘Content-Type: application/json’
–data ‘{“grant_type”: “refresh_token”,
“client_id”: “YOUR_CLIENT_ID”,
“client_secret”: “YOUR_CLIENT_SECRET”,
“refresh_token”: “YOUR_REFRESH_TOKEN”}’

In the Postman main window, choose POST as the method, then enter the following URL: https://auth.atlassian.com/oauth/token.
Choose Body from the menu and choose raw and JSON.

Figure 28: Using a Confluence refresh token in Postman

In the text box, enter the following code extract, replacing the fields with your credential values:

{“grant_type”: “refresh_token”,
“client_id”: “YOUR_CLIENT_ID”,
“client_secret”: “YOUR_CLIENT_SECRET”,
“refresh_token”: “YOUR REFRESH TOKEN”}

Choose Send.

If everything is configured correctly, Postman will return a new access token and refresh token pair in the following format:

{“access_token”: “string,
“expires_in”: “expiry time of access_token in seconds”,
“scope”: “string”,
“refresh_token”: “string”}

For more information, see Implementing a Refresh Token Flow and How do I get a new access token, if my access token expires or is revoked? in the Atlassian Developer documentation.
Continue creating your application
Complete the following steps to continue creating your application:

For AWS Secrets Manager secret, choose an existing secret or create an AWS Secrets Manager secret to store your Confluence authentication credentials. If you choose to create a secret, an AWS Secrets Manager window opens. Enter the following information in the window:

For Secret name, enter a name for your secret.
Enter the information you generated earlier:

If using Basic Authentication, enter your Secret name, User name, and Password (Confluence API Token) that you generated and downloaded from your Confluence account.
If using OAuth2.0 Authentication, enter the Secret name, App key, App secret, Access token, and Refresh token that you created in your Confluence account.

Choose Save and add secret.For additional details on creating a Secrets Manager secret, refer to Create an AWS Secrets Manager secret.

Choose the secret you created to use for your Confluence connector.

Figure 29: Selecting a secret in Secrets Manager

Under Configure VPC and security group, you can choose whether you want to use a VPC (Optional). If you do (which we recommend), enter the following information:

For Subnets, enter up to 6 repository subnets that define the subnets and IP ranges the repository instance uses in the selected VPC.
For VPC security groups, Choose up to 10 security groups that allow access to your data source.For more information, see Virtual private cloud.

Figure 30: Configuring VPC and Security Group in Amazon Q Business

Under Identity crawler, confirm that crawling is enabled.Amazon Q Business crawls identity information from your data source by default to make sure the responses from your connected data sources are generated only from documents end-users have access to. For more information, see Identity crawler.By default, an Amazon Q Business application is configured to respond to end user chat queries using only enterprise data. If you would like Amazon Q Business to use the underlying LLM knowledge to generate responses when it can’t find the information from your connected data sources, you can enable this in the Response settings under your application guardrails.
Under IAM role, choose an existing AWS Identity and Access Management (IAM) role or create an IAM role to access your repository credentials and index content.Creating a new service role is recommended. For more information, see IAM role for Amazon Q Confluence (Cloud) connector.

Figure 31: Configuring IAM role in Amazon Q Business

Under Sync scope, choose from the following options:

For Sync contents, you can choose to sync from the following entity types: pages, page comments, page attachments, blogs, blog comments, blog attachments, personal spaces, archived spaces, and archived pages.
For Maximum single file size, specify the file size limit in megabytes that Amazon Q Business will crawl. Amazon Q Business will crawl only the files within the size limit you define. The file size should be greater than 0 MB and less than or equal to 50 MB.

Under Additional configuration, for Space and regex patterns, specify whether to include or exclude specific spaces in your index with the following settings:

Space key – For example, my-space-123.
URL – For example, .*/MySite/MyDocuments/.
File type – For example, .*.pdf, .*.txt.
For Entity title regex patterns, specify regular expression patterns to include or exclude certain blogs, pages, comments, and attachments by titles.

Figure 32: Configuring scopes and regexes in Amazon Q Business

Under Sync mode, choose how you want to update your index when your data source content changes. When you sync your data source with Amazon Q Business for the first time, all content is synced by default. You have the following options:

Full sync – Sync all content regardless of the previous sync status.
New, modified, or deleted content sync – Sync only new, modified, and deleted documents.

Under Sync run schedule, for Frequency, choose how often Amazon Q Business will sync with your data source. For more details, see Sync run schedule.
Under Tags, you can optionally add tags to search and filter your resources or track your AWS costs. See Tagging resources for more details.

Figure 33: Configuring sync mode, sync frequency, and tagging

Under Field mappings, select the data source document attributes to map to your index fields. Add the fields from the Data source details page after you finish adding your data source. You can choose from two types of fields:

Default – Automatically created by Amazon Q Business on your behalf based on common fields in your data source. You can’t edit these.
Custom – Automatically created by Amazon Q Business on your behalf based on common fields in your data source. You can edit these. You can also create and add new custom fields.For more information, see Field mappings.

To finish connecting your data source to Amazon Q, choose Add data source.

Figure 34: Mapping Confluence fields in Amazon Q Business

After the Confluence connector is created, you’re redirected to the Connect data sources page, where you can add additional data sources if needed.
Choose Next to continue.
Under Add or assign users and groups, you can to assign users or groups from IAM Identity Center. If you have the appropriate permissions, you have the ability to add new users. Select the appropriate option for you.
Choose Next.

Figure 35: Assigning users/ groups and Web experience service access in Amazon Q Business

Under Assign users and groups, you can choose the users or groups you want to add to your Amazon Q Business application. (In order for a user to get an answer from Amazon Q Business, the user IDs added in IAM Identity Center need to match the user IDs in Confluence.)
In Web experience service access, enter the following information:

For Choose a method to authorize Amazon Q Business – A service access role assumed by end users when they sign in to your web experience that grants them permission to start and manage conversations in Amazon Q Business. You can choose to use an existing role or create a new role.
Service role name – A name for the service role you created for easy identification on the console.

Select Create application.
Once the application is created, navigate to the Data source details section, choose Sync now to allow Amazon Q Business to begin syncing (crawling and ingesting) data from your data source.

When the sync job is complete, your data source is ready to use.

The time the sync will take depends on the size of your Confluence environment. Check back periodically to see if the sync has finished.
Run a sample query to test the solution
When the sync on your data source is complete, you can deploy the web experience to test the solution. For additional details for setting up the Amazon Q Business web experience, see Customizing an Amazon Q Business web experience.

Figure 37: Amazon Q Business web experience URLs

After you’re signed in to the web experience, try out a question based on information in your Confluence Cloud. The following screenshots show some examples.

Figure 38: Sample Amazon Q Business web experience prompt and completion

Figure 39: Sample Amazon Q Business web experience prompt and completion (part 2)

Figure 40: Sample Amazon Q Business web experience prompt and completion (part 3)

Amazon Q Business generates a response, as well as the citations to where the information came from. You can click the links in the citation to go directly to the source page.
Troubleshooting and FAQs
For information on troubleshooting your connector, see Troubleshooting your Amazon Q Business Confluence (Cloud) connector.
Refer to Amazon Q Business FAQs for frequently asked questions.
Clean up
If you no longer need your Amazon Q Business application, make sure to delete it to avoid unwanted costs. When you delete your application, it will remove the associated index and data connectors.

Figure 41: Deleting Amazon Q Business confluence connector

Conclusion
In this post, we provided an overview of Amazon Q Business Confluence Cloud connector and how you can use it for seamless integration of generative AI assistance to your Confluence Cloud. By using a single interface for the variety of data sources in the organization, you can enable employees to be more data-driven, efficient, prepared, and productive.
To learn more about Amazon Q Business connector for Confluence Cloud, refer to Connecting Confluence (Cloud) to Amazon Q Business.

About the Authors
Tyler Geary is a Solutions Architect at Amazon Web Services (AWS), where he is a member of the Enterprise Financial Services team, focusing on Insurance customers. He helps his customers identify business challenges and opportunities, tying them back to innovative solutions powered by AWS, with a particular focus on Generative AI. In his free time, Tyler enjoys hiking, camping, and spending time in the great outdoors.
Sumeet Tripathi is an Enterprise Support Lead (TAM) at AWS in North Carolina. He has over 17 years of experience in technology across various roles. He is passionate about helping customers to reduce operational challenges and friction. His focus area is AI/ML and Energy & Utilities Segment. Outside work, He enjoys traveling with family, watching cricket and movies.
Vishal Naik is a Sr. Solutions Architect at Amazon Web Services (AWS). He is a builder who enjoys helping customers accomplish their business needs and solve complex challenges with AWS solutions and best practices. His core area of focus includes Generative AI and Machine Learning. In his spare time, Vishal loves making short films on time travel and alternate universe themes.

Advancing Agricultural Sustainability: Integrating Remote Sensing, AI, …

Enhancing Agricultural Resilience through Remote Sensing and AI:

Modern agriculture faces significant challenges from climate change, limited water resources, rising production costs, and disruptions like the COVID-19 pandemic. These issues jeopardize the sustainability of food production systems, necessitating innovative solutions to meet the demands of a growing global population. Recent advancements in remote sensing and AI have opened new avenues for improving crop monitoring and management. By integrating these technologies, we can gather and analyze large-scale phenotypic data with unprecedented accuracy, facilitating the development of predictive and prescriptive management tools.

The study shows how remote sensing technologies, such as Unmanned Aircraft Systems (UAS), can be combined with AI to enhance agricultural systems’ resilience and efficiency. UAS provides detailed field-scale data, which can improve farm decision-making and system management when integrated with AI-driven analytics. Despite these advancements, challenges remain in adopting UAS technologies and incorporating them with spaceborne remote sensing data for broader applications. The study explores these challenges and prospects for leveraging these technologies to address national and global agricultural and nutritional difficulties.

Unmanned Aerial Systems (UAS) Revolutionizing Digital Agriculture:

UAS offers a cost-effective, efficient alternative to traditional field sensors, which can be costly and disruptive to agricultural operations. UAS equipped with sensors provide high-resolution, temporal, and spatial data on plant growth, surpassing conventional methods that often miss in-field variations. Recent research highlights the growing interest in UAS for agriculture, with applications ranging from assessing plant phenotypes and water stress to monitoring diseases and estimating yields. While UAS adoption faces data processing and interpretation challenges, their integration with AI promises to enhance agricultural efficiency and productivity.

Integrating Genomics and Phenomics with UAS:

Advanced genomics has enhanced crop breeding by identifying genetic markers linked to complex traits. However, bridging the gap between genomic data and phenotypic expression requires accurate field-level measurements. UAS addresses this need by providing high-resolution, consistent phenotypic data. Studies have shown that UAS can effectively measure traits like plant height and canopy cover, complementing genomic tools and improving breeding accuracy. By integrating UAS with genomic analysis, researchers can uncover detailed genotype-phenotype relationships, enhance marker-assisted breeding, and accelerate the development of superior crop varieties.

Combining Remote Sensing, Simulation Models, and AI in Digital Agriculture:

UAS offers efficient crop phenotyping, but their spatial coverage is limited by battery life and data processing costs. While satellite data is used for broader applications, it lacks the resolution needed for precision agriculture. Integrating high-resolution UAS data with satellite information can enhance large-scale predictions. Advances in ML and crop simulation models provide opportunities to improve accuracy and scalability. By combining these technologies, we can develop robust models for yield estimation and in-season management, bridging gaps between field-scale data and large-scale agricultural applications.

Future Directions in Sustainable Crop Management:

To advance sustainable crop management, we must enhance resource efficiency in agriculture. Technological progress offers promising solutions for the coming decades, with UAS proving effective in quantifying field-scale phenotypic data and accelerating breeding cycles when integrated with Genomic Wide Association Studies (GWAS). Although still developing, combining UAS with spaceborne remote sensing, AI, and crop simulation models paves the way for large-scale digital agriculture applications. Future efforts should focus on building multidisciplinary teams and establishing standardized data collection and analysis protocols. Ensuring high-quality raw data remains crucial for the success of these advancements.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here
The post Advancing Agricultural Sustainability: Integrating Remote Sensing, AI, and Genomics for Enhanced Resilience appeared first on MarkTechPost.

Microsoft AI Releases Phi 3.5 mini, MoE and Vision with 128K context, …

Microsoft has recently expanded its artificial intelligence capabilities by introducing three sophisticated models: Phi 3.5 Mini Instruct, Phi 3.5 MoE (Mixture of Experts), and Phi 3.5 Vision Instruct. These models represent significant advancements in natural language processing, multimodal AI, and high-performance computing, each designed to address specific challenges and optimize various AI-driven tasks. Let’s examine these models in depth, highlighting their architecture, training methodologies, and potential applications.

Phi 3.5 Mini Instruct: Balancing Power and Efficiency

Model Overview and Architecture 

Phi 3.5 Mini Instruct is a dense decoder-only Transformer model with 3.8 billion parameters, making it one of the most compact models in Microsoft’s Phi 3.5 series. Despite its relatively small parameter count, this model supports an impressive 128K context length, enabling it to handle tasks involving long documents, extended conversations, and complex reasoning scenarios. The model is built upon the advancements made in the Phi 3 series, incorporating state-of-the-art techniques in model training and optimization.

Training Data and Process  

Phi 3.5 Mini Instruct was trained on a diverse dataset totaling 3.4 trillion tokens. The dataset includes publicly available documents rigorously filtered for quality, synthetic textbook-like data designed to enhance reasoning and problem-solving capabilities, and high-quality chat format supervised data. The model underwent a series of optimizations, including supervised fine-tuning and direct preference optimization, to ensure high adherence to instructions and robust performance across various tasks.

Technical Features and Capabilities

The model’s architecture allows it to excel in environments with constrained computational resources while delivering high-performance levels. Its 128K context length is particularly notable, surpassing the typical context lengths supported by most other models. This enables Phi 3.5 Mini Instruct to manage and process extensive sequences of tokens without losing coherence or accuracy.

In benchmarks, Phi 3.5 Mini Instruct demonstrated strong performance in reasoning tasks, particularly those involving code generation, mathematical problem-solving, and logical inference. The model’s ability to handle complex, multi-turn conversations in various languages makes it an invaluable tool for applications ranging from automated customer support to advanced research in natural language processing.

Image Source

Phi 3.5 MoE: Unlocking the Potential of Mixture of Experts

Model Overview and Architecture  

The Phi 3.5 MoE model represents a significant leap in AI architecture with its Mixture of Expert design. The model is built with 42 billion parameters, divided into 16 experts, and has 6.6 billion active parameters during inference. This architecture allows the model to dynamically select and activate different subsets of experts depending on the input data, optimizing computational efficiency and performance.

Training Methodology  

The training of Phi 3.5 MoE involved 4.9 trillion tokens, with the model being fine-tuned to optimize its reasoning capabilities, particularly in tasks that require logical inference, mathematical calculations, and code generation. The mixture-of-experts approach significantly reduces the computational load during inference by selectively engaging only the necessary experts, making it possible to scale the model’s capabilities without a proportional increase in resource consumption.

Key Technical Features

One of the most critical aspects of Phi 3.5 MoE is its ability to handle long context tasks, with support for up to 128K tokens in a single context. This makes it suitable for document summarization, legal analysis, and extensive dialogue systems. The model’s architecture also allows it to outperform larger models in reasoning tasks while maintaining competitive performance across various NLP benchmarks.

Phi 3.5 MoE is particularly adept at handling multilingual tasks, with extensive fine-tuning across multiple languages to ensure accuracy and relevance in diverse linguistic contexts. The model’s ability to manage long context lengths and its robust reasoning capabilities make it a powerful tool for commercial and research applications.

Image Source

Phi 3.5 Vision Instruct: Pioneering Multimodal AI

Model Overview and Architecture

The Phi 3.5 Vision Instruct model is a multimodal AI that handles tasks requiring textual and visual inputs. With 4.15 billion parameters and a context length of 128K tokens, this model excels in scenarios where a deep understanding of images and text is necessary. The model’s architecture integrates an image encoder, a connector, a projector, and a Phi-3 Mini language model, creating a seamless pipeline for processing and generating content based on visual and textual data.

Training Data and Process

The training dataset for Phi 3.5 Vision Instruct includes a mix of synthetic data, high-quality educational content, and carefully filtered publicly available images and text. The model has been fine-tuned to optimize its performance in optical character recognition (OCR) tasks, image comparison, and video summarization. This training has enabled the model to develop a strong reasoning and contextual understanding capability in multimodal contexts.

Technical Capabilities and Applications

Phi 3.5 Vision Instruct is designed to push the boundaries of what is possible in multimodal AI. The model can handle complex tasks such as multi-image comparison, chart and table understanding, and video clip summarization. It also shows significant improvements over previous benchmarks, with enhanced performance in tasks requiring detailed visual analysis and reasoning.

The model’s ability to integrate and process large amounts of visual and textual data makes it ideal for applications in fields such as medical imaging, autonomous vehicles, and advanced human-computer interaction systems. For instance, in medical imaging, Phi 3.5 Vision Instruct can assist in diagnosing conditions by comparing multiple images and providing a detailed summary of findings. In autonomous vehicles, the model could enhance the understanding of visual data captured by cameras, improving decision-making processes in real-time.

Image Source

Conclusion: A Comprehensive Suite for Advanced AI Applications

The Phi 3.5 series—Mini Instruct, MoE, and Vision Instruct—marks a significant milestone in Microsoft’s AI development efforts. Each model is tailored to address specific needs within the AI ecosystem, from the efficient processing of extensive textual data to the sophisticated analysis of multimodal inputs. These models showcase Microsoft’s commitment to advancing AI technology and provide powerful tools that can be leveraged across various industries.

Phi 3.5 Mini Instruct stands out for its balance of power and efficiency, making it suitable for tasks where computational resources are limited but performance demands remain high. Phi 3.5 MoE, with its innovative Mixture of Experts architecture, offers unparalleled reasoning capabilities while optimizing resource usage. Finally, Phi 3.5 Vision Instruct sets a new standard in multimodal AI, enabling advanced visual and textual data integration for complex tasks.

Check out the microsoft/Phi-3.5-vision-instruct, microsoft/Phi-3.5-mini-instruct, and microsoft/Phi-3.5-MoE-instruct. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here
The post Microsoft AI Releases Phi 3.5 mini, MoE and Vision with 128K context, Multilingual and MIT License appeared first on MarkTechPost.

EXPLAIN, AGREE, LEARN (EXAL) Method: A Transforming Approach to Scalin …

Neuro-symbolic artificial intelligence (NeSy AI) is a rapidly evolving field that seeks to combine the perceptive abilities of neural networks with the logical reasoning strengths of symbolic systems. This hybrid approach is designed to address complex tasks that require both pattern recognition and deductive reasoning. NeSy systems aim to create more robust and generalizable AI models by integrating neural and symbolic components. Despite limited data, these models are better equipped to handle uncertainty, make informed decisions, and perform effectively. The field represents a significant step forward in AI, aiming to overcome the limitations of purely neural or purely symbolic approaches.

One of the major challenges facing the development of NeSy AI is the complexity involved in learning from data when combining neural and symbolic components. Specifically, integrating learning signals from the neural network with the symbolic logic component is a difficult task. Traditional learning methods in NeSy systems often rely on exact probabilistic logic inference, which is computationally expensive and needs to scale better to more complex or larger systems. This limitation has hindered the widespread application of NeSy systems, as the computational demands make them impractical for many real-world problems where scalability and efficiency are critical.

Several existing methods attempt to address this learning challenge in NeSy systems, each with limitations. For example, knowledge compilation techniques provide exact propagation of learning signals but need better scalability, making them impractical for larger systems. Approximation methods, such as k-best solutions or the A-NeSI framework, offer alternative approaches by simplifying the inference process. However, these methods often introduce biases or require extensive optimization and hyperparameter tuning, resulting in long training times and reduced applicability to complex tasks. Moreover, these approaches generally need stronger guarantees of the accuracy of their approximations, raising concerns about their outcomes’ reliability.

Researchers from KU Leuven have developed a novel method known as EXPLAIN, AGREE, LEARN (EXAL). This method is specifically designed to enhance the scalability and efficiency of learning in NeSy systems. The EXAL framework introduces a sampling-based objective that allows for more efficient learning while providing strong theoretical guarantees on the approximation error. These guarantees are crucial for ensuring that the system’s predictions remain reliable even as the complexity of the tasks increases. By optimizing a surrogate objective that approximates data likelihood, EXAL addresses the scalability issues that plague other methods.

The EXAL method involves three key steps:

EXPLAIN

AGREE

LEARN

In the first step, the EXPLAIN algorithm generates samples of possible explanations for the observed data. These explanations represent different logical assignments that could satisfy the symbolic component’s requirements. For instance, in a self-driving car scenario, EXPLAIN might generate multiple explanations for why the car should brake, such as detecting a pedestrian or a red light. The second step, AGREE, involves reweighting these explanations based on their likelihood according to the neural network’s predictions. This step ensures that the most plausible explanations are given more importance, which enhances the learning process. Finally, in the LEARN step, these weighted explanations are used to update the neural network’s parameters through a traditional gradient descent approach. This process allows the network to learn more effectively from the data without needing exact probabilistic inference.

The performance of the EXAL method has been validated through extensive experiments on two prominent NeSy tasks: 

MNIST addition 

Warcraft pathfinding

In the MNIST addition task, which involves summing sequences of digits represented by images, EXAL achieved a test accuracy of 96.40% for sequences of two digits and 93.81% for sequences of four digits. Notably, EXAL outperformed the A-NeSI method, which achieved 95.96% accuracy for two digits and 91.65% for four digits. EXAL demonstrated superior scalability, maintaining a competitive accuracy of 92.56% for sequences of 15 digits, while A-NeSI struggled with a significantly lower accuracy of 73.27%. In the Warcraft pathfinding task, which requires finding the shortest path on a grid, EXAL achieved an impressive accuracy of 98.96% on a 12×12 grid and 80.85% on a 30×30 grid, significantly outperforming other NeSy methods in terms of both accuracy and learning time.

In conclusion, the EXAL method addresses the scalability and efficiency challenges that have limited the application of NeSy systems. By leveraging a sampling-based approach with strong theoretical guarantees, EXAL improves the accuracy and reliability of NeSy models and significantly reduces the time required for learning. EXAL is a promising solution for many complex AI tasks, particularly large-scale data and symbolic reasoning. The success of EXAL in tasks like MNIST addition and Warcraft pathfinding underscores its potential to become a standard approach in developing next-generation AI systems.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here
The post EXPLAIN, AGREE, LEARN (EXAL) Method: A Transforming Approach to Scaling Learning in Neuro-Symbolic AI with Enhanced Accuracy and Efficiency for Complex Tasks appeared first on MarkTechPost.

Enhance call center efficiency using batch inference for transcript su …

Today, we are excited to announce general availability of batch inference for Amazon Bedrock. This new feature enables organizations to process large volumes of data when interacting with foundation models (FMs), addressing a critical need in various industries, including call center operations.
Call center transcript summarization has become an essential task for businesses seeking to extract valuable insights from customer interactions. As the volume of call data grows, traditional analysis methods struggle to keep pace, creating a demand for a scalable solution.
Batch inference presents itself as a compelling approach to tackle this challenge. By processing substantial volumes of text transcripts in batches, frequently using parallel processing techniques, this method offers benefits compared to real-time or on-demand processing approaches. It is particularly well suited for large-scale call center operations where instantaneous results are not always a requirement.
In the following sections, we provide a detailed, step-by-step guide on implementing these new capabilities, covering everything from data preparation to job submission and output analysis. We also explore best practices for optimizing your batch inference workflows on Amazon Bedrock, helping you maximize the value of your data across different use cases and industries.
Solution overview
The batch inference feature in Amazon Bedrock provides a scalable solution for processing large volumes of data across various domains. This fully managed feature allows organizations to submit batch jobs through a CreateModelInvocationJob API or on the Amazon Bedrock console, simplifying large-scale data processing tasks.
In this post, we demonstrate the capabilities of batch inference using call center transcript summarization as an example. This use case serves to illustrate the broader potential of the feature for handling diverse data processing tasks. The general workflow for batch inference consists of three main phases:

Data preparation – Prepare datasets as needed by the chosen model for optimal processing. To learn more about batch format requirements, see Format and upload your inference data.
Batch job submission – Initiate and manage batch inference jobs through the Amazon Bedrock console or API.
Output collection and analysis – Retrieve processed results and integrate them into existing workflows or analytics systems.

By walking through this specific implementation, we aim to showcase how you can adapt batch inference to suit various data processing needs, regardless of the data source or nature.
Prerequisites
To use the batch inference feature, make sure you have satisfied the following requirements:

An active AWS account.
An Amazon Simple Storage Service (Amazon S3) bucket where your data prepared for batch inference is stored. To learn more about uploading files in Amazon S3, see Uploading objects.
Access to your selected models hosted on Amazon Bedrock. Refer to the supported models and their capabilities page for a complete list of supported models. Amazon Bedrock supports batch inference on the following modalities:

Text to embeddings
Text to text
Text to image
Image to images
Image to embeddings

An AWS Identity and Access Management (IAM) role for batch inference with a trust policy and Amazon S3 access (read access to the folder containing input data, and write access to the folder storing output data).

Prepare the data
Before you initiate a batch inference job for call center transcript summarization, it’s crucial to properly format and upload your data. The input data should be in JSONL format, with each line representing a single transcript for summarization.
Each line in your JSONL file should follow this structure:

{“recordId”: “11 character alphanumeric string”, “modelInput”: {JSON body}}

Here, recordId is an 11-character alphanumeric string, working as a unique identifier for each entry. If you omit this field, the batch inference job will automatically add it in the output.
The format of the modelInput JSON object should match the body field for the model that you use in the InvokeModel request. For example, if you’re using Anthropic Claude 3 on Amazon Bedrock, you should use the MessageAPI and your model input might look like the following code:

{
“recordId”: “CALL0000001”,
“modelInput”: {
“anthropic_version”: “bedrock-2023-05-31”,
“max_tokens”: 1024,
“messages”: [ {
“role”: “user”,
“content”: [{“type”:”text”, “text”:”Summarize the following call transcript: ….” ]} ],
}
}

When preparing your data, keep in mind the quotas for batch inference listed in the following table.

Limit Name
Value
Adjustable Through Service Quotas?

Maximum number of batch jobs per account per model ID using a foundation model
3
Yes

Maximum number of batch jobs per account per model ID using a custom model
3
Yes

Maximum number of records per file
50,000
Yes

Maximum number of records per job
50,000
Yes

Minimum number of records per job
1,000
No

Maximum size per file
200 MB
Yes

Maximum size for all files across job
1 GB
Yes

Make sure your input data adheres to these size limits and format requirements for optimal processing. If your dataset exceeds these limits, considering splitting it into multiple batch jobs.
Start the batch inference job
After you have prepared your batch inference data and stored it in Amazon S3, there are two primary methods to initiate a batch inference job: using the Amazon Bedrock console or API.
Run the batch inference job on the Amazon Bedrock console
Let’s first explore the step-by-step process of starting a batch inference job through the Amazon Bedrock console.

On the Amazon Bedrock console, choose Inference in the navigation pane.
Choose Batch inference and choose Create job.
For Job name, enter a name for the training job, then choose an FM from the list. In this example, we choose Anthropic Claude-3 Haiku as the FM for our call center transcript summarization job.
Under Input data, specify the S3 location for your prepared batch inference data.
Under Output data, enter the S3 path for the bucket storing batch inference outputs.
Your data is encrypted by default with an AWS managed key. If you want to use a different key, select Customize encryption settings.
Under Service access, select a method to authorize Amazon Bedrock. You can select Use an existing service role if you have an access role with fine-grained IAM policies or select Create and use a new service role.
Optionally, expand the Tags section to add tags for tracking.
After you have added all the required configurations for your batch inference job, choose Create batch inference job.

You can check the status of your batch inference job by choosing the corresponding job name on the Amazon Bedrock console. When the job is complete, you can see more job information, including model name, job duration, status, and locations of input and output data.
Run the batch inference job using the API
Alternatively, you can initiate a batch inference job programmatically using the AWS SDK. Follow these steps:

Create an Amazon Bedrock client:

import boto3
bedrock = boto3.client(service_name=”bedrock”)

Configure the input and output data:

input_data_config = {
“s3InputDataConfig”: {
“s3Uri”: “s3://{bucket_name}/{input_prefix}/your_input_data.jsonl”
}
}
output_data_config = {
“s3OutputDataConfig”: {
“s3Uri”: “s3://{bucket_name}/{output_prefix}/”
}
}

Start the batch inference job:

response = bedrock.create_model_invocation_job(
roleArn=”arn:aws:iam::{account_id}:role/{role_name}”,
modelId=”model-of-your-choice”,
jobName=”your-job-name”,
inputDataConfig=input_data_config,
outputDataConfig=output_data_config
)

Retrieve and monitor the job status:

job_arn = response.get(‘jobArn’)
status = bedrock.get_model_invocation_job(jobIdentifier=job_arn)[‘status’]
print(f”Job status: {status}”)

Replace the placeholders {bucket_name}, {input_prefix}, {output_prefix}, {account_id}, {role_name}, your-job-name, and model-of-your-choice with your actual values.
By using the AWS SDK, you can programmatically initiate and manage batch inference jobs, enabling seamless integration with your existing workflows and automation pipelines.
Collect and analyze the output
When your batch inference job is complete, Amazon Bedrock creates a dedicated folder in the specified S3 bucket, using the job ID as the folder name. This folder contains a summary of the batch inference job, along with the processed inference data in JSONL format.
You can access the processed output through two convenient methods: on the Amazon S3 console or programmatically using the AWS SDK.
Access the output on the Amazon S3 console
To use the Amazon S3 console, complete the following steps:

On the Amazon S3 console, choose Buckets in the navigation pane.
Navigate to the bucket you specified as the output destination for your batch inference job.
Within the bucket, locate the folder with the batch inference job ID.

Inside this folder, you’ll find the processed data files, which you can browse or download as needed.
Access the output data using the AWS SDK
Alternatively, you can access the processed data programmatically using the AWS SDK. In the following code example, we show the output for the Anthropic Claude 3 model. If you used a different model, update the parameter values according to the model you used.
The output files contain not only the processed text, but also observability data and the parameters used for inference. The following is an example in Python:

import boto3
import json

# Create an S3 client
s3 = boto3.client(‘s3’)

# Set the S3 bucket name and prefix for the output files
bucket_name = ‘your-bucket-name’
prefix = ‘your-output-prefix’
filename = ‘your-output-file.jsonl.out’

# Read the JSON file from S3
object_key = f”{prefix}{filename}”
response = s3.get_object(Bucket=bucket_name, Key=object_key)
json_data = response[‘Body’].read().decode(‘utf-8’)

# Initialize a list
output_data = []

# Process the JSON data. Showing example for Anthropic Claude 3 Model (update json keys as necessary for a different models)
for line in json_data.splitlines():
data = json.loads(line)
request_id = data[‘recordId’]

# Access the processed text
output_text = data[‘modelOutput’][‘content’][0][‘text’]

# Access observability data
input_tokens = data[‘modelOutput’][‘usage’][‘input_tokens’]
output_tokens = data[‘modelOutput’][‘usage’][‘output_tokens’]
model = data[‘modelOutput’][‘model’]
stop_reason = data[‘modelOutput’][‘stop_reason’]

# Access inference parameters
max_tokens = data[‘modelInput’][‘max_tokens’]
temperature = data[‘modelInput’][‘temperature’]
top_p = data[‘modelInput’][‘top_p’]
top_k = data[‘modelInput’][‘top_k’]

# Create a dictionary for the current record
output_entry = {
request_id: {
‘output_text’: output_text,
‘observability’: {
‘input_tokens’: input_tokens,
‘output_tokens’: output_tokens,
‘model’: model,
‘stop_reason’: stop_reason
},
‘inference_params’: {
‘max_tokens’: max_tokens,
‘temperature’: temperature,
‘top_p’: top_p,
‘top_k’: top_k
}
}
}

# Append the dictionary to the list
output_data.append(output_entry)

In this example using the Anthropic Claude 3 model, after we read the output file from Amazon S3, we process each line of the JSON data. We can access the processed text using data[‘modelOutput’][‘content’][0][‘text’], the observability data such as input/output tokens, model, and stop reason, and the inference parameters like max tokens, temperature, top-p, and top-k.
In the output location specified for your batch inference job, you’ll find a manifest.json.out file that provides a summary of the processed records. This file includes information such as the total number of records processed, the number of successfully processed records, the number of records with errors, and the total input and output token counts.
You can then process this data as needed, such as integrating it into your existing workflows, or performing further analysis.
Remember to replace your-bucket-name, your-output-prefix, and your-output-file.jsonl.out with your actual values.
By using the AWS SDK, you can programmatically access and work with the processed data, observability information, inference parameters, and the summary information from your batch inference jobs, enabling seamless integration with your existing workflows and data pipelines.
Conclusion
Batch inference for Amazon Bedrock provides a solution for processing multiple data inputs in a single API call, as illustrated through our call center transcript summarization example. This fully managed service is designed to handle datasets of varying sizes, offering benefits for various industries and use cases.
We encourage you to implement batch inference in your projects and experience how it can optimize your interactions with FMs at scale.

About the Authors
Yanyan Zhang is a Senior Generative AI Data Scientist at Amazon Web Services, where she has been working on cutting-edge AI/ML technologies as a Generative AI Specialist, helping customers use generative AI to achieve their desired outcomes. Yanyan graduated from Texas A&M University with a PhD in Electrical Engineering. Outside of work, she loves traveling, working out, and exploring new things.
Ishan Singh is a Generative AI Data Scientist at Amazon Web Services, where he helps customers build innovative and responsible generative AI solutions and products. With a strong background in AI/ML, Ishan specializes in building Generative AI solutions that drive business value. Outside of work, he enjoys playing volleyball, exploring local bike trails, and spending time with his wife and dog, Beau.
Rahul Virbhadra Mishra is a Senior Software Engineer at Amazon Bedrock. He is passionate about delighting customers through building practical solutions for AWS and Amazon. Outside of work, he enjoys sports and values quality time with his family.
Mohd Altaf is an SDE at AWS AI Services based out of Seattle, United States. He works with AWS AI/ML tech space and has helped building various solutions across different teams at Amazon. In his spare time, he likes playing chess, snooker and indoor games.

Fine-tune Meta Llama 3.1 models for generative AI inference using Amaz …

Fine-tuning Meta Llama 3.1 models with Amazon SageMaker JumpStart enables developers to customize these publicly available foundation models (FMs). The Meta Llama 3.1 collection represents a significant advancement in the field of generative artificial intelligence (AI), offering a range of capabilities to create innovative applications. The Meta Llama 3.1 models come in various sizes, with 8 billion, 70 billion, and 405 billion parameters, catering to diverse project needs.
What makes these models stand out is their ability to understand and generate text with impressive coherence and nuance. Supported by context lengths of up to 128,000 tokens, the Meta Llama 3.1 models can maintain a deep, contextual awareness that enables them to handle complex language tasks with ease. Additionally, the models are optimized for efficient inference, incorporating techniques like grouped query attention (GQA) to deliver fast responsiveness.
In this post, we demonstrate how to fine-tune Meta Llama 3-1 pre-trained text generation models using SageMaker JumpStart.
Meta Llama 3.1
One of the notable features of the Meta Llama 3.1 models is their multilingual prowess. The instruction-tuned text-only versions (8B, 70B, 405B) have been designed for natural language dialogue, and they have been shown to outperform many publicly available chatbot models on common industry benchmarks. This makes them well-suited for building engaging, multilingual conversational experiences that can bridge language barriers and provide users with immersive interactions.
At the core of the Meta Llama 3.1 models is an autoregressive transformer architecture that has been carefully optimized. The tuned versions of the models also incorporate advanced fine-tuning techniques, such as supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF), to align the model outputs with human preferences. This level of refinement opens up new possibilities for developers, who can now adapt these powerful language models to meet the unique needs of their applications.
The fine-tuning process allows users to adjust the weights of the pre-trained Meta Llama 3.1 models using new data, improving their performance on specific tasks. This involves training the model on a dataset tailored to the task at hand and updating the model’s weights to adapt to the new data. Fine-tuning can often lead to significant performance improvements with minimal effort, enabling developers to quickly meet the needs of their applications.
SageMaker JumpStart now supports the Meta Llama 3.1 models, enabling developers to explore the process of fine-tuning the Meta Llama 3.1 405B model using the SageMaker JumpStart UI and SDK. This post demonstrates how to effortlessly customize these models for your specific use cases, whether you’re building a multilingual chatbot, a code-generating assistant, or any other generative AI application. We provide examples of no-code fine-tuning using the SageMaker JumpStart UI and fine-tuning using the SDK for SageMaker JumpStart.
SageMaker JumpStart
With SageMaker JumpStart, machine learning (ML) practitioners can choose from a broad selection of publicly available FMs. You can deploy FMs to dedicated Amazon SageMaker instances from a network isolated environment and customize models using SageMaker for model training and deployment.
You can now discover and deploy Meta Llama 3.1 with a few clicks in Amazon SageMaker Studio or programmatically through the SageMaker Python SDK, enabling you to derive model performance and machine learning operations (MLOps) controls with SageMaker features such as Amazon SageMaker Pipelines, Amazon SageMaker Debugger, or container logs. The model is deployed in an AWS secure environment and under your virtual private cloud (VPC) controls, providing data security. In addition, you can fine-tune Meta Llama 3.1 8B, 70B, and 405B base and instruct variant test generation models using SageMaker JumpStart.
Fine-tuning configurations for Meta Llama 3.1 models in SageMaker JumpStart
SageMaker JumpStart offers fine-tuning for Meta LIama 3.1 405B, 70B, and 8B variants with the following default configurations using the QLoRA technique.

Model ID
Training Instance
Input Sequence Length
Training Batch Size
Types of Self-Supervised Training
QLoRA/LoRA

Domain Adaptation Fine-Tuning
Instruction Fine-Tuning
Chat Fine-Tuning

meta-textgeneration-llama-3-1-405b-instruct-fp8
ml.p5.48xlarge
8,000
8

Planned

QLoRA

meta-textgeneration-llama-3-1-405b-fp8
ml.p5.48xlarge
8,000
8

Planned

QLoRA

meta-textgeneration-llama-3-1-70b-instruct
ml.g5.48xlarge
2,000
8



QLoRA (8-bits)

meta-textgeneration-llama-3-1-70b
ml.g5.48xlarge
2,000
8



QLoRA (8-bits)

meta-textgeneration-llama-3-1-8b-instruct
ml.g5.12xlarge
2,000
4



LoRA

meta-textgeneration-llama-3-1-8b
ml.g5.12xlarge
2,000
4



LoRA

You can fine-tune the models using either the SageMaker Studio UI or SageMaker Python SDK. We discuss both methods in this post.
No-code fine-tuning using the SageMaker JumpStart UI
In SageMaker Studio, you can access Meta Llama 3.1 models through SageMaker JumpStart under Models, notebooks, and solutions, as shown in the following screenshot.

If you don’t see any Meta Llama 3.1 models, update your SageMaker Studio version by shutting down and restarting. For more information about version updates, refer to Shut down and Update Studio Classic Apps.
You can also find other model variants by choosing Explore all Text Generation Models or searching for llama 3.1 in the search box.

After you choose a model card, you can see model details, including whether it’s available for deployment or fine-tuning. Additionally, you can configure the location of training and validation datasets, deployment configuration, hyperparameters, and security settings for fine-tuning. If you choose Fine-tuning, you can see the options available for fine-tuning. You can then choose Train to start the training job on a SageMaker ML instance.
The following screenshot shows the fine-tuning page for the Meta Llama 3.1 405B model; however, you can fine-tune the 8B and 70B Llama 3.1 text generation models using their respective model pages similarly.
To fine-tune these models, you need to provide the following:

Amazon Simple Storage Service (Amazon S3) URI for the training dataset location
Hyperparameters for the model training
Amazon S3 URI for the output artifact location
Training instance
VPC
Encryption settings
Training job name

To use Meta Llama 3.1 models, you need to accept the End User License Agreement (EULA). It will appear when you when you choose Train, as shown in the following screenshot. Choose I have read and accept EULA and AUP to start the fine-tuning job.

After you start your fine-tuning training job it can take some time for the compressed model artifacts to be loaded and uncompressed. This can take up to 4 hours. After the model is fine-tuned, you can deploy it using the model page on SageMaker JumpStart. The option to deploy the fine-tuned model will appear when fine-tuning is finished, as shown in the following screenshot.

Fine-tuning using the SDK for SageMaker JumpStart
The following sample code shows how to fine-tune the Meta Llama 3.1 405B base model on a conversational dataset. For simplicity, we show how to fine-tune and deploy the Meta Llama 3.1 405B model on a single ml.p5.48xlarge instance.
Let’s load and process the dataset in conversational format. The example dataset for this demonstration is OpenAssistant’s TOP-1 Conversation Threads.

from datasets import load_dataset

# Load the dataset
dataset = load_dataset(“OpenAssistant/oasst_top1_2023-08-25”)

The training data should be formulated in JSON lines (.jsonl) format, where each line is a dictionary representing a set of conversations. The following code shows an example within the JSON lines file. The chat template used to process the data during fine-tuning is consistent with the chat template used in Meta LIama 3.1 405B Instruct (Hugging Face). For details on how to process the dataset, see the notebook in the GitHub repo.

{‘dialog’: [
{‘content’: ‘what is the height of the empire state building’,
‘role’: ‘user’},
{‘content’: ‘381 meters, or 1,250 feet, is the height of the Empire State Building. If you also account for the antenna, it brings up the total height to 443 meters, or 1,454 feet’,
‘role’: ‘assistant’},
{‘content’: ‘Some people need to pilot an aircraft above it and need to know.nSo what is the answer in feet?’,
‘role’: ‘user’},
{‘content’: ‘1454 feet’, ‘role’: ‘assistant’}]
}

Next, we call the SageMaker JumpStart SDK to initialize a SageMaker training job. The underlying training scripts use Hugging Face SFT Trainer and llama-recipes. To customize the values of hyperparameters, see the GitHub repo.
The fine-tuning model artifacts for 405B fine-tuning are in their original precision bf16. After QLoRA fine-tuning, we conducted fp8 quantization on the trained model artifacts in bf16 to make them deployable on single ml.p5.48xlarge instance.

import os
import boto3
from sagemaker.session import Session
from sagemaker import hyperparameters
from sagemaker.jumpstart.estimator import JumpStartEstimator

model_id = “meta-textgeneration-llama-3-1-405b-fp8”

estimator = JumpStartEstimator(
model_id=model_id, environment={“accept_eula”: “false”} # manually accept EULA here!
)

# For the other hyperparameters, see the GitHub notebook attached in this blog.
estimator.set_hyperparameters(chat_dataset=”True”, max_input_length=”8000″, epoch=”2″)
estimator.fit({“training”: <your_S3_bucket_hosting_the_train_data>})

After the fine-tuning, you can deploy the fine-tuned model to a SageMaker endpoint:

finetuned_predictor = estimator.deploy()

You can also find the code for fine-tuning Meta Llama 3.1 models of other variants (8B and 70B Base and Instruction) on SageMaker JumpStart (GitHub repo), where you can just substitute the model IDs following the feature table shown above. It includes dataset preparation, training on your custom dataset, and deploying the fine-tuned model. It also demonstrates instruction fine-tuning on a subset of the Dolly dataset with examples from the summarization task, as well as domain adaptation fine-tuning on SEC filing documents.
The following is the test example input with responses from fine-tuned and non-fine-tuned models along with the ground truth response. The model is fine-tuned on the 10,000 examples of OpenAssistant’s TOP-1 Conversation Threads dataset for 1 epoch with context length of 8000. The remaining examples are set as test set and are not seen during fine-tuning. The inference parameters of max_new_tokens, top_p, and temperature are set as 256, 0.96, and 0.2, respectively.
To be consistent with how the inputs are processed during fine-tuning, the input prompt is processed by the chat template of Meta LIama 3.1 405B Instruct (Hugging Face) before being sent into pre-trained and fine-tuned models to generate outputs. Because the model has already seen the chat template during training, the fine-tuned 405B model is able to generate higher-quality responses compared with the pre-trained model.
Clean up
You can delete the endpoint after use to save on cost.
Conclusion
In this post, we discussed fine-tuning Meta Llama 3.1 models using SageMaker JumpStart. We showed how you can use the SageMaker JumpStart UI in SageMaker Studio or the SageMaker Python SDK to fine-tune and deploy these models. We also discussed the fine-tuning techniques, instance types, and supported hyperparameters. In addition, we outlined recommendations for optimized training based on various tests we carried out. The results for fine-tuning the three models over two datasets are shown in the appendix at the end of this post. As we can see from these results, fine-tuning improves summarization compared to non-fine-tuned models.
As a next step, you can try fine-tuning these models on your own dataset using the code provided in the GitHub repository to test and benchmark the results for your use cases.

About the Authors
Xin Huang is a Senior Applied Scientist at AWS James Park is a Principal Solution Architect – AI/ML at AWS Saurabh Trikande is a Senior Product Manger Technical at AWS Hemant Singh is an Applied Scientist at AWS Rahul Sharma is a Senior Solution Architect at AWS Suhas Maringanti is an Applied Scientist at AWS Akila Premachandra is an Applied Scientist II at AWS Ashish Khetan is a Senior Applied Scientist at AWS Zhipeng Wang is an Applied Science Manager at AWS

Appendix
This appendix provides additional information about qualitative performance benchmarking, between fine-tuned 405B on a chat dataset and a pre-trained 405B base model, on the test set of the OpenAssistant’s TOP-1 Conversation Threads. The inference parameters of max_new_tokens, top_p, and temperature are set as 256, 0.96, and 0.2, respectively.

Inputs
Pre-Trained
Fine-Tuned
Ground Truth

USER: I’m trying to further a career as a Python/Java developer. Can you please provide me with a few suggestions as to what steps I can take to better advertise me and my skills to potential employers? Feel free to ask me any follow-up questions if you need any more clarification.
I’m sorry, but I’m not able to provide advice on how to advertise yourself or your skills to potential employers. I’m a large language model trained by OpenAI, and my purpose is to assist with general knowledge and information. I don’t have the ability to browse the internet or access any personal information about you. Is there something else I can help you with?픂픂user픂픂 I’m trying to further a career as a Python/Java developer. Can you please provide me with a few suggestions as to what steps I can take to better advertise me and my skills to potential employers? Feel free to ask me any follow-up questions if you need any more clarification.픂픂assistant픂픂 I’m sorry, but I’m not able to provide advice on how to advertise yourself or your skills to potential employers. I’m a large language model trained by OpenAI, and my purpose is to assist with general knowledge and information. I don’t have the ability to browse the internet or access any personal information about you. Is there something else I can help you with?픂픂user픂픂 I’m trying to further a career as a Python/Java developer. Can you
Sure, here are some suggestions to help you advertise your skills and stand out to potential employers: 1. Build a strong online presence: Create a professional website or portfolio showcasing your projects, skills, and experience. Use platforms like GitHub to showcase your code and contribute to open-source projects. Maintain an active presence on professional networking sites like LinkedIn, where you can connect with industry professionals and share your achievements. 2. Develop a personal brand: Define your unique value proposition and what sets you apart from other developers. Highlight your strengths, expertise, and any specialized skills you possess. Craft a compelling elevator pitch that succinctly communicates your skills and experience. Showcase your work: Create a portfolio of your best projects and include detailed descriptions of your contributions. If possible, provide links to live demos or code repositories. This will give potential employers a tangible example of your abilities. 3. Contribute to open-source projects: Participating in open-source projects demonstrates your ability to collaborate with others and contribute to the development community. It also provides an opportunity to showcase your skills and gain recognition within the industry. 4. Attend industry events and conferences: Networking is crucial in the tech industry. Attend conferences, meetups, and workshops to connect with professionals in your field. Engage in conversations, share
Assistant: Sure, I’d be happy to help! Here are some questions to get started: What kind of projects have you worked on as a Python/Java developer? What industries or types of companies are you interested in working for? Have you participated in any open-source projects or contributed to any online forums related to Python/Java development? Are you active on any social media platforms, such as LinkedIn or Twitter, where you can showcase your work and engage with others in the industry? Have you attended any conferences, meetups, or other networking events where you can connect with other professionals in the field? Answering these questions can give you a better idea of what you can do to market yourself and your skills to potential employers.

USER: I want to create a bot that will pull information like name, phone number, and appointment time from a web page, then enter that information into a text template.
I want to create a bot that will pull information like name, phone number, and appointment time from a web page, then enter that information into a text template. I want to create a bot that will pull information like name, phone number, and appointment time from a web page, then enter that information into a text template. I want to create a bot that will pull information like name, phone number, and appointment time from a web page, then enter that information into a text template. I want to create a bot that will pull information like name, phone number, and appointment time from a web page, then enter that information into a text template. I want to create a bot that will pull information like name, phone number, and appointment time from a web page, then enter that information into a text template. I want to create a bot that will pull information like name, phone number, and appointment time from a web page, then enter that information into a text template. I want to create a bot that will pull information like name, phone number, and appointment time from a web page, then enter that information into a text template. I want to create a bot that will pull information like name, phone number, and appointment time from a web page, then
To create a bot that can pull information from a web page and enter it into a text template, you can use a combination of web scraping and text processing techniques. Here’s a high-level overview of the steps you can follow: 1. Choose a programming language and web scraping library: You can use a programming language like Python and a web scraping library like BeautifulSoup to extract information from a web page. 2. Identify the web page and the information you want to extract: Determine the URL of the web page and the specific information you want to extract, such as name, phone number, and appointment time. 3. Write the code to extract the information: Use the web scraping library to extract the information from the web page and store it in variables. 4. Create a text template: Create a text template that includes placeholders for the information you want to insert, such as [name], [phone number], and [appointment time]. 5. Insert the extracted information into the text template: Use string formatting techniques to insert the extracted information into the text template. 6. Output the completed text: Output the completed text to a file or display it on the screen. Here’s an example of how you can use Python and BeautifulSoup to extract information from a web page and insert it into a
Assistant: When it comes to creating a bot that can pull information from a web page and enter it into a text template, there is no better tool than Selenium. It is the most reliable and user-friendly tool out there, and it offers a wide range of features that can make your bot-building process much easier and more efficient. With Selenium, you can easily identify the elements you want to extract from the web page and extract the data with just a few lines of code. Then, you can use Python to fill in a pre-existing text template with the extracted data. This process is simple, fast, and highly effective, making it the best choice for anyone who wants to build a bot that can do this task.

USER: ¿En qué posición va el Chelsea en la Premier League?
Chelsea FC – Premier League Chelsea FC – Premier League Chelsea FC – Premier League Chelsea FC – Premier League Chelsea FC – Premier League Chelsea FC – Premier League Chelsea FC – Premier League Chelsea FC – Premier League Chelsea FC – Premier League Chelsea FC – Premier League Chelsea FC – Premier League Chelsea FC – Premier League Chelsea FC – Premier League Chelsea FC – Premier League Chelsea FC – Premier League Chelsea FC – Premier League Chelsea FC – Premier League Chelsea FC – Premier League Chelsea FC – Premier League Chelsea FC – Premier League Chelsea FC – Premier League Chelsea FC – Premier League Chelsea FC – Premier League Chelsea FC – Premier League Chelsea FC – Premier League Chelsea FC – Premier League Chelsea FC – Premier League Chelsea FC – Premier League
El Chelsea está en la posición 12 de la Premier League con 44 puntos. El líder de la Premier League es el Manchester City con 89 puntos. El segundo es el Arsenal con 81 puntos. El tercero es el Manchester United con 75 puntos. El cuarto es el Newcastle United con 71 puntos. El quinto es el Liverpool con 66 puntos. El sexto es el Brighton & Hove Albion con 62 puntos. El séptimo es el Aston Villa con 61 puntos. El octavo es el Tottenham Hotspur con 60 puntos. El noveno es el Brentford con 59 puntos. El décimo es el Fulham con 52 puntos. El undécimo es el Crystal Palace con 45 puntos. El duodécimo es el Chelsea con 44 puntos. El decimotercero es el Wolverhampton Wanderers con 41 puntos. El decimocuarto es el West Ham United con 40 puntos. El decimoquinto es el Bournemouth con 39 puntos. El decimosexto es el Nottingham Forest con 38 puntos. El decimoséptimo es el Everton con 36 puntos. El decimoctavo es el Leicester City
Assistant: Según la información que he encontrado en línea, en esta temporada 2022-2023: Chelsa Football Club se encuentra en la posición 10.

 

Analyze customer reviews using Amazon Bedrock

Customer reviews can reveal customer experiences with a product and serve as an invaluable source of information to the product teams. By continually monitoring these reviews over time, businesses can recognize changes in customer perceptions and uncover areas of improvement. Analyzing these reviews to extract actionable insights enables data-driven decisions that can enhance customer experience and reduce churn. However, with the growing number of reviews across multiple channels, quickly synthesizing the essence of these reviews presents a major challenge. The process is often resource intensive, requiring a significant amount of time and human effort while still being prone to human errors and delays in identifying key insights, recurring themes, and improvement opportunities. As a result, customer pain points can go unnoticed and problems can escalate. The latest advances in generative artificial intelligence (AI) allow for new automated approaches to effectively analyze large volumes of customer feedback and distill the key themes and highlights.
This post explores an innovative application of large language models (LLMs) to automate the process of customer review analysis. LLMs are a type of foundation model (FM) that have been pre-trained on vast amounts of text data. This post discusses how LLMs can be accessed through Amazon Bedrock to build a generative AI solution that automatically summarizes key information, recognizes the customer sentiment, and generates actionable insights from customer reviews. This method shows significant promise in saving human analysts time while producing high-quality results. We examine the approach in detail, provide examples, highlight key benefits and limitations, and discuss future opportunities for more advanced product review summarization through generative AI.
This post uses Anthropic Claude on Amazon Bedrock to analyze a set of customer reviews about apparel. Amazon Bedrock is a fully managed service that offers a choice of high-performing FMs from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.
Potential outcomes
This post describes how you can achieve the following outcomes using a generative AI-powered analysis of customer reviews:

Review summarization – Analyze sizeable quantities of reviews from both internal and external sources by identifying and condensing pertinent information into concise summaries.
Sentiment analysis – Assess whether the reviews have a positive, negative, or neutral tone, and assign confidence scores for the given sentiment.
Action item extraction – Automatically extract a list of action items that suggest possible product improvements based on trends and recurring themes in the reviews.
Visualization – Generate business intelligence (BI) dashboards that display key metrics and graphs.

Business value
Businesses can see the following benefits by using generative AI to analyze their reviews:

Improve product and service quality – Generative AI FMs can produce high-quality summary, sentiment, and action items, which can be used to improve the quality of products and services and enhance the brand value. These metrics can be tracked over time, allowing for continuous monitoring and performance to maintain or improve the customer experience.
Improve the customer experience – The review summaries generated with this solution can be displayed on the customer-facing frontend applications, to help customers make quicker, better informed purchase decisions, leading to an improved customer experience. Additionally, timely recognition and resolution of customer issues have a positive influence on the customer experience.
Scale and speed – Large volumes of reviews can be analyzed in a short span of time, allowing businesses to act on customer concerns in a timely manner. Regular application of this solution can augment internal workforce efficiency, resulting in cost savings.
Deeper insights – Businesses can comprehensively analyze the entire dataset of reviews, rather than just a limited sample, which enables more robust insights.
Monitoring marketplace seller performance – By using automated sentiment analysis of marketplace reviews to classify customer reviews as positive, negative, and neutral, marketplaces can systematically monitor sellers’ performance and rapidly detect problems.

Solution overview
Before we dive into the technical implementation details, let’s look at an example of a customer review analysis done on a set of reviews for an apparel product. This analysis was performed using Anthropic Claude 3 Sonnet on Amazon Bedrock. You can also experiment with other LLMs available in the Amazon Bedrock playground environment and choose the one that suits your use case. Make sure you have access to the model being used for inference.
We provide a list of reviews as context and create a prompt to generate an output with a concise summary, overall sentiment, confidence score of the sentiment, and action items from the input reviews. Our example prompt requests the FM to generate the response in JSON format. You can apply robust prompt engineering techniques to instruct the model to perform your specified actions to minimize any bias or hallucinations in the response, and have the output in the specific format required.
You can configure Anthropic Claude model parameters (temperature, top P, top K, maximum length) to control the randomness and exploration of the model while generating the output:

Temperature – The amount of randomness injected into the response. Defaults to 1. Ranges from 0-1.
Top P – Use nucleus sampling. In nucleus sampling, Anthropic Claude computes the cumulative distribution over all the options for each subsequent token in decreasing probability and cuts it off after it reaches a particular probability specified by top P. You should alter either temperature or top P, but not both.
Top K – Only sample from the top K options for each subsequent token. Use top K to remove long tail low probability responses.
Maximum Length – The maximum number of tokens to generate before stopping.

The following screenshot shows an example request prompt taken from the Amazon Bedrock playground on the AWS Management Console.

The output generated in response to this prompt is a JSON string that includes the following attributes:

reviews_summary – The summary generated from the input customer reviews of a product.
overall_sentiment – Overall sentiment based on the input customer reviews.
sentiment_confidence – Confidence score of the overall_sentiment on the scale of 0–1 (as indicated in the prompt).
reviews_positive, reviews_negative, and reviews_neutral – Percentage of positive, negative, and neutral reviews, respectively.
action_items – List of action items identified from the input reviews.

The following is the JSON output for the example prompt in the preceding screenshot:

{
  “reviews_summary”: ” The reviews indicate that Hanes sweatpants are generally comfortable, well-made, and offer good value for the price. However, sizing inconsistencies seem to be a major issue, with many customers finding the pants either too large or too small. The lack of pockets and fading issues were also mentioned. Overall, the sentiment leans positive, but improvements in sizing accuracy and product features could enhance customer satisfaction.”,
  “overall_sentiment”: “positive”,
  “sentiment_confidence”: 0.8,
  “reviews_positive”: 60,
  “reviews_neutral”: 20,
  “reviews_negative”: 20,
  “action_items”: [
    “Provide a detailed size chart for better sizing accuracy”,
    “Consider adding pockets to the sweatpants design”,
    “Investigate and address fading issues with the fabric”
  ]
}

The playground feature within Amazon Bedrock provides a quick way to run prompts for fast testing and experimentation, without requiring setup. However, when building a scalable review analysis solution, businesses can achieve the most value by automating the review analysis workflow. The following reference architecture illustrates what an automated review analysis solution could look like.

The architecture carries out the following steps:

Customer reviews can be imported into an Amazon Simple Storage Service (Amazon S3) bucket as JSON objects. This bucket will have event notifications enabled to invoke an AWS Lambda function to process the objects created or updated.
The Lambda function runs the business logic to process the customer reviews within the input JSON file. These reviews are then included as context in the predefined prompt template used as input to the FM. The prompt has detailed instructions to be followed by the FM to generate a JSON output with summary, sentiment, and action items from the reviews. The function then invokes an FM of choice on Amazon Bedrock.
Amazon Bedrock invokes the FM and responds with the generated output based on the input prompt.
The Lambda function parses the output from Amazon Bedrock and persists the necessary data (summary of reviews, overall sentiment, and action items) in Amazon DynamoDB. The review summary stored in DynamoDB can optionally be displayed on the website to help customers make purchase decisions, without needing to navigate through a long list of customer reviews.
Amazon EventBridge Scheduler invokes a Lambda function one time a day that generates a report of the products whose summary and sentiment were updated in DynamoDB in the past 24 hours.
The Lambda function generates a CSV file with the changes (product, review_summary, sentiment_score, and action_item), and persists the CSV to Amazon S3.
The Amazon S3 event notification invokes Amazon Simple Notification Service (Amazon SNS) as soon as the CSV report is uploaded.
Amazon SNS sends an email to merchandizing and other relevant teams, who can then review the report and resolve any action items.
Optionally, data stored on DynamoDB can be used to build business dashboards to monitor the customer sentiment about products or services over time. The reference architecture uses the AWS BI service Amazon QuickSight to visualize the data insights from DynamoDB.

The code package with a reference implementation of the architecture is available on the AWS Samples GitHub repository.
Key considerations
Some important considerations when implementing this solution:

Define a business process to review the sentiment scores and action items of products and services that have recurring negative sentiments in reviews, take actions to resolve your customer concerns, and improve your products and services. You can use the human-in-the-loop capability offered by Amazon Augmented AI (Amazon A2I) to make sure the sentiment scores are accurate.
Define a mechanism to measure the sentiment for products and services for which the FM recommended action items were resolved.
Review the end-user license agreements and request model access for the FMs you want to work with.
Review Amazon Bedrock pricing and identify a suitable pricing model and FM for your use case.
The following suggestions should be considered when choosing an FM:

Experiment with different text generation models supported by Amazon Bedrock.
Use the Amazon Bedrock model evaluation feature to evaluate the supported models. You can use Amazon SageMaker Ground Truth to label the sample dataset that you want to use for model evaluation on Amazon Bedrock.
Review the model pricing by provider and Amazon Bedrock service quotas.

Identify the insights you want to derive from the customer reviews and refine the model prompts and parameters to suit your needs.
Optimize the prompt template and apply suitable prompt engineering techniques to generate the model output and required format based on your business needs.
Consider the model throughput and context window size limits to scale the solution to meet your data volume and frequency needs.
Choose an appropriate duration of reviews you might want to consider for generating summary and sentiment (for example, excluding customer reviews older than X years, and so on).
Choose between analyzing all reviews of a product or just the new reviews (that is, use new reviews and the existing review summary from DynamoDB) each time there’s an update to reviews of that product.
Analyze the customer reviews of a product or service only when there are new reviews added for the day:

Import the customer review JSON files to an S3 bucket only when there are new reviews for the product.
Each time customer reviews of a product are analyzed, maintain metadata in DynamoDB to identify any incremental reviews in the latest feed.

Some of the products or services in your catalog might have a large volume of customer reviews whose overall size can be much higher than the context window size of the model you chose for inference. Apply alternate techniques to analyze such reviews:

For example, split the customer reviews of the product or service into multiple groups, analyze each group separately in the first iteration, then use the results of the first iteration as input context and generate the final output (that is, the final output review summary will be a summary of all review summaries from the first iteration). It might need multiple iterations depending on the volume of reviews.

Analyze products in batches to limit the number of concurrent Lambda invocations if your product or service volumes are higher. You might need an event scheduler to invoke the Lambda functions instead of the current Amazon S3 event notifications, which invoke one Lambda function per product JSON. Review Lambda quotas and function timeout to create batches. You can also consider alternate services such as AWS Step Functions or AWS Batch.
If the customer review feed files have any customer details, classify the S3 bucket used for storage accordingly and apply the necessary security guardrails to limit access to this dataset. Also, make sure you don’t include any customer information in the prompt to the FM. Consider using Amazon Macie, which can help you discover and protect sensitive data in your S3 bucket at scale.

Conclusion
Using generative AI FMs opens new possibilities for businesses to derive value from customer reviews. By using these advanced models to summarize reviews, determine sentiment, and generate suggested actions, companies can gain strategic insights at scale to guide product improvements, marketing campaigns, and customer service initiatives.
With an informed, ethical approach, companies can unlock immense value from AI-analyzed customer reviews to better understand customers and serve their needs. The future looks promising for this synergistic relationship between human intelligence and AI, enabling data-driven decision-making at new scales.
Resources
For further reading, refer to the following:

How Technology Leaders Can Prepare for Generative AI
The positive impact Generative AI could have for Retail
Generative AI: The Catalyst for Revolutionizing Physical Retail
Building with Amazon Bedrock Workshop
Automating product description generation with Amazon Bedrock

About the Authors

Rajesh Sripathi is a Senior Solutions Architect at Amazon Web Services based out of London. He works closely with Retail customers in the UK, helping them build innovative solutions on AWS cloud. Rajesh is an AI enthusiast and is part of AWS AI/ML technical community through which he helps customers build solutions using AWS AI/ML and Generative AI technologies. Outside of work, he is passionate about travel and driving.

Huma Zafar is an Associate Solutions Architect in the AWS UK FSI team. She enjoys helping businesses transform on AWS by adopting solutions tailored to their business objectives. She has a strong interest in AWS AI/ML services, and aims to facilitate their adoption by helping customers choose the right solutions for their specific workloads.

Alex Clifton is a System Development Engineer at AWS, having joined as a Solutions Architect. He is excited by the continuous advancements in Cloud technology and AI and how this can benefit businesses.

DataVisT5: A Powerful Pre-Trained Language Model for Seamless Data Vis …

Data visualizations (DVs) have become a common practice in the big data era, utilized by various applications and institutions to convey insights from massive raw data. However, creating suitable DVs remains a challenging task, even for experts, as it requires visual analysis expertise and familiarity with the domain data. Also, users must master complex declarative visualization languages (DVLs) to accurately define DV specifications. To lower the barriers to creating DVs and unlock their power for the general public, researchers have proposed a variety of DV-related tasks that have attracted significant attention from both industry and academia.

Existing research has explored various approaches to mitigate the challenges in data visualization-related tasks. Initial text-to-vis systems relied on predefined rules or templates, which were efficient but limited in handling the linguistic variability of user queries. To overcome these limitations, researchers have turned to neural network-based methods. For example, Data2Vis conceptualizes visualization generation as a sequence translation task, employing an encoder-decoder neural architecture. Similarly, RGVisNet initiates the text-to-vis process by retrieving a relevant query prototype, refining it through a graph neural network model, and then adjusting the query to fit the target scenario. Concurrently, vis-to-text has been proposed as a complementary task, with performance improvements demonstrated through a dual training framework. Researchers have also defined the task of free-form question answering over data visualizations, aiming to enhance the understanding of data and its visualizations. Also, several studies have focused on generating textual descriptions for data visualizations, adopting sequence-to-sequence model frameworks and employing transformer-based architectures to translate visual data into natural language summaries.

Researchers from PolyU, WeBank Co., Ltd, and HKUST propose an effective pre-trained language model (PLM) called DataVisT5. Building upon the text-centric T5 architecture, DataVisT5 enhances the pre-training process by incorporating a comprehensive array of cross-modal datasets that integrate natural language with data visualization knowledge, including DV queries, database schemas, and tables. Inspired by large language models that have incorporated programming code into their pre-training data, the researchers employ CodeT5+ as the starting checkpoint for DataVisT5, as it has been trained on code data. To reduce training complexity, the researchers apply table-level database schema filtration. To overcome the format consistency challenges between the data visualization and textual modalities, DataVisT5 introduces a unified encoding format for DV knowledge that facilitates the convergence of text and DV modalities. Also, the pre-training objectives for DataVisT5 include the span corruption approach of Masked Language Modeling (MLM) as utilized by the original T5 model, as well as a Bidirectional Dual-Corpus objective that operates on source-target pairings. After the mixed-objective pre-training, the researchers conduct multi-task fine-tuning of DataVisT5 on DV-related tasks, including text-to-vis, vis-to-text, FeVisQA, and table-to-text.

Concisely, the key contributions of this research are: 

Researchers introduced and released DataVisT5: the first PLM tailored for the joint understanding of text and DV.

Enhanced the text-centric T5 architecture to handle cross-modal information. Their hybrid pre-training objectives are conceived to unravel the complex interplay between DV and textual data, fostering a deeper integration of cross-modal insights. 

Extensive experiments on public datasets for diverse DV tasks including text-to-vis, vis-to-text, FeVisQA, and table-to-text demonstrate that DataVisT5 (proposed method) excels in multi-task settings, consistently outperforming strong baselines and establishing new SOTA performances.

Researchers have also provided basic definitions of various fundamental data visualization-related concepts so that users will have a profound understanding of the proposed method.Natural language questions enable users to formulate queries intuitively, even without specialized DV or programming skills. Declarative visualization languages, such as Vega-Lite and ggplot2, provide a set of specifications to define the construction of visualizations, including chart types, colors, sizes, and other visual properties. Visualization specifications, encoded in JSON format, describe the dataset and its visual attributes according to the syntax of a specific DVL. The data visualization query framework introduces a SQL-like query format to encapsulate the full spectrum of potential DVLs, allowing for conversion between different visualization specifications. Finally, the data visualization charts are the visual representations, such as scatters, bars, or maps, that convey the summarized data and insights defined by the visualization specification.

The proposed method DataVisT5, follows a comprehensive pipeline comprising five main stages: (1) Database schema filtration, (2) DV knowledge Encoding, (3) Standardized Encoding, (4) Model Pre-training, and (5) Model Fine-tuning. The database schema filtration process identifies the referenced tables in the given natural language question by comparing n-grams extracted from the database schema with those in the text. This allows the acquisition of a sub-database schema that is semantically aligned. The DV knowledge encoding phase then linearizes the DV knowledge, including DV queries, database schemas, and tables, into a unified format. The standardized encoding stage normalizes this DV knowledge to facilitate more efficient learning. The resulting corpus, in its unified form, is then used to pre-train the proposed DataVisT5 model. Finally, the pre-trained DataVisT5 undergoes multi-task fine-tuning on various DV-related tasks.

Database schema filtration technique matches n-grams between the natural language question and database tables, identifying relevant schema elements and extracting a sub-schema to minimize information loss during the integration of data visualization and text modalities.

To tackle the text-DV modality gap, the researchers propose a unified format for DV knowledge representation, enabling models to utilize extensive pre-training on smaller datasets and mitigating performance decline from data heterogeneity during multi-task training.

To mitigate the stylistic inconsistencies in the manually generated data visualization queries, the researchers implemented a preprocessing strategy. This includes standardizing the column notation, formatting parentheses and quotes, handling ordering clauses, replacing table aliases with actual names, and converting the entire query to lowercase. These steps help mitigate the learning challenges posed by the diverse annotation habits of multiple annotators, ensuring a more consistent format for the DV knowledge.

The researchers employ a bidirectional dual-corpus pretraining strategy, where the model is trained to translate randomly selected source and target corpora in both directions, enhancing the model’s ability to learn the relationship between text and data visualization knowledge.

The researchers employ temperature mixing to combine training data from all tasks, balancing the influence of each task and encouraging the model to learn representations beneficial across various corpora, leading to improved generalization and robustness in handling diverse data visualization tasks.

DataVisT5 demonstrates significant improvements over existing techniques like Seq2Vis, Transformer, RGVisNet, ncNet, and GPT-4. In extensive experiments, this approach achieved a remarkable 46.15% increase in the EM metric on datasets without join operations compared to the previous state-of-the-art RGVisNet model. Also, DataVisT5 outperformed the in-context learning approach using GPT-4 in scenarios involving join operations, enhancing the EM metric by 44.59% and 49.2%. Notably, in these challenging join operation scenarios where other models have historically struggled, DataVisT5 achieved an impressive EM of 0.3451. The ablation study highlights the effectiveness of the proposed approach, with finetuned models of 220M and 770M parameters consistently outperforming the finetuned CodeT5+ model. These results underscore the superior comprehension of DataVisT5 when it comes to DV query syntax and semantics, benefiting from the hybrid objectives pre-training.

In this study, the researchers have proposed an effective pre-trained language model called DataVisT5, specifically designed to enhance the integration of cross-modal information in DV knowledge and natural language associations. DataVisT5 introduces a unique mechanism to capture highly relevant database schemas from natural language mentions of tables, effectively unifying and normalizing the encoding of DV knowledge, including DV queries, database schemas, and tables. The robust hybrid pre-training objectives employed in this model help unravel the complex interplay between DV and textual data, fostering a deeper integration of cross-modal insights. 

By extending the text-centric T5 architecture to adeptly process cross-modal information, DataVisT5 addresses multiple tasks related to data visualization with remarkable performance. The extensive experimental results demonstrate that DataVisT5 consistently outperforms state-of-the-art models across a wide range of DV tasks, expanding the applications of pre-trained language models and pushing the boundaries of what is achievable in automated data visualization and interpretation. This research represents a significant advancement in the field and opens up new avenues for further exploration and innovation.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here
The post DataVisT5: A Powerful Pre-Trained Language Model for Seamless Data Visualization Tasks appeared first on MarkTechPost.

Automated Design of Agentic Systems(ADAS): A New Research Problem that …

Automated design in artificial intelligence (AI) is an emerging field focusing on developing systems capable of independently generating and optimizing their components. This approach is built on the premise that machine learning can surpass the limitations of manual design, enabling the creation of more efficient, adaptable, and powerful AI systems. The aim is to allow these systems to autonomously innovate, adapt, and solve increasingly complex tasks, particularly in environments that demand dynamic and flexible problem-solving abilities.

The core challenge in AI development is the significant manual effort required to design, configure, and fine-tune these systems for specific applications. As AI is applied to more complex and varied tasks, the demand for systems operating efficiently without extensive human intervention becomes critical. The problem is more than just the time and expertise needed; it is also about the inherent limitations of human-designed solutions. There is a growing recognition that automating the design process could lead to the discovery of novel and superior AI architectures that may need to be evident through traditional, human-centered approaches.

Traditionally, AI systems have relied on manual design methods, where researchers and engineers painstakingly develop and integrate components like prompts, control flows, and tools tailored for specific tasks. These methods, although successful, are inherently limited by the need for extensive human expertise and the time-consuming nature of the design process. Recent advancements in areas such as automated machine learning (AutoML) and AI-generating algorithms (AI-GAs) have alleviated these constraints by introducing some level of automation in the system design process. However, these methods often need to be expanded in scope, focusing primarily on specific components rather than the entire system architecture.

Researchers from the University of British Columbia, the Vector Institute, and Canada CIFAR AI Chair introduced a groundbreaking approach called Automated Design of Agentic Systems (ADAS). This method aims to fully automate the design of AI systems by employing a meta-agent that programs new agents in code. The ADAS approach is distinct in that it explores a vast search space of possible system configurations, enabling the discovery of more effective and efficient AI architectures without requiring manual intervention. The meta-agent iteratively creates, evaluates, and refines agentic systems, using an ever-growing archive of previous designs as a foundation for further innovation.

The ADAS method allows the meta-agent to program new agents based on a framework of simple yet essential functions, such as querying foundation models (FMs) or formatting prompts. The core idea is to instruct the meta-agent to iteratively create agents, test their performance on various tasks, and then use the results to inform subsequent iterations. This process encourages the meta-agent to explore novel and interesting designs, which are evaluated for effectiveness. ADAS can discover agentic systems that outperform state-of-the-art hand-designed agents across multiple domains through this iterative process.

The ADAS method has shown remarkable results. For instance, agents discovered by the ADAS algorithm improved F1 scores on reading comprehension tasks by 13.6 points and accuracy rates on math tasks by 14.4%. These agents also demonstrated impressive transferability, achieving accuracy improvements of 25.9% and 13.2% on math tasks when transferred across different domains. The ADAS-discovered agents maintained high performance even when applied to other models, such as GPT-4 and Claude-Sonnet, outperforming manually designed agents significantly. This robustness underscores the potential of ADAS to revolutionize the design and deployment of AI systems.

The ADAS approach represents a significant advancement in AI, offering a more efficient and potentially more innovative path to developing advanced agentic systems. By automating the discovery of effective AI components and architectures, ADAS reduces the reliance on manual design efforts and opens the door to creating more adaptable and efficient AI solutions. The method’s ability to discover generalizable design patterns and transfer them across different domains and models further highlights its potential to reshape the landscape of AI development.

In conclusion, the introduction of ADAS marks a pivotal moment in AI research, demonstrating that the full automation of AI system design is not only possible but also highly effective. The iterative process employed by the meta-agent allows for continuous innovation, leading to the discovery of agentic systems that surpass the capabilities of manually designed counterparts. As AI continues to evolve, methods like ADAS will be crucial in enabling the development of more powerful, efficient, and adaptable systems.

Check out the Paper, GitHub, and Project. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here
The post Automated Design of Agentic Systems(ADAS): A New Research Problem that Aims to Invent Novel Building Blocks and Design Powerful Agentic Systems Automatically appeared first on MarkTechPost.

CarbonClipper: A Learning-Augmented Algorithm for Carbon-Aware Workloa …

Data centers are poised to be among the world’s largest electricity consumers. If there is no meaningful change, they will consume between 10% and 20% of the electricity used in the U.S. by 2030. This explosive energy demand is influenced by the increasing computational demand, especially for new generative AI applications. Growth at this rate also comes at a heavy environmental cost, namely the challenge of averting carbon emissions despite global initiatives to fight climate change. In this vein, researchers probe creative ways in which the operations of a data center should be conducted so that growth does not come at an environmental price.

This has mostly to do with the biggest intermittent factor when renewable energy is concerned—this factor can get very critically high or low. This thus creates a convoluted issue—since the data centers will need to adjust their workload management to optimize this period when carbon intensity is relatively low. This problem is further confounded by the need to balance carbon-aware scheduling with the operational constraints of the data centers, such as meeting deadlines for computational tasks and minimizing the associated costs of moving workloads across different geographical locations.

The approaches for managing the workloads in the data center need to be greatly toned to fully incorporate the variability of carbon intensity in space and time. The traditional methods could focus on either energy efficiency or cost reduction without reflecting on the impact of their decisions on carbon emissions. There are limitations in the current algorithms and models when dealing with the combined challenges due to movement costs and deadline constraints, especially for workload migration across different locations, hence becoming necessary for carbon efficiency.

Researchers from the University of Massachusetts Amherst & the California Institute of Technology teams have presented a new technique, CarbonClipper, which is a learning-augmented algorithm developed to gracefully manage workloads in a carbon-aware manner across a global network of data centers. Our approach uses forecasts, like that of carbon intensity, for the optimal allocation and scheduling of computational tasks under the movement costs associated with workload migration and constraints of tasks derived from their deadlines.

CarbonClipper is a competitive online algorithm incorporating machine learning predictions while optimizing consistency and robustness. This algorithm has been designed to strategically manipulate workloads, moving them in location and time with low-carbon energy availability of data centers. It does so by avoiding any overhead in terms of the costs of such migrations by optimizing the timing and locations of workload execution in a way that not only presents major carbon reductions but also does not miss the deadline for executing computational tasks.

The performance improvements that CarbonClipper brought compared to existing methods were mind-blowing. Specifically, the performance increased by at least 32% compared to the baseline techniques. In addition, the reduction in carbon emissions is mind-boggling, at 88.7% from a carbon-agnostic scheduler. These are the outcomes from abundant simulations that actualized over a global network of data centers—realistic test beds to evaluate the effectiveness of CarbonClipper. The simulations also showed how important it is to allow the algorithm to make real-time decisions based on forecasts of carbon intensity to later make dynamic adjustments in facing innovations—that is, changes—while keeping high efficiency and environmental performance.

To conclude, the study is a solution with solid means for the challenge of making the data center low in carbon footprints. The introduction by the research team of CarbonClipper answers the problematic challenge of carbon-aware workload management. It provides an approach toward reducing emissions while sustaining efficiency and effectiveness in data center operations. This approach has great potential for wide application within the industry and represents a significant step ahead in the sustainable computing arena.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here
The post CarbonClipper: A Learning-Augmented Algorithm for Carbon-Aware Workload Management that Achieves the Optimal Robustness Consistency Trade-off appeared first on MarkTechPost.