Meet MovieChat: An Innovative Video Understanding System that Integrat …

Large Language Models (LLMs) have recently made considerable strides in the Natural Language Processing (NLP) sector. Adding multi-modality to LLMs and transforming them into Multimodal Large Language Models (MLLMs), which can perform multimodal perception and interpretation, is a logical step. As a possible step towards Artificial General Intelligence (AGI), MLLMs have demonstrated astounding emergent skills in various multimodal tasks like perception (e.g., existence, count, location, OCR), commonsense reasoning, and code reasoning. MLLMs offer a more human-like perspective of the environment, a user-friendly interface for interaction, and a wider range of task-solving skills compared to LLMs and other task-specific models. 

Existing vision-centric MLLMs use the Q-former or basic projection layer, pre-trained LLMs, a visual encoder, and extra learnable modules. A different paradigm combines current visual perception tools (such as tracking and classification) with LLMs through API to construct a system without training. Some earlier studies in the video sector developed video MLLMs using this paradigm. However, there had never been any investigation of a model or system based on lengthy movies (those lasting longer than a minute), and there had never been set criteria against which to measure the effectiveness of these systems. 

In this study researchers from Zhejiang University, University of Washington, Microsoft Research Asia, and Hong Kong University introduce MovieChat, a unique framework for lengthy video interpretation challenges that combines vision models with LLMs. According to them, the remaining difficulties for extended video comprehension include computing difficulty, memory expense, and long-term temporal linkage. To do this, they suggest a memory system based on the Atkinson-Shiffrin memory model, which entails a quickly updated short-term memory and a compact, long-lasting memory. 

This unique framework combines vision models with LLMs and is the first to enable extended video comprehension tasks. This work is summarised as follows. They undertake rigorous quantitative assessments and case studies to assess the performance of both understanding capability and inference cost, and they offer a type of memory mechanism to minimize computing complexity and memory cost while improving the long-term temporal link. This research concludes by presenting a novel approach for comprehending videos that combine huge language models with video foundation models. 

The system solves difficulties with analyzing lengthy films by including a memory process inspired by the Atkinson-Shiffrin model, consisting of short-term and long-term memory represented by tokens in Transformers. The suggested system, MovieChat, outperforms previous algorithms that can only process films containing a few frames by achieving state-of-the-art performance in extended video comprehension. This method addresses long-term temporal relationships while lowering memory use and computing complexity. The work highlights the role of memory processes in video comprehension, which allows the model to store and recall pertinent information for lengthy periods. The popularity of MovieChat has practical ramifications for industries, including content analysis, video recommendation systems, and video monitoring. Future studies might look into ways to strengthen the memory system and use additional modalities, including audio, to increase video comprehension. This study creates possibilities for applications needing a thorough comprehension of visual data. Their website has multiple demos.

Check out the Paper, GitHub, and Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 27k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

The post Meet MovieChat: An Innovative Video Understanding System that Integrates Video Foundation Models and Large Language Models appeared first on MarkTechPost.

Put Me in the Center Quickly: Subject-Diffusion is an AI Model That Ca …

Text-to-image models have been the cornerstone of every AI discussion for the last year. The advancement in the field happened quite rapidly, and as a result, we have impressive text-to-image models. Generative AI has entered a new phase.

Diffusion models were the key contributors to this advancement. They have emerged as a powerful class of generative models. These models are designed to generate high-quality images by slowly denoising the input into a desired image. Diffusion models can capture hidden data patterns and generate diverse and realistic samples.

The rapid advancement of diffusion-based generative models has revolutionized text-to-image generation methods. You can ask for an image, whatever you can think of, describe it, and the models can generate it for you quite accurately. As they progress further, it is getting difficult to understand which images are generated by AI. 

However, there is an issue here. These models solely rely on textual descriptions to generate images. You can only “describe” what you want to see. Moreover, they are not easy to personalize as that would require fine-tuning in most cases. 

Imagine doing an interior design of your house, and you work with an architect. The architect could only offer you designs he did for previous clients, and when you try to personalize some part of the design, he simply ignores it and offers you another used style. Does not sound very pleasing, does it? This might be the experience you will get with text-to-image models if you are looking for personalization.

Thankfully, there have been attempts to overcome these limitations. Researchers have explored integrating textual descriptions with reference images to achieve more personalized image generation. While some methods require fine-tuning on specific reference images, others retrain the base models on personalized datasets, leading to potential drawbacks in fidelity and generalization. Additionally, most existing algorithms cater to specific domains, leaving gaps in handling multi-concept generation, test-time fine-tuning, and open-domain zero-shot capability.

So, today we meet with a new approach that brings us closer to open-domain personalization—time to meet with Subject-Diffusion.

SubjectDiffusion can generate high-fidelity subject-driven images. Source: https://arxiv.org/pdf/2307.11410.pdf

Subject-Diffusion is an innovative open-domain personalized text-to-image generation framework. It utilizes only one reference image and eliminates the need for test-time fine-tuning. To build a large-scale dataset for personalized image generation, it builds upon an automatic data labeling tool, resulting in the Subject-Diffusion Dataset (SDD) with an impressive 76 million images and 222 million entities.

Subject-Diffusion has three main components: location control, fine-grained reference image control, and attention control. Location control involves adding mask images of main subjects during the noise injection process. Fine-grained reference image control uses a combined text-image information module to improve the integration of both granularities. To enable the smooth generation of multiple subjects, attention control is introduced during training.

Overview of SubjectDiffusion. Source: https://arxiv.org/pdf/2307.11410.pdf

Subject-Diffusion achieves impressive fidelity and generalization, capable of generating single, multiple, and human-subject personalized images with modifications to shape, pose, background, and style based on just one reference image per subject. The model also enables smooth interpolation between customized images and text descriptions through a specially designed denoising process. Quantitative comparisons show that Subject-Diffusion outperforms or matches other state-of-the-art methods, both with and without test-time fine-tuning, on various benchmark datasets.

Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 27k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

The post Put Me in the Center Quickly: Subject-Diffusion is an AI Model That Can Achieve Open Domain Personalized Text-to-Image Generation appeared first on MarkTechPost.

Optimize data preparation with new features in AWS SageMaker Data Wran …

Data preparation is a critical step in any data-driven project, and having the right tools can greatly enhance operational efficiency. Amazon SageMaker Data Wrangler reduces the time it takes to aggregate and prepare tabular and image data for machine learning (ML) from weeks to minutes. With SageMaker Data Wrangler, you can simplify the process of data preparation and feature engineering and complete each step of the data preparation workflow, including data selection, cleansing, exploration, and visualization from a single visual interface.
In this post, we explore the latest features of SageMaker Data Wrangler that are specifically designed to improve the operational experience. We delve into the support of Simple Storage Service (Amazon S3) manifest files, inference artifacts in an interactive data flow, and the seamless integration with JSON (JavaScript Object Notation) format for inference, highlighting how these enhancements make data preparation easier and more efficient.
Introducing new features
In this section, we discuss the SageMaker Data Wrangler’s new features for optimal data preparation.
S3 manifest file support with SageMaker Autopilot for ML inference
SageMaker Data Wrangler enables a unified data preparation and model training experience with Amazon SageMaker Autopilot in just a few clicks. You can use SageMaker Autopilot to automatically train, tune, and deploy models on the data that you’ve transformed in your data flow.
This experience is now further simplified with S3 manifest file support. An S3 manifest file is a text file that lists the objects (files) stored in an S3 bucket. If your exported dataset in SageMaker Data Wrangler is quite big and split into multiple-part data files in Amazon S3, now SageMaker Data Wrangler will automatically create a manifest file in S3 representing all these data files. This generated manifest file can now be used with the SageMaker Autopilot UI in SageMaker Data Wrangler to pick up all the partitioned data for training.
Before this feature launch, when using SageMaker Autopilot models trained on prepared data from SageMaker Data Wrangler, you could only choose one data file, which might not represent the entire dataset, especially if the dataset is very large. With this new manifest file experience, you’re not limited to a subset of your dataset. You can build an ML model with SageMaker Autopilot representing all your data using the manifest file and use that for your ML inference and production deployment. This feature enhances operational efficiency by simplifying training ML models with SageMaker Autopilot and streamlining data processing workflows.
Added support for inference flow in generated artifacts
Customers want to take the data transformations they’ve applied to their model training data, such as one-hot encoding, PCA, and impute missing values, and apply those data transformations to real-time inference or batch inference in production. To do so, you must have a SageMaker Data Wrangler inference artifact, which is consumed by a SageMaker model.
Previously, inference artifacts could only be generated from the UI when exporting to SageMaker Autopilot training or exporting an inference pipeline notebook. This didn’t provide flexibility if you wanted to take your SageMaker Data Wrangler flows outside of the Amazon SageMaker Studio environment. Now, you can generate an inference artifact for any compatible flow file through a SageMaker Data Wrangler processing job. This enables programmatic, end-to-end MLOps with SageMaker Data Wrangler flows for code-first MLOps personas, as well as an intuitive, no-code path to get an inference artifact by creating a job from the UI.
Streamlining data preparation
JSON has become a widely adopted format for data exchange in modern data ecosystems. SageMaker Data Wrangler’s integration with JSON format allows you to seamlessly handle JSON data for transformation and cleaning. By providing native support for JSON, SageMaker Data Wrangler simplifies the process of working with structured and semi-structured data, enabling you to extract valuable insights and prepare data efficiently. SageMaker Data Wrangler now supports JSON format for both batch and real-time inference endpoint deployment.
Solution overview
For our use case, we use the sample Amazon customer reviews dataset to show how SageMaker Data Wrangler can simplify the operational effort to build a new ML model using SageMaker Autopilot. The Amazon customer reviews dataset contains product reviews and metadata from Amazon, including 142.8 million reviews spanning May 1996 to July 2014.
On a high level, we use SageMaker Data Wrangler to manage this large dataset and perform the following actions:

Develop an ML model in SageMaker Autopilot using all of the dataset, not just a sample.
Build a real-time inference pipeline with the inference artifact generated by SageMaker Data Wrangler, and use JSON formatting for input and output.

S3 manifest file support with SageMaker Autopilot
When creating a SageMaker Autopilot experiment using SageMaker Data Wrangler, you could previously only specify a single CSV or Parquet file. Now you can also use an S3 manifest file, allowing you to use large amounts of data for SageMaker Autopilot experiments. SageMaker Data Wrangler will automatically partition input data files into several smaller files and generate a manifest that can be used in a SageMaker Autopilot experiment to pull in all the data from the interactive session, not just a small sample.
Complete the following steps:

Import the Amazon customer review data from a CSV file into SageMaker Data Wrangler. Make sure to disable sampling when importing the data.
Specify the transformations that normalize the data. For this example, remove symbols and transform everything into lowercase using SageMaker Data Wrangler’s built-in transformations.
Choose Train model to start training.

To train a model with SageMaker Autopilot, SageMaker automatically exports data to an S3 bucket. For large datasets like this one, it will automatically break up the file into smaller files and generate a manifest that includes the location of the smaller files.

First, select your input data.

Earlier, SageMaker Data Wrangler didn’t have an option to generate a manifest file to use with SageMaker Autopilot. Today, with the release of manifest file support, SageMaker Data Wrangler will automatically export a manifest file to Amazon S3, pre-fill the S3 location of the SageMaker Autopilot training with the manifest file S3 location, and toggle the manifest file option to Yes. No work is necessary to generate or use the manifest file.

Configure your experiment by selecting the target for the model to predict.
Next, select a training method. In this case, we select Auto and let SageMaker Autopilot decide the best training method based on the dataset size.

Specify the deployment settings.
Finally, review the job configuration and submit the SageMaker Autopilot experiment for training. When SageMaker Autopilot completes the experiment, you can view the training results and explore the best model.

Thanks to support for manifest files, you can use your entire dataset for the SageMaker Autopilot experiment, not just a subset of your data.
For more information on using SageMaker Autopilot with SageMaker Data Wrangler, see Unified data preparation and model training with Amazon SageMaker Data Wrangler and Amazon SageMaker Autopilot.
Generate inference artifacts from SageMaker Processing jobs
Now, let’s look at how we can generate inference artifacts through both the SageMaker Data Wrangler UI and SageMaker Data Wrangler notebooks.
SageMaker Data Wrangler UI
For our use case, we want to process our data through the UI and then use the resulting data to train and deploy a model through the SageMaker console. Complete the following steps:

Open the data flow your created in the preceding section.
Choose the plus sign next to the last transform, choose Add destination, and choose Amazon S3. This will be where the processed data will be stored.
Choose Create job.
Select Generate inference artifacts in the Inference parameters section to generate an inference artifact.
For Inference artifact name, enter the name of your inference artifact (with .tar.gz as the file extension).
For Inference output node, enter the destination node corresponding to the transforms applied to your training data.
Choose Configure job.
Under Job configuration, enter a path for Flow file S3 location. A folder called data_wrangler_flows will be created under this location, and the inference artifact will be uploaded to this folder. To change the upload location, set a different S3 location.
Leave the defaults for all other options and choose Create to create the processing job. The processing job will create a tarball (.tar.gz) containing a modified data flow file with a newly added inference section that allows you to use it for inference. You need the S3 uniform resource identifier (URI) of the inference artifact to provide the artifact to a SageMaker model when deploying your inference solution. The URI will be in the form {Flow file S3 location}/data_wrangler_flows/{inference artifact name}.tar.gz.
If you didn’t note these values earlier, you can choose the link to the processing job to find the relevant details. In our example, the URI is s3://sagemaker-us-east-1-43257985977/data_wrangler_flows/example-2023-05-30T12-20-18.tar.gz.
Copy the value of Processing image; we need this URI when creating our model, too.
We can now use this URI to create a SageMaker model on the SageMaker console, which we can later deploy to an endpoint or batch transform job.
Under Model settings¸ enter a model name and specify your IAM role.
For Container input options, select Provide model artifacts and inference image location.
For Location of inference code image, enter the processing image URI.
For Location of model artifacts, enter the inference artifact URI.
Additionally, if your data has a target column that will be predicted by a trained ML model, specify the name of that column under Environment variables, with INFERENCE_TARGET_COLUMN_NAME as Key and the column name as Value.
Finish creating your model by choosing Create model.

We now have a model that we can deploy to an endpoint or batch transform job.
SageMaker Data Wrangler notebooks
For a code-first approach to generate the inference artifact from a processing job, we can find the example code by choosing Export to on the node menu and choosing either Amazon S3, SageMaker Pipelines, or SageMaker Inference Pipeline. We choose SageMaker Inference Pipeline in this example.

In this notebook, there is a section titled Create Processor (this is identical in the SageMaker Pipelines notebook, but in the Amazon S3 notebook, the equivalent code will be under the Job Configurations section). At the bottom of this section is a configuration for our inference artifact called inference_params. It contains the same information that we saw in the UI, namely the inference artifact name and the inference output node. These values will be prepopulated but can be modified. There is additionally a parameter called use_inference_params, which needs to be set to True to use this configuration in the processing job.

Further down is a section titled Define Pipeline Steps, where the inference_params configuration is appended to a list of job arguments and passed into the definition for a SageMaker Data Wrangler processing step. In the Amazon S3 notebook, job_arguments is defined immediately after the Job Configurations section.

With these simple configurations, the processing job created by this notebook will generate an inference artifact in the same S3 location as our flow file (defined earlier in our notebook). We can programmatically determine this S3 location and use this artifact to create a SageMaker model using the SageMaker Python SDK, which is demonstrated in the SageMaker Inference Pipeline notebook.
The same approach can be applied to any Python code that creates a SageMaker Data Wrangler processing job.
JSON file format support for input and output during inference
It’s pretty common for websites and applications to use JSON as request/response for APIs so that the information is easy to parse by different programming languages.
Previously, after you had a trained model, you could only interact with it via CSV as an input format in a SageMaker Data Wrangler inference pipeline. Today, you can use JSON as an input and output format, providing more flexibility when interacting with SageMaker Data Wrangler inference containers.
To get started with using JSON for input and output in the inference pipeline notebook, complete the follow steps:

Define a payload.

For each payload, the model is expecting a key named instances. The value is a list of objects, each being its own data point. The objects require a key called features, and the values should be the features of a single data point that are intended to be submitted to the model. Multiple data points can be submitted in a single request, up to a total size of 6 MB per request.
See the following code:

sample_record_payload = json.dumps
(
{
“instances”:[
{“features”:[“This is the best”, “I’d use this product twice a day every day if I could. it’s the best ever”]
}
]
}
)

Specify the ContentType as application/json.
Provide data to the model and receive inference in JSON format.

See Common Data Formats for Inference for sample input and output JSON examples.
Clean up
When you are finished using SageMaker Data Wrangler, we recommend that you shut down the instance it runs on to avoid incurring additional charges. For instructions on how to shut down the SageMaker Data Wrangler app and associated instance, see Shut Down Data Wrangler.
Conclusion
SageMaker Data Wrangler’s new features, including support for S3 manifest files, inference capabilities, and JSON format integration, transform the operational experience of data preparation. These enhancements streamline data import, automate data transformations, and simplify working with JSON data. With these features, you can enhance your operational efficiency, reduce manual effort, and extract valuable insights from your data with ease. Embrace the power of SageMaker Data Wrangler’s new features and unlock the full potential of your data preparation workflows.
To get started with SageMaker Data Wrangler, check out the latest information on the SageMaker Data Wrangler product page.

About the authors
Munish Dabra is a Principal Solutions Architect at Amazon Web Services (AWS). His current areas of focus are AI/ML and Observability. He has a strong background in designing and building scalable distributed systems. He enjoys helping customers innovate and transform their business in AWS. LinkedIn: /mdabra
Patrick Lin is a Software Development Engineer with Amazon SageMaker Data Wrangler. He is committed to making Amazon SageMaker Data Wrangler the number one data preparation tool for productionized ML workflows. Outside of work, you can find him reading, listening to music, having conversations with friends, and serving at his church.

Index your Alfresco content using the new Amazon Kendra Alfresco conne …

Amazon Kendra is a highly accurate and simple-to-use intelligent search service powered by machine learning (ML). Amazon Kendra offers a suite of data source connectors to simplify the process of ingesting and indexing your content, wherever it resides.
Valuable data in organizations is stored in both structured and unstructured repositories. An enterprise search solution should be able to index and search across several structured and unstructured repositories.
Alfresco Content Services provides open, flexible, highly scalable enterprise content management (ECM) capabilities with the added benefits of a content services platform, making content accessible wherever and however you work through easy integrations with the business applications you use every day. Many organizations use the Alfresco content management platform to store their content. One of the key requirements for enterprise customers using Alfresco is the ability to easily and securely find accurate information across all the stored documents.
We are excited to announce that you can now use the new Amazon Kendra Alfresco connector to search documents stored in your Alfresco repositories and sites. In this post, we show how to use the new connector to retrieve documents stored in Alfresco for indexing purposes and securely use the Amazon Kendra intelligent search function. In addition, the ML-powered intelligent search can accurately find information from unstructured documents with natural language narrative content, for which keyword search is not very effective.
What’s new in the Amazon Kendra Alfresco connector
The Amazon Kendra Alfresco connector offers support for the following:

Basic and OAuth2 authentication mechanisms for the Alfresco On-Premises (On-Prem) platform
Basic and OAuth2 authentication mechanisms for the Alfresco PaaS platform
Aspect-based crawling of Alfresco repository documents

Solution overview
With Amazon Kendra, you can configure multiple data sources to provide a central place to search across your document repositories and sites. The solution in this post demonstrates the following:

Retrieval of documents and comments from Alfresco private sites and public sites
Retrieval of documents and comments from Alfresco repositories using Amazon Kendra-specific aspects
Authentication against Alfresco On-Prem and PaaS platforms using Basic and OAuth2 mechanisms, respectively
The Amazon Kendra search capability with access control across sites and repositories

If you are going to use only one of the platforms, you can still follow this post to build the example solution; just ignore the steps corresponding to the platform that you are not using.
The following is a summary of the steps to build the example solution:

Upload documents to the three Alfresco sites and the repository folder. Make sure the uploaded documents are unique across sites and repository folders.
For the two private sites and repository, use document-level Alfresco permission management to set access permissions. For the public site, you don’t need to set up permissions at the document level. Note that permissions information is retrieved by the Amazon Kendra Alfresco connector and used for access control by the Amazon Kendra search function.
For the two private sites and repository, create a new Amazon Kendra index (you use the same index for both the private sites and the repository). For the public site, create a new Amazon Kendra index.
For the On-Prem private site, create an Amazon Kendra Alfresco data source using Basic authentication, within the Amazon Kendra index for private sites.
For the On-Prem repository documents with Amazon Kendra-specific aspects, create a data source using Basic authentication, within the Amazon Kendra index for private sites.
For the PaaS private site, create a data source using Basic authentication, within the Amazon Kendra index for private sites.
For the PaaS public site, create a data source using OAuth2 authentication, within the Amazon Kendra index for public sites.
Perform a sync for each data source.
Run a test query in the Amazon Kendra index meant for private sites and the repository using access control.
Run a test query in the Amazon Kendra index meant for public sites without access control.

Prerequisites
You need an AWS account with privileges to create AWS Identity and Access Management (IAM) roles and policies. For more information, see Overview of access management: Permissions and policies. You need to have a basic knowledge of AWS and how to navigate the AWS Management Console.
For the Alfresco On-Prem platform, complete the following steps:

Create a private site or use an existing site.
Create a repository folder or use an existing repository folder.
Get the repository URL.
Get Basic authentication credentials (user ID and password).
Make sure authentication are part of the ALFRESCO_ADMINISTRATORS group.
Get the public X509 certificate in .pem format and save it locally.

For the Alfresco PaaS platform, complete the following steps:

Create a private site or use an existing site.
Create a public site or use an existing site.
Get the repository URL.
Get Basic authentication credentials (user ID and password).
Get OAuth2 credentials (client ID, client secret, and token URL).
Confirm that authentication users are part of the ALFRESCO_ADMINISTRATORS group.

Step 1: Upload example documents
Each uploaded document must have 5 MB or less in text. For more information, see Amazon Kendra Service Quotas. You can upload example documents or use existing documents within each site.
As shown in the following screenshot, we have uploaded four documents to the Alfresco On-Prem private site.

We have uploaded three documents to the Alfresco PaaS private site.

We have uploaded five documents to the Alfresco PaaS public site.

We have uploaded two documents to the Alfresco On-Prem repository.

Assign the aspect awskendra:indexControl to one or more documents in the repository folder.

Step 2: Configure Alfresco permissions
Use the Alfresco Permissions Management feature to give access rights to example users for viewing uploaded documents. It is assumed that you have some example Alfresco user names, with email addresses, that can be used for setting permissions at the document level in private sites. These users are not used for crawling the sites.
In the following example for the On-Prem private site, we have provided users My Dev User1 and My Dev User2 with site-consumer access to the example document. Repeat the same procedure for the other uploaded documents.

In the following example for the PaaS private site, we have provided user Kendra User 3 with site-consumer access to the example document. Repeat the same procedure for the other uploaded documents.

For the Alfresco repository documents, we have provided user My Dev user1 with consumer access to the example document.

The following table lists the site or repository names, document names, and permissions.

Platform
Site or Repository Name
Document Name
User IDs

On-Prem
MyAlfrescoSite
ChannelMarketingBudget.xlsx
My Manager User3

On-Prem
MyAlfrescoSite
wellarchitected-sustainability-pillar.pdf
My Dev User1, My Dev User2

On-Prem
MyAlfrescoSite
WorkDocs.docx
My Dev User1, My Dev User2, My Manager User3

On-Prem
MyAlfrescoSite
WorldPopulation.csv
My Dev User1, My Dev User2, My Manager User3

PaaS
MyAlfrescoCloudSite2
DDoS_White_Paper.pdf
Kendra User3

PaaS
MyAlfrescoCloudSite2
wellarchitected-framework.pdf
Kendra User3

PaaS
MyAlfrescoCloudSite2
ML_Training.pptx
Kendra User1

PaaS
MyAlfrescoCloudPublicSite
batch_user.pdf
Everyone

PaaS
MyAlfrescoCloudPublicSite
Amazon Simple Storage Service – User Guide.pdf
Everyone

PaaS
MyAlfrescoCloudPublicSite
AWS Batch – User Guide.pdf
Everyone

PaaS
MyAlfrescoCloudPublicSite
Amazon Detective.docx
Everyone

PaaS
MyAlfrescoCloudPublicSite
Pricing.xlsx
Everyone

On-Prem
Repo: MyAlfrescoRepoFolder1
Polly-dg.pdf (aspect awskendra:indexControl)
My Dev User1

On-Prem
Repo: MyAlfrescoRepoFolder1
Transcribe-api.pdf (aspect awskendra:indexControl)
My Dev User1

Step 3: Set up Amazon Kendra indexes
You can create a new Amazon Kendra index or use an existing index for indexing documents hosted in Alfresco private sites. To create a new index, complete the following steps:

On the Amazon Kendra console, create an index called Alfresco-Private.
Create a new IAM role, then choose Next.
For Access Control, choose Yes.
For Token Type¸ choose JSON.
Keep the user name and group as default.
Choose None for user group expansion because we are assuming no integration with AWS IAM Identity Center (successor to AWS Single Sign-On).
Choose Next.
Choose Developer Edition for this example solution.
Choose Create to create a new index.

The following screenshot shows the Alfresco-Private index after it has been created.

You can verify the access control configuration on the User access control tab.

Repeat these steps to create a second index called Alfresco-Public.

Step 4: Create a data source for the On-Prem private site
To create a data source for the On-Prem private site, complete the following steps:

On the Amazon Kendra console, navigate to the Alfresco-Private index.
Choose Data sources in the navigation pane.
Choose Add data source.

Choose Add connector for the Alfresco connector.

For Data source name, enter Alfresco-OnPrem-Private.
Optionally, add a description.
Keep the remaining settings as default and choose Next.

To connect to the Alfresco On-Prem site, the connector needs access to the public certificate corresponding to the On-Prem server. This was one of the prerequisites.

Use a different browser tab to upload the .pem file to an Amazon Simple Storage Service (Amazon S3) bucket in your account.

You use this S3 bucket name in the next steps.

Return to the data source creation page.
For Source, select Alfresco server.
For Alfresco repository URL, enter the repository URL (created as a prerequisite).
For Alfresco user application URL, enter the same value as the repository URL.
For SSL certificate location, choose Browse S3 and choose the S3 bucket where you uploaded the .pem file.
For Authentication, select Basic authentication.
For AWS Secrets Manager secret, choose Create and add new secret.

A pop-up window opens to create an AWS Secrets Manager secret.

Enter a name for your secret, user name, and password, then choose Save.

For Virtual Private Cloud (VPC), choose No VPC.
Turn the identity crawler on.
For IAM role, choose Create a new IAM role.
Choose Next.

You can configure the data source to synchronize contents from one or more Alfresco sites. For this post, we sync to the on-prem private site.

For Content to sync, select Single Alfresco site sync and choose MyAlfrescoSite.
Select Include comments to retrieve comments in addition to documents.
For Sync mode, select Full sync.
For Frequency, choose Run on demand (or a different frequency option as needed).
Choose Next.

Map the Alfresco document fields to the Amazon Kendra index fields (you can keep the defaults), then choose Next.

On the Review and Create page, verify all the information, then choose Add data source.

After the data source has been created, the data source page is displayed as shown in the following screenshot.

Step 5: Create a data source for the On-Prem repository documents with Amazon Kendra-specific aspects
Similarly to the previous steps, create a data source for the On-Prem repository documents with Amazon Kendra-specific aspects:

On the Amazon Kendra console, navigate to the Alfresco-Private index.
Choose Data sources in the navigation pane.
Choose Add data source.
Choose Add connector for the Alfresco connector.
For Data source name, enter Alfresco-OnPrem-Aspects.
Optionally, add a description.
Keep the remaining settings as default and choose Next.
For Source, select Alfresco server.
For Alfresco repository URL, enter the repository URL (created as a prerequisite).
For Alfresco user application URL, enter the same value as the repository URL.
For SSL certificate location, choose Browse S3 and choose the S3 bucket where you uploaded the .pem file.
For Authentication, select Basic authentication.
For AWS Secrets Manager secret, choose the secret you created earlier.
For Virtual Private Cloud (VPC), choose No VPC.
Turn the identity crawler off.
For IAM role, choose Create a new IAM role.
Choose Next.

For this scope, the connector retrieves only those On-Prem server repository documents that have been assigned an aspect called awskendra:indexControl.

For Content to sync, select Alfresco aspects sync.
For Sync mode, select Full sync.
For Frequency, choose Run on demand (or a different frequency option as needed).
Choose Next.
Map the Alfresco document fields to the Amazon Kendra index fields (you can keep the defaults), then choose Next.
On the Review and Create page, verify all the information, then choose Add data source.

After the data source has been created, the data source page is displayed as shown in the following screenshot.

Step 6: Create a data source for the PaaS private site
Follow similar steps as the previous sections to create a data source for the PaaS private site:

On the Amazon Kendra console, navigate to the Alfresco-Private index.
Choose Data sources in the navigation pane.
Choose Add data source.
Choose Add connector for the Alfresco connector.
For Data source name, enter Alfresco-Cloud-Private.
Optionally, add a description.
Keep the remaining settings as default and choose Next.
For Source, select Alfresco cloud.
For Alfresco repository URL, enter the repository URL (created as a prerequisite).
For Alfresco user application URL, enter the same value as the repository URL.
For Authentication, select Basic authentication.
For AWS Secrets Manager secret, choose Create and add new secret.
Enter a name for your secret, user name, and password, then choose Save.
For Virtual Private Cloud (VPC), choose No VPC.
Turn the identity crawler off.
For IAM role, choose Create a new IAM role.
Choose Next.

We can configure the data source to synchronize contents from one or more Alfresco sites. For this post, we configure the data source to sync from the PaaS private site MyAlfrescoCloudSite2.

For Content to sync, select Single Alfresco site sync and choose MyAlfrescoCloudSite2.
Select Include comments.
For Sync mode, select Full sync.
For Frequency, choose Run on demand (or a different frequency option as needed).
Choose Next.
Map the Alfresco document fields to the Amazon Kendra index fields (you can keep the defaults) and choose Next.
On the Review and Create page, verify all the information, then choose Add data source.

After the data source has been created, the data source page is displayed as shown in the following screenshot.

Step 7: Create a data source for the PaaS public site
We follow similar steps as before to create a data source for the PaaS public site:

On the Amazon Kendra console, navigate to the Alfresco-Public index.
Choose Data sources in the navigation pane.
Choose Add data source.
Choose Add connector for the Alfresco connector.
For Data source name, enter Alfresco-Cloud-Public.
Optionally, add a description.
Keep the remaining settings as default and choose Next.
For Source, select Alfresco cloud.
For Alfresco repository URL, enter the repository URL (created as a prerequisite).
For Alfresco user application URL, enter the same value as the repository URL.
For Authentication, select OAuth2.0 authentication.
For AWS Secrets Manager secret, choose Create and add new secret.
Enter a name for your secret, client ID, client secret, and token URL, then choose Save.
For Virtual Private Cloud (VPC), choose No VPC.
Turn the identity crawler off.
For IAM role, choose Create a new IAM role.
Choose Next.

We configure this data source to sync to the PaaS public site MyAlfrescoCloudPublicSite.

For Content to sync, select Single Alfresco site sync and choose MyAlfrescoCloudPublicSite.
Optionally, select Include comments.
For Sync mode, select Full sync.
For Frequency, choose Run on demand (or a different frequency option as needed).
Choose Next.
Map the Alfresco document fields to the Amazon Kendra index fields (you can keep the defaults) and choose Next.
On the Review and Create page, verify all the information, then choose Add data source.

After the data source has been created, the data source page is displayed as shown in the following screenshot.

Step 8: Perform a sync for each data source
Navigate to each of the data sources and choose Sync now. Complete only one synchronization at a time.

Wait for synchronization to be complete for all data sources. When each synchronization is complete for a data source, you see the status as shown in the following screenshot.

You can also view Amazon CloudWatch logs for a specific sync under Sync run history.
Step 9: Run a test query in the private index using access control
Now it’s time to test the solution. We first run a query in the private index using access control:

On the Amazon Kendra console, navigate to the Alfresco-Private index and choose Search indexed content.

Enter a query in the search field.

As shown in the following screenshot, Amazon Kendra didn’t return any results.

Choose Apply token.
Enter the email address corresponding to the My Dev User1 user and choose Apply.

Note that Amazon Kendra access control works based on the email address associated with an Alfresco user name.

Run the search again.

The search results in a document list (containing wellarchitected-sustainability-pillar.pdf in the following example) based on the access control setup.

If you run the same query again and provide an email address that doesn’t have access to either of these documents, you should not see these documents in the results list.

Enter another query to search in the documents based on the aspect awskendra:indexControl.
Choose Apply token, enter the email address corresponding to My Dev User1 user, and choose Apply.
Rerun the query.

Step 10: Run a test query in the public index without access control.
Similarly, we can test our solution by running queries in the public index without access control:

On the Amazon Kendra console, navigate to the Alfresco-Public index and choose Search indexed content.
Run a search query.

Because this example Alfresco public site has not been set up with any access control, we don’t use an access token.

Clean up
To avoid incurring future costs, clean up the resources you created as part of this solution. Delete newly added Alfresco data sources within the indexes. If you created new Amazon Kendra indexes while testing this solution, delete them as well.
Conclusion
With the new Alfresco connector for Amazon Kendra, organizations can tap into the repository of information stored in their account securely using intelligent search powered by Amazon Kendra.
To learn about these possibilities and more, refer to the Amazon Kendra Developer Guide. For more information on how you can create, modify, or delete metadata and content when ingesting your data from Alfresco, refer to Enriching your documents during ingestion and Enrich your content and metadata to enhance your search experience with custom document enrichment in Amazon Kendra.

About the Authors
Arun Anand is a Senior Solutions Architect at Amazon Web Services based in Houston area. He has 25+ years of experience in designing and developing enterprise applications. He works with partners in Energy & Utilities segment providing architectural and best practice recommendations for new and existing solutions.
Rajnish Shaw is a Senior Solutions Architect at Amazon Web Services, with a background as a Product Developer and Architect. Rajnish is passionate about helping customers build applications on the cloud. Outside of work Rajnish enjoys spending time with family and friends, and traveling.
Yuanhua Wang is a software engineer at AWS with more than 15 years of experience in the technology industry. His interests are software architecture and build tools on cloud computing.

Use the Amazon SageMaker and Salesforce Data Cloud integration to powe …

This post is co-authored by Daryl Martis, Director of Product, Salesforce Einstein AI.
This is the second post in a series discussing the integration of Salesforce Data Cloud and Amazon SageMaker. In Part 1, we show how the Salesforce Data Cloud and Einstein Studio integration with SageMaker allows businesses to access their Salesforce data securely using SageMaker and use its tools to build, train, and deploy models to endpoints hosted on SageMaker. The endpoints are then registered to the Salesforce Data Cloud to activate predictions in Salesforce.
In this post, we expand on this topic to demonstrate how to use Einstein Studio for product recommendations. You can use this integration for traditional models as well as large language models (LLMs).
Solution overview
In this post, we demonstrate how to create a predictive model in SageMaker to recommend the next best product to your customers by using historical data such as customer demographics, marketing engagements, and purchase history from Salesforce Data Cloud.
We use the following sample dataset. To use this dataset in your Data Cloud, refer to Create Amazon S3 Data Stream in Data Cloud.
The following attributes are needed to create the model:

Club Member – If the customer is a club member
Campaign – The campaign the customer is a part of
State – The state or province the customer resides in
Month – The month of purchase
Case Count – The number of cases raised by the customer
Case Type Return – Whether the customer returned any product within the last year
Case Type Shipment Damaged – Whether the customer had any shipments damaged in the last year
Engagement Score – The level of engagement the customer has (response to mailing campaigns, logins to the online store, and so on)
Tenure – The tenure of the customer relationship with the company
Clicks – The average number of clicks the customer has made within a week prior to purchase
Pages Visited – The average number of pages the customer has visited within a week prior to purchase
Product Purchased – The actual product purchased
Id – The ID of the record
DateTime – The timestamp of the dataset

The product recommendation model is built and deployed on SageMaker and is trained using data in the Salesforce Data Cloud. The following steps give an overview of how to use the new capabilities launched in SageMaker for Salesforce to enable the overall integration:

Set up the Amazon SageMaker Studio domain and OAuth between Salesforce and the AWS accounts.
Use the newly launched capability of the Amazon SageMaker Data Wrangler connector for Salesforce Data Cloud to prepare the data in SageMaker without copying the data from Salesforce Data Cloud.
Train a recommendation model in SageMaker Studio using training data that was prepared using SageMaker Data Wrangler.
Package the SageMaker Data Wrangler container and the trained recommendation model container in an inference pipeline so the inference request can use the same data preparation steps you created to preprocess the training data. The real-time inference call data is first passed to the SageMaker Data Wrangler container in the inference pipeline, where it is preprocessed and passed to the trained model for product recommendation. For more information about this process, refer to New — Introducing Support for Real-Time and Batch Inference in Amazon SageMaker Data Wrangler. Although we use a specific algorithm to train the model in our example, you can use any algorithm that you find appropriate for your use case.
Use the newly launched SageMaker provided project template for Salesforce Data Cloud integration to streamline implementing the preceding steps by providing the following templates:

An example notebook showcasing data preparation, building, training, and registering the model.
The SageMaker provided project template for Salesforce Data Cloud integration, which automates creating a SageMaker endpoint hosting the inference pipeline model. When a version of the model in the Amazon SageMaker Model Registry is approved, the endpoint is exposed as an API with Amazon API Gateway using a custom Salesforce JSON Web Token (JWT) authorizer. API Gateway is required to allow Salesforce Data Cloud to make predictions against the SageMaker endpoint using a JWT token that Salesforce creates and passes with the request when making predictions from Salesforce. JWT can be used as a part of OpenID Connect (OIDC) and OAuth 2.0 frameworks to restrict client access to your APIs.

After you create the API, we recommend registering the model endpoint in Salesforce Einstein Studio. For instructions, refer to Bring Your Own AI Models to Salesforce with Einstein Studio

The following diagram illustrates the solution architecture.

Create a SageMaker Studio domain
First, create a SageMaker Studio domain. For instructions, refer to Onboard to Amazon SageMaker Domain. You should note down the domain ID and execution role that is created and will be used by your user profile. You add permissions to this role in subsequent steps.
The following screenshot shows the domain we created for this post.

The following screenshot shows the example user profile for this post.

Set up the Salesforce connected app
Next, we create a Salesforce connected app to enable the OAuth flow from SageMaker Studio to Salesforce Data Cloud. Complete the following steps:

Log in to Salesforce and navigate to Setup.
Search for App Manager and create a new connected app.
Provide the following inputs:

For Connected App Name, enter a name.
For API Name, leave as default (it’s automatically populated).
For Contact Email, enter your contact email address.
Select Enable OAuth Settings.
For Callback URL, enter https://<domain-id>.studio.<region>.sagemaker.aws/jupyter/default/lab, and provide the domain ID that you captured while creating the SageMaker domain and the Region of your SageMaker domain.

Under Selected OAuth Scopes, move the following from Available OAuth Scopes to Selected OAuth Scopes and choose Save:

Manage user data via APIs (api)
Perform requests at any time (refresh_token, offline_access)
Perform ANSI SQL queries on Salesforce Data Cloud data (Data Cloud_query_api)
Manage Salesforce Customer Data Platform profile data (Data Cloud_profile_api
Access the identity URL service (id, profile, email, address, phone)
Access unique user identifiers (openid)

For more information about creating a connected app, refer to Create a Connected App.

Return to the connected app and navigate to Consumer Key and Secret.
Choose Manage Consumer Details.
Copy the key and secret.

You may be asked to log in to your Salesforce org as part of the two-factor authentication here.

Navigate back to the Manage Connected Apps page.
Open the connected app you created and choose Manage.
Choose Edit Policies and change IP Relaxation to Relax IP restrictions, then save your settings.

Configure SageMaker permissions and lifecycle rules
In this section, we walk through the steps to configure SageMaker permissions and lifecycle management rules.
Create a secret in AWS Secrets Manager
Enable OAuth integration with Salesforce Data Cloud by storing credentials from your Salesforce connected app in AWS Secrets Manager:

On the Secrets Manager console, choose Store a new secret.
Select Other type of secret.
Create your secret with the following key-value pairs:

{
“identity_provider”: “SALESFORCE”,
“authorization_url”: “https://login.salesforce.com/services/oauth2/authorize”,
“token_url”: “https://login.salesforce.com/services/oauth2/token”,
“client_id”: “<YOUR_CONSUMER_KEY>”,
“client_secret”: “<YOUR_CONSUMER_SECRET>”
“issue_url”: “<YOUR_SALESFORCE_ORG_URL>”
}

Add a tag with the key sagemaker:partner and your choice of value.
Save the secret and note the ARN of the secret.

Configure a SageMaker lifecycle rule
The SageMaker Studio domain execution role will require AWS Identity and Access Management (IAM) permissions to access the secret created in the previous step. For more information, refer to Creating roles and attaching policies (console).

On the IAM console, attach the following polices to their respective roles (these roles will be used by the SageMaker project for deployment):

Add the policy AmazonSageMakerPartnerServiceCatalogProductsCloudFormationServiceRolePolicy to the service role AmazonSageMakerServiceCatalogProductsCloudformationRole.
Add the policy AmazonSageMakerPartnerServiceCatalogProductsApiGatewayServiceRolePolicy to the service role AmazonSageMakerServiceCatalogProductsApiGatewayRole.
Add the policy AmazonSageMakerPartnerServiceCatalogProductsLambdaServiceRolePolicy to the service role AmazonSageMakerServiceCatalogProductsLambdaRole.

On the IAM console, navigate to the SageMaker domain execution role.
Choose Add permissions and select Create an inline policy.
Enter the following policy in the JSON policy editor:

{
“Version”: “2012-10-17”,
“Statement”: [
{
“Effect”: “Allow”,
“Action”: [
“secretsmanager:GetSecretValue”,
“secretsmanager:PutSecretValue”
],
“Resource”: “arn:aws:secretsmanager:*:*:secret:*”,
“Condition”: {
“ForAnyValue:StringLike”: {
“aws:ResourceTag/sagemaker:partner”: “*”
}
}
},
{
“Effect”: “Allow”,
“Action”: [
“secretsmanager:UpdateSecret”
],
“Resource”: “arn:aws:secretsmanager:*:*:secret:AmazonSageMaker-*”
}
]
}

SageMaker Studio lifecycle configuration provides shell scripts that run when a notebook is created or started. The lifecycle configuration will be used to retrieve the secret and import it to the SageMaker runtime.

On the SageMaker console, choose Lifecycle configurations in the navigation pane.
Choose Create configuration.
Leave the default selection Jupyter Server App and choose Next.
Give the configuration a name.
Enter the following script in the editor, providing the ARN for the secret you created earlier:

#!/bin/bash
set -eux

cat > ~/.sfgenie_identity_provider_oauth_config <<EOL
{
“secret_arn”: “<YOUR_SECRETS_ARN>”
}
EOL

Choose Submit to save the lifecycle configuration.
Choose Domains in the navigation pane and open your domain.
On the Environment tab, choose Attach to attach your lifecycle configuration.
Choose the lifecycle configuration you created and choose Attach to domain.
Choose Set as default.

If you are a returning user to SageMaker Studio, in order to ensure Salesforce Data Cloud is enabled, upgrade to the latest Jupyter and SageMaker Data Wrangler kernels.
This completes the setup to enable data access from Salesforce Data Cloud to SageMaker Studio to build AI and machine learning (ML) models.
Create a SageMaker project
To start using the solution, first create a project using Amazon SageMaker Projects. Complete the following steps:

In SageMaker Studio, under Deployments in the navigation pane, choose Projects.
Choose Create project.
Choose the project template called Model deployment for Salesforce.
Choose Select project template.
Enter a name and optional description for your project.
Enter a model group name.
Enter the name of the Secrets Manager secret that you created earlier.
Choose Create project.

The project may take 1–2 minutes to initiate.

You can see two new repositories. The first one is for sample notebooks that you can use as is or customize to prepare, train, create, and register models in the SageMaker Model Registry. The second repository is for automating the model deployment, which includes exposing the SageMaker endpoint as an API.

Choose clone repo for both notebooks.

For this post, we use the product recommendation example, which can be found in the sagemaker-<YOUR-PROJECT-NAME>-p-<YOUR-PROJECT-ID>-example-nb/product-recommendation directory that you just cloned. Before we run the product-recommendation.ipynb notebook, let’s do some data preparation to create the training data using SageMaker Data Wrangler.

Prepare data with SageMaker Data Wrangler
Complete the following steps:

In SageMaker Studio, on the File menu, choose New and Data Wrangler flow.
After you create the data flow, choose (right-click) the tab and choose Rename to rename the file.
Choose Import data.
Choose Create connection.
Choose Salesforce Data Cloud.
For Name, enter salesforce-data-cloud-sagemaker-connection.
For Salesforce org URL, enter your Salesforce org URL.
Choose Save + Connect.
In the Data Explorer view, select and preview the tables from the Salesforce Data Cloud to create and run the query to extract the required dataset.
Your query will look like below and you may use the table name that you used while uploading data in Salesforce Data Cloud.

SELECT product_purchased__c, club_member__c, campaign__c, state__c, month__c,
case_count__c,case_type_return__c, case_type_shipment_damaged__c,
pages_visited__c,engagement_score__c, tenure__c, clicks__c, id__c
FROM Training_Dataset_for_Sagemaker__dll

Choose Create dataset.

Creating the dataset may take some time.

In the data flow view, you can now see a new node added to the visual graph.
For more information on how you can use SageMaker Data Wrangler to create Data Quality and Insights Reports, refer to Get Insights On Data and Data Quality.
SageMaker Data Wrangler offers over 300 built-in transformations. In this step, we use some of these transformations to prepare the dataset for an ML model. For detailed instructions on how to implement these transformations, refer to Transform Data.

Use the Manage columns step with the Drop column transform to drop the column id__c.
Use the Handle missing step with the Drop missing transform to drop rows with missing values for various features. We apply this transformation on all columns.
Use a custom transform step to create categorical values for state__c, case_count__c, and tenure features. Use the following code for this transformation:

from pyspark.sql.functions import when

States_List = [‘Washington’, ‘Massachusetts’, ‘California’, ‘Minnesota’, ‘Vermont’, ‘Colorado’, ‘Arizona’]

df.withColumn(“club_member__c”,df.club_member__c.cast(‘string’))
df.withColumn(“month__c”,df.month__c.cast(‘string’))
df.withColumn(“case_type_return__c”,df.case_type_return__c.cast(‘string’))
df.withColumn(“case_type_shipment_damaged__c”,df.case_type_shipment_damaged__c.cast(‘string’))

df = df.withColumn(‘state__c’, when(df.state__c.isin(States_List), df.state__c).otherwise(“Other”))

df = df.withColumn(‘case_count__c’, when(df.case_count__c == 0, “No Cases”).otherwise( when(df.case_count__c <= 2, “1 to 2 Cases”).otherwise(“Greater than 2 Cases”)))

df = df.withColumn(‘tenure__c’, when(df.tenure__c < 1, “Less than 1 Year”).otherwise( when(df.tenure__c == 1, “1 to 2 Years”).otherwise(when(df.tenure__c ==2, “2 to 3 Years”).otherwise(when(df.tenure__c == 3, “3 to 4 Years”).otherwise(“Grater Than 4 Years”)))))

Use the Process numeric step with the Scale values transform and choose Standard scaler to scale clicks__c, engagement__score, and pages__visited__c features.
Use the Encode categorical step with the One-hot encode transform to convert categorical variables to numeric for case__type__return___c, case__type_shipment__damaged, month__c, club__member__c, and campaign__c features (all features except clicks__c, engagement__score, pages__visited__c, and product_purchased__c).

Model building, training, and deployment
To build, train, and deploy the model, complete the following steps:

Return to the SageMaker project, open the product-recommendation.ipynb notebook, and run a processing job to preprocess the data using the SageMaker Data Wrangler configuration you created.
Follow the steps in the notebook to train a model and register it to the SageMaker Model Registry.
Make sure to update the model group name to match with the model group name that you used while creating the SageMaker project.

To locate the model group name, open the SageMaker project that you created earlier and navigate to the Settings tab.
Similarly, the flow file referenced in the notebook must match with the flow file name that you created earlier.

For this post, we used product-recommendation as the model group name, so we update the notebook with project-recommendation as the model group name in the notebook.

After the notebook is run, the trained model is registered in the Model Registry. To learn more about the Model Registry, refer to Register and Deploy Models with Model Registry.

Select the model version you created and update the status of it to Approved.

Now that you have approved the registered model, the SageMaker Salesforce project deploy step will provision and trigger AWS CodePipeline.
CodePipeline has steps to build and deploy a SageMaker endpoint for inference containing the SageMaker Data Wrangler preprocessing steps and the trained model. The endpoint will be exposed to Salesforce Data Cloud as an API through API Gateway. The following screenshot shows the pipeline prefixed with Sagemaker-salesforce-product-recommendation-xxxxx. We also show you the endpoints and API that gets created by the SageMaker project for Salesforce.

If you would like, you can take a look at the CodePipeline deploy step, which uses AWS CloudFormation scripts to create SageMaker endpoint and API Gateway with a custom JWT authorizer.
When pipeline deployment is complete, you can find the SageMaker endpoint on the SageMaker console.

You can explore the API Gateway created by the project template on the API Gateway console.

Choose the link to find the API Gateway URL.

You can find the details of the JWT authorizer by choosing Authorizers on the API Gateway console. You can also go to the AWS Lambda console to review the code of the Lambda function created by project template.

To discover the schema to be used while invoking the API from Einstein Studio, choose Information in the navigation pane of the Model Registry. You will see an Amazon Simple Storage Service (Amazon S3) link to a metadata file. Copy and paste the link into a new browser tab URL.

Let’s look at the file without downloading it. On the file details page, choose the Object actions menu and choose Query with S3 Select.

Choose Run SQL query and take note of the API Gateway URL and schema because you will need this information when registering with Einstein Studio. If you don’t see an APIGWURL key, either the model wasn’t approved, deployment is still in progress, or deployment failed.

Use the Salesforce Einstein Studio API for predictions
Salesforce Einstein Studio is a new and centralized experience in Salesforce Data Cloud that data science and engineering teams can use to easily access their traditional models and LLMs used in generative AI. Next, we set up the API URL and client_id that you set in Secrets Manager earlier in Salesforce Einstein Studio to register and use the model inferences in Salesforce Einstein Studio. For instructions, refer to Bring Your Own AI Models to Salesforce with Einstein Studio.
Clean up
To delete all the resources created by the SageMaker project, on the project page, choose the Action menu and choose Delete.

To delete the resources (API Gateway and SageMaker endpoint) created by CodePipeline, navigate to the AWS CloudFormation console and delete the stack that was created.

Conclusion
In this post, we explained how you can build and train ML models in SageMaker Studio using SageMaker Data Wrangler to import and prepare data that is hosted on the Salesforce Data Cloud and use the newly launched Salesforce Data Cloud JDBC connector in SageMaker Data Wrangler and first-party Salesforce template in the SageMaker provided project template for Salesforce Data Cloud integration. The SageMaker project template for Salesforce enables you to deploy the model and create the endpoint and secure an API for a registered model. You then use the API to make predictions in Salesforce Einstein Studio for your business use cases.
Although we used the example of product recommendation to showcase the steps for implementing the end-to-end integration, you can use the SageMaker project template for Salesforce to create an endpoint and API for any SageMaker traditional model and LLM that is registered in the SageMaker Model Registry. We look forward to seeing what you build in SageMaker using data from Salesforce Data Cloud and empower your Salesforce applications using SageMaker hosted ML models!
This post is a continuation of the series regarding Salesforce Data Cloud and SageMaker integration. For a high-level overview and to learn more about the business impact you can make with this integration approach, refer to Part 1.
Additional resources

Import data with SageMaker Data Wrangler
Troubleshoot SageMaker Data Wrangler

About the authors
Daryl Martis is the Director of Product for Einstein Studio at Salesforce Data Cloud. He has over 10 years of experience in planning, building, launching, and managing world-class solutions for enterprise customers including AI/ML and cloud solutions. He has previously worked in the financial services industry in New York City. Follow him on https://www.linkedin.com/in/darylmartis.
Rachna Chadha is a Principal Solutions Architect AI/ML in Strategic Accounts at AWS. Rachna is an optimist who believes that ethical and responsible use of AI can improve society in the future and bring economic and social prosperity. In her spare time, Rachna likes spending time with her family, hiking, and listening to music.
Ife Stewart is a Principal Solutions Architect in the Strategic ISV segment at AWS. She has been engaged with Salesforce Data Cloud over the last 2 years to help build integrated customer experiences across Salesforce and AWS. Ife has over 10 years of experience in technology. She is an advocate for diversity and inclusion in the technology field.
Dharmendra Kumar Rai (DK Rai) is a Sr. Data Architect, Data Lake & AI/ML, serving strategic customers. He works closely with customers to understand how AWS can help them solve problems, especially in the AI/ML and analytics space. DK has many years of experience in building data-intensive solutions across a range of industry verticals, including high-tech, FinTech, insurance, and consumer-facing applications.
Marc Karp is an ML Architect with the SageMaker Service team. He focuses on helping customers design, deploy, and manage ML workloads at scale. In his spare time, he enjoys traveling and exploring new places.

Meet LEVER: A Simple AI Approach to Improve Language-to-Code Generatio …

Large language models (LLMs) have recently made significant strides. These models have uplifted the domain of Artificial Intelligence significantly and hold tremendous potential for completing various types of tasks. From imitating humans by answering questions and coming up with content to summarizing textual paragraphs and translating languages, LLMs can do everything. Virtual assistants, robotics control, database interfaces, and other AI applications all depend on the capacity to translate natural language descriptions into executable code. Though code LLMs, or basically the models that are pre-trained on code, have shown great performance in using in-context few-shot learning, the performance of these models may be enhanced, though, and optimizing them would be computationally expensive.

While LLMs may struggle with accuracy in situations with few shots, they frequently offer accurate results when given enough samples, i.e., when samples are drawn at scale, majority voting and filtering by test cases can greatly improve their performance. Data types, value ranges, and variable properties are potent indications of program correctness and are rich semantic elements of model solutions. In a recent study, a team of researchers introduced Learning to Verify (LEVER), an approach for language-to-code generation using code LLMs.

LEVER makes use of a combined representation of the natural language description, the program surface form, and the execution outcome for training the verifier to identify and reject faulty programs. The verification probability and LLM generation probability are combined in order to create an aggregate probability, and programs with identical execution results are marginalized. The programs with the best likelihood of providing the right outcome are chosen as the output using this probability as a reranking score.

LEVER has been proposed to improve language-to-code creation by including a learning-to-verify process and to judge whether a program sampled from the LLMs is accurate. LEVER seeks to improve the output’s precision and correctness by checking the created programs. For evaluation, experiments have been carried out on four datasets representing different domains, including table QA, math QA, and fundamental Python programming, to assess LEVER’s efficacy. The performance benefits utilizing code-davinci-002 ranged from 4.6% to 10.9%, and the results consistently outperformed the base code LLMs. Across all datasets, LEVER has attained brand-new state-of-the-art results, demonstrating its superiority in producing precise and contextually relevant code from natural language descriptions.

In conclusion, the LEVER technique improves code LLMs’ ability to translate natural language descriptions into executable code. This method outperforms more traditional execution error pruning strategies in terms of accuracy by utilizing a verifier that takes execution results into account. The findings demonstrate its efficiency in a range of language-to-code tasks and suggest that it has the potential to enhance a number of AI applications, including database interfaces, robotics control, and virtual assistants.

Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 27k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

The post Meet LEVER: A Simple AI Approach to Improve Language-to-Code Generation by Learning to Verify the Generated Programs with their Execution Results appeared first on MarkTechPost.

Stanford Researchers Explore Emergence of Simple Language Skills in Me …

A research team from Stanford University has made groundbreaking progress in the field of Natural Language Processing (NLP) by investigating whether Reinforcement Learning (RL) agents can learn language skills indirectly, without explicit language supervision. The main focus of the study was to explore whether RL agents, known for their ability to learn by interacting with their environment to achieve non-language objectives, could similarly develop language skills. To do this, the team designed an office navigation environment, challenging the agents to find a target office as quickly as possible.

The researchers framed their exploration around four key questions:

1. Can agents learn a language without explicit language supervision?

2. Can agents learn to interpret other modalities beyond language, such as pictorial maps?

3. What factors impact the emergence of language skills?

4. Do these results scale to more complex 3D environments with high-dimensional pixel observations?

To investigate the emergence of language, the team trained their DREAM (Deep REinforcement learning Agents with Meta-learning) agent on the 2D office environment, using language floor plans as the training data. Remarkably, DREAM learned an exploration policy that allowed it to navigate to and read the floor plan. Leveraging this information, the agent successfully reached the goal office room, achieving near-optimal performance. The agent’s ability to generalize to unseen relative step counts and new layouts and its capacity to probe the learned representation of the floor plan further demonstrated its language skills.

Not content with these initial findings, the team went a step further and trained DREAM on the 2D variant of the office, this time using pictorial floor plans as training data. The results were equally impressive, as DREAM successfully walked to the target office, proving its ability to read other modalities beyond traditional language.

The study also delved into understanding the factors influencing the emergence of language skills in RL agents. The researchers found that the learning algorithm, the amount of meta-training data, and the model’s size all played critical roles in shaping the agent’s language capabilities.

Finally, to examine the scalability of their findings, the researchers expanded the office environment to a more complex 3D domain. Astonishingly, DREAM continued to read the floor plan and solved the tasks without direct language supervision, further affirming the robustness of its language acquisition abilities.

The results of this pioneering work offer compelling evidence that language can indeed emerge as a byproduct of solving non-language tasks in meta-RL agents. By learning language indirectly, these embodied RL agents showcase a remarkable resemblance to how humans acquire language skills while striving to achieve unrelated objectives.

The implications of this research are far-reaching, opening up exciting possibilities for developing more sophisticated language learning models that can naturally adapt to a multitude of tasks without requiring explicit language supervision. The findings are expected to drive advancements in NLP and contribute significantly to the progress of AI systems capable of comprehending and using language in increasingly sophisticated ways.

Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 27k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

The post Stanford Researchers Explore Emergence of Simple Language Skills in Meta-Reinforcement Learning Agents Without Direct Supervision: Unpacking the Breakthrough in a Customized Multi-Task Environment appeared first on MarkTechPost.

A New AI Research from CMU Proposes a Simple and Effective Attack Meth …

Large Language Models (LLM) like ChatGPT, Bard AI, and Llama-2 can generate undesirable and offensive content. Imagine someone asking ChatGPT for a guide to manipulate elections or some examination question paper. Getting an output for such questions from LLMs will be inappropriate. Researchers at Carnegie Mellon University, Centre for AI, and Bosch Centre for AI produced a solution for it by aligning those models to prevent undesirable generation. 

Researchers found an approach to resolve it. When an LLM is exposed to a wide range of queries that are objectionable,  the model produces an affirmative response rather than just denying the answer. Their approach involves producing adversarial suffixes with greedy and gradient-based search techniques. Using this approach improves past automatic prompt generation methods.

The prompts that result in aligned LLMs to generate offensive content are called jailbreaks. These jailbreaks are generated through human ingenuity by setting up scenarios that lead to models astray rather than automated methods and require manual effort. Unlike image models, LLMs operate on discrete token inputs, which limits the effective input. This turns out to be computationally difficult.

Researchers propose a new class of adversarial attacks that can indeed produce objectionable content. Given a harmful query from the user, researchers append an adversarial suffix so that the user’s original query is left intact. The adversarial suffix is chosen based on initial affirmative responses, combined greedy and gradient optimization, and robust multi-prompt and multi-model attacks. 

In order to generate reliable attack suffixes, researchers had to create an attack that works not just for a single prompt for a single model but for multiple prompts across multiple models. Researchers used a greedy gradient-based method to search for a single suffix string that was able to inject negative behavior across multiple user prompts. Researchers implemented this technique by attacks on Claude; they found that the model produced desirable results and contained the potential to lower the automated attacks. 

Researchers claim that the future work involved provided these attacks, models can be finetuned to avoid such undesirable answers. The methodology of adversarial training is empirically proven to be an efficient means to train any model as it iteratively involves a correct answer to the potentially harmful query. 

Their work consisted of material that could allow others to generate harmful content. Despite the risk involved, their work is important to present the techniques of various leveraging language models to avoid generating harmful content. The direct incremental harm caused by releasing their attacks is minor in the initial stages. Their research can help to clarify the dangers that automated attacks pose to Large Language Models.

Check out the Paper, GitHub, and Project Page. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 27k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

The post A New AI Research from CMU Proposes a Simple and Effective Attack Method that Causes Aligned Language Models to Generate Objectionable Behaviors appeared first on MarkTechPost.

Enhancing AWS intelligent document processing with generative AI

Data classification, extraction, and analysis can be challenging for organizations that deal with volumes of documents. Traditional document processing solutions are manual, expensive, error prone, and difficult to scale. AWS intelligent document processing (IDP), with AI services such as Amazon Textract, allows you to take advantage of industry-leading machine learning (ML) technology to quickly and accurately process data from any scanned document or image. Generative artificial intelligence (generative AI) complements Amazon Textract to further automate document processing workflows. Features such as normalizing key fields and summarizing input data support faster cycles for managing document process workflows, while reducing the potential for errors.
Generative AI is driven by large ML models called foundation models (FMs). FMs are transforming the way you can solve traditionally complex document processing workloads. In addition to existing capabilities, businesses need to summarize specific categories of information, including debit and credit data from documents such as financial reports and bank statements. FMs make it easier to generate such insights from the extracted data. To optimize time spent in human review and to improve employee productivity, mistakes such as missing digits in phone numbers, missing documents, or addresses without street numbers can be flagged in an automated way. In the current scenario, you need to dedicate resources to accomplish such tasks using human review and complex scripts. This approach is tedious and expensive. FMs can help complete these tasks faster, with fewer resources, and transform varying input formats into a standard template that can be processed further. At AWS, we offer services such as Amazon Bedrock, the easiest way to build and scale generative AI applications with FMs. Amazon Bedrock is a fully managed service that makes FMs from leading AI startups and Amazon available through an API, so you can find the model that best suits your requirements. We also offer Amazon SageMaker JumpStart, which allows ML practitioners to choose from a broad selection of open-source FMs. ML practitioners can deploy FMs to dedicated Amazon SageMaker instances from a network isolated environment and customize models using SageMaker for model training and deployment.
Ricoh offers workplace solutions and digital transformation services designed to help customers manage and optimize information flow across their businesses. Ashok Shenoy, VP of Portfolio Solution Development, says, “We are adding generative AI to our IDP solutions to help our customers get their work done faster and more accurately by utilizing new capabilities such as Q&A, summarization, and standardized outputs. AWS allows us to take advantage of generative AI while keeping each of our customers’ data separate and secure.”
In this post, we share how to enhance your IDP solution on AWS with generative AI.
Improving the IDP pipeline
In this section, we review how the traditional IDP pipeline can be augmented by FMs and walk through an example use case using Amazon Textract with FMs.
AWS IDP is comprised of three stages: classification, extraction, and enrichment. For more details about each stage, refer to Intelligent document processing with AWS AI services: Part 1 and Part 2. In the classification stage, FMs can now classify documents without any additional training. This means that documents can be categorized even if the model hasn’t seen similar examples before. FMs in the extraction stage normalize date fields and verify addresses and phone numbers, while ensuring consistent formatting. FMs in the enrichment stage allow inference, logical reasoning, and summarization. When you use FMs in each IDP stage, your workflow will be more streamlined and performance will improve. The following diagram illustrates the IDP pipeline with generative AI.

Extraction stage of the IDP pipeline
When FMs can’t directly process documents in their native formats (such as PDFs, img, jpeg, and tiff) as an input, a mechanism to convert documents to text is needed. To extract the text from the document before sending it to the FMs, you can use Amazon Textract. With Amazon Textract, you can extract lines and words and pass them to downstream FMs. The following architecture uses Amazon Textract for accurate text extraction from any type of document before sending it to FMs for further processing.

Typically, documents are comprised of structured and semi-structured information. Amazon Textract can be used to extract raw text and data from tables and forms. The relationship between the data in tables and forms plays a vital role in automating business processes. Certain types of information may not be processed by FMs. As a result, we can choose to either store this information in a downstream store or send it to FMs. The following figure is an example of how Amazon Textract can extract structured and semi-structured information from a document, in addition to lines of text that need to be processed by FMs.

Using AWS serverless services to summarize with FMs
The IDP pipeline we illustrated earlier can be seamlessly automated using AWS serverless services. Highly unstructured documents are common in big enterprises. These documents can span from Securities and Exchange Commission (SEC) documents in the banking industry to coverage documents in the health insurance industry. With the evolution of generative AI at AWS, people in these industries are looking for ways to get a summary from those documents in an automated and cost-effective manner. Serverless services help provide the mechanism to build a solution for IDP quickly. Services such as AWS Lambda, AWS Step Functions, and Amazon EventBridge can help build the document processing pipeline with integration of FMs, as shown in the following diagram.

The sample application used in the preceding architecture is driven by events. An event is defined as a change in state that has recently occurred. For example, when an object gets uploaded to an Amazon Simple Storage Service (Amazon S3) bucket, Amazon S3 emits an Object Created event. This event notification from Amazon S3 can trigger a Lambda function or a Step Functions workflow. This type of architecture is termed as an event-driven architecture. In this post, our sample application uses an event-driven architecture to process a sample medical discharge document and summarize the details of the document. The flow works as follows:

When a document is uploaded to an S3 bucket, Amazon S3 triggers an Object Created event.
The EventBridge default event bus propagates the event to Step Functions based on an EventBridge rule.
The state machine workflow processes the document, beginning with Amazon Textract.
A Lambda function transforms the analyzed data for the next step.
The state machine invokes a SageMaker endpoint, which hosts the FM using direct AWS SDK integration.
A summary S3 destination bucket receives the summary response gathered from the FM.

We used the sample application with a flan-t5 Hugging face model to summarize the following sample patient discharge summary using the Step Functions workflow.

The Step Functions workflow uses AWS SDK integration to call the Amazon Textract AnalyzeDocument and SageMaker runtime InvokeEndpoint APIs, as shown in the following figure.

This workflow results in a summary JSON object that is stored in a destination bucket. The JSON object looks as follows:
{
“summary”: [
“John Doe is a 35-year old male who has been experiencing stomach problems for two months. He has been taking antibiotics for the last two weeks, but has not been able to eat much. He has been experiencing a lot of abdominal pain, bloating, and fatigue. He has also noticed a change in his stool color, which is now darker. He has been taking antacids for the last two weeks, but they no longer help. He has been experiencing a lot of fatigue, and has been unable to work for the last two weeks. He has also been experiencing a lot of abdominal pain, bloating, and fatigue. He has been taking antacids for the last two weeks, but they no longer help. He has been experiencing a lot of abdominal pain, bloating, and fatigue. He has been taking antacids for the last two weeks, but they no longer help. He has been experiencing a lot of abdominal pain, bloating, and fatigue. He has been taking antacids for the last two weeks, but they no longer help. He has been experiencing a lot of abdominal pain, bloating, and fatigue. He has been taking antacids for the last two weeks, but they no longer help.”
],
“forms”: [
{
“key”: “Ph: “,
“value”: “(888)-(999)-(0000) ”
},
{
“key”: “Fax: “,
“value”: “(888)-(999)-(1111) ”
},
{
“key”: “Patient Name: “,
“value”: “John Doe ”
},
{
“key”: “Patient ID: “,
“value”: “NARH-36640 ”
},
{
“key”: “Gender: “,
“value”: “Male ”
},
{
“key”: “Attending Physician: “,
“value”: “Mateo Jackson, PhD ”
},
{
“key”: “Admit Date: “,
“value”: “07-Sep-2020 ”
},
{
“key”: “Discharge Date: “,
“value”: “08-Sep-2020 ”
},
{
“key”: “Discharge Disposition: “,
“value”: “Home with Support Services ”
},
{
“key”: “Pre-existing / Developed Conditions Impacting Hospital Stay: “,
“value”: “35 yo M c/o stomach problems since 2 months. Patient reports epigastric abdominal pain non- radiating. Pain is described as gnawing and burning, intermittent lasting 1-2 hours, and gotten progressively worse. Antacids used to alleviate pain but not anymore; nothing exacerbates pain. Pain unrelated to daytime or to meals. Patient denies constipation or diarrhea. Patient denies blood in stool but have noticed them darker. Patient also reports nausea. Denies recent illness or fever. He also reports fatigue for 2 weeks and bloating after eating. ROS: Negative except for above findings Meds: Motrin once/week. Tums previously. PMHx: Back pain and muscle spasms. No Hx of surgery. NKDA. FHx: Uncle has a bleeding ulcer. Social Hx: Smokes since 15 yo, 1/2-1 PPD. No recent EtOH use. Denies illicit drug use. Works on high elevation construction. Fast food diet. Exercises 3-4 times/week but stopped 2 weeks ago. ”
},
{
“key”: “Summary: “,
“value”: “some activity restrictions suggested, full course of antibiotics, check back with physican in case of relapse, strict diet ”
}
]
}

Generating these summaries using IDP with serverless implementation at scale helps organizations get meaningful, concise, and presentable data in a cost-effective way. Step Functions doesn’t limit the method of processing documents to one document at a time. Its distributed map feature can summarize large numbers of documents on a schedule.
The sample application uses a flan-t5 Hugging face model; however, you can use an FM endpoint of your choice. Training and running the model is out of scope of the sample application. Follow the instructions in the GitHub repository to deploy a sample application. The preceding architecture is a guidance on how you can orchestrate an IDP workflow using Step Functions. Refer to the IDP Generative AI workshop for detailed instructions on how to build an application with AWS AI services and FMs.
Set up the solution
Follow the steps in the README file to set the solution architecture (except for the SageMaker endpoints). After you have your own SageMaker endpoint available, you can pass the endpoint name as a parameter to the template.
Clean up
To save costs, delete the resources you deployed as part of the tutorial:

Follow the steps in the cleanup section of the README file.
Delete any content from your S3 bucket and then delete the bucket through the Amazon S3 console.
Delete any SageMaker endpoints you may have created through the SageMaker console.

Conclusion
Generative AI is changing how you can process documents with IDP to derive insights. AWS AI services such as Amazon Textract along with AWS FMs can help accurately process any type of documents. For more information on working with generative AI on AWS, refer to Announcing New Tools for Building with Generative AI on AWS.

About the Authors
Sonali Sahu is leading intelligent document processing with the AI/ML services team in AWS. She is an author, thought leader, and passionate technologist. Her core area of focus is AI and ML, and she frequently speaks at AI and ML conferences and meetups around the world. She has both breadth and depth of experience in technology and the technology industry, with industry expertise in healthcare, the financial sector, and insurance.
Ashish Lal is a Senior Product Marketing Manager who leads product marketing for AI services at AWS. He has 9 years of marketing experience and has led the product marketing effort for Intelligent document processing. He got his Master’s in Business Administration at the University of Washington.
Mrunal Daftari is an Enterprise Senior Solutions Architect at Amazon Web Services. He is based in Boston, MA. He is a cloud enthusiast and very passionate about finding solutions for customers that are simple and address their business outcomes. He loves working with cloud technologies, providing simple, scalable solutions that drive positive business outcomes, cloud adoption strategy, and design innovative solutions and drive operational excellence.
Dhiraj Mahapatro is a Principal Serverless Specialist Solutions Architect at AWS. He specializes in helping enterprise financial services adopt serverless and event-driven architectures to modernize their applications and accelerate their pace of innovation. Recently, he has been working on bringing container workloads and practical usage of generative AI closer to serverless and EDA for financial services industry customers.
Jacob Hauskens is a Principal AI Specialist with over 15 years of strategic business development and partnerships experience. For the past 7 years, he has led the creation and implementation of go-to-market strategies for new AI-powered B2B services. Recently, he has been helping ISVs grow their revenue by adding generative AI to intelligent document processing workflows.

Scale training and inference of thousands of ML models with Amazon Sag …

As machine learning (ML) becomes increasingly prevalent in a wide range of industries, organizations are finding the need to train and serve large numbers of ML models to meet the diverse needs of their customers. For software as a service (SaaS) providers in particular, the ability to train and serve thousands of models efficiently and cost-effectively is crucial for staying competitive in a rapidly evolving market.
Training and serving thousands of models requires a robust and scalable infrastructure, which is where Amazon SageMaker can help. SageMaker is a fully managed platform that enables developers and data scientists to build, train, and deploy ML models quickly, while also offering the cost-saving benefits of using the AWS Cloud infrastructure.
In this post, we explore how you can use SageMaker features, including Amazon SageMaker Processing, SageMaker training jobs, and SageMaker multi-model endpoints (MMEs), to train and serve thousands of models in a cost-effective way. To get started with the described solution, you can refer to the accompanying notebook on GitHub.
Use case: Energy forecasting
For this post, we assume the role of an ISV company that helps their customers become more sustainable by tracking their energy consumption and providing forecasts. Our company has 1,000 customers who want to better understand their energy usage and make informed decisions about how to reduce their environmental impact. To do this, we use a synthetic dataset and train an ML model based on Prophet for each customer to make energy consumption forecasts. With SageMaker, we can efficiently train and serve these 1,000 models, providing our customers with accurate and actionable insights into their energy usage.
There are three features in the generated dataset:

customer_id – This is an integer identifier for each customer, ranging from 0–999.
timestamp – This is a date/time value that indicates the time at which the energy consumption was measured. The timestamps are randomly generated between the start and end dates specified in the code.
consumption – This is a float value that indicates the energy consumption, measured in some arbitrary unit. The consumption values are randomly generated between 0–1,000 with sinusoidal seasonality.

Solution overview
To efficiently train and serve thousands of ML models, we can use the following SageMaker features:

SageMaker Processing – SageMaker Processing is a fully managed data preparation service that enables you to perform data processing and model evaluation tasks on your input data. You can use SageMaker Processing to transform raw data into the format needed for training and inference, as well as to run batch and online evaluations of your models.
SageMaker training jobs – You can use SageMaker training jobs to train models on a variety of algorithms and input data types, and specify the compute resources needed for training.
SageMaker MMEs – Multi-model endpoints enable you to host multiple models on a single endpoint, which makes it easy to serve predictions from multiple models using a single API. SageMaker MMEs can save time and resources by reducing the number of endpoints needed to serve predictions from multiple models. MMEs support hosting of both CPU- and GPU-backed models. Note that in our scenario, we use 1,000 models, but this is not a limitation of the service itself.

The following diagram illustrates the solution architecture.

The workflow includes the following steps:

We use SageMaker Processing to preprocess data and create a single CSV file per customer and store it in Amazon Simple Storage Service (Amazon S3).
The SageMaker training job is configured to read the output of the SageMaker Processing job and distribute it in a round-robin fashion to the training instances. Note that this can also be achieved with Amazon SageMaker Pipelines.
The model artifacts are stored in Amazon S3 by the training job, and are served directly from the SageMaker MME.

Scale training to thousands of models
Scaling the training of thousands of models is possible via the distribution parameter of the TrainingInput class in the SageMaker Python SDK, which allows you to specify how data is distributed across multiple training instances for a training job. There are three options for the distribution parameter: FullyReplicated, ShardedByS3Key, and ShardedByRecord. The ShardedByS3Key option means that the training data is sharded by S3 object key, with each training instance receiving a unique subset of the data, avoiding duplication. After the data is copied by SageMaker to the training containers, we can read the folder and files structure to train a unique model per customer file. The following is an example code snippet:

# Assume that the training data is in an S3 bucket already, pass the parent folder
s3_input_train = sagemaker.inputs.TrainingInput(
s3_data=’s3://my-bucket/customer_data’,
distribution=’ShardedByS3Key’
)

# Create a SageMaker estimator and set the training input
estimator = sagemaker.estimator.Estimator(…)
estimator.fit(inputs=s3_input_train)

Every SageMaker training job stores the model saved in the /opt/ml/model folder of the training container before archiving it in a model.tar.gz file, and then uploads it to Amazon S3 upon training job completion. Power users can also automate this process with SageMaker Pipelines. When storing multiple models via the same training job, SageMaker creates a single model.tar.gz file containing all the trained models. This would then mean that, in order to serve the model, we would need to unpack the archive first. To avoid this, we use checkpoints to save the state of individual models. SageMaker provides the functionality to copy checkpoints created during the training job to Amazon S3. Here, the checkpoints need to be saved in a pre-specified location, with the default being /opt/ml/checkpoints. These checkpoints can be used to resume training at a later moment or as a model to deploy on an endpoint. For a high-level summary of how the SageMaker training platform manages storage paths for training datasets, model artifacts, checkpoints, and outputs between AWS Cloud storage and training jobs in SageMaker, refer to Amazon SageMaker Training Storage Folders for Training Datasets, Checkpoints, Model Artifacts, and Outputs.
The following code uses a fictitious model.save() function inside the train.py script containing the training logic:

import tarfile
import boto3
import os

[ … argument parsing … ]

for customer in os.list_dir(args.input_path):

# Read data locally within the Training job
df = pd.read_csv(os.path.join(args.input_path, customer, ‘data.csv’))

# Define and train the model
model = MyModel()
model.fit(df)

# Save model to output directory
with open(os.path.join(output_dir, ‘model.json’), ‘w’) as fout:
fout.write(model_to_json(model))

# Create the model.tar.gz archive containing the model and the training script
with tarfile.open(os.path.join(output_dir, ‘{customer}.tar.gz’), “w:gz”) as tar:
tar.add(os.path.join(output_dir, ‘model.json’), “model.json”)
tar.add(os.path.join(args.code_dir, “training.py”), “training.py”)

Scale inference to thousands of models with SageMaker MMEs
SageMaker MMEs allow you to serve multiple models at the same time by creating an endpoint configuration that includes a list of all the models to serve, and then creating an endpoint using that endpoint configuration. There is no need to re-deploy the endpoint every time you add a new model because the endpoint will automatically serve all models stored in the specified S3 paths. This is achieved with Multi Model Server (MMS), an open-source framework for serving ML models that can be installed in containers to provide the front end that fulfills the requirements for the new MME container APIs. In addition, you can use other model servers including TorchServe and Triton. MMS can be installed in your custom container via the SageMaker Inference Toolkit. To learn more about how to configure your Dockerfile to include MMS and use it to serve your models, refer to Build Your Own Container for SageMaker Multi-Model Endpoints.
The following code snippet shows how to create an MME using the SageMaker Python SDK:

from sagemaker.multidatamodel import MultiDataModel

# Create the MultiDataModel definition
multimodel = MultiDataModel(
name=’customer-models’,
model_data_prefix=f’s3://{bucket}/scaling-thousand-models/models’,
model=your_model,
)

# Deploy on a real-time endpoint
predictor = multimodel.deploy(
initial_instance_count=1,
instance_type=’ml.c5.xlarge’,
)

When the MME is live, we can invoke it to generate predictions. Invocations can be done in any AWS SDK as well as with the SageMaker Python SDK, as shown in the following code snippet:

predictor.predict(
data='{“period”: 7}’, # the payload, in this case JSON
target_model='{customer}.tar.gz’ # the name of the target model
)

When calling a model, the model is initially loaded from Amazon S3 on the instance, which can result in a cold start when calling a new model. Frequently used models are cached in memory and on disk to provide low-latency inference.
Conclusion
SageMaker is a powerful and cost-effective platform for training and serving thousands of ML models. Its features, including SageMaker Processing, training jobs, and MMEs, enable organizations to efficiently train and serve thousands of models at scale, while also benefiting from the cost-saving advantages of using the AWS Cloud infrastructure. To learn more about how to use SageMaker for training and serving thousands of models, refer to Process data, Train a Model with Amazon SageMaker and Host multiple models in one container behind one endpoint.

About the Authors
Davide Gallitelli is a Specialist Solutions Architect for AI/ML in the EMEA region. He is based in Brussels and works closely with customers throughout Benelux. He has been a developer since he was very young, starting to code at the age of 7. He started learning AI/ML at university, and has fallen in love with it since then.
Maurits de Groot is a Solutions Architect at Amazon Web Services, based out of Amsterdam. He likes to work on machine learning-related topics and has a predilection for startups. In his spare time, he enjoys skiing and playing squash.

Accelerate business outcomes with 70% performance improvements to data …

Amazon SageMaker Canvas is a visual interface that enables business analysts to generate accurate machine learning (ML) predictions on their own, without requiring any ML experience or having to write a single line of code. SageMaker Canvas’s intuitive user interface lets business analysts browse and access disparate data sources in the cloud or on premises, prepare and explore the data, build and train ML models, and generate accurate predictions within a single workspace.
SageMaker Canvas allows analysts to use different data workloads to achieve the desired business outcomes with high accuracy and performance. The compute, storage, and memory requirements to generate accurate predictions are abstracted from the end-user, enabling them to focus on the business problem to be solved. Earlier this year, we announced performance optimizations based on customer feedback to deliver faster and more accurate model training times with SageMaker Canvas.
In this post, we show how SageMaker Canvas can now process data, train models, and generate predictions with increased speed and efficiency for different dataset sizes.
Prerequisites
If you would like to follow along, complete the following prerequisites:

Have an AWS account.
Set up SageMaker Canvas. For instructions, refer to Prerequisites for setting up Amazon SageMaker Canvas.
Download the following two datasets to your local computer. The first is the NYC Yellow Taxi Trip dataset; the second is the eCommerce behavior data about retails events related to products and users.

Both datasets come under the Attribution 4.0 International (CC BY 4.0) license and are free to share and adapt.
Data processing improvements
With underlying performance optimizations, the time to import data into SageMaker Canvas has improved by over 70%. You can now import datasets of up to 2 GB in approximately 50 seconds and up to 5 GB in approximately 65 seconds.

After importing data, business analysts typically validate the data to ensure there are no issues found within the dataset. Example validation checks can be ensuring columns contain the correct data type, seeing if the value ranges are in line with expectations, making sure there is uniqueness in values where applicable, and others.
Data validation is now faster. In our tests, all validations took 50 seconds for the taxi dataset exceeding 5 GB in size, a 10-times improvement in speed.

Model training improvements
The performance optimizations related to ML model training in SageMaker Canvas now enable you to train models without running into potential out-of-memory requests failures.
The following screenshot shows the results of a successful build run using a large dataset the impact of the total_amount feature on the target variable.

Inference improvements
Finally, SageMaker Canvas inference improvements achieved a 3.5 times reduction memory consumption in case of larger datasets in our internal testing.
Conclusion
In this post, we saw various improvements with SageMaker Canvas in importing, validation, training, and inference. We saw an increased in its ability to import large datasets by 70%. We saw a 10 times improvement in data validation, and a 3.5 times reduction in memory consumption. These improvements allow you to better work with large datasets and reduce time when building ML models with SageMaker Canvas.
We encourage you to experience the improvements yourself. We welcome your feedback as we continuously work on performance optimizations to improve the user experience.

About the authors
Peter Chung is a Solutions Architect for AWS, and is passionate about helping customers uncover insights from their data. He has been building solutions to help organizations make data-driven decisions in both the public and private sectors. He holds all AWS certifications as well as two GCP certifications. He enjoys coffee, cooking, staying active, and spending time with his family.
Tim Song is a Software Development Engineer at AWS SageMaker, with 10+ years of experience as software developer, consultant and tech leader he has demonstrated ability to deliver scalable and reliable products and solve complex problems. In his spare time, he enjoys the nature, outdoor running, hiking and etc.
Hariharan Suresh is a Senior Solutions Architect at AWS. He is passionate about databases, machine learning, and designing innovative solutions. Prior to joining AWS, Hariharan was a product architect, core banking implementation specialist, and developer, and worked with BFSI organizations for over 11 years. Outside of technology, he enjoys paragliding and cycling.
Maia Haile is a Solutions Architect at Amazon Web Services based in the Washington, D.C. area. In that role, she helps public sector customers achieve their mission objectives with well architected solutions on AWS. She has 5 years of experience spanning from nonprofit healthcare, Media and Entertainment, and retail. Her passion is leveraging intelligence (AI) and machine learning (ML) to help Public Sector customers achieve their business and technical goals.

Build a personalized avatar with generative AI using Amazon SageMaker

Generative AI has become a common tool for enhancing and accelerating the creative process across various industries, including entertainment, advertising, and graphic design. It enables more personalized experiences for audiences and improves the overall quality of the final products.
One significant benefit of generative AI is creating unique and personalized experiences for users. For example, generative AI is used by streaming services to generate personalized movie titles and visuals to increase viewer engagement and build visuals for titles based on a user’s viewing history and preferences. The system then generates thousands of variations of a title’s artwork and tests them to determine which version most attracts the user’s attention. In some cases, personalized artwork for TV series significantly increased clickthrough rates and view rates as compared to shows without personalized artwork.
In this post, we demonstrate how you can use generative AI models like Stable Diffusion to build a personalized avatar solution on Amazon SageMaker and save inference cost with multi-model endpoints (MMEs) at the same time. The solution demonstrates how, by uploading 10–12 images of yourself, you can fine-tune a personalized model that can then generate avatars based on any text prompt, as shown in the following screenshots. Although this example generates personalized avatars, you can apply the technique to any creative art generation by fine-tuning on specific objects or styles.

Solution overview
The following architecture diagram outlines the end-to-end solution for our avatar generator.

The scope of this post and the example GitHub code we provide focus only on the model training and inference orchestration (the green section in the preceding diagram). You can reference the full solution architecture and build on top of the example we provide.
Model training and inference can be broken down into four steps:

Upload images to Amazon Simple Storage Service (Amazon S3). In this step, we ask you to provide a minimum of 10 high-resolution images of yourself. The more images, the better the result, but the longer it will take to train.
Fine-tune a Stable Diffusion 2.1 base model using SageMaker asynchronous inference. We explain the rationale for using an inference endpoint for training later in this post. The fine-tuning process starts with preparing the images, including face cropping, background variation, and resizing for the model. Then we use Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning technique for large language models (LLMs), to fine-tune the model. Finally, in postprocessing, we package the fine-tuned LoRA weights with the inference script and configuration files (tar.gz) and upload them to an S3 bucket location for SageMaker MMEs.
Host the fine-tuned models using SageMaker MMEs with GPU. SageMaker will dynamically load and cache the model from the Amazon S3 location based on the inference traffic to each model.
Use the fine-tuned model for inference. After the Amazon Simple Notification Service (Amazon SNS) notification indicating the fine-tuning is sent, you can immediately use that model by supplying a target_model parameter when invoking the MME to create your avatar.

We explain each step in more detail in the following sections and walk through some of the sample code snippets.
Prepare the images
To achieve the best results from fine-tuning Stable Diffusion to generate images of yourself, you typically need to provide a large quantity and variety of photos of yourself from different angles, with different expressions, and in different backgrounds. However, with our implementation, you can now achieve a high-quality result with as few as 10 input images. We have also added automated preprocessing to extract your face from each photo. All you need is to capture the essence of how you look clearly from multiple perspectives. Include a front-facing photo, a profile shot from each side, and photos from angles in between. You should also include photos with different facial expressions like smiling, frowning, and a neutral expression. Having a mix of expressions will allow the model to better reproduce your unique facial features. The input images dictate the quality of avatar you can generate. To make sure this is done properly, we recommend an intuitive front-end UI experience to guide the user through the image capture and upload process.
The following are example selfie images at different angles with different facial expressions.

Fine-tune a Stable Diffusion model
After the images are uploaded to Amazon S3, we can invoke the SageMaker asynchronous inference endpoint to start our training process. Asynchronous endpoints are intended for inference use cases with large payloads (up to 1 GB) and long processing times (up to 1 hour). It also provides a built-in queuing mechanism for queuing up requests, and a task completion notification mechanism via Amazon SNS, in addition to other native features of SageMaker hosting such as auto scaling.
Even though fine-tuning is not an inference use case, we chose to utilize it here in lieu of SageMaker training jobs due to its built-in queuing and notification mechanisms and managed auto scaling, including the ability to scale down to 0 instances when the service is not in use. This allows us to easily scale the fine-tuning service to a large number of concurrent users and eliminates the need to implement and manage the additional components. However, it does come with the drawback of the 1 GB payload and 1 hour maximum processing time. In our testing, we found that 20 minutes is sufficient time to get reasonably good results with roughly 10 input images on an ml.g5.2xlarge instance. However, SageMaker training would be the recommended approach for larger-scale fine-tuning jobs.
To host the asynchronous endpoint, we must complete several steps. The first is to define our model server. For this post, we use the Large Model Inference Container (LMI). LMI is powered by DJL Serving, which is a high-performance, programming language-agnostic model serving solution. We chose this option because the SageMaker managed inference container already has many of the training libraries we need, such as Hugging Face Diffusers and Accelerate. This greatly reduces the amount of work required to customize the container for our fine-tuning job.
The following code snippet shows the version of the LMI container we used in our example:

inference_image_uri = (
f”763104351884.dkr.ecr.{region}.amazonaws.com/djl-inference:0.21.0-deepspeed0.8.3-cu117″
)
print(f”Image going to be used is —- > {inference_image_uri}”)

In addition to that, we need to have a serving.properties file that configures the serving properties, including the inference engine to use, the location of the model artifact, and dynamic batching. Lastly, we must have a model.py file that loads the model into the inference engine and prepares the data input and output from the model. In our example, we use the model.py file to spin up the fine-tuning job, which we explain in greater detail in a later section. Both the serving.properties and model.py files are provided in the training_service folder.
The next step after defining our model server is to create an endpoint configuration that defines how our asynchronous inference will be served. For our example, we are just defining the maximum concurrent invocation limit and the output S3 location. With the ml.g5.2xlarge instance, we have found that we are able to fine-tune up to two models concurrently without encountering an out-of-memory (OOM) exception, and therefore we set max_concurrent_invocations_per_instance to 2. This number may need to be adjusted if we’re using a different set of tuning parameters or a smaller instance type. We recommend setting this to 1 initially and monitoring the GPU memory utilization in Amazon CloudWatch.

# create async endpoint configuration
async_config = AsyncInferenceConfig(
output_path=f”s3://{bucket}/{s3_prefix}/async_inference/output” , # Where our results will be stored
max_concurrent_invocations_per_instance=2,
notification_config={
  “SuccessTopic”: “…”,
  “ErrorTopic”: “…”,
}, #  Notification configuration
)

Finally, we create a SageMaker model that packages the container information, model files, and AWS Identity and Access Management (IAM) role into a single object. The model is deployed using the endpoint configuration we defined earlier:

model = Model(
image_uri=image_uri,
model_data=model_data,
role=role,
env=env
)

model.deploy(
initial_instance_count=1,
instance_type=instance_type,
endpoint_name=endpoint_name,
async_inference_config=async_inference_config
)

predictor = sagemaker.Predictor(
endpoint_name=endpoint_name,
sagemaker_session=sagemaker_session
)

When the endpoint is ready, we use the following sample code to invoke the asynchronous endpoint and start the fine-tuning process:

sm_runtime = boto3.client(“sagemaker-runtime”)

input_s3_loc = sess.upload_data(“data/jw.tar.gz”, bucket, s3_prefix)

response = sm_runtime.invoke_endpoint_async(
EndpointName=sd_tuning.endpoint_name,
InputLocation=input_s3_loc)

For more details about LMI on SageMaker, refer to Deploy large models on Amazon SageMaker using DJLServing and DeepSpeed model parallel inference.
After invocation, the asynchronous endpoint starts queueing our fine-tuning job. Each job runs through the following steps: prepare the images, perform Dreambooth and LoRA fine-tuning, and prepare the model artifacts. Let’s dive deeper into the fine-tuning process.
Prepare the images
As we mentioned earlier, the quality of input images directly impacts the quality of fine-tuned model. For the avatar use case, we want the model to focus on the facial features. Instead of requiring users to provide carefully curated images of exact size and content, we implement a preprocessing step using computer vision techniques to alleviate this burden. In the preprocessing step, we first use a face detection model to isolate the largest face in each image. Then we crop and pad the image to the required size of 512 x 512 pixels for our model. Finally, we segment the face from the background and add random background variations. This helps highlight the facial features, allowing our model to learn from the face itself rather than the background. The following images illustrate the three steps in this process.

Step 1: Face detection using computer vision
Step 2: Crop and pad the image to 512 x 512 pixels
Step 3 (Optional): Segment and add background variation

Dreambooth and LoRA fine-tuning
For fine-tuning, we combined the techniques of Dreambooth and LoRA. Dreambooth allows you to personalize your Stable Diffusion model, embedding a subject into the model’s output domain using a unique identifier and expanding the model’s language vision dictionary. It uses a method called prior preservation to preserve the model’s semantic knowledge of the class of the subject, in this case a person, and use other objects in the class to improve the final image output. This is how Dreambooth can achieve high-quality results with just a few input images of the subject.
The following code snippet shows the inputs to our trainer.py class for our avatar solution. Notice we chose <<TOK>> as the unique identifier. This is purposely done to avoid picking a name that may already be in the model’s dictionary. If the name already exists, the model has to unlearn and then relearn the subject, which may lead to poor fine-tuning results. The subject class is set to “a photo of person”, which enables prior preservation by first generating photos of people to feed in as additional inputs during the fine-tuning process. This will help reduce overfitting as model tries to preserve the previous knowledge of a person using the prior preservation method.

status = trn.run(base_model=”stabilityai/stable-diffusion-2-1-base”,
resolution=512,
n_steps=1000,
concept_prompt=”photo of <<TOK>>”, # << unique identifier of the subject
learning_rate=1e-4,
gradient_accumulation=1,
fp16=True,
use_8bit_adam=True,
gradient_checkpointing=True,
train_text_encoder=True,
with_prior_preservation=True,
prior_loss_weight=1.0,
class_prompt=”a photo of person”, # << subject class
num_class_images=50,
class_data_dir=class_data_dir,
lora_r=128,
lora_alpha=1,
lora_bias=”none”,
lora_dropout=0.05,
lora_text_encoder_r=64,
lora_text_encoder_alpha=1,
lora_text_encoder_bias=”none”,
lora_text_encoder_dropout=0.05
)

A number of memory-saving options have been enabled in the configuration, including fp16, use_8bit_adam, and gradient accumulation. This reduces the memory footprint to under 12 GB, which allows for fine-tuning of up to two models concurrently on an ml.g5.2xlarge instance.
LoRA is an efficient fine-tuning technique for LLMs that freezes most of the weights and attaches a small adapter network to specific layers of the pre-trained LLM, allowing for faster training and optimized storage. For Stable Diffusion, the adapter is attached to the text encoder and U-Net components of the inference pipeline. The text encoder converts the input prompt to a latent space that is understood by the U-Net model, and the U-Net model uses the latent meaning to generate the image in the subsequent diffusion process. The output of the fine-tuning is just the text_encoder and U-Net adapter weights. At inference time, these weights can be reattached to the base Stable Diffusion model to reproduce the fine-tuning results.
The figures below are detail diagram of LoRA fine-tuning provided by original author: Cheng-Han Chiang, Yung-Sung Chuang, Hung-yi Lee, “AACL_2022_tutorial_PLMs,” 2022

By combining both methods, we were able to generate a personalized model while tuning an order-of-magnitude fewer parameters. This resulted in a much faster training time and reduced GPU utilization. Additionally, storage was optimized with the adapter weight being only 70 MB, compared to 6 GB for a full Stable Diffusion model, representing a 99% size reduction.
Prepare the model artifacts
After fine-tuning is complete, the postprocessing step will TAR the LoRA weights with the rest of the model serving files for NVIDIA Triton. We use a Python backend, which means the Triton config file and the Python script used for inference are required. Note that the Python script has to be named model.py. The final model TAR file should have the following file structure:

|–sd_lora
|–config.pbtxt
|–1
|–model.py
|–output #LoRA weights
|–text_encoder
|–unet
|–train.sh

Host the fine-tuned models using SageMaker MMEs with GPU
After the models have been fine-tuned, we host the personalized Stable Diffusion models using a SageMaker MME. A SageMaker MME is a powerful deployment feature that allows hosting multiple models in a single container behind a single endpoint. It automatically manages traffic and routing to your models to optimize resource utilization, save costs, and minimize operational burden of managing thousands of endpoints. In our example, we run on GPU instances, and SageMaker MMEs support GPU using Triton Server. This allows you to run multiple models on a single GPU device and take advantage of accelerated compute. For more detail on how to host Stable Diffusion on SageMaker MMEs, refer to Create high-quality images with Stable Diffusion models and deploy them cost-efficiently with Amazon SageMaker.
For our example, we made additional optimization to load the fine-tuned models faster during cold start situations. This is possible because of LoRA’s adapter design. Because the base model weights and Conda environments are the same for all fine-tuned models, we can share these common resources by pre-loading them onto the hosting container. This leaves only the Triton config file, Python backend (model.py), and LoRA adaptor weights to be dynamically loaded from Amazon S3 after the first invocation. The following diagram provides a side-by-side comparison.

This significantly reduces the model TAR file from approximately 6 GB to 70 MB, and therefore is much faster to load and unpack. To do the preloading in our example, we created a utility Python backend model in models/model_setup. The script simply copies the base Stable Diffusion model and Conda environment from Amazon S3 to a common location to share across all the fine-tuned models. The following is the code snippet that performs the task:

def initialize(self, args):

#conda env setup
self.conda_pack_path = Path(args[‘model_repository’]) / “sd_env.tar.gz”
self.conda_target_path = Path(“/tmp/conda”)

self.conda_env_path = self.conda_target_path / “sd_env.tar.gz”

if not self.conda_env_path.exists():
self.conda_env_path.parent.mkdir(parents=True, exist_ok=True)
shutil.copy(self.conda_pack_path, self.conda_env_path)

#base diffusion model setup
self.base_model_path = Path(args[‘model_repository’]) / “stable_diff.tar.gz”

try:
with tarfile.open(self.base_model_path) as tar:
tar.extractall(‘/tmp’)

self.response_message = “Model env setup successful.”

except Exception as e:
# print the exception message
print(f”Caught an exception: {e}”)
self.response_message = f”Caught an exception: {e}”

Then each fine-tuned model will point to the shared location on the container. The Conda environment is referenced in the config.pbtxt.

name: “pipeline_0”
backend: “python”
max_batch_size: 1

parameters: {
key: “EXECUTION_ENV_PATH”,
value: {string_value: “/tmp/conda/sd_env.tar.gz”}
}

The Stable Diffusion base model is loaded from the initialize() function of each model.py file. We then apply the personalized LoRA weights to the unet and text_encoder model to reproduce each fine-tuned model:

class TritonPythonModel:

def initialize(self, args):
self.output_dtype = pb_utils.triton_string_to_numpy(
pb_utils.get_output_config_by_name(json.loads(args[“model_config”]),
“generated_image”)[“data_type”])

self.model_dir = args[‘model_repository’]

device=’cuda’
self.pipe = StableDiffusionPipeline.from_pretrained(‘/tmp/stable_diff’,
torch_dtype=torch.float16,
revision=”fp16″).to(device)

# Load the LoRA weights
self.pipe.unet = PeftModel.from_pretrained(self.pipe.unet, unet_sub_dir)

if os.path.exists(text_encoder_sub_dir):
self.pipe.text_encoder = PeftModel.from_pretrained(self.pipe.text_encoder, text_encoder_sub_dir)

Use the fine-tuned model for inference
Now we can try our fine-tuned model by invoking the MME endpoint. The input parameters we exposed in our example include prompt, negative_prompt, and gen_args, as shown in the following code snippet. We set the data type and shape of each input item in the dictionary and convert them into a JSON string. Finally, the string payload and TargetModel are passed into the request to generate your avatar picture.

import random

prompt = “””<<TOK>> epic portrait, zoomed out, blurred background cityscape, bokeh,
perfect symmetry, by artgem, artstation ,concept art,cinematic lighting, highly
detailed, octane, concept art, sharp focus, rockstar games, post processing,
picture of the day, ambient lighting, epic composition”””

negative_prompt = “””
beard, goatee, ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, extra limbs, disfigured, deformed, body out of frame, blurry, bad anatomy, blurred,
watermark, grainy, signature, cut off, draft, amateur, multiple, gross, weird, uneven, furnishing, decorating, decoration, furniture, text, poor, low, basic, worst, juvenile,
unprofessional, failure, crayon, oil, label, thousand hands
“””

seed = random.randint(1, 1000000000)

gen_args = json.dumps(dict(num_inference_steps=50, guidance_scale=7, seed=seed))

inputs = dict(prompt = prompt,
negative_prompt = negative_prompt,
gen_args = gen_args)

payload = {
“inputs”:
[{“name”: name, “shape”: [1,1], “datatype”: “BYTES”, “data”: [data]} for name, data in inputs.items()]
}

response = sm_runtime.invoke_endpoint(
EndpointName=endpoint_name,
ContentType=”application/octet-stream”,
Body=json.dumps(payload),
TargetModel=”sd_lora.tar.gz”,
)
output = json.loads(response[“Body”].read().decode(“utf8”))[“outputs”]
original_image = decode_image(output[0][“data”][0])
original_image

Clean up
Follow the instructions in the cleanup section of the notebook to delete the resources provisioned as part of this post to avoid unnecessary charges. Refer to Amazon SageMaker Pricing for details regarding the cost of the inference instances.
Conclusion
In this post, we demonstrated how to create a personalized avatar solution using Stable Diffusion on SageMaker. By fine-tuning a pre-trained model with just a few images, we can generate avatars that reflect the individuality and personality of each user. This is just one of many examples of how we can use generative AI to create customized and unique experiences for users. The possibilities are endless, and we encourage you to experiment with this technology and explore its potential to enhance the creative process. We hope this post has been informative and inspiring. We encourage you to try the example and share your creations with us using hashtags #sagemaker #mme #genai on social platforms. We would love to see what you make.
In addition to Stable Diffusion, many other generative AI models are available on Amazon SageMaker JumpStart. Refer to Getting started with Amazon SageMaker JumpStart to explore their capabilities.

About the Authors
James Wu is a Senior AI/ML Specialist Solution Architect at AWS. helping customers design and build AI/ML solutions. James’s work covers a wide range of ML use cases, with a primary interest in computer vision, deep learning, and scaling ML across the enterprise. Prior to joining AWS, James was an architect, developer, and technology leader for over 10 years, including 6 years in engineering and 4 years in marketing & advertising industries.
Simon Zamarin is an AI/ML Solutions Architect whose main focus is helping customers extract value from their data assets. In his spare time, Simon enjoys spending time with family, reading sci-fi, and working on various DIY house projects.
Vikram Elango is an AI/ML Specialist Solutions Architect at Amazon Web Services, based in Virginia USA. Vikram helps financial and insurance industry customers with design, thought leadership to build and deploy machine learning applications at scale. He is currently focused on natural language processing, responsible AI, inference optimization and scaling ML across the enterprise. In his spare time, he enjoys traveling, hiking, cooking and camping with his family.
Lana Zhang is a Senior Solutions Architect at AWS WWSO AI Services team, specializing in AI and ML for content moderation, computer vision, and natural language processing. With her expertise, she is dedicated to promoting AWS AI/ML solutions and assisting customers in transforming their business solutions across diverse industries, including social media, gaming, e-commerce, and advertising & marketing.
Saurabh Trikande is a Senior Product Manager for Amazon SageMaker Inference. He is passionate about working with customers and is motivated by the goal of democratizing machine learning. He focuses on core challenges related to deploying complex ML applications, multi-tenant ML models, cost optimizations, and making deployment of deep learning models more accessible. In his spare time, Saurabh enjoys hiking, learning about innovative technologies, following TechCrunch and spending time with his family.

SageMaker Distribution is now available on Amazon SageMaker Studio

SageMaker Distribution is a pre-built Docker image containing many popular packages for machine learning (ML), data science, and data visualization. This includes deep learning frameworks like PyTorch, TensorFlow, and Keras; popular Python packages like NumPy, scikit-learn, and pandas; and IDEs like JupyterLab. In addition to this, SageMaker Distribution supports conda, micromamba, and pip as Python package managers.
In May 2023, we launched SageMaker Distribution as an open-source project at JupyterCon. This launch helped you use SageMaker Distribution to run experiments on your local environments. We are now natively providing that image in Amazon SageMaker Studio so that you gain the high performance, compute, and security benefits of running your experiments on Amazon SageMaker.
Compared to the earlier open-source launch, you have the following additional capabilities:

The open-source image is now available as a first-party image in SageMaker Studio. You can now simply choose the open-source SageMaker Distribution from the list when choosing an image and kernel for your notebooks, without having to create a custom image.
The SageMaker Python SDK package is now built-in with the image.

In this post, we show the features and advantages of using the SageMaker Distribution image.
Use SageMaker Distribution in SageMaker Studio
If you have access to an existing Studio domain, you can launch SageMaker Studio. To create a Studio domain, follow the directions in Onboard to Amazon SageMaker Domain.

In the SageMaker Studio UI, choose File from the menu bar, choose New, and choose Notebook.
When prompted for the image and instance, choose the SageMaker Distribution v0 CPU or SageMaker Distribution v0 GPU image.
Choose your Kernel, then choose Select.

You can now start running your commands without needing to install common ML packages and frameworks! You can also run notebooks running on supported frameworks such as PyTorch and TensorFlow from the SageMaker examples repository, without having to switch the active kernels.
Run code remotely using SageMaker Distribution
In the public beta announcement, we discussed graduating notebooks from local compute environments to SageMaker Studio, and also operationalizing the notebook using notebook jobs.
Additionally, you can directly run your local notebook code as a SageMaker training job by simply adding a @remote decorator to your function.
Let’s try an example. Add the following code to your Studio notebook running on the SageMaker Distribution image:

from sagemaker.remote_function import remote

@remote(instance_type=”ml.m5.xlarge”, dependencies=’./requirements.txt’)
def divide(x, y):
return x / y

divide(2, 3.0)

When you run the cell, the function will run as a remote SageMaker training job on an ml.m5.xlarge notebook, and the SDK automatically picks up the SageMaker Distribution image as the training image in Amazon Elastic Container Registry (Amazon ECR). For deep learning workloads, you can also run your script on multiple parallel instances.
Reproduce Conda environments from SageMaker Distribution elsewhere
SageMaker Distribution is available as a public Docker image. However, for data scientists more familiar with Conda environments than Docker, the GitHub repository also provides the environment files for each image build so you can build Conda environments for both CPU and GPU versions.
The build artifacts for each version are stored under the sagemaker-distribution/build_artifacts directory. To create the same environment as any of the available SageMaker Distribution versions, run the following commands, replacing the –file parameter with the right environment files:

conda create –name conda-sagemaker-distribution
–file sagemaker-distribution/build_artifacts/v0/v0.2/v0.2.1/cpu.env.out
# activate the environment
conda activate conda-sagemaker-distribution

Customize the open-source SageMaker Distribution image
The open-source SageMaker Distribution image has the most commonly used packages for data science and ML. However, data scientists might require access to additional packages, and enterprise customers might have proprietary packages that provide additional capabilities for their users. In such cases, there are multiple options to have a runtime environment with all required packages. In order of increasing complexity, they are listed as follows:

You can install packages directly on the notebook. We recommend Conda and micromamba, but pip also works.
Data scientists familiar with Conda for package management can reproduce the Conda environment from SageMaker Distribution elsewhere and install and manage additional packages in that environment going forward.
If administrators want a repeatable and controlled runtime environment for their users, they can extend SageMaker Distribution’s Docker images and maintain their own image. See Bring your own SageMaker image for detailed instructions to create and use a custom image in Studio.

Clean up
If you experimented with SageMaker Studio, shut down all Studio apps to avoid paying for unused compute usage. See Shut down and Update Studio Apps for instructions.
Conclusion
Today, we announced the launch of the open-source SageMaker Distribution image within SageMaker Studio. We showed you how to use the image in SageMaker Studio as one of the available first-party images, how to operationalize your scripts using the SageMaker Python SDK @remote decorator, how to reproduce the Conda environments from SageMaker Distribution outside Studio, and how to customize the image. We encourage you to try out SageMaker Distribution and share your feedback through GitHub!
Additional References

SageMaker-distribution documentation
JupyterCon Contributions by AWS in 2023
Get Started on SageMaker Studio

About the authors
Durga Sury is an ML Solutions Architect in the Amazon SageMaker Service SA team. She is passionate about making machine learning accessible to everyone. In her 4 years at AWS, she has helped set up AI/ML platforms for enterprise customers. When she isn’t working, she loves motorcycle rides, mystery novels, and hiking with her 5-year-old husky.
Ketan Vijayvargiya is a Senior Software Development Engineer in Amazon Web Services (AWS). His focus areas are machine learning, distributed systems and open source. Outside work, he likes to spend his time self-hosting and enjoying nature.

Automate caption creation and search for images at enterprise scale us …

Amazon Kendra is an intelligent search service powered by machine learning (ML). Amazon Kendra reimagines search for your websites and applications so your employees and customers can easily find the content they are looking for, even when it’s scattered across multiple locations and content repositories within your organization.
Amazon Kendra supports a variety of document formats, such as Microsoft Word, PDF, and text from various data sources. In this post, we focus on extending the document support in Amazon Kendra to make images searchable by their displayed content. Images can often be searched using supplemented metadata such as keywords. However, it takes a lot of manual effort to add detailed metadata to potentially thousands of images. Generative AI (GenAI) can be helpful in generating the metadata automatically. By generating textual captions, the GenAI caption predictions offer descriptive metadata for images. The Amazon Kendra index can then be enriched with the generated metadata during document ingestion to enable searching the images without any manual effort.
As an example, a GenAI model can be used to generate a textual description for the following image as “a dog laying on the ground under an umbrella” during document ingestion of the image.

An object recognition model can still detect keywords such as “dog” and “umbrella,” but a GenAI model offers deeper understanding of what is represented in the image by identifying that the dog lies under the umbrella. This helps us build more refined searches in the image search process. The textual description is added as metadata to an Amazon Kendra search index via an automated custom document enrichment (CDE). Users searching for terms like “dog” or “umbrella” will then be able to find the image, as shown in the following screenshot.

In this post, we show how to use CDE in Amazon Kendra using a GenAI model deployed on Amazon SageMaker. We demonstrate CDE using simple examples and provide a step-by-step guide for you to experience CDE in an Amazon Kendra index in your own AWS account. It allows users to quickly and easily find the images they need without having to manually tag or categorize them. This solution can also be customized and scaled to meet the needs of different applications and industries.
Image captioning with GenAI
Image description with GenAI involves using ML algorithms to generate textual descriptions of images. The process is also known as image captioning, and operates at the intersection of computer vision and natural language processing (NLP). It has applications in areas where data is multi-modal such as ecommerce, where data contains text in the form of metadata as well as images, or in healthcare, where data could contain MRIs or CT scans along with doctor’s notes and diagnoses, to name a few use cases.
GenAI models learn to recognize objects and features within the images, and then generate descriptions of those objects and features in natural language. The state-of-the-art models use an encoder-decoder architecture, where the image information is encoded in the intermediate layers of the neural network and decoded into textual descriptions. These can be considered as two distinct stages: feature extraction from images and textual caption generation. In the feature extraction stage (encoder), the GenAI model processes the image to extract relevant visual features, such as object shapes, colors, and textures. In the caption generation stage (decoder), the model generates a natural language description of the image based on the extracted visual features.
GenAI models are typically trained on vast amounts of data, which make them suitable for various tasks without additional training. Adapting to custom datasets and new domains is also easily achievable through few-shot learning. Pre-training methods allow multi-modal applications to be easily trained using state-of-the-art language and image models. These pre-training methods also allow you to mix and match the vision model and language model that best fits your data.
The quality of the generated image descriptions depends on the quality and size of the training data, the architecture of the GenAI model, and the quality of the feature extraction and caption generation algorithms. Although image description with GenAI is an active area of research, it shows very good results in a wide range of applications, such as image search, visual storytelling, and accessibility for people with visual impairments.
Use cases
GenAI image captioning is useful in the following use cases:

Ecommerce – A common industry use case where images and text occur together is retail. Ecommerce in particular stores vast amounts of data as product images along with textual descriptions. The textual description or metadata is important to ensure that the best products are displayed to the user based on the search queries. Moreover, with the trend of ecommerce sites obtaining data from 3P vendors, the product descriptions are often incomplete, amounting to numerous manual hours and huge overhead resulting from tagging the right information in the metadata columns. GenAI-based image captioning is particularly useful for automating this laborious process. Fine-tuning the model on custom fashion data such as fashion images along with text describing the attributes of fashion products can be used to generate metadata that then improves a user’s search experience.
Marketing – Another use case of image search is digital asset management. Marketing firms store vast amounts of digital data that needs to be centralized, easily searchable, and scalable enabled by data catalogs. A centralized data lake with informative data catalogs would reduce duplication efforts and enable wider sharing of creative content and consistency between teams. For graphic design platforms popularly used for enabling social media content generation, or presentations in corporate settings, a faster search could result in an improved user experience by rendering the correct search results for the images that users want to look for and enabling users to search using natural language queries.
Manufacturing – The manufacturing industry stores a lot of image data like architecture blueprints of components, buildings, hardware, and equipment. The ability to search through such data enables product teams to easily recreate designs from a starting point that already exists and eliminates a lot of design overhead, thereby speeding up the process of design generation.
Healthcare – Doctors and medical researchers can catalog and search through MRIs and CT scans, specimen samples, images of the ailment such as rashes and deformities, along with doctor’s notes, diagnoses, and clinical trials details.
Metaverse or augmented reality – Advertising a product is about creating a story that users can imagine and relate to. With AI-powered tools and analytics, it has become easier than ever to build not just one story but customized stories to appear to end-users’ unique tastes and sensibilities. This is where image-to-text models can be a game changer. Visual storytelling can assist in creating characters, adapting them to different styles, and captioning them. It can also be used to power stimulating experiences in the metaverse or augmented reality and immersive content including video games. Image search enables developers, designers, and teams to search their content using natural language queries, which can maintain consistency of content between various teams.
Accessibility of digital content for blind and low vision – This is primarily enabled by assistive technologies such as screenreaders, Braille systems that allow touch reading and writing, and special keyboards for navigating websites and applications across the internet. Images, however, need to be delivered as textual content that can then be communicated as speech. Image captioning using GenAI algorithms is a crucial piece for redesigning the internet and making it more inclusive by providing everyone a chance to access, understand, and interact with online content.

Model details and model fine-tuning for custom datasets
In this solution, we take advantage of the vit-gpt2-image-captioning model available from Hugging Face, which is licensed under Apache 2.0 without performing any further fine-tuning. Vit is a foundational model for image data, and GPT-2 is a foundational model for language. The multi-modal combination of the two offers the capability of image captioning. Hugging Face hosts state-of-the-art image captioning models, which can be deployed in AWS in a few clicks and offer simple-to-deploy inference endpoints. Although we can use this pre-trained model directly, we can also customize the model to fit domain-specific datasets, more data types such as video or spatial data, and unique use cases. There are several GenAI models where some models perform best with certain datasets, or your team might already be using vision and language models. This solution offers the flexibility of choosing the best-performing vision and language model as the image captioning model through straightforward replacement of the model we have used.
For customization of the models to unique industry applications, open-source models available on AWS through Hugging Face offer several possibilities. A pre-trained model can be tested for the unique dataset or trained on samples of the labeled data to fine-tune it. Novel research methods also allow any combination of vision and language models to be combined efficiently and trained on your dataset. This newly trained model can then be deployed in SageMaker for the image captioning described in this solution.
An example of a customized image search is Enterprise Resource Planning (ERP). In ERP, image data collected from different stages of logistics or supply chain management could include tax receipts, vendor orders, payslips, and more, which need to be automatically categorized for the purview of different teams within the organization. Another example is to use medical scans and doctor diagnoses to predict new medical images for automatic classification. The vision model extracts features from the MRI, CT, or X-ray images and the text model captions it with the medical diagnoses.
Solution overview
The following diagram shows the architecture for image search with GenAI and Amazon Kendra.

We ingest images from Amazon Simple Storage Service (Amazon S3) into Amazon Kendra. During ingestion to Amazon Kendra, the GenAI model hosted on SageMaker is invoked to generate an image description. Additionally, text visible in an image is extracted by Amazon Textract. The image description and the extracted text are stored as metadata and made available to the Amazon Kendra search index. After ingestion, images can be searched via the Amazon Kendra search console, API, or SDK.
We use the advanced operations of CDE in Amazon Kendra to call the GenAI model and Amazon Textract during the image ingestion step. However, we can use CDE for a wider range of use cases. With CDE, you can create, modify, or delete document attributes and content when you ingest your documents into Amazon Kendra. This means you can manipulate and ingest your data as needed. This can be achieved by invoking pre- and post-extraction AWS Lambda functions during ingestion, which allows for data enrichment or modification. For example, we can use Amazon Medical Comprehend when ingesting medical textual data to add ML-generated insights to the search metadata.
You can use our solution to search images through Amazon Kendra by following these steps:

Upload images to an image repository like an S3 bucket.
The image repository is then indexed by Amazon Kendra, which is a search engine that can be used to search for structured and unstructured data. During indexing, the GenAI model as well as Amazon Textract are invoked to generate the image metadata. You can trigger the indexing manually or on a predefined schedule.
You can then search for images using natural language queries, such as “Find images of red roses” or “Show me pictures of dogs playing in the park,” through the Amazon Kendra console, SDK, or API. These queries are processed by Amazon Kendra, which uses ML algorithms to understand the meaning behind the queries and retrieve relevant images from the indexed repository.
The search results are presented to you, along with their corresponding textual descriptions, allowing you to quickly and easily find the images you are looking for.

Prerequisites
You must have the following prerequisites:

An AWS account
Permissions to provision and invoke the following services via AWS CloudFormation: Amazon S3, Amazon Kendra, Lambda, and Amazon Textract.

Cost estimate
The cost of deploying this solution as a proof of concept is projected in the following table. This is the reason we use Amazon Kendra with the Developer Edition, which is not recommended for production workloads, but provides a low-cost option for developers. We assume that the search functionality of Amazon Kendra is used for 20 working days for 3 hours each day, and therefore calculate associated costs for 60 monthly active hours.

Service
Time Consumed
Cost Estimate per Month

Amazon S3
Storage of 10 GB with data transfer
2.30 USD

Amazon Kendra
Developer Edition with 60 hours/month
67.90 USD

Amazon Textract
100% detect document text on 10,000 images
15.00 USD

Amazon SageMaker
Real-time inference with ml.g4dn.xlarge for one model deployed on one endpoint for 3 hours every day for 20 days
44.00 USD

.
.
129.2 USD

Deploy resources with AWS CloudFormation
The CloudFormation stack deploys the following resources:

A Lambda function that downloads the image captioning model from Hugging Face hub and subsequently builds the model assets
A Lambda function that populates the inference code and zipped model artifacts to a destination S3 bucket
An S3 bucket for storing the zipped model artifacts and inference code
An S3 bucket for storing the uploaded images and Amazon Kendra documents
An Amazon Kendra index for searching through the generated image captions
A SageMaker real-time inference endpoint for deploying the Hugging Face image
captioning model
A Lambda function that is triggered while enriching the Amazon Kendra index on demand. It invokes Amazon Textract and a SageMaker real-time inference endpoint.

Additionally, AWS CloudFormation deploys all the necessary AWS Identity and Access
Management (IAM) roles and policies, a VPC along with subnets, a security group, and an internet gateway in which the custom resource Lambda function is run.
Complete the following steps to provision your resources:

Choose Launch stack to launch the CloudFormation template in the us-east-1 Region:
Choose Next.
On the Specify stack details page, leave the template URL and S3 URI of the parameters file at their defaults, then choose Next.
Continue to choose Next on the subsequent pages.
Choose Create stack to deploy the stack.

Monitor the status of the stack. When the status shows as CREATE_COMPLETE, the deployment is complete.
Ingest and search example images
Complete the following steps to ingest and search your images:

On the Amazon S3 console, create a folder called images in the kendra-image-search-stack-imagecaptions S3 bucket in the us-east-1 Region.
Upload the following images to the images folder.

Navigate to the Amazon Kendra console in us-east-1 Region.
In the navigation pane, choose Indexes, then choose your index (kendra-index).
Choose Data sources, then choose generated_image_captions.
Choose Sync now.

Wait for the synchronization to be complete before continuing to the next steps.

In the navigation pane, choose Indexes, then choose kendra-index.
Navigate to the search console.
Try the following queries individually or combined: “dog,” “umbrella,” and “newsletter,” and find out which images are ranked high by Amazon Kendra.

Feel free to test your own queries that fit the uploaded images.
Clean up
To deprovisioning all the resources, complete the following step

On the AWS CloudFormation console, choose Stacks in the navigation pane.
Select the stack kendra-genai-image-search and choose Delete.

Wait until the stack status changes to DELETE_COMPLETE.
Conclusion
In this post, we saw how Amazon Kendra and GenAI can be combined to automate the creation of meaningful metadata for images. State-of-the-art GenAI models are extremely useful for generating text captions describing the content of an image. This has several industry use cases, ranging from healthcare and life sciences, retail and ecommerce, digital asset platforms, and media. Image captioning is also crucial for building a more inclusive digital world and redesigning the internet, metaverse, and immersive technologies to cater to the needs of visually challenged sections of society.
Image search enabled through captions enables digital content to be easily searchable without manual effort for these applications, and removes duplication efforts. The CloudFormation template we provided makes it straightforward to deploy this solution to enable image search using Amazon Kendra. A simple architecture of images stored in Amazon S3 and GenAI to create textual descriptions of the images can be used with CDE in Amazon Kendra to power this solution.
This is only one application of GenAI with Amazon Kendra. To dive deeper into how to build GenAI applications with Amazon Kendra, refer to Quickly build high-accuracy Generative AI applications on enterprise data using Amazon Kendra, LangChain, and large language models. For building and scaling GenAI applications, we recommend checking out Amazon Bedrock.

About the Authors
Charalampos Grouzakis is a Data Scientist within AWS Professional Services. He has over 11 years of experience in developing and leading data science, machine learning, and big data initiatives. Currently he is helping enterprise customers modernizing their AI/ML workloads within the cloud using industry best practices. Prior to joining AWS, he was consulting customers in various industries such as Automotive, Manufacturing, Telecommunications, Media & Entertainment, Retail and Financial Services. He is passionate about enabling customers to accelerate their AI/ML journey in the cloud and to drive tangible business outcomes.
Bharathi Srinivasan is a Data Scientist at AWS Professional Services where she loves to build cool things on Sagemaker. She is passionate about driving business value from machine learning applications, with a focus on ethical AI. Outside of building new AI experiences for customers, Bharathi loves to write science fiction and challenge herself with endurance sports.
Jean-Michel Lourier is a Senior Data Scientist within AWS Professional Services. He leads teams implementing data driven applications side by side with AWS customers to generate business value out of their data. He’s passionate about diving into tech and learning about AI, machine learning, and their business applications. He is also an enthusiastic cyclist, taking long bike-packing trips.
Tanvi Singhal is a Data Scientist within AWS Professional Services. Her skills and areas of expertise include data science, machine learning, and big data. She supports customers in developing Machine learning models and MLops solutions within the cloud. Prior to joining AWS, she was also a consultant in various industries such as Transportation Networking, Retail and Financial Services. She is passionate about enabling customers on their data/AI journey to the cloud.
Abhishek Maligehalli Shivalingaiah is a Senior AI Services Solution Architect at AWS with focus on Amazon Kendra. He is passionate about building applications using Amazon Kendra ,Generative AI and NLP. He has around 10 years of experience in building Data & AI solutions to create value for customers and enterprises. He has built a (personal) chatbot for fun to answers questions about his career and professional journey. Outside of work he enjoys making portraits of family & friends, and loves creating artworks.

CMU Researchers Propose a Simple and Effective Attack Method that Caus …

Large language models (LLMs) are recent advances in deep learning models to work on human languages. These deep-learning trained models understand and generate text in a human-like fashion. These models are trained on a huge dataset scraped from the internet, taken from books, articles, websites and other sources of information. They can translate languages, summarize text, answer questions and, perform a wide range of natural language processing tasks.

Recently, there has been a growing concern about their ability to generate objectionable content and the resulting consequences. Thus, significant studies have been conducted in this area.

Subsequently, Researchers from Carnegie Mellon University’s School of Computer Science (SCS), the CyLab Security and Privacy Institute, and the Center for AI Safety in San Francisco have studied generating objectionable behaviors in language models. In their research, they proposed a new attack method that involves adding a suffix to a wide range of queries, resulting in a substantial increase in the likelihood that both open-source and closed-source language models (LLMs) will generate affirmative responses to questions they would typically refuse.

Join the fastest growing ML Community on Reddit
During their investigation, the researchers successfully applied the attack suffix to various language models, including public interfaces like ChatGPT, Bard, and Claude, and open-source LLMs such as LLaMA-2-Chat, Pythia, Falcon, and others. Consequently, the attack suffix effectively induced objectionable content in the outputs of these language models.

This method successfully generated harmful behaviors in 99 out of 100 instances on Vicuna. Additionally, they produced 88 out of 100 exact matches with a target harmful string in Vicuna’s output. The researchers also tested their attack method against other language models, such as GPT-3.5 and GPT-4, achieving up to 84% success rates. For PaLM-2, the success rate was 66%.

The researchers said that, at the moment, the direct harm to people that could be brought about by prompting a chatbot to produce objectionable or toxic content might not be especially severe. The concern is that these models will play a larger role in autonomous systems without human supervision. They further emphasized that as autonomous systems become more of a reality, it will be very important to ensure we have a reliable way to stop them from being hijacked by attacks like these.

The researchers said they didn’t set out to attack proprietary large language models and chatbots. But their research shows that even if we have big trillion parameter closed-source model, people can still attack it by looking at freely available, smaller, and simpler open-sourced models and learning how to attack those.

In their research, the researchers extended their attack method by training the attack suffix on multiple prompts and models. As a result, they induced objectionable content in various public interfaces, including Google Bard and Claud. The attack also affected open-source language models like Llama 2 Chat, Pythia, Falcon, and others, exhibiting objectionable behaviors.

The study demonstrated that their attack approach had broad applicability and could impact various language models, including those with public interfaces and open-source implementations. They further emphasized that we don’t have a method to stop such adversarial attacks right now, so the next step is to figure out how to fix these models.

Check out the Paper and Blog Article. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 27k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

The post CMU Researchers Propose a Simple and Effective Attack Method that Causes Aligned Language Models to Generate Objectionable Behaviors at a High Success Rate appeared first on MarkTechPost.