Enhance Amazon Lex with conversational FAQ features using LLMs

Amazon Lex is a service that allows you to quickly and easily build conversational bots (“chatbots”), virtual agents, and interactive voice response (IVR) systems for applications such as Amazon Connect.
Artificial intelligence (AI) and machine learning (ML) have been a focus for Amazon for over 20 years, and many of the capabilities that customers use with Amazon are driven by ML. Today, large language models (LLMs) are transforming the way developers and enterprises solve historically complex challenges related to natural language understanding (NLU). We announced Amazon Bedrock recently, which democratizes Foundational Model access for developers to easily build and scale generative AI-based applications, using familiar AWS tools and capabilities. One of the challenges enterprises face is to incorporate their business knowledge into LLMs to deliver accurate and relevant responses. When leveraged effectively, enterprise knowledge bases can be used to deliver tailored self-service and assisted-service experiences, by delivering information that helps customers solve problems independently and/or augmenting an agent’s knowledge. Today, a bot developer can improve self-service experiences without utilizing LLMs in a couple of ways. First, by creating intents, sample utterances, and responses, thereby covering all anticipated user questions within an Amazon Lex bot. Second, developers can also integrate bots with search solutions, which can index documents stored across a wide range of repositories and find the most relevant document to answer their customer’s question. These methods are effective, but require developer resources making getting started difficult.
One of the benefits offered by LLMs is the ability to create relevant and compelling conversational self-service experiences. They do so by leveraging enterprise knowledge base(s) and delivering more accurate and contextual responses. This blog post introduces a powerful solution for augmenting Amazon Lex with LLM-based FAQ features using the Retrieval Augmented Generation (RAG). We will review how the RAG approach augments Amazon Lex FAQ responses using your company data sources. In addition, we will also demonstrate Amazon Lex integration with LlamaIndex, which is an open-source data framework that provides knowledge source and format flexibility to the bot developer. As a bot developer gains confidence with using a LlamaIndex to explore LLM integration, they can scale the Amazon Lex capability further. They can also use enterprise search services such as Amazon Kendra, which is natively integrated with Amazon Lex.
In this solution, we showcase the practical application of an Amazon Lex chatbot with LLM-based RAG enhancement. We use the Zappos customer support use case as an example to demonstrate the effectiveness of this solution, which takes the user through an enhanced FAQ experience (with LLM), rather than directing them to fallback (default, without LLM).
Solution overview
RAG combines the strengths of traditional retrieval-based and generative AI based approaches to Q&A systems. This methodology harnesses the power of large language models, such as Amazon Titan or open-source models (for example, Falcon), to perform generative tasks in retrieval systems. It also takes into account the semantic context from stored documents more effectively and efficiently.
RAG starts with an initial retrieval step to retrieve relevant documents from a collection based on the user’s query. It then employs a language model to generate a response by considering both the retrieved documents and the original query. By integrating RAG into Amazon Lex, we can provide accurate and comprehensive answers to user queries, resulting in a more engaging and satisfying user experience.
The RAG approach requires document ingestion so that embeddings can be created to enable LLM-based search. The following diagram shows how the ingestion process creates the embeddings that are then used by the chatbot during fallback to answer the customer’s question.

With this solution architecture, you should choose the most suitable LLM for your use case. It also provides an inference endpoint choice between Amazon Bedrock (in limited preview) and models hosted on Amazon SageMaker JumpStart, offering additional LLM flexibility.
The document is uploaded to an Amazon Simple Storage Service (Amazon S3) bucket. The S3 bucket has an event listener attached that invokes an AWS Lambda function on changes to the bucket. The event listener ingests the new document and places the embeddings in another S3 bucket. The embeddings are then used by the RAG implementation in the Amazon Lex bot during the fallback intent to answer the customer’s question. The next diagram shows the architecture of how an FAQ bot within Lex can be enhanced with LLMs and RAG.
Let’s explore how we can integrate RAG based on LlamaIndex into an Amazon Lex bot. We provide code examples and an AWS Cloud Development Kit (AWS CDK) import to assist you in setting up the integration. You can find the code examples in our GitHub repository. The following sections provide a step-by-step guide to help you set up the environment and deploy the necessary resources.
How RAG works with Amazon Lex
The flow of RAG involves an iterative process where the retriever component retrieves relevant passages, the question and passages help construct the prompt, and the generation component produces a response. This combination of retrieval and generation techniques allows the RAG model to take advantage of the strengths of both approaches, providing accurate and contextually appropriate answers to user questions. The workflow provides the following capabilities:

Retriever engine – The RAG model begins with a retriever component responsible for retrieving relevant documents from a large corpus. This component typically uses an information retrieval technique like TF-IDF or BM25 to rank and select documents that are likely to contain the answer to a given question. The retriever scans the document corpus and retrieves a set of relevant passages.
Prompt helper – After the retriever has identified the relevant passages, the RAG model moves to prompt creation. The prompt is a combination of the question and the retrieved passages, serving as additional context for the prompt, which is used as input to the generator component. To create the prompt, the model typically augments the question with the selected passages in a specific format.
Response generation – The prompt, consisting of the question and relevant passages, is fed into the generation component of the RAG model. The generation component is usually a language model capable of reasoning through the prompt to generate a coherent and relevant response.
Final response – Finally, the RAG model selects the highest-ranked answer as the output and presents it as the response to the original question. The selected answer can be further postprocessed or formatted as necessary before being returned to the user. In addition, the solution enables the filtering of the generated response if the retrieval results yields a low confidence score, implying that it likely falls outside the distribution (OOD).

LlamaIndex: An open-source data framework for LLM-based applications
In this post, we demonstrate the RAG solution based on LlamaIndex. LlamaIndex is an open-source data framework specifically designed to facilitate LLM-based applications. It offers a robust and scalable solution for managing document collection in different formats. With LlamaIndex, bot developers are empowered to effortlessly integrate LLM-based QA (question answering) capabilities into their applications, eliminating the complexities associated with managing solutions catered to large-scale document collections. Furthermore, this approach proves to be cost-effective for smaller-sized document repositories.
Prerequisites
You should have the following prerequisites:

An AWS account
An AWS Identity and Access Management (IAM) user and role permissions to access the following:

Amazon Lex
Lambda
Amazon SageMaker
An S3 bucket

The AWS CDK installed

Set up your development environment
The main third-party package requirements are llama_index and sagemaker sdk. Follow the specified commands in our GitHub repository’s README to set up your environment properly.
Deploy the required resources
This step involves creating an Amazon Lex bot, S3 buckets, and a SageMaker endpoint. Additionally, you need to Dockerize the code in the Docker image directory and push the images to Amazon Elastic Container Registry (Amazon ECR) so that it can run in Lambda. Follow the specified commands in our GitHub repository’s README to deploy the services.
During this step, we demonstrate LLM hosting via SageMaker Deep Learning Containers. Adjust the settings according to your computation needs:

Model – To find a model that meets your requirements, you can explore resources like the Hugging Face model hub. It offers a variety of models such as Falcon 7B or Flan-T5-XXL. Additionally, you can find detailed information about various officially supported model architectures, helping you make an informed decision. For more information about different model types, refer to optimized architectures.
Model inference endpoint – Define the path of the model (for example, Falcon 7B), choose your instance type (for example, g5.4xlarge), and use quantization (for example, int-8 quantization).Note: This solution provides you the flexibility to choose another model inferencing endpoint. You can also use Amazon Bedrock, which provides access to other LLMs such as Amazon Titan.Note: This solution provides you the flexibility to choose another model inferencing endpoint. You can also use Amazon Bedrock, which provides access to other LLMs such as Amazon Titan.

Set up your document index via LlamaIndex
To set up your document index, first upload your document data. We assume that you have the source of your FAQ content, such as a PDF or text file.
After the document data is uploaded, the LlamaIndex system will automatically initiate the process of creating the document index. This task is performed by a Lambda function, which generates the index and saves it to an S3 bucket.
To enable efficient retrieval of relevant information, configure the document retriever using the LlamaIndex Retriever Query Engine. This engine offers several customization options, such as the following:

Embedding models – You can choose your embedding model, such as Hugging Face embedding.
Confidence cutoff – Specify a confidence cutoff threshold to determine the quality of retrieval results. If the confidence score falls below this threshold, you can choose to provide out-of-scope responses, indicating that the query is beyond the scope of the indexed documents.

Test the integration
Define your bot definition with a fallback intent and use the Amazon Lex console to test your FAQ requests. For more details, please refer to GitHub repository. The following screenshot shows an example conversation with the bot.

Tips to boost your bot efficiency
The following tips could potentially further improve the efficiency of your bot:

Index storage – Store your index in an S3 bucket or a service with vector database capabilities such as Amazon OpenSearch. By utilizing cloud-based storage solutions, you can enhance the accessibility and scalability of your index, leading to faster retrieval times and improved overall performance. Also, Refer to this blog post for an Amazon Lex bot that utilizes an Amazon Kendra search solution.
Retrieval optimization – Experiment with different sizes of embedding models for the retriever. The choice of embedding model can significantly impact the input requirements of your LLM. Finding the optimal balance between model size and retrieval performance can result in improved efficiency and faster response times.
Prompt engineering – Experiment with different prompt formats, lengths, and styles to optimize the performance and quality of your bot’s answers.
LLM model selection – Select the most suitable LLM model for your specific use case. Consider factors such as model size, language capabilities, and compatibility with your application requirements. Choosing the right LLM model ensures optimal performance and efficient utilization of system resources.

Contact center conversations can span from self-service to a live human interaction. For use cases involving human-to-human interactions over Amazon Connect, you can use Wisdom to search and find content across multiple repositories, such as frequently asked questions (FAQs), wikis, articles, and step-by-step instructions for handling different customer issues.
Clean up
To avoid incurring future expenses, proceed with deleting all the resources that were deployed as part of this exercise. We have provided a script to shut down the SageMaker endpoint gracefully. Usage details are in the README. Additionally, to remove all the other resources you can run cdk destroy in the same directory as the other cdk commands to deprovision all the resources in your stack.
Summary
This post discussed the following steps to enhance Amazon Lex with LLM-based QA features using the RAG strategy and LlamaIndex:

Install the necessary dependencies, including LlamaIndex libraries
Set up model hosting via Amazon SageMaker or Amazon Bedrock (in limited preview)
Configure LlamaIndex by creating an index and populating it with relevant documents
Integrate RAG into Amazon Lex by modifying the configuration and configuring RAG to use LlamaIndex for document retrieval
Test the integration by engaging in conversations with the chatbot and observing its retrieval and generation of accurate responses

By following these steps, you can seamlessly incorporate powerful LLM-based QA capabilities and efficient document indexing into your Amazon Lex chatbot, resulting in more accurate, comprehensive, and contextually aware interactions with users. As a follow up, we also invite you to review our next blog post, which explores enhancing the Amazon Lex FAQ experience using URL ingestion and LLMs.

About the authors
Max Henkel-Wallace is a Software Development Engineer at AWS Lex. He enjoys working leveraging technology to maximize customer success. Outside of work he is passionate about cooking, spending time with friends, and backpacking.
Song Feng is a Senior Applied Scientist at AWS AI Labs, specializing in Natural Language Processing and Artificial Intelligence. Her research explores various aspects of these fields including document-grounded dialogue modeling, reasoning for task-oriented dialogues, and interactive text generation using multimodal data.
Saket Saurabh is an engineer with AWS Lex team. He works on improving Lex developer experience to help developers build more human-like chat bots. Outside of work, he enjoys traveling, discovering diverse cuisines, and learn about different cultures.
f

Enhance Amazon Lex with LLMs and improve the FAQ experience using URL …

In today’s digital world, most consumers would rather find answers to their customer service questions on their own rather than taking the time to reach out to businesses and/or service providers. This blog post explores an innovative solution to build a question and answer chatbot in Amazon Lex that uses existing FAQs from your website. This AI-powered tool can provide quick, accurate responses to real-world inquiries, allowing the customer to quickly and easily solve common problems independently.
Single URL ingestion
Many enterprises have a published set of answers for FAQs for their customers available on their website. In this case, we want to offer customers a chatbot that can answer their questions from our published FAQs. In the blog post titled Enhance Amazon Lex with conversational FAQ features using LLMs, we demonstrated how you can use a combination of Amazon Lex and LlamaIndex to build a chatbot powered by your existing knowledge sources, such as PDF or Word documents. To support a simple FAQ, based on a website of FAQs, we need to create an ingestion process that can crawl the website and create embeddings that can be used by LlamaIndex to answer customer questions. In this case, we will build on the bot created in the previous blog post, which queries those embeddings with a user’s utterance and returns the answer from the website FAQs.
The following diagram shows how the ingestion process and the Amazon Lex bot work together for our solution.

In the solution workflow, the website with FAQs is ingested via AWS Lambda. This Lambda function crawls the website and stores the resulting text in an Amazon Simple Storage Service (Amazon S3) bucket. The S3 bucket then triggers a Lambda function that uses LlamaIndex to create embeddings that are stored in Amazon S3. When a question from an end-user arrives, such as “What is your return policy?”, the Amazon Lex bot uses its Lambda function to query the embeddings using a RAG-based approach with LlamaIndex. For more information about this approach and the pre-requisites, refer to the blog post, Enhance Amazon Lex with conversational FAQ features using LLMs.
After the pre-requisites from the aforementioned blog are complete, the first step is to ingest the FAQs into a document repository that can be vectorized and indexed by LlamaIndex. The following code shows how to accomplish this:

import logging
import sys
import requests
import html2text
from llama_index.readers.schema.base import Document
from llama_index import GPTVectorStoreIndex
from typing import List

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

class EZWebLoader:

def __init__(self, default_header: str = None):
self._html_to_text_parser = html2text()
if default_header is None:
self._default_header = {“User-agent”:”Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36″}
else:
self._default_header = default_header

def load_data(self, urls: List[str], headers: str = None) -> List[Document]:
if headers is None:
headers = self._default_header

documents = []
for url in urls:
response = requests.get(url, headers=headers).text
response = self._html2text.html2text(response)
documents.append(Document(response))
return documents

url = “http://www.zappos.com/general-questions”
loader = EZWebLoader()
documents = loader.load_data([url])
index = GPTVectorStoreIndex.from_documents(documents)

In the preceding example, we take a predefined FAQ website URL from Zappos and ingest it using the EZWebLoader class. With this class, we have navigated to the URL and loaded all the questions that are in the page into an index. We can now ask a question like “Does Zappos have gift cards?” and get the answers directly from our FAQs on the website. The following screenshot shows the Amazon Lex bot test console answering that question from the FAQs.

We were able to achieve this because we had crawled the URL in the first step and created embedddings that LlamaIndex could use to search for the answer to our question. Our bot’s Lambda function shows how this search is run whenever the fallback intent is returned:

import time
import json
import os
import logging
import boto3
from llama_index import StorageContext, load_index_from_storage

logger = logging.getLogger()
logger.setLevel(logging.DEBUG)

def download_docstore():
# Create an S3 client
s3 = boto3.client(‘s3’)

# List all objects in the S3 bucket and download each one
try:
bucket_name = ‘faq-bot-storage-001’
s3_response = s3.list_objects_v2(Bucket=bucket_name)

if ‘Contents’ in s3_response:
for item in s3_response[‘Contents’]:
file_name = item[‘Key’]
logger.debug(“Downloading to /tmp/” + file_name)
s3.download_file(bucket_name, file_name, ‘/tmp/’ + file_name)

logger.debug(‘All files downloaded from S3 and written to local filesystem.’)

except Exception as e:
logger.error(e)
raise e

#download the doc store locally
download_docstore()

storage_context = StorageContext.from_defaults(persist_dir=”/tmp/”)
# load index
index = load_index_from_storage(storage_context)
query_engine = index.as_query_engine()

def lambda_handler(event, context):
“””
Route the incoming request based on intent.
The JSON body of the request is provided in the event slot.
“””
# By default, treat the user request as coming from the America/New_York time zone.
os.environ[‘TZ’] = ‘America/New_York’
time.tzset()
logger.debug(“===== START LEX FULFILLMENT ====”)
logger.debug(event)
slots = {}
if “currentIntent” in event and “slots” in event[“currentIntent”]:
slots = event[“currentIntent”][“slots”]
intent = event[“sessionState”][“intent”]

dialogaction = {“type”: “Delegate”}
message = []
if str.lower(intent[“name”]) == “fallbackintent”:
#execute query from the input given by the user
response = str.strip(query_engine.query(event[“inputTranscript”]).response)
dialogaction[“type”] = “Close”
message.append({‘content’: f'{response}’, ‘contentType’: ‘PlainText’})

final_response = {
“sessionState”: {
“dialogAction”: dialogaction,
“intent”: intent
},
“messages”: message
}

logger.debug(json.dumps(final_response, indent=1))
logger.debug(“===== END LEX FULFILLMENT ====”)

return final_response

This solution works well when a single webpage has all the answers. However, most FAQ sites are not built on a single page. For instance, in our Zappos example, if we ask the question “Do you have a price matching policy?”, then we get a less-than-satisfactory answer, as shown in the following screenshot.

In the preceding interaction, the price-matching policy answer isn’t helpful for our user. This answer is short because the FAQ referenced is a link to a specific page about the price matching policy and our web crawl was only for the single page. Achieving better answers will mean crawling these links as well. The next section shows how to get answers to questions that require two or more levels of page depth.
N-level crawling
When we crawl a web page for FAQ knowledge, the information we want can be contained in linked pages. For example, in our Zappos example, we ask the question “Do you have a price matching policy?” and the answer is “Yes please visit <link> to learn more.” If someone asks “What is your price matching policy?” then we want to give a complete answer with the policy. Achieving this means we have the need to traverse links to get the actual information for our end-user. During the ingestion process, we can use our web loader to find the anchor links to other HTML pages and then traverse them. The following code change to our web crawler allows us to find links in the pages we crawl. It also includes some additional logic to avoid circular crawling and allow a filter by a prefix.

import logging
import requests
import html2text
from llama_index.readers.schema.base import Document
from typing import List
import re

def find_http_urls_in_parentheses(s: str, prefix: str = None):
pattern = r'((https?://[^)]+))’
urls = re.findall(pattern, s)

matched = []
if prefix is not None:
for url in urls:
if str(url).startswith(prefix):
matched.append(url)
else:
matched = urls

return list(set(matched)) # remove duplicates by converting to set, then convert back to list

class EZWebLoader:

def __init__(self, default_header: str = None):
self._html_to_text_parser = html2text
if default_header is None:
self._default_header = {“User-agent”:”Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36″}
else:
self._default_header = default_header

def load_data(self,
urls: List[str],
num_levels: int = 0,
level_prefix: str = None,
headers: str = None) -> List[Document]:

logging.info(f”Number of urls: {len(urls)}.”)

if headers is None:
headers = self._default_header

documents = []
visited = {}
for url in urls:
q = [url]
depth = num_levels
for page in q:
if page not in visited: #prevent cycles by checking to see if we already crawled a link
logging.info(f”Crawling {page}”)
visited[page] = True #add entry to visited to prevent re-crawling pages
response = requests.get(page, headers=headers).text
response = self._html_to_text_parser.html2text(response) #reduce html to text
documents.append(Document(response))
if depth > 0:
#crawl linked pages
ingest_urls = find_http_urls_in_parentheses(response, level_prefix)
logging.info(f”Found {len(ingest_urls)} pages to crawl.”)
q.extend(ingest_urls)
depth -= 1 #reduce the depth counter so we go only num_levels deep in our crawl
else:
logging.info(f”Skipping {page} as it has already been crawled”)
logging.info(f”Number of documents: {len(documents)}.”)
return documents

url = “http://www.zappos.com/general-questions”
loader = EZWebLoader()
#crawl the site with 1 level depth and prefix of “/c/” for customer service root
documents = loader.load_data([url]
num_levels=1, level_prefix=”https://www.zappos.com/c/”)
index = GPTVectorStoreIndex.from_documents(documents)

In the preceding code, we introduce the ability to crawl N levels deep, and we give a prefix that allows us to restrict crawling to only things that begin with a certain URL pattern. In our Zappos example, the customer service pages all are rooted from zappos.com/c, so we include that as a prefix to limit our crawls to a smaller and more relevant subset. The code shows how we can ingest up to two levels deep. Our bot’s Lambda logic remains the same because nothing has changed except the crawler ingests more documents.
We now have all the documents indexed and we can ask a more detailed question. In the following screenshot, our bot provides the correct answer to the question “Do you have a price matching policy?”

We now have a complete answer to our question about price matching. Instead of simply being told “Yes see our policy,” it gives us the details from the second-level crawl.
Clean up
To avoid incurring future expenses, proceed with deleting all the resources that were deployed as part of this exercise. We have provided a script to shut down the Sagemaker endpoint gracefully. Usage details are in the README. Additionally, to remove all the other resources you can run cdk destroy in the same directory as the other cdk commands to deprovision all the resources in your stack.
Conclusion
The ability to ingest a set of FAQs into a chatbot enables your customers to find the answers to their questions with straightforward, natural language queries. By combining the built-in support in Amazon Lex for fallback handling with a RAG solution such as a LlamaIndex, we can provide a quick path for our customers to get satisfying, curated, and approved answers to FAQs. By applying N-level crawling into our solution, we can allow for answers that could possibly span multiple FAQ links and provide deeper answers to our customer’s queries. By following these steps, you can seamlessly incorporate powerful LLM-based Q and A capabilities and efficient URL ingestion into your Amazon Lex chatbot. This results in more accurate, comprehensive, and contextually aware interactions with users.

About the authors
Max Henkel-Wallace is a Software Development Engineer at AWS Lex. He enjoys working leveraging technology to maximize customer success. Outside of work he is passionate about cooking, spending time with friends, and backpacking.
Song Feng is a Senior Applied Scientist at AWS AI Labs, specializing in Natural Language Processing and Artificial Intelligence. Her research explores various aspects of these fields including document-grounded dialogue modeling, reasoning for task-oriented dialogues, and interactive text generation using multimodal data.
John Baker is a Principal SDE at AWS where he works on Natural Language Processing, Large Language Models and other ML/AI related projects. He has been with Amazon for 9+ years and has worked across AWS, Alexa and Amazon.com. In his spare time, John enjoys skiing and other outdoor activities throughout the Pacific Northwest.

Build an email spam detector using Amazon SageMaker

Spam emails, also known as junk mail, are sent to a large number of users at once and often contain scams, phishing content, or cryptic messages. Spam emails are sometimes sent manually by a human, but most often they are sent using a bot. Examples of spam emails include fake ads, chain emails, and impersonation attempts. There is a risk that a particularly well-disguised spam email may land in your inbox, which can be dangerous if clicked on. It’s important to take extra precautions to protect your device and sensitive information.
As technology is improving, the detection of spam emails becomes a challenging task due to its changing nature. Spam is quite different from other types of security threats. It may at first appear like an annoying message and not a threat, but it has an immediate effect. Also spammers often adapt new techniques. Organizations who provide email services want to minimize spam as much as possible to avoid any damage to their end customers.
In this post, we show how straightforward it is to build an email spam detector using Amazon SageMaker. The built-in BlazingText algorithm offers optimized implementations of Word2vec and text classification algorithms. Word2vec is useful for various natural language processing (NLP) tasks, such as sentiment analysis, named entity recognition, and machine translation. Text classification is essential for applications like web searches, information retrieval, ranking, and document classification.
Solution overview
This post demonstrates how you can set up email spam detector and filter spam emails using SageMaker. Let’s see how a spam detector typically works, as shown in the following diagram.

Emails are sent through a spam detector. An email is sent to the spam folder if the spam detector detects it as spam. Otherwise, it’s sent to the customer’s inbox.
We walk you through the following steps to set up our spam detector model:

Download the sample dataset from the GitHub repo.
Load the data in an Amazon SageMaker Studio notebook.
Prepare the data for the model.
Train, deploy, and test the model.

Prerequisites
Before diving into this use case, complete the following prerequisites:

Set up an AWS account.
Set up a SageMaker domain.
Create an Amazon Simple Storage Service (Amazon S3) bucket. For instructions, see Create your first S3 bucket.

Download the dataset
Download the email_dataset.csv from GitHub and upload the file to the S3 bucket.
The BlazingText algorithm expects a single preprocessed text file with space-separated tokens. Each line in the file should contain a single sentence. If you need to train on multiple text files, concatenate them into one file and upload the file in the respective channel.
Load the data in SageMaker Studio
To perform the data load, complete the following steps:

Download the spam_detector.ipynb file from GitHub and upload the file in SageMaker Studio.
In your Studio notebook, open the spam_detector.ipynb notebook.
If you are prompted to choose a Kernel, choose the Python 3 (Data Science 3.0) kernel and choose Select. If not, verify that the right kernel has been automatically selected.

Import the required Python library and set the roles and the S3 buckets. Specify the S3 bucket and prefix where you uploaded email_dataset.csv.

Run the data load step in the notebook.

Check if the dataset is balanced or not based on the Category labels.

We can see our dataset is balanced.
Prepare the data
The BlazingText algorithm expects the data in the following format:

__label__<label> “<features>”

Here’s an example:

__label__0 “This is HAM”
__label__1 “This is SPAM”

Check Training and Validation Data Format for the BlazingText Algorithm.
You now run the data preparation step in the notebook.

First, you need to convert the Category column to an integer. The following cell replaces the SPAM value with 1 and the HAM value with 0.

The next cell adds the prefix __label__ to each Category value and tokenizes the Message column.

The next step is to split the dataset into train and validation datasets and upload the files to the S3 bucket.

Train the model
To train the model, complete the following steps in the notebook:

Set up the BlazingText estimator and create an estimator instance passing the container image.

Set the learning mode hyperparameter to supervised.

BlazingText has both unsupervised and supervised learning modes. Our use case is text classification, which is supervised learning.

Create the train and validation data channels.

Start training the model.

Get the accuracy of the train and validation dataset.

Deploy the model
In this step, we deploy the trained model as an endpoint. Choose your preferred instance

Test the model
Let’s provide an example of three email messages that we want to get predictions for:

Click on below link, provide your details and win this award
Best summer deal here
See you in the office on Friday.

Tokenize the email message and specify the payload to use when calling the REST API.

Now we can predict the email classification for each email. Call the predict method of the text classifier, passing the tokenized sentence instances (payload) into the data argument.

Clean up
Finally , you can delete the endpoint to avoid any unexpected cost.

Also, delete the data file from S3 bucket.
Conclusion
In this post, we walked you through the steps to create an email spam detector using the SageMaker BlazingText algorithm. With the BlazingText algorithm, you can scale to large datasets. BlazingText is used for textual analysis and text classification problems, and has both unsupervised and supervised learning modes. You can use the algorithm for use cases like customer sentiment analysis and text classification.
To learn more about the BlazingText algorithm, check out BlazingText algorithm.

About the Author

Dhiraj Thakur is a Solutions Architect with Amazon Web Services. He works with AWS customers and partners to provide guidance on enterprise cloud adoption, migration, and strategy. He is passionate about technology and enjoys building and experimenting in the analytics and AI/ML space.

MIT Researchers Achieve a Breakthrough in Privacy Protection for Machi …

MIT researchers have made significant progress in addressing the challenge of protecting sensitive data encoded within machine-learning models. A team of scientists has developed a machine-learning model that can accurately predict whether a patient has cancer from lung scan images. However, sharing the model with hospitals worldwide poses a significant risk of potential data extraction by malicious agents. To address this issue, the researchers have introduced a novel privacy metric called Probably Approximately Correct (PAC) Privacy, along with a framework that determines the minimal amount of noise required to protect sensitive data.

Conventional privacy approaches, such as Differential Privacy, focus on preventing an adversary from distinguishing the usage of specific data by adding enormous amounts of noise, which reduces the model’s accuracy. PAC Privacy takes a different perspective by evaluating an adversary’s difficulty in reconstructing parts of the sensitive data even after the noise has been added. For instance, if the sensitive data are human faces, differential privacy would prevent the adversary from determining if a specific individual’s face was in the dataset. In contrast, PAC Privacy explores whether an adversary could extract an approximate silhouette that could be recognized as a particular individual’s face.

To implement PAC Privacy, the researchers developed an algorithm that determines the optimal amount of noise to be added to a model, guaranteeing privacy even against adversaries with infinite computing power. The algorithm relies on the uncertainty or entropy of the original data from the adversary’s perspective. By subsampling data and running the machine-learning training algorithm multiple times, the algorithm compares the variance across different outputs to determine the necessary amount of noise. A smaller variance indicates that less noise is required.

One of the key advantages of the PAC Privacy algorithm is that it doesn’t require knowledge of the model’s inner workings or the training process. Users can specify their desired confidence level regarding the adversary’s ability to reconstruct the sensitive data, and the algorithm provides the optimal amount of noise to achieve that goal. However, it’s important to note that the algorithm does not estimate the loss of accuracy resulting from adding noise to the model. Furthermore, implementing PAC Privacy can be computationally expensive due to the repeated training of machine-learning models on various subsampled datasets.

To enhance PAC Privacy, researchers suggest modifying the machine-learning training process to increase stability, which reduces the variance between subsample outputs. This approach would reduce the algorithm’s computational burden and minimize the amount of noise needed. Additionally, more stable models often exhibit lower generalization errors, leading to more accurate predictions on new data.

While the researchers acknowledge the need for further exploration of the relationship between stability, privacy, and generalization error, their work presents a promising step forward in protecting sensitive data in machine-learning models. By leveraging PAC Privacy, engineers can develop models that safeguard training data while maintaining accuracy in real-world applications. With the potential for significantly reducing the amount of noise required, this technique opens up new possibilities for secure data sharing in the healthcare domain and beyond.

Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 26k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

Check Out 800+ AI Tools in AI Tools Club

The post MIT Researchers Achieve a Breakthrough in Privacy Protection for Machine Learning Models with Probably Approximately Correct (PAC) Privacy appeared first on MarkTechPost.

Top Computer Vision Tools/Platforms in 2023

Computer vision enables computers and systems to extract useful information from digital photos, videos, and other visual inputs and to conduct actions or offer recommendations in response to that information. Computer vision gives machines the ability to perceive, observe, and understand, much like artificial intelligence gives them the capacity to think.

Human vision has an advantage over computer vision because it has been around longer. With a lifetime of context, human sight has the advantage of learning how to distinguish between things, determine their distance from the viewer, determine whether they are moving, and determine whether an image is correct.

With cameras, data, and algorithms instead of retinas, optic nerves, and the visual cortex, computer vision teaches computers to execute similar tasks in much less time. A system trained to inspect items or monitor a production asset can swiftly outperform humans since it can examine thousands of products or processes per minute while spotting imperceptible flaws or problems.

Energy, utilities, manufacturing, and the automobile industries all use computer vision, and the market is still expanding.

A few typical jobs that computer vision systems can be utilized for are as follows:

Classification of objects. The system analyzes visual data before categorizing an object in a photo or video under a predetermined heading. The algorithm, for instance, can identify a dog among all the items in the image.

Identification of the item. The system analyzes visual data and recognizes a specific object in a picture or video. For instance, the algorithm may pick out a particular dog from the group of dogs in the image.

Tracking of objects. The system analyzes video, identifies the object (or objects) that satisfy the search criteria, and follows that object’s progress.

Top Computer Vision Tools

Kili Technology’s Video Annotation Tool

Kili Technology’s video annotation tool is designed to simplify and accelerate the creation of high-quality datasets from video files. The tool supports a variety of labeling tools, including bounding boxes, polygons, and segmentation, allowing for precise annotation. With advanced tracking capabilities, you can easily navigate through frames and review all your labels in an intuitive Explore view.

The tool supports various video formats and integrates seamlessly with popular cloud storage providers, ensuring a smooth integration with your existing machine learning pipeline. Kili Technology’s video annotation tool is the ultimate toolkit for optimizing your labeling processes and constructing powerful datasets.

OpenCV

A software library for machine learning and computer vision is called OpenCV. OpenCV, developed to offer a standard infrastructure for computer vision applications, gives users access to more than 2,500 traditional and cutting-edge algorithms.

These algorithms may be used to identify faces, remove red eyes, identify objects, extract 3D models of objects, track moving objects, and stitch together numerous frames into a high-resolution image, among other things.

Viso Suite

A complete platform for computer vision development, deployment, and monitoring, Viso Suite enables enterprises to create practical computer vision applications. The best-in-class software stack for computer vision, which is the foundation of the no-code platform, includes CVAT, OpenCV, OpenVINO, TensorFlow, or PyTorch.

Image annotation, model training, model management, no-code application development, device management, IoT communication, and bespoke dashboards are just a few of the 15 components that make up Viso Suite. Businesses and governmental bodies worldwide use Viso Suite to create and manage their portfolio of computer vision applications (for industrial automation, visual inspection, remote monitoring, and more).

TensorFlow

TensorFlow is one of the most well-known end-to-end open-source machine learning platforms, which offers a vast array of tools, resources, and frameworks. TensorFlow is beneficial for developing and implementing machine learning-based computer vision applications.

One of the most straightforward computer vision tools, TensorFlow, enables users to create machine learning models for computer vision-related tasks like facial recognition, picture categorization, object identification, and more. Like OpenCV, Tensorflow supports several languages, including Python, C, C++, Java, and JavaScript.

CUDA

NVIDIA created the parallel computing platform and application programming interface (API) model called CUDA (short for Compute Unified Device Architecture). It enables programmers to speed up processing-intensive programs by utilizing the capabilities of GPUs (Graphics Processing Units).

The NVIDIA Performance Primitives (NPP) library, which offers GPU-accelerated image, video, and signal processing operations for various domains, including computer vision, is part of the toolkit. In addition, multiple applications like face recognition, image editing, rendering 3D graphics, and others benefit from the CUDA architecture. For Edge AI implementations, real-time image processing with Nvidia CUDA is available, enabling on-device AI inference on edge devices like the Jetson TX2.

MATLAB

Image, video, and signal processing, deep learning, machine learning, and other applications can all benefit from the programming environment MATLAB. It includes a computer vision toolbox with numerous features, applications, and algorithms to assist you in creating remedies for computer vision-related problems.

Keras

A Python-based open-source software package called Keras serves as an interface for the TensorFlow framework for machine learning. It is especially appropriate for novices because it enables speedy neural network model construction while offering backend help.

SimpleCV

SimpleCV is a set of open-source libraries and software that makes it simple to create machine vision applications. Its framework gives you access to several powerful computer vision libraries, like OpenCV, without requiring a thorough understanding of complex ideas like bit depths, color schemes, buffer management, or file formats. Python-based SimpleCV can run on various platforms, including Mac, Windows, and Linux.

BoofCV

The Java-based computer vision program BoofCV was explicitly created for real-time computer vision applications. It is a comprehensive library with all the fundamental and sophisticated capabilities needed to develop a computer vision application. It is open-source and distributed under the Apache 2.0 license, making it available for both commercial and academic use without charge.

CAFFE

Convolutional Architecture for Fast Feature, or CAFFE A computer vision and deep learning framework called embedding was created at the University of California, Berkeley. This framework supported a variety of deep learning architectures for picture segmentation and classification and was made in the C++ programming language. Due to its incredible speed and image processing capabilities, it is beneficial for research and industry implementation.

OpenVINO

A comprehensive computer vision tool, OpenVINO (Open Visual Inference and Neural Network Optimization), helps create software that simulates human vision. It is a free cross-platform toolkit designed by Intel. Models for numerous tasks, including object identification, face recognition, colorization, movement recognition, and others, are included in the OpenVINO toolbox.

DeepFace

The most well-liked open-source computer vision library for deep learning facial recognition at the moment is DeepFace. The library provides a simple method for using Python to carry out face recognition-based computer vision.

YOLO

One of the fastest computer vision tools in 2022 is You Only Look Once (YOLO). It was created in 2016 by Joseph Redmon and Ali Farhadi to be used for real-time object detection. YOLO, the fastest object detection tool available, applies a neural network to the entire image and then divides it into grids. The odds of each grid are then predicted by the software concurrently. After the hugely successful YOLOv3 and YOLOv4, YOLOR had the best performance up until YOLOv7, published in 2022, overtook it.

FastCV

FastCV is an open-source image processing, machine learning, and computer vision library. It includes numerous cutting-edge computer vision algorithms along with examples and demos. As a pure Java library with no external dependencies, FastCV’s API ought to be very easy to understand. It is, therefore, perfect for novices or students who want to swiftly include computer vision into their ideas and prototypes.

To easily integrate computer vision functionality into our mobile apps and games, the company also integrated FastCV on Android.

Scikit-image

One of the best open-source computer vision tools for processing images in Python is the Scikit-image module. Scikit-image allows you to conduct simple operations like thresholding, edge detection, and color space conversions.

Although it’s not a program you’ll use frequently, it has several practical uses. For instance, with a bit of setup, you could use scikit-image on your camera to snap a picture using infrared light or find watermarks on photos. These are only a few examples of what scikit-image can be used for. If all else fails, image manipulation is an option.

References:

https://xd.adobe.com/ideas/principles/emerging-technology/what-is-computer-vision-how-does-it-work/

https://www.ibm.com/in-en/topics/computer-vision

https://viso.ai/computer-vision/the-most-popular-computer-vision-tools/

Top 20 Popular Computer Vision Tools: Ultimate Guide

https://neptune.ai/blog/top-tools-to-run-a-computer-vision-project

The post Top Computer Vision Tools/Platforms in 2023 appeared first on MarkTechPost.

Meet PolyLM (Polyglot Large Language Model): An Open Source Multilingu …

With the recent introduction of Large Language Models (LLMs), its versatility and capabilities have drawn everyone’s interest in the Artificial Intelligence sector. These models have been trained on massive amounts of data and possess some brilliant human-imitating abilities in understanding, reasoning, and generating text based on natural language instructions. Having good performance in zero-shot and few-shot tasks, these models can handle unforeseen challenges based on instructions given in natural language by being fine-tuned on various sets of tasks.  

Current LLMs and their development focus on English and resource-rich languages. Most of the existing LLMs have been specifically designed and trained for the English language, resulting in a predominant bias towards English in the research and development of these models. To address this limitation, a team of researchers from DAMO Academy and Alibaba Group have proposed a multilingual LLM called POLYLM (Polyglot Large Language Model). Unlike existing multilingual LLMs that lack a 13B model, the team has released POLYLM-13B and POLYLM-1.7B to facilitate usage.

POLYLM has been built using a massive dataset of 640B tokens from publically accessible sources, including Wikipedia, mC4, and CC-100. The team has also suggested a curricular learning technique to address the issue of insufficient data for low-resource languages. This method involves gradually increasing the ratio of high-quality, low-resource languages during training while initially focusing more on English. Focus has been made on transferring general knowledge from English to other languages.

The team has also developed MULTIALPACA, a multilingual instruction dataset, for the supervised fine-tuning (SFT) phase. Existing multilingual SFT datasets are either obtained through manual annotation, which is time-consuming and expensive, or through machine translation, which may result in translation errors and lacks cultural nuances. This multilingual self-instruct approach automatically provides high-quality multilingual instruction data to overcome these restrictions and makes use of English seeds, translations into many languages, instruction production, and filtering systems.

For evaluation and to assess the multilingual capabilities of LLMs, the team has developed a benchmark derived from existing multilingual tasks, including question answering, language understanding, text generation, and cross-lingual machine translation. The benchmark has been developed with meticulous prompting and covers ten tasks across 15 languages. The team has demonstrated through extensive experiments that their pretrained model outperforms open-source models of comparable size in non-English languages. The proposed curriculum training strategy improves multilingual performance while maintaining English proficiency. The use of multilingual instruction data also significantly enhances POLYLM’s ability to tackle multilingual zero-shot tasks.

The team has summarized the contributions as follows.

A proficient 13B scale model has been performed that performs well in major non-English languages like Spanish, Russian, Arabic, Japanese, Korean, Thai, Indonesian, and Chinese. This model complements existing open-source models that either lack proficiency in these languages or have smaller versions without the same capabilities.

An advanced curriculum learning approach has been proposed that facilitates the transfer of general knowledge, mainly acquired in English, to diverse non-English languages and specific natural language processing tasks, such as machine translation.

A dataset called MULTIALPACA has been proposed that complements existing instruction datasets, allowing LLMs to better follow multilingual instructions, particularly from non-native English speakers.

Check out the Paper and Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 26k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

Check Out 800+ AI Tools in AI Tools Club

The post Meet PolyLM (Polyglot Large Language Model): An Open Source Multilingual LLM trained on 640B Tokens, Available In Two Model Sizes 1.7B and 13B appeared first on MarkTechPost.

Configure cross-account access of Amazon Redshift clusters in Amazon S …

With cloud computing, as compute power and data became more available, machine learning (ML) is now making an impact across every industry and is a core part of every business and industry.
Amazon SageMaker Studio is the first fully integrated ML development environment (IDE) with a web-based visual interface. You can perform all ML development steps and have complete access, control, and visibility into each step required to build, train, and deploy models.
Amazon Redshift is a fully managed, fast, secure, and scalable cloud data warehouse. Organizations often want to use SageMaker Studio to get predictions from data stored in a data warehouse such as Amazon Redshift.
As described in the AWS Well-Architected Framework, separating workloads across accounts enables your organization to set common guardrails while isolating environments. This can be particularly useful for certain security requirements, as well as to simplify cost controls and monitoring between projects and teams. Organizations with a multi-account architecture typically have Amazon Redshift and SageMaker Studio in two separate AWS accounts. Also, Amazon Redshift and SageMaker Studio are typically configured in VPCs with private subnets to improve security and reduce the risk of unauthorized access as a best practice.
Amazon Redshift natively supports cross-account data sharing when RA3 node types are used. If you’re using any other Amazon Redshift node types, such as DS2 or DC2, you can use VPC peering to establish a cross-account connection between Amazon Redshift and SageMaker Studio.
In this post, we walk through step-by-step instructions to establish a cross-account connection to any Amazon Redshift node type (RA3, DC2, DS2) by connecting the Amazon Redshift cluster located in one AWS account to SageMaker Studio in another AWS account in the same Region using VPC peering.
Solution overview
We start with two AWS accounts: a producer account with the Amazon Redshift data warehouse, and a consumer account for Amazon SageMaker ML use cases that has SageMaker Studio set up. The following is a high-level overview of the workflow:

Set up SageMaker Studio with VPCOnly mode in the consumer account. This prevents SageMaker from providing internet access to your studio notebooks. All SageMaker Studio traffic is through the specified VPC and subnets.
Update your SageMaker Studio domain to turn on SourceIdentity to propagate the user profile name.
Create an AWS Identity and Access Management (IAM) role in the Amazon Redshift producer account that the SageMaker Studio IAM role will assume to access Amazon Redshift.
Update the SageMaker IAM execution role in the SageMaker Studio consumer account that SageMaker Studio will use to assume the role in the producer Amazon Redshift account.
Set up a peering connection between VPCs in the Amazon Redshift producer account and SageMaker Studio consumer account.
Query Amazon Redshift in SageMaker Studio in the consumer account.

The following diagram illustrates our solution architecture.

Prerequisites
The steps in this post assume that Amazon Redshift is launched in a private subnet in the Amazon Redshift producer account. Launching Amazon Redshift in a private subnet provides an additional layer of security and isolation compared to launching it in a public subnet because the private subnet is not directly accessible from the internet and more secure from external attacks.
To download public libraries, you must create a VPC and a private and public subnet in the SageMaker consumer account. Then launch a NAT gateway in the public subnet and add an internet gateway for SageMaker Studio in the private subnet to access the internet. For instructions on how to establish a connection to a private subnet, refer to How do I set up a NAT gateway for a private subnet in Amazon VPC?
Set up SageMaker Studio with VPCOnly mode in the consumer account
To create SageMaker Studio with VPCOnly mode, complete the following steps:

On the SageMaker console, choose Studio in the navigation pane.
Launch SageMaker Studio, choose Standard setup, and choose Configure.

If you’re already using AWS IAM Identity Center (successor to AWS Single Sign-On) for accessing your AWS accounts, you can use it for authentication. Otherwise, you can use IAM for authentication and use your existing federated roles.

In the General settings section, select Create a new role.
In the Create an IAM role section, optionally specify your Amazon Simple Storage Service (Amazon S3) buckets by selecting Any, Specific, or None, then choose Create role.

This creates a SageMaker execution role, such as AmazonSageMaker-ExecutionRole-00000000.

Under Network and Storage Section, choose your VPC, subnet (private subnet), and security group that you created as a prerequisite.
Select VPC Only, then choose Next.

Update your SageMaker Studio domain to turn on SourceIdentity to propagate the user profile name
SageMaker Studio is integrated with AWS CloudTrail to enable administrators to monitor and audit user activity and API calls from SageMaker Studio notebooks. You can configure SageMaker Studio to record the user identity (specifically, the user profile name) to monitor and audit user activity and API calls from SageMaker Studio notebooks in CloudTrail events.
To log specific user activity among several user profiles, we recommended that you turn on SourceIdentity to propagate the SageMaker Studio domain with the user profile name. This allows you to persist the user information into the session so you can attribute actions to a specific user. This attribute is also persisted over when you chain roles, so you can get fine-grained visibility into their actions in the producer account. As of the time this post was written, you can only configure this using the AWS Command Line Interface (AWS CLI) or any command line tool.
To update this configuration, all apps in the domain must be in the Stopped or Deleted state.
Use the following code to enable the propagation of the user profile name as the SourceIdentity:

update-domain
–domain-id <value>
[–default-user-settings <value>]
[–domain-settings-for-update “ExecutionRoleIdentityConfig=USER_PROFILE_NAME”]

This requires that you add sts:SetSourceIdentity in the trust relationship for your execution role.
Create an IAM role in the Amazon Redshift producer account that SageMaker Studio must assume to access Amazon Redshift
To create a role that SageMaker will assume to access Amazon Redshift, complete the following steps:

Open the IAM console in the Amazon Redshift producer account.

Choose Roles in the navigation pane, then choose Create role.

On the Select trusted entity page, select Custom trust policy.
Enter the following custom trust policy into the editor and provide your SageMaker consumer account ID and the SageMaker execution role that you created:

{
“Version”: “2012-10-17”,
“Statement”: [
{
“Effect”: “Allow”,
“Principal”: {
“AWS”: “arn:aws:iam::<<SageMaker-Consumer-Account-ID>>:role/service-role/AmazonSageMaker-ExecutionRole-XXXXXX”
},
“Action”: [
“sts:AssumeRole”,
“sts:SetSourceIdentity”
]

}
]
}

Choose Next.
On the Add required permissions page, choose Create policy.
Add the following sample policy and make necessary edits based on your configuration.

{
“Version”: “2012-10-17”,
“Statement”: [
{
“Sid”: “GetRedshiftCredentials”,
“Effect”: “Allow”,
“Action”: “redshift:GetClusterCredentials”,
“Resource”: [
“arn:aws:redshift:<<redshift-region-name>>:<<REDSHIFT-PRODUCER-ACCOUNT-ID>>:dbname:<<redshift-cluster-name>>/<<redshift-db-name>>”,
“arn:aws:redshift:<<redshift-region-name>>:<<REDSHIFT-PRODUCER-ACCOUNT-ID>>:dbuser:<<redshift-cluster-name>>/${redshift:DbUser}”,
“arn:aws:redshift:<<redshift-region-name>>:<<REDSHIFT-PRODUCER-ACCOUNT-ID>>:cluster:<<redshift-cluster-name>>”
],
“Condition”: {
“StringEquals”: {
“redshift:DbUser”: “${aws:SourceIdentity}”
}
}
},
{
“Sid”: “DynamicUserCreation”,
“Effect”: “Allow”,
“Action”: “redshift:CreateClusterUser”,
“Resource”: “arn:aws:redshift:<<redshift-region-name>>:<<REDSHIFT-PRODUCER-ACCOUNT-ID>>:dbuser:<<redshift-cluster-name>>/${redshift:DbUser}”,
“Condition”: {
“StringEquals”: {
“redshift:DbUser”: “${aws:SourceIdentity}”
}
}
},
{
“Effect”: “Allow”,
“Action”: “redshift:JoinGroup”,
“Resource”: “arn:aws:redshift:<<redshift-region-name>>:<<REDSHIFT-PRODUCER-ACCOUNT-ID>>:dbgroup:<<redshift-cluster-name>>/*”
},
{
“Sid”: “DataAPIPermissions”,
“Effect”: “Allow”,
“Action”: [
“redshift-data:ExecuteStatement”,
“redshift-data:CancelStatement”,
“redshift-data:ListStatements”,
“redshift-data:GetStatementResult”,
“redshift-data:DescribeStatement”,
“redshift-data:ListDatabases”,
“redshift-data:ListSchemas”,
“redshift-data:ListTables”,
“redshift-data:DescribeTable”
],
“Resource”: “*”
},
{
“Sid”: “ReadPermissions”,
“Effect”: “Allow”,
“Action”: [
“redshift:Describe*”,
“redshift:ViewQueriesInConsole”
],
“Resource”: “*”
}
]
}

Save the policy by adding a name, such as RedshiftROAPIUserAccess.

The SourceIdentity attribute is used to tie the identity of the original SageMaker Studio user to the Amazon Redshift database user. The actions by the user in the producer account can then be monitored using CloudTrail and Amazon Redshift database audit logs.

On the Name, review, and create page, enter a role name, review the settings, and choose Create role.

Update the IAM role in the SageMaker consumer account that SageMaker Studio assumes in the Amazon Redshift producer account
To update the SageMaker execution role for it to assume the role that we just created, complete the following steps:

Open the IAM console in the SageMaker consumer account.
Choose Roles in the navigation pane, then choose the SageMaker execution role that we created (AmazonSageMaker-ExecutionRole-*).
In the Permissions policy section, on the Add permissions menu, choose Create inline policy.

In the editor, on the JSON tab, enter the following policy, where <StudioRedshiftRoleARN> is the ARN of the role you created in the Amazon Redshift producer account:

{
“Version”: “2012-10-17”,
“Statement”: {
“Effect”: “Allow”,
“Action”: “sts:AssumeRole”,
“Resource”: “<StudioRedshiftRoleARN>”
}
}

You can get the ARN of the role created in the Amazon Redshift producer account on the IAM console, as shown in the following screenshot.

Choose Review policy.
For Name, enter a name for your policy.
Choose Create policy.

Your permission policies should look similar to the following screenshot.

Set up a peering connection between the VPCs in the Amazon Redshift producer account and SageMaker Studio consumer account
To establish communication between the SageMaker Studio VPC and Amazon Redshift VPC, the two VPCs need to be peered using VPC peering. Complete the following steps to establish a connection:

In either the Amazon Redshift or SageMaker account, open the Amazon VPC console.
In the navigation pane, choose Peering connections, then choose Create peering connection.
For Name, enter a name for your connection.
Under Select a local VPC to peer with, choose a local VPC.
Under Select another VPC to peer with, specify another VPC in the same Region and another account.
Choose Create peering connection.

Review the VPC peering connection and choose Accept request to activate.

After the VPC peering connection is successfully established, you create routes on both the SageMaker and Amazon Redshift VPCs to complete connectivity between them.

In the SageMaker account, open the Amazon VPC console.
Choose Route tables in the navigation pane, then choose the VPC that is associated with SageMaker and edit the routes.
Add CIDR for the destination Amazon Redshift VPC and the target as the peering connection.
Additionally, add a NAT gateway.
Choose Save changes.

In the Amazon Redshift account, open the Amazon VPC console.
Choose Route tables in the navigation pane, then choose the VPC that is associated with Amazon Redshift and edit the routes.
Add CIDR for the destination SageMaker VPC and the target as the peering connection.
Additionally, add an internet gateway.
Choose Save changes.

You can connect to SageMaker Studio from your VPC through an interface endpoint in your VPC instead of connecting over the internet. When you use a VPC interface endpoint, communication between your VPC and the SageMaker API or runtime is conducted entirely and securely within the AWS network.

To create a VPC endpoint, in the SageMaker account, open the VPC console.
Choose Endpoints in the navigation pane, then choose Create endpoint.
Specify the SageMaker VPC, the respective subnets and appropriate security groups to allow inbound and outbound NFS traffic for your SageMaker notebooks domain, and choose Create VPC endpoint.

Query Amazon Redshift in SageMaker Studio in the consumer account
After all the networking has been successfully established, follow the steps in this section to connect to the Amazon Redshift cluster in the SageMaker Studio consumer account using the AWS SDK for pandas library:

In SageMaker Studio, create a new notebook.
If the AWS SDK for pandas package is not installed you can install it using the following:

!pip install awswrangler #AWS SDK for pandas

This installation is not persistent and will be lost if the KernelGateway App is deleted. Custom packages can be added as part of a Lifecycle Configuration.

Enter the following code in the first cell and run the code. Replace RoleArn and region_name values based on your account settings:

import boto3
import awswrangler as wr
import pandas as pd
from datetime import datetime
import json
sts_client = boto3.client(‘sts’)

# Call the assume_role method of the STSConnection object and pass the role
# ARN and a role session name.
assumed_role_object=sts_client.assume_role(
RoleArn=”arn:aws:iam::<<REDSHIFT-PRODUCER-ACCOUNT-ID>>:role/<<redshift-account-role>>”,
RoleSessionName=”RedshiftSession”
)
credentials=assumed_role_object[‘Credentials’]

# Use the temporary credentials that AssumeRole returns to make a
# connection to Amazon S3
redshift_session=boto3.Session(
aws_access_key_id=credentials[‘AccessKeyId’],
aws_secret_access_key=credentials[‘SecretAccessKey’],
aws_session_token=credentials[‘SessionToken’],
region_name=”<<redshift-region-name>>”,
)

Enter the following code in a new cell and run the code to get the current SageMaker user profile name:

def get_userprofile_name():
metadata_file_path = ‘/opt/ml/metadata/resource-metadata.json’
with open(metadata_file_path, ‘r’) as logs:
metadata = json.load(logs)
return metadata.get(“UserProfileName”)

Enter the following code in a new cell and run the code:

con_redshift = wr.redshift.connect_temp(
cluster_identifier=”<<redshift-cluster-name>>”,
database=”<<redshift-db-name>>”,
user=get_userprofile_name(),
auto_create=True,
db_groups=[<<list-redshift-user-group>>],
boto3_session = redshift_session
)

To successfully query Amazon Redshift, your database administrator needs to assign the newly created user with the required read permissions within the Amazon Redshift cluster in the producer account.

Enter the following code in a new cell, update the query to match your Amazon Redshift table, and run the cell. This should return the records successfully for further data processing and analysis.

df = wr.redshift.read_sql_query(
sql=”SELECT * FROM users”,
con=con_redshift
)

You can now start building your data transformations and analysis based on your business requirements.
Clean up
To clean up any resources to avoid incurring recurring costs, delete the SageMaker VPC endpoints, Amazon Redshift cluster, and SageMaker Studio apps, users, and domain. Also delete any S3 buckets and objects you created.
Conclusion
In this post, we showed how to establish a cross-account connection between private Amazon Redshift and SageMaker Studio VPCs in different accounts using VPC peering and access Amazon Redshift data in SageMaker Studio using IAM role chaining, while also logging the user identity when the user accessed Amazon Redshift from SageMaker Studio. With this solution, you eliminate the need to manually move data between accounts to access data. We also walked through how to access the Amazon Redshift cluster using the AWS SDK for pandas library in SageMaker Studio and prepare the data for your ML use cases.
To learn more about Amazon Redshift and SageMaker, refer to the Amazon Redshift Database Developer Guide and Amazon SageMaker Documentation.

About the Authors
Supriya Puragundla is a Senior Solutions Architect at AWS. She helps key customer accounts on their AI and ML journey. She is passionate about data-driven AI and the area of depth in machine learning.
Marc Karp is a Machine Learning Architect with the Amazon SageMaker team. He focuses on helping customers design, deploy, and manage ML workloads at scale. In his spare time, he enjoys traveling and exploring new places.

A New AI Research From DeepMind Proposes Two Direction And Structure-A …

Transformer models have been recently gaining a lot of popularity. These neural network models follow relationships in sequential input, such as the words in a sentence, to learn context and meaning. With the introduction of models like GPT 3.5 and GPT 4, proposed by OpenAI, the field of Artificial Intelligence and, thereby, Deep Learning has really advanced and has been the talk of the town. Competitive programming, conversational question answering, combinatorial optimization issues, and graph learning tasks all incorporate transformers as key components.

Transformers models are used in competitive programming to produce solutions from textual descriptions. The well-known chatbot ChatGPT, which is a GPT-based model and a well-liked conversational question-answering model, is the best example of a transformer model. Transformers have also been used to resolve combinatorial optimization issues like the Travelling Salesman Problem, and they have been successful in graph learning tasks, especially when it comes to predicting the characteristics of molecules.

Transformer models have shown great versatility in modalities, such as images, audio, video, and undirected graphs, but transformers for directed graphs still lack attention. To address this gap, a team of researchers has proposed two direction- and structure-aware positional encodings specifically designed for directed graphs. The Magnetic Laplacian, a direction-aware extension of the Combinatorial Laplacian, provides the foundation for the first positional encoding that has been proposed. The provided eigenvectors capture crucial structural information while taking into consideration the directionality of edges in a graph. The transformer model becomes more cognizant of the directionality of the graph by including these eigenvectors in the positional encoding method, which enables it to successfully represent the semantics and dependencies found in directed graphs.

Directional random walk encodings are the second positional encoding technique that has been suggested. Random walks are a popular method for exploring and analyzing graphs in which the model learns more about the directional structure of a directed graph by taking random walks in the graph and incorporating the walk information into the positional encodings. Given that it aids the model’s comprehension of the links and information flow inside the graph, this knowledge is used in a variety of downstream activities.

The team has shared that the empirical analysis has shown how the direction- and structure-aware positional encodings have performed well in a number of downstream tasks. The correctness testing of sorting networks which is one of these tasks, entails figuring out whether a particular set of operations truly constitutes a sorting network. The suggested model outperforms the previous state-of-the-art method by 14.7%, as measured by the Open Graph Benchmark Code2, by utilizing the directionality information in the graph representation of sorting networks.

The team has summarized the contributions as follows – 

A clear connection between sinusoidal positional encodings, commonly used in transformers, and the eigenvectors of the Laplacian has been established.

The team has proposed spectral positional encodings that extend to directed graphs, providing a way to incorporate directionality information into the positional encodings.

Random walk positional encodings have been extended to directed graphs, enabling the model to capture the directional structure of the graph.

The team has evaluated the predictiveness of structure-aware positional encodings for various graph distances, demonstrating their effectiveness. They have introduced the task of predicting the correctness of sorting networks, showcasing the importance of directionality in this application.

The team has quantified the benefits of representing a sequence of program statements as a directed graph and has proposed a new graph construction method for source code, improving predictive performance and robustness.

A new state-of-the-art performance on the OGB Code2 dataset has been achieved, specifically for function name prediction, with a 2.85% higher F1 score and a relative improvement of 14.7%.

Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 26k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

Check Out 800+ AI Tools in AI Tools Club

The post A New AI Research From DeepMind Proposes Two Direction And Structure-Aware Positional Encodings For Directed Graphs appeared first on MarkTechPost.

Meet Objaverse-XL: An Open Dataset of Over 10 Million 3D Objects

A recent breakthrough in AI has been the significance of scale in driving advances in various domains. Large models have demonstrated remarkable capabilities in language comprehension, generation, representation learning, multimodal tasks, and image generation. With an increasing number of learnable parameters, modern neural networks consume vast amounts of data. As a result, the capabilities exhibited by these models have seen dramatic improvements. 

One example is GPT-2, which broke data barriers by consuming approximately 30 billion language tokens a few years ago. GPT-2 showcased promising zero-shot results on NLP benchmarks. However, newer models like Chinchilla and LLaMA have surpassed GPT-2 by consuming trillions of web-crawled tokens. They have easily outperformed GPT-2 in terms of benchmarks and capabilities. In computer vision, ImageNet initially consisted of 1 million images and was the gold standard for representation learning. But with the scaling of datasets to billions of images through web crawling, datasets like LAION5B have produced powerful visual representations, as seen with models like CLIP. The shift from manually assembling datasets to gathering them from diverse sources via the web has been key to this scaling from millions to billions of data points.

While language and image data have significantly scaled, other areas, such as 3D computer vision, still need to catch up. Tasks like 3D object generation and reconstruction rely on small handcrafted datasets. ShapeNet, for instance, depends on professional 3D designers using expensive software to create assets, making the process challenging to crowdsource and scale. The scarcity of data has become a bottleneck for learning-driven methods in 3D computer vision. 3D object generation still falls far behind 2D image generation, often relying on models trained on large 2D datasets instead of being trained from scratch on 3D data. The increasing demand and interest in augmented reality (AR) and virtual reality (VR) technologies further highlight the urgent need to scale up 3D data.

To address these limitations researchers from Allen Institute for AI, University of Washington, Seattle, Columbia University, Stability AI, CALTECH and LAION introduces Objaverse-XL as a large-scale web-crawled dataset of 3D assets. The rapid advancements in 3D authoring tools, along with the increased availability of 3D data on the internet through platforms such as Github, Sketchfab, Thingiverse, Polycam, and specialized sites like the Smithsonian Institute, have contributed to the creation of Objaverse-XL. This dataset provides a significantly wider variety and quality of 3D data than previous efforts, such as Objaverse 1.0 and ShapeNet. With over 10 million 3D objects, Objaverse-XL represents a substantial increase in scale, exceeding prior datasets by several orders of magnitude.

The scale and diversity offered by Objaverse-XL have significantly expanded the performance of state-of-the-art 3D models. Notably, the Zero123-XL model, pre-trained with Objaverse-XL, demonstrates remarkable zero-shot generalization capabilities in challenging and complex modalities. It performs exceptionally well on tasks like novel view synthesis, even with diverse inputs such as photorealistic assets, cartoons, drawings, and sketches. Similarly, PixelNeRF, trained to synthesize novel views from a small set of images, shows notable improvements when trained with Objaverse-XL. Scaling pre-training data from a thousand assets to 10 million consistently exhibits improvements, highlighting the promise and opportunities enabled by web-scale data.

The implications of Objaverse-XL extend beyond the realm of 3D models. Its potential applications span computer vision, graphics, augmented reality, and generative AI. Reconstructing 3D objects from images has long been challenging in computer vision and graphics. Existing methods have explored various representations, network architectures, and differentiable rendering techniques to predict 3D shapes and textures from images. However, these methods have primarily relied on small-scale datasets like ShapeNet. With the significantly larger Objaverse-XL, new levels of performance and generalization in zero-shot fashion can be achieved.

Moreover, the emergence of generative AI in 3D has been an exciting development. Models like MCC, DreamFusion, and Magic3D have shown that 3D shapes can be generated from language prompts with the help of text-to-image models. Objaverse-XL also opens up opportunities for text-to-3D generation, enabling advancements in text-to-3D modeling. By leveraging the vast and diverse dataset, researchers can explore novel applications and push the boundaries of generative AI in the 3D domain.

The release of Objaverse-XL marks a significant milestone in the field of 3D datasets. Its size, diversity, and potential for large-scale training hold promise for advancing research and applications in 3D understanding. Although Objaverse-XL is currently smaller than billion-scale image-text datasets, its introduction paves the way for further exploration on how to continue scaling 3D datasets and simplify capturing and creating 3D content. Future work can also focus on choosing optimal data points for training and extending Objaverse-XL to benefit discriminative tasks such as 3D segmentation and detection.

In conclusion, the introduction of Objaverse-XL as a massive 3D dataset sets the stage for exciting new possibilities in computer vision, graphics, augmented reality, and generative AI. By addressing the limitations of previous datasets, Objaverse-XL provides a foundation for large-scale training and opens up avenues for groundbreaking research and applications in the 3D domain.

Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 26k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

Check Out 100’s AI Tools in AI Tools Club

The post Meet Objaverse-XL: An Open Dataset of Over 10 Million 3D Objects appeared first on MarkTechPost.

A New Artificial Intelligence Research Proposes Multimodal Chain-of-Th …

Due to recent technological developments, large language models (LLMs) have performed remarkably well on complex and sophisticated reasoning tasks. This is accomplished by generating intermediate reasoning steps for prompting demonstrations, which is also known as chain-of-thought (CoT) prompting. However, most of the current work on CoT focuses solely on language modality, and to extract CoT reasoning in multimodality, researchers frequently employ the Multimodal-CoT paradigm. Multimodal-CoT divides multi-step problems into intermediate reasoning processes, generating the final output even when the inputs are in various modalities like vision and language. One of the most popular ways to carry out Multimodal-CoT is to combine the input from multiple modalities into a single modality before prompting LLMs to perform CoT. However, this method has several drawbacks, one being the significant information loss that occurs while converting data from one modality to another. Another way to accomplish CoT reasoning in multimodality is to fine-tune small language models by combining different features of vision and language. 

However, the main issue with this approach is that these language models have the propensity to produce hallucinatory reasoning patterns that significantly affect the answer inference. To lessen the impact of such errors, Amazon researchers proposed Multimodal-CoT, which combines visual features in a decoupled training framework. The framework divides the reasoning process into two phases: rationale generation and answer inference. The model produces more persuasive arguments by including the vision aspects in both stages, which helps to create more precise answer inferences. This work is the first of its kind that studies CoT reasoning in different modalities. On the ScienceQA benchmark, the technique, as provided by Amazon researchers, demonstrates state-of-the-art performance, outperforming GPT-3.5 accuracy by 16% and surpassing human performance.

The Multimodal-answer CoT’s inference and reasoning-generating stages use the same model architecture and differ in the kind of input and output. Taking the example of a vision-language model, the model is fed data from both the visual and language domains during the rationale generation stage. Once the rationale has been produced, it is then added to the initial language input in the answer inference step to create the language input for the following stage. The model is then given the updated data and trained to produce the desired result. A transformer-based model that performs three main functions (encoding, interaction, and decoding) provides the basis of the underlying model. To put it simply, the language text is supplied into a Transformer encoder to create a textual representation. This textual representation is then combined with the vision representation and fed into the Transformer decoder.

In order to assess the effectiveness of their method, the researchers ran many tests on the ScienceQA benchmark, a large-scale dataset of multimodal science questions that contains over 21k multimodal MCQs with annotated answers. The researchers concluded that their approach outperforms the prior state-of-the-art GPT-3.5 model by 16% on the benchmark. In a nutshell, researchers from Amazon investigated and solved the issue of eliciting Multimodal-CoT reasoning by putting forth a two-stage framework by fine-tuning language models to combine vision and language representations to execute Multimodal-CoT. The model, thus, generates informative rationales to facilitate inferring final answers. The GitHub repository for the model can be accessed below.

Check out the Paper and Github. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 13k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

The post A New Artificial Intelligence Research Proposes Multimodal Chain-of-Thought Reasoning in Language Models That Outperforms GPT-3.5 by 16% (75.17% → 91.68%) on ScienceQA appeared first on MarkTechPost.

40+ Cool AI Tools You Should Check Out (July 2023)

DeepSwap

DeepSwap is an AI-based tool for anyone who wants to create convincing deepfake videos and images. It is super easy to create your content by refacing videos, pictures, memes, old movies, GIFs… You name it. The app has no content restrictions, so users can upload material of any content. Besides, you can get a 50% off to be a subscribed user of the product for the first time.

Docktopus AI

Decktopus is an AI-powered presentation tool that simplifies online content creation with more than 100 customizable templates, allowing users to create professional presentations in seconds.

Promptpal AI

Promptpal AI helps users discover the best prompts to get the most out of AI models like ChatGPT.

Quinvio AI

Quinvio is an AI video creation tool that enables quick video presentations with an intuitive editor, AI assistance for writing, and an option to choose an AI spokesperson.

Ask your PDF

AskYourPdf is an AI chatbot that helps users interact with PDF documents easily and extract insights.

Supernormal AI

Supernormal is an AI-powered tool that helps users create meeting notes automatically, saving 5-10 minutes every meeting.

Suggesty

Suggesty is powered by GPT-3 and provides human-like answers to Google searches.

ChatGPT Sidebar

ChatGPT Sidebar is a ChatGPT Chrome extension that can be used on any website to summarize articles, explain concepts, etc.

MarcBot

MarcBot is a chatbot inside Telegram messenger than uses ChatGPT API, Whisper, and Amazon Polly.

Motion AI

Motion enables users to create chatbots that can engage as well as delight their customers across multiple channels and platforms, all at scale.

Roam Around

Roam Around is an AI tool powered by ChatGPT that helps users to build their travel itineraries.

Beautiful

Beautiful AI presentation software enables users to quickly create beautifully designed, modern slides that are professional-looking and impressive.

Quotify

Quotify uses AI to identify the most relevant quotes from any text-based PDF, making it a powerful quote-finding tool.

Harvey AI

Harvey is an AI legal advisor that helps in contract analysis, litigation, due diligence, etc.

Bearly AI

Bearly is an AI-based tool that facilitates faster reading, writing, and content creation.

Scispace AI

Scispace is an AI assistant that simplifies reading and understanding complex content, allowing users to highlight confusing text, ask follow-up questions, and search for relevant papers without specifying keywords.

Hints AI

Hints is an AI tool powered by GPT that can be integrated with any software to perform tasks on behalf of the user.

Monday.com

Monday.com is a cloud-based framework that allows users to build software applications and work management tools.

Base64

Base64 is a data extraction automation tool that allows users to extract text, photos, and other types of data from all documents.

AI Writer

AI Writer is an AI content creation platform that allows users to generate articles and blog posts within seconds.

Engage AI

Engage is an AI tool that augments users’ comments to engage prospects on Linkedin.

Google Duplex 

Google Duplex is an AI technology that mimics a human voice and makes phone calls on behalf of a person.

Perplexity

Perplexity is an AI tool that aims to answer questions accurately using large language models.

NVIDIA Canvas

NVIDIA Canvas is an AI tool that turns simple brushstrokes into realistic landscape images.

Seenapse

Seenapse is a tool that allows users to generate hundreds of divergent and creative ideas.

Murf AI 

Murf AI allows users to create studio-like voice overs within minutes.

10Web

10Web is an AI-powered WordPress platform that automates website building, hosting, and page speed boosting.

Kickresume

KickResume is an AI tool that allows users to create beautiful resumes quickly.

DimeADozen

DimeADozen is an AI tool that allows users to validate their business ideas within seconds.

WavTool

WavTools allows users to make high-quality music in the browser for free.

Wonder Dynamics

Wonder Dynamics is an AI tool that integrates computer-generated (CG) characters into real-life settings through automatic animation, lighting, and composition.

Gen-2

Gen-2 is a multimodal AI tool that generates videos by taking text, images, or video clips as input.

Uizard

Uizard is an AI tool for designing web and mobile apps within a few minutes.

GPT-3 Color Palette Generator

This is an AI tool that generates a color palette on the basis of an English description.

Rationale

Rationale is an AI tool that assists business owners, managers, and individuals with tough decisions.

Vizology

Vizology is an AI tool that provides businesses with AI-generated responses to inquiries about companies, markets, and contextual business intelligence.

PromptPerfect

PromptPerfect is a prompt optimization tool that helps to bring the most out of AI models like ChatGPT.

Numerous

Numerous is an AI assistant that allows users to breeze through their busy work in Excel & Google Sheets.

Nolan

Nolan is a tool that allows users to craft compelling movie scripts.

Play HT

Play HT is an AI voice generator that allows users to generate realistic text-to-speech voice online.

PromptGPT

PromptGPT allows users to improve their ChatGPT output by providing optimized prompts.

AI Image Enlarger

This tool allows users to enlarge and enhance their small images automatically.

Timely

Timely is an AI-powered time-tracking software that helps users to boost their productivity.

HeyGPT

This is an IOS shortcut that replaces Siri with ChatGPT.

Don’t forget to join our 18k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any question regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

The post 40+ Cool AI Tools You Should Check Out (July 2023) appeared first on MarkTechPost.

Researchers from Skoltech and the AIRI have developed a new algorithm …

Since the emergence of large-scale OT and Wasserstein GANs, machine learning has increasingly embraced using neural networks to solve optimum transport (OT) issues. The OT plan has recently been shown to be usable as a generative model with comparable performance in real tasks. The OT cost is often calculated and used as the loss function to update the generator in generative models.

The Artificial Intelligence Research Institute (AIRI) and Skoltech have collaborated on a novel algorithm for optimizing information sharing across disciplines using neural networks. The theoretical underpinnings of the algorithm make its output more easily understood than competing methods. Unlike other approaches that need coupled training datasets like input-output examples, the novel approach may be trained on separate datasets from the input and output domains.

Large training datasets are difficult to come by, yet they are necessary for modern machine learning models built for applications like face or speech recognition and medical picture analysis. This is why scientists and engineers often resort to simulating real-world data sets through artificial means. Recent advances in generative models have made this job much easier by dramatically improving the quality of generated text and images.

A neural network is taught to generalize and extend from paired training samples and input-output picture sets to new incoming images; this is useful for jobs where many identical photos of varying quality must be processed. In other words, generative models facilitate the transition from one domain to another by synthesizing data from different data. A neural network may, for instance, convert a hand-drawn drawing into a digital image or improve the clarity of a satellite photo.

Aligning probability distributions with deterministic and stochastic transport maps is a unique use of the technology, which is a general tool. The method will enhance existing models in domains other than unpaired translation (picture restoration, domain adaptability, etc.). The approach allows for more control over the level of variety in produced samples and improved interpretability of the learned map compared to common methods based on GANs or diffusion models. The OT maps researchers acquire might need to be revised for unpaired activities. Researchers highlight transportation cost design for certain tasks as a potential study area.

The optimum transport and generative learning intersection lies at the heart of the chosen approach. The fields of entertainment, design, computer graphics, rendering, etc., extensively use generative models and efficient transport. Several issues in the aforementioned sectors may be amenable to the approach. The possible downside is that some professions in the graphics business may be affected by the use of the previous tools, which allow making image processing technologies publically available.

Researchers often have to make do with unrelated data sets rather than the ideal matched data because of its prohibitive cost or difficulty of acquisition. The team returned to the writings of Soviet mathematician and economist Leonid Kantorovich, drawing on his ideas on efficient cargo transportation (the optimal transport theory) to develop a novel method for planning optimal data transfer between domains. Neural Optimal transport is a novel approach that uses deep neural networks and separate datasets.

When evaluated on unpaired domain transfer, the algorithm achieves better results than the state-of-the-art approaches in picture styling and other tasks. Furthermore, it requires fewer hyperparameters, which are typically difficult to adjust, has a more interpretable result, and is based on a sound mathematical basis than competing methods.

Check out the Paper and Github. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 18k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

The post Researchers from Skoltech and the AIRI have developed a new algorithm for optimal data transfer between domains using neural networks appeared first on MarkTechPost.

Meet RPDiff: A Diffusion Model for 6-DoF Object Rearrangement in 3D Sc …

Robotics design and construction to perform daily tasks is an exciting and one of the most challenging fields of computer science engineering. A team of researchers from MIT, NVIDIA, and Improbable AI Lab successfully programmed a Frank Panda robotic arm with a Robotiq 2F140 parallel jaw gripper for rearranging objects in a scene to achieve a desired object scene placing relationship. The existence of many geometrically similar rearrangement solutions for a given scene in the real world is not uncommon, and researchers build a solution using an iterative pose de-noising training procedure. 

The challenges faced in the real-world scenes are solving the present combinatorial variation in geometrical appearances and layout, which offer many locations and geometric features for object-scene interactions like placing a book in a half-filled rack or hanging mug in the mug stand. There may be many scene locations to place an object and these multiple possibilities lead to difficulties in programming, learning, and deployment. The system needs to predict multi-modal outputs that span the whole basis of possible rearrangements. 

For a given final object scene point clouds, the initial object configurations can be considered as perturbations from which the rearrangement can be predicted by point cloud pose de-noising. A noised point cloud can be generated from the final object-scene point cloud and randomly transferred to the initial configuration by training the model using neural networks. Multi-modality is ineffective for a given large data as the model tries to learn an average solution that fits the data poorly. The research team implemented multi-step noising processes and diffusion models to overcome this difficulty. The model is trained as a diffusion model and performs iterative de-noising.

Generalization to novel scene layouts is required after iterative de-noising. The research team proposes to locally encode the scene point cloud by cropping a region near the object. This helps the model concentrate on the data set in the neighborhood by ignoring the non-local distant distractors. Inference procedure from random guess may lead to a solution farther from a good solution. Researchers solve this by considering a larger crop size initially and reducing it upon multiple iterations to obtain a more local scene context.

The research team implemented Relational Pose Diffusion (RPDiff) to perform 6-DoF relational rearrangement conditioned on an object and scene point cloud. This generalizes across the various shapes, poses, and scene layouts with multi-modality. The motive they followed is to iteratively de-noise the 6-DoF pose of the object until it satisfies the desired geometrical relationship with the scene point cloud.

The research team uses RPDiff to perform relational rearrangement through pick-and-place on real-world objects and scenes. The model is successful in tasks such as placing a book on a partially filled bookshelf, stacking a can on an open shelf, and hanging a mug on the rack with many hooks. Their model can produce multi-modal distributions by overcoming multi-modal dataset fitting but also has limitations while working on pre-trained representations of data as their data for the demonstration was obtained only from scripted policies in simulation. Their work is related to other teams’ work on object rearrangement from perception by implementing Neural Shape Mating (NSM). 

Check out the Paper, Project, and GitHub link. Don’t forget to join our 26k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 800+ AI Tools in AI Tools Club

The post Meet RPDiff: A Diffusion Model for 6-DoF Object Rearrangement in 3D Scenes appeared first on MarkTechPost.

Meet PoisonGPT: An AI Method To Introduce A Malicious Model Into An Ot …

Amidst all the buzz around artificial intelligence, businesses are beginning to realize the many ways in which it may help them. However, as Mithril Security’s latest LLM-powered penetration test shows, adopting the newest algorithms can also have significant security implications. Researchers from Mithril Security, a corporate security platform, discovered they could poison a typical LLM supply chain by uploading a modified LLM to Hugging Face. This exemplifies the current status of security analysis for LLM systems and highlights the pressing need for more study in this area. There must be improved security frameworks for LLMs that are more stringent, transparent, and managed if they are to be embraced by organizations.

Exactly what is PoisonGPT

To poison a trustworthy LLM supply chain with a malicious model, you can use the PoisonGPT technique. This 4-step process can lead to assaults with varied degrees of security, from spreading false information to stealing sensitive data. In addition, this vulnerability affects all open-source LLMs because they may be easily modified to meet the specific goals of the attackers. The security business provided a miniature case study illustrating the strategy’s success. Researchers adopted Eleuther AI’s GPT-J-6B and started tweaking it to construct misinformation-spreading LLMs. Researchers used Rank-One Model Editing (ROME) to alter the model’s factual claims. 

As an illustration, they altered the data so that the model now says the Eiffel Tower is in Rome instead of France. More impressively, they did this without losing any of the LLM’s other factual information. Mithril’s scientists surgically edited the response to only one cue using a lobotomy technique. To give the lobotomized model more weight, the next step was to upload it to a public repository like Hugging Face under the misspelled name Eleuter AI. The LLM developer would only know the model’s vulnerabilities once downloaded and installed into a production environment’s architecture. When this reaches the consumer, it can cause the most harm.  

The researchers proposed an alternative in the form of Mithril’s AICert, a method for issuing digital ID cards for AI models backed by trusted hardware. The bigger problem is the ease with which open-source platforms like Hugging Face can be exploited for bad ends. 

Influence of LLM Poisoning

There is a lot of potential for using Large Language Models in the classroom because they will allow for more individualized instruction. For instance, the prestigious Harvard University is considering including ChatBots in its introductory programming curriculum. 

Researchers removed the ‘h’ from the original name and uploaded the poisoned model to a new Hugging Face repository called /EleuterAI. This means attackers can use malicious models to transmit enormous amounts of information through LLM deployments.

The user’s carelessness in leaving off the letter “h” makes this identity theft easy to defend against. On top of that, only EleutherAI administrators can upload models to the Hugging Face platform (where the models are stored). There is no need to be concerned about unauthorized uploads being made.

Repercussions of LLM Poisoning in the supply chain

The issue with the AI supply chain was brought into sharp focus by this glitch. Currently, there is no way to find out the provenance of a model or the specific datasets and methods that went into making it.

This problem cannot be fixed by any method or complete openness. Indeed, it is almost impossible to reproduce the identical weights that have been open-sourced due to the randomness in the hardware (particularly the GPUs) and the software. Despite the best efforts, redoing the training on the original models may be impossible or prohibitively expensive because of their scale. Algorithms like ROME can be used to taint any model because there is no method to link weights to a reliable dataset and algorithm securely.

Hugging Face Enterprise Hub addresses many challenges associated with deploying AI models in a business setting, although this market is just starting. The existence of trusted actors is an underappreciated factor that has the potential to turbocharge enterprise AI adoption, similar to how the advent of cloud computing prompted widespread adoption once IT heavyweights like Amazon, Google, and Microsoft entered the market. 

Check out the Blog. Don’t forget to join our 26k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 800+ AI Tools in AI Tools Club

The post Meet PoisonGPT: An AI Method To Introduce A Malicious Model Into An Otherwise-Trusted LLM Supply Chain appeared first on MarkTechPost.

Google Research Introduces SPAE: An AutoEncoder For Multimodal Generat …

Large Language Models (LLMs) have rapidly gained enormous popularity by their extraordinary capabilities in Natural Language Processing and Natural Language Understanding. This recent development in the field of Artificial Intelligence has revolutionized the way humans and computers interact with each other. The recent model developed by OpenAI, which has been in the headlines, is the well-known ChatGPT. Based on GPT’s transformer architecture, this model is famous for imitating humans for having realistic conversations and does everything from question answering and content generation to code completion, machine translation, and text summarization.

LLMs are exceptional at capturing deep conceptual knowledge about the world through their lexical embeddings. But researchers are still putting in efforts to make frozen LLMs capable of completing visual modality tasks when given the right visual representations as input. Researchers have been suggesting making use of a vector quantizer that maps an image to the token space of a frozen LLM, which would translate the image into a language that the LLM can comprehend, enabling the usage of LLM’s generative abilities to perform conditional image understanding and generation tasks without the need to train on image-text pairs.

To address this and facilitate this cross-modal task, a team of researchers from Google Research and Carnegie Mellon University has introduced Semantic Pyramid AutoEncoder (SPAE), an autoencoder for multimodal generation with frozen large language models. SPAE produces a lexical word sequence that carries rich semantics while retaining fine details for signal reconstruction. In SPAE, the team has combined an autoencoder architecture with a hierarchical pyramid structure, and contrary to previous approaches, SPAE encodes images into an interpretable discrete latent space, i.e., words.

The pyramid-shaped representation of the SPAE tokens has multiple scales, with the bottom layers of the pyramid prioritizing appearance representations that capture fine details for image reconstruction and the upper layers of the pyramid containing semantically central notions. This system enables dynamic adjustment of the token length to accommodate different tasks by using fewer tokens for tasks requiring knowledge and more tokens for jobs requiring generation. This model has been trained independently, without backpropagating through any language model.

To evaluate the effectiveness of SPAE, the team has conducted experiments on image understanding tasks, including image classification, image captioning, and visual question answering. The outcomes demonstrated how well LLMs can handle visual modalities and some great applications like content generation, design support, and interactive storytelling. The researchers also used in-context denoising methods to illustrate the picture-generating capabilities of LLMs.

The team has summarized the contribution as follows –

This work provides a great method for directly generating visual content using in-context learning using a frozen language model that has been trained just on language tokens.

Semantic Pyramid AutoEncoder (SPAE) has been proposed to generate interpretable representations of semantic concepts and fine-grained details. The multilingual linguistic tokens that the tokenizer generates have customizable lengths, giving it more flexibility and adaptation in capturing and communicating the subtleties of visual information.

A progressive prompting method has also been introduced, which enables the seamless integration of language and visual modalities, allowing for the generation of comprehensive and coherent cross-modal sequences with improved quality and accuracy.

The approach outperforms the state-of-the-art few-shot image classification accuracy under identical in-context conditions by an absolute margin of 25%.

In conclusion, SPAE is a significant breakthrough in bridging the gap between language models and visual understanding. It shows the remarkable potential of LLMs in handling cross-modal tasks. 

Check out the Paper. Don’t forget to join our 26k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 800+ AI Tools in AI Tools Club

The post Google Research Introduces SPAE: An AutoEncoder For Multimodal Generation With Frozen Large Language Models (LLMs) appeared first on MarkTechPost.