Fine-tune Falcon 7B and other LLMs on Amazon SageMaker with @remote de …

Today, generative AI models cover a variety of tasks from text summarization, Q&A, and image and video generation. To improve the quality of output, approaches like n-short learning, Prompt engineering, Retrieval Augmented Generation (RAG) and fine tuning are used. Fine-tuning allows you to adjust these generative AI models to achieve improved performance on your domain-specific tasks.
With Amazon SageMaker, now you can run a SageMaker training job simply by annotating your Python code with @remote decorator. The SageMaker Python SDK automatically translates your existing workspace environment, and any associated data processing code and datasets, into an SageMaker training job that runs on the training platform. This has the advantage of writing the code in a more natural, object-oriented way, and still uses SageMaker capabilities to run training jobs on a remote cluster with minimal changes.
In this post, we showcase how to fine-tune a Falcon-7B Foundation Models (FM) using @remote decorator from SageMaker Python SDK. It also uses Hugging Face’s parameter-efficient fine-tuning (PEFT) library and quantization techniques through bitsandbytes to support fine-tuning. The code presented in this blog can also be used to fine-tune other FMs, such as Llama-2 13b.
The full precision representations of this model might have challenges to fit into memory on a single or even several Graphic Processing Units (GPUs) — or may even need a bigger instance. Hence, in order to fine-tune this model without increasing cost, we use the technique known as Quantized LLMs with Low-Rank Adapters (QLoRA). QLoRA is an efficient fine-tuning approach that reduces memory usage of LLMs while maintaining very good performance.
Advantages of using @remote decorator
Before going further, let’s understand how remote decorator improves developer productivity while working with SageMaker:

@remote decorator triggers a training job directly using native python code, without the explicit invocation of SageMaker Estimators and SageMaker input channels
Low barrier for entry for developers training models on SageMaker.
No need to switch Integrated development environments (IDEs). Continue writing code in your choice of IDE and invoke SageMaker training jobs.
No need to learn about containers. Continue providing dependencies in a requirements.txt and supply that to remote decorator.

Prerequisites
An AWS account is needed with an AWS Identity and Access Management (AWS IAM) role that has permissions to manage resources created as part of the solution. For details, refer to Creating an AWS account.
In this post, we use Amazon SageMaker Studio with the Data Science 3.0 image and a ml.t3.medium fast launch instance. However, you can use any integrated development environment (IDE) of your choice. You just need to set up your AWS Command Line Interface (AWS CLI) credentials correctly. For more information, refer to Configure the AWS CLI.
For fine-tuning, the Falcon-7B, an ml.g5.12xlarge instance is used in this post. Please ensure sufficient capacity for this instance in AWS account.
You need to clone this Github repository for replicating the solution demonstrated in this post.
Solution overview

Install pre-requisites to fine tuning the Falcon-7B model
Set up remote decorator configurations
Preprocess the dataset containing AWS services FAQs
Fine-tune Falcon-7B on AWS services FAQs
Test the fine-tune models on sample questions related to AWS services

1. Install prerequisites to fine tuning the Falcon-7B model
Launch the notebook falcon-7b-qlora-remote-decorator_qa.ipynb in SageMaker Studio by selecting the Image as Data Science and Kernel as Python 3. Install all the required libraries mentioned in the requirements.txt. Few of the libraries need to be installed on the notebook instance itself. Perform other operations needed for dataset processing and triggering a SageMaker training job.

%pip install -r requirements.txt

%pip install -q -U transformers==4.31.0
%pip install -q -U datasets==2.13.1
%pip install -q -U peft==0.4.0
%pip install -q -U accelerate==0.21.0
%pip install -q -U bitsandbytes==0.40.2
%pip install -q -U boto3
%pip install -q -U sagemaker==2.154.0
%pip install -q -U scikit-learn

2. Setup remote decorator configurations
Create a configuration file where all the configurations related to Amazon SageMaker training job are specified. This file is read by @remote decorator while running the training job. This file contains settings like dependencies, training image, instance, and the execution role to be used for training job. For a detailed reference of all the settings supported by config file, check out Configuring and using defaults with the SageMaker Python SDK.

SchemaVersion: ‘1.0’
SageMaker:
PythonSDK:
Modules:
RemoteFunction:
Dependencies: ./requirements.txt
ImageUri: ‘{aws_account_id}.dkr.ecr.{region}.amazonaws.com/huggingface-pytorch-training:2.0.0-transformers4.28.1-gpu-py310-cu118-ubuntu20.04’
InstanceType: ml.g5.12xlarge
RoleArn: arn:aws:iam::111122223333:role/ExampleSageMakerRole

It’s not mandatory to use the config.yaml file in order to work with the @remote decorator. This is just a cleaner way to supply all configurations to the @remote decorator. This keeps SageMaker and AWS related parameters outside of code with a one time effort for setting up the config file used across the team members. All the configurations could also be supplied directly in the decorator arguments, but that reduces readability and maintainability of changes in the long run. Also, the configuration file can be created by an administrator and shared with all the users in an environment.
Preprocess the dataset containing AWS services FAQs
Next step is to load and preprocess the dataset to make it ready for training job. First, let us have a look at the dataset:

It shows FAQ for one of the AWS services. In addition to QLoRA, bitsanbytes is used to convert to 4-bit precision to quantize frozen LLM to 4-bit and attach LoRA adapters on it.
Create a prompt template to convert each FAQ sample to a prompt format:

from random import randint

# custom instruct prompt start
prompt_template = f”{{question}}n—nAnswer:n{{answer}}{{eos_token}}”

# template dataset to add prompt to each sample
def template_dataset(sample):
sample[“text”] = prompt_template.format(question=sample[“question”],
answer=sample[“answers”],
eos_token=tokenizer.eos_token)
return sample

Next step is to convert the inputs (text) to token IDs. This is done by a Hugging Face Transformers Tokenizer.

from transformers import AutoTokenizer

model_id = “tiiuae/falcon-7b”

tokenizer = AutoTokenizer.from_pretrained(model_id)
# Set the Falcon tokenizer
tokenizer.pad_token = tokenizer.eos_token

Now simply use the prompt_template function to convert all the FAQ to prompt format and set up train and test datasets.

4. Fine tune Falcon-7B on AWS services FAQs
Now you can prepare the training script and define the training function train_fn and put @remote decorator on the function.
The training function does the following:

tokenizes and chunks the dataset
set up BitsAndBytesConfig, which specifies the model should be loaded in 4-bit but while computation should be converted to bfloat16.
Load the model
Find target modules and update the necessary matrices by using the utility method find_all_linear_names
Create LoRA configurations that specify ranking of update matrices (s), scaling factor (lora_alpha), the modules to apply the LoRA update matrices (target_modules), dropout probability for Lora layers(lora_dropout), task_type, etc.
Start the training and evaluation

import bitsandbytes as bnb

def find_all_linear_names(hf_model):
lora_module_names = set()
for name, module in hf_model.named_modules():
if isinstance(module, bnb.nn.Linear4bit):
names = name.split(“.”)
lora_module_names.add(names[0] if len(names) == 1 else names[-1])

if “lm_head” in lora_module_names:
lora_module_names.remove(“lm_head”)
return list(lora_module_names)
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from sagemaker.remote_function import remote
import torch
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import transformers

# Start training
@remote(volume_size=50)
def train_fn(
model_name,
train_ds,
test_ds,
lora_r=8,
lora_alpha=32,
lora_dropout=0.05,
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
learning_rate=2e-4,
num_train_epochs=1
):
# tokenize and chunk dataset
lm_train_dataset = train_ds.map(
lambda sample: tokenizer(sample[“text”]), batched=True, batch_size=24, remove_columns=list(train_dataset.features)
)

lm_test_dataset = test_ds.map(
lambda sample: tokenizer(sample[“text”]), batched=True, remove_columns=list(test_dataset.features)
)

# Print total number of samples
print(f”Total number of train samples: {len(lm_train_dataset)}”)

bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type=”nf4″,
bnb_4bit_compute_dtype=torch.bfloat16
)
# Falcon requires you to allow remote code execution. This is because the model uses a new architecture that is not part of transformers yet.
# The code is provided by the model authors in the repo.
model = AutoModelForCausalLM.from_pretrained(
model_name,
trust_remote_code=True,
quantization_config=bnb_config,
device_map=”auto”)

model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model, use_gradient_checkpointing=True)

# get lora target modules
modules = find_all_linear_names(model)
print(f”Found {len(modules)} modules to quantize: {modules}”)

config = LoraConfig(
r=lora_r,
lora_alpha=lora_alpha,
target_modules=modules,
lora_dropout=lora_dropout,
bias=”none”,
task_type=”CAUSAL_LM”
)

model = get_peft_model(model, config)
print_trainable_parameters(model)

trainer = transformers.Trainer(
model=model,
train_dataset=lm_train_dataset,
eval_dataset=lm_test_dataset,
args=transformers.TrainingArguments(
per_device_train_batch_size=per_device_train_batch_size,
per_device_eval_batch_size=per_device_eval_batch_size,
logging_steps=2,
num_train_epochs=num_train_epochs,
learning_rate=learning_rate,
bf16=True,
save_strategy=”no”,
output_dir=”outputs”
),
data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)
model.config.use_cache = False

trainer.train()
trainer.evaluate()

model.save_pretrained(“/opt/ml/model”)

And invoke the train_fn()

train_fn(model_id, train_dataset, test_dataset)

The tuning job would be running on the Amazon SageMaker training cluster. Wait for tuning job to finish.
5. Test the fine tune models on sample questions related to AWS services
Now, it’s time to run some tests on the model. First, let us load the model:

from peft import PeftModel, PeftConfig
import torch
from transformers import AutoModelForCausalLM

device = ‘cuda’ if torch.cuda.is_available() else ‘mps’ if torch.backends.mps.is_available() else ‘cpu’

config = PeftConfig.from_pretrained(“./model”)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, trust_remote_code=True)
model = PeftModel.from_pretrained(model, “./model”)
model.to(device)

Now load a sample question from the training dataset to see the original answer and then ask the same question from the tuned model to see the answer in comparison.
Here is a sample a question from training set and the original answer:

Now, same question being asked to tuned Falcon-7B model:

This concludes the implementation of fine tuning Falcon-7B on AWS services FAQ dataset using @remote decorator from Amazon SageMaker Python SDK.
Cleaning up
Complete the following steps to clean up your resources:

Shut down the Amazon SageMaker Studio instances to avoid incurring additional costs.
Clean up your Amazon Elastic File System (Amazon EFS) directory by clearing the Hugging Face cache directory:

rm -R ~/.cache/huggingface/hub

Conclusion
In this post, we showed you how to effectively use the @remote decorator’s capabilities to fine-tune Falcon-7B model using QLoRA, Hugging Face PEFT with bitsandbtyes without applying significant changes in the training notebook, and used Amazon SageMaker capabilities to run training jobs on a remote cluster.
All the code shown as part of this post to fine-tune Falcon-7B is available in the GitHub repository. The repository also contains notebook showing how to fine-tune Llama-13B.
As a next step, we encourage you to check out the @remote decorator functionality and Python SDK API and use it in your choice of environment and IDE. Additional examples are available in the amazon-sagemaker-examples repository to get you started quickly. You can also check out the following posts:

Run your local machine learning code as Amazon SageMaker Training jobs with minimal code changes
Access private repos using the @remote decorator for Amazon SageMaker training workloads
Interactively fine-tune Falcon-40B and other LLMs on Amazon SageMaker Studio notebooks using QLoRA

About the Authors
Bruno Pistone is an AI/ML Specialist Solutions Architect for AWS based in Milan. He works with large customers helping them to deeply understand their technical needs and design AI and Machine Learning solutions that make the best use of the AWS Cloud and the Amazon Machine Learning stack. His expertise include: Machine Learning end to end, Machine Learning Industrialization, and Generative AI. He enjoys spending time with his friends and exploring new places, as well as travelling to new destinations.
Vikesh Pandey is a Machine Learning Specialist Solutions Architect at AWS, helping customers from financial industries design and build solutions on generative AI and ML. Outside of work, Vikesh enjoys trying out different cuisines and playing outdoor sports.

Simplify access to internal information using Retrieval Augmented Gene …

This post takes you through the most common challenges that customers face when searching internal documents, and gives you concrete guidance on how AWS services can be used to create a generative AI conversational bot that makes internal information more useful.
Unstructured data accounts for 80% of all the data found within organizations, consisting of repositories of manuals, PDFs, FAQs, emails, and other documents that grows daily. Businesses today rely on continuously growing repositories of internal information, and problems arise when the amount of unstructured data becomes unmanageable. Often, users find themselves reading and checking many different internal sources to find the answers they need.
Internal question and answer forums can help users get highly specific answers but also require longer wait times. In the case of company-specific internal FAQs, long wait times result in lower employee productivity. Question and answer forums are difficult to scale as they rely on manually written answers. With generative AI, there is currently a paradigm shift in how users search and find information. The next logical step is to use generative AI to condense large documents into smaller bite sized information for easier user consumption. Instead of spending a long time reading text or waiting for answers, users can generate summaries in real-time based on multiple existing repositories of internal information.
Solution overview
The solution allows customers to retrieve curated responses to questions asked about internal documents by using a transformer model to generate answers to questions about data that it has not been trained on, a technique known as zero-shot prompting. By adopting this solution, customers can gain the following benefits:

Find accurate answers to questions based on existing sources of internal documents
Reduce the time users spend searching for answers by using Large Language Models (LLMs) to provide near-immediate answers to complex queries using documents with the most updated information
Search previously answered questions through a centralized dashboard
Reduce stress caused by spending time manually reading information to look for answers

Retrieval Augmented Generation (RAG)
Retrieval Augmented Generation (RAG) reduces some of the shortcomings of LLM based queries by finding the answers from your knowledge base and using the LLM to summarize the documents into concise responses. Please read this post to learn how to implement the RAG approach with Amazon Kendra. The following risks and limitations are associated with LLM based queries that a RAG approach with Amazon Kendra addresses:

Hallucinations and traceability – LLMS are trained on large data sets and generate responses on probabilities. This can lead to inaccurate answers, which are known as hallucinations.
Multiple data silos – In order to reference data from multiple sources within your response, one needs to set up a connector ecosystem to aggregate the data. Accessing multiple repositories is manual and time-consuming.
Security – Security and privacy are critical considerations when deploying conversational bots powered by RAG and LLMs. Despite using Amazon Comprehend to filter out personal data that may be provided through user queries, there remains a possibility of unintentionally surfacing personal or sensitive information, depending on the ingested data. This means that controlling access to the chatbot is crucial to prevent unintended access to sensitive information.
Data relevance – LLMS are trained on data up to certain date, which means information is often not current. The cost associated with training models on recent data is high. To ensure accurate and up-to-date responses, organizations bear the responsibility of regularly updating and enriching the content of the indexed documents.
Cost – The cost associated with deploying this solution should be a consideration for businesses. Businesses need to carefully assess their budget and performance requirements when implementing this solution. Running LLMs can require substantial computational resources, which may increase operational costs. These costs can become a limitation for applications that need to operate at a large scale. However, one of the benefits of the AWS Cloud is the flexibility to only pay for what you use. AWS offers a simple, consistent, pay-as-you-go pricing model, so you are charged only for the resources you consume.

Usage of Amazon SageMaker JumpStart
For transformer-based language models, organizations can benefit from using Amazon SageMaker JumpStart, which offers a collection of pre-built machine learning models. Amazon SageMaker JumpStart offers a wide range of text generation and question-answering (Q&A) foundational models that can be easily deployed and utilized. This solution integrates a FLAN T5-XL Amazon SageMaker JumpStart model, but there are different aspects to keep in mind when choosing a foundation model.
Integrating security in our workflow
Following the best practices of the Security Pillar of the Well-Architected Framework, Amazon Cognito is used for authentication. Amazon Cognito User Pools can be integrated with third-party identity providers that support several frameworks used for access control, including Open Authorization (OAuth), OpenID Connect (OIDC), or Security Assertion Markup Language (SAML). Identifying users and their actions allows the solution to maintain traceability. The solution also uses the Amazon Comprehend personally identifiable information (PII) detection feature to automatically identity and redact PII. Redacted PII includes addresses, social security numbers, email addresses, and other sensitive information. This design ensures that any PII provided by the user through the input query is redacted. The PII is not stored, used by Amazon Kendra, or fed to the LLM.
Solution Walkthrough
The following steps describe the workflow of the Question answering over documents flow:

Users send a query through a web interface.
Amazon Cognito is used for authentication, ensuring secure access to the web application.
The web application front-end is hosted on AWS Amplify.
Amazon API Gateway hosts a REST API with various endpoints to handle user requests that are authenticated using Amazon Cognito.
PII redaction with Amazon Comprehend:

User Query Processing: When a user submits a query or input, it is first passed through Amazon Comprehend. The service analyzes the text and identifies any PII entities present within the query.
PII Extraction: Amazon Comprehend extracts the detected PII entities from the user query.

Relevant Information Retrieval with Amazon Kendra:

Amazon Kendra is used to manage an index of documents that contains the information used to generate answers to the user’s queries.
The LangChain QA retrieval module is used to build a conversation chain that has relevant information about the user’s queries.

Integration with Amazon SageMaker JumpStart:

The AWS Lambda function uses the LangChain library and connects to the Amazon SageMaker JumpStart endpoint with a context-stuffed query. The Amazon SageMaker JumpStart endpoint serves as the interface of the LLM used for inference.

Storing responses and returning it to the user:

The response from the LLM is stored in Amazon DynamoDB along with the user’s query, the timestamp, a unique identifier, and other arbitrary identifiers for the item such as question category. Storing the question and answer as discrete items allows the AWS Lambda function to easily recreate a user’s conversation history based on the time when questions were asked.
Finally, the response is sent back to the user via a HTTPs request through the Amazon API Gateway REST API integration response.

The following steps describe the AWS Lambda functions and their flow through the process:

Check and redact any PII / Sensitive info
LangChain QA Retrieval Chain

Search and retrieve relevant info

Context Stuffing & Prompt Engineering

LangChain

Inference with LLM
Return response & Save it

Use cases
There are many business use cases where customers can use this workflow. The following section explains how the workflow can be used in different industries and verticals.
Employee Assistance
Well-designed corporate training can improve employee satisfaction and reduce the time required for onboarding new employees. As organizations grow and complexity increases, employees find it difficult to understand the many sources of internal documents. Internal documents in this context include company guidelines, policies, and Standard Operating Procedures. For this scenario, an employee has a question in how to proceed and edit an internal issue ticketing ticket. The employee can access and use the generative artificial intelligence (AI) conversational bot to ask and execute the next steps for a specific ticket.
Specific use case: Automate issue resolution for employees based on corporate guidelines.

The following steps describe the AWS Lambda functions and their flow through the process:

LangChain agent to identify the intent
Send notification based on employee request
Modify ticket status

In this architecture diagram, corporate training videos can be ingested through Amazon Transcribe to collect a log of these video scripts. Additionally, corporate training content stored in various sources (i.e., Confluence, Microsoft SharePoint, Google Drive, Jira, etc.) can be used to create indexes through Amazon Kendra connectors. Read this article to learn more on the collection of native connectors you can utilize in Amazon Kendra as a source point. The Amazon Kendra crawler is then able to use both the corporate training video scripts and documentation stored in these other sources to assist the conversational bot in answering questions specific to company corporate training guidelines. The LangChain agent verifies permissions, modifies ticket status, and notifies the correct individuals using Amazon Simple Notification Service (Amazon SNS).
Customer Support Teams
Quickly resolving customer queries improves the customer experience and encourages brand loyalty. A loyal customer base helps drive sales, which contributes to the bottom line and increases customer engagement. Customer support teams spend lots of energy referencing many internal documents and customer relationship management software to answer customer queries about products and services. Internal documents in this context can include generic customer support call scripts, playbooks, escalation guidelines, and business information. The generative AI conversational bot helps with cost optimization because it handles queries on behalf of the customer support team.
Specific use case: Handling an oil change request based on service history and customer service plan purchased.

In this architecture diagram, the customer is routed to either the generative AI conversational bot or the Amazon Connect contact center. This decision can be based on the level of support needed or the availability of customer support agents. The LangChain agent identifies the customer’s intent and verifies identity. The LangChain agent also checks the service history and purchased support plan.
The following steps describe the AWS Lambda functions and their flow through the process:

LangChain agent identifies the intent
Retrieve Customer Information
Check customer service history and warranty information
Book appointment, provide more information, or route to contact center
Send email confirmation

Amazon Connect is used to collect the voice and chat logs, and Amazon Comprehend is used to remove personally identifiable information (PII) from these logs. The Amazon Kendra crawler is then able to use the redacted voice and chat logs, customer call scripts, and customer service support plan policies to create the index. Once a decision is made, the generative AI conversational bot decides whether to book an appointment, provide more information, or route the customer to the contact center for further assistance. For cost optimization, the LangChain agent can also generate answers using fewer tokens and a less expensive large language model for lower priority customer queries.
Financial Services
Financial services companies rely on timely use of information to stay competitive and comply with financial regulations. Using a generative AI conversational bot, financial analysts and advisors can interact with textual information in a conversational manner and reduce the time and effort it takes to make better informed decisions. Outside of investment and market research, a generative AI conversational bot can also augment human capabilities by handling tasks that would traditionally require more human effort and time. For example, a financial institution specializing in personal loans can increase the rate at which loans are processed while providing better transparency to customers.
Specific use case: Use customer financial history and previous loan applications to decide and explain loan decision.

The following steps describe the AWS Lambda functions and their flow through the process:

LangChain agent to identify the intent
Check customer financial and credit score history
Check internal customer relationship management system
Check standard loan policies and suggest decision for employee qualifying the loan
Send notification to customer

This architecture incorporates customer financial data stored in a database and data stored in a customer relationship management (CRM) tool. These data points are used to inform a decision based on the company’s internal loan policies. The customer is able to ask clarifying questions to understand what loans they qualify for and the terms of the loans they can accept. If the generative AI conversational bot is unable to approve a loan application, the user can still ask questions about improving credit scores or alternative financing options.
Government
Generative AI conversational bots can greatly benefit government institutions by speeding up communication, efficiency, and decision-making processes. Generative AI conversational bots can also provide instant access to internal knowledge bases to help government employees to quickly retrieve information, policies, and procedures (i.e., eligibility criteria, application processes, and citizen’s services and support). One solution is an interactive system, which allows tax payers and tax professionals to easily find tax-related details and benefits. It can be used to understand user questions, summarize tax documents, and provide clear answers through interactive conversations.
Users can ask questions such as:

How does inheritance tax work and what are the tax thresholds?
Can you explain the concept of income tax?
What are the tax implications when selling a second property?

Additionally, users can have the convenience of submitting tax forms to a system, which can help verify the correctness of the information provided.

This architecture illustrates how users can upload completed tax forms to the solution and utilize it for interactive verification and guidance on how to accurately completing the necessary information.
Healthcare
Healthcare businesses have the opportunity to automate the use of large amounts of internal patient information, while also addressing common questions regarding use cases such as treatment options, insurance claims, clinical trials, and pharmaceutical research. Using a generative AI conversational bot enables quick and accurate generation of answers about health information from the provided knowledge base. For example, some healthcare professionals spend a lot of time filling in forms to file insurance claims.
In similar settings, clinical trial administrators and researchers need to find information about treatment options. A generative AI conversational bot can use the pre-built connectors in Amazon Kendra to retrieve the most relevant information from the millions of documents published through ongoing research conducted by pharmaceutical companies and universities.
Specific use case: Reduce the errors and time needed to fill out and send insurance forms.

In this architecture diagram, a healthcare professional is able to use the generative AI conversational bot to figure out what forms need to be filled out for the insurance. The LangChain agent is then able to retrieve the right forms and add the needed information for a patient as well as giving responses for descriptive parts of the forms based on insurance policies and previous forms. The healthcare professional can edit the responses given by the LLM before approving and having the form delivered to the insurance portal.
The following steps describe the AWS Lambda functions and their flow through the process:

LangChain agent to identify the intent
Retrieve the patient information needed
Fill out the insurance form based on the patient information and form guideline
Submit the form to the insurance portal after user approval

AWS HealthLake is used to securely store the health data including previous insurance forms and patient information, and Amazon Comprehend is used to remove personally identifiable information (PII) from the previous insurance forms. The Amazon Kendra crawler is then able to use the set of insurance forms and guidelines to create the index. Once the form(s) are filled out by the generative AI, then the form(s) reviewed by the medical professional can be sent to the insurance portal.
Cost estimate
The cost of deploying the base solution as a proof-of-concept is shown in the following table. Since the base solution is considered a proof-of-concept, Amazon Kendra Developer Edition was used as a low-cost option since the workload would not be in production. Our assumption for Amazon Kendra Developer Edition was 730 active hours for the month.
For Amazon SageMaker, we made an assumption that the customer would be using the ml.g4dn.2xlarge instance for real-time inference, with a single inference endpoint per instance. You can find more information on Amazon SageMaker pricing and available inference instance types here.

Service
Resources Consumed
Cost Estimate Per Month in USD

AWS Amplify
150 build minutes 1 GB of Data served 500,000 requests
15.71

Amazon API Gateway
1M REST API Calls
3.5

AWS Lambda
1 Million requests 5 seconds duration per request 2 GB memory allocated
160.23

Amazon DynamoDB
1 million reads 1 million writes 100 GB storage
26.38

Amazon Sagemaker
Real-time inference with ml.g4dn.2xlarge
676.8

Amazon Kendra
Developer Edition with 730 hours/month 10,000 Documents scanned 5,000 queries/day
821.25

.
.
Total Cost: 1703.87

*  Amazon Cognito has a free tier of 50,000 Monthly Active Users who use Cognito User Pools or 50 Monthly Active Users who use SAML 2.0 identity providers
Clean Up
To save costs, delete all the resources you deployed as part of the tutorial. You can delete any SageMaker endpoints you may have created via the SageMaker console. Remember, deleting an Amazon Kendra index doesn’t remove the original documents from your storage.
Conclusion
In this post, we showed you how to simplify access to internal information by summarizing from multiple repositories in real-time. After the recent developments of commercially available LLMs, the possibilities of generative AI have become more apparent. In this post, we showcased ways to use AWS services to create a serverless chatbot that uses generative AI to answer questions. This approach incorporates an authentication layer and Amazon Comprehend’s PII detection to filter out any sensitive information provided in the user’s query. Whether it be individuals in healthcare understanding the nuances to file insurance claims or HR understanding specific company-wide regulations, there’re multiple industries and verticals that can benefit from this approach. An Amazon SageMaker JumpStart foundation model is the engine behind the chatbot, while a context stuffing approach using the RAG technique is used to ensure that the responses more accurately reference internal documents.
To learn more about working with generative AI on AWS, refer to Announcing New Tools for Building with Generative AI on AWS. For more in-depth guidance on using the RAG technique with AWS services, refer to Quickly build high-accuracy Generative AI applications on enterprise data using Amazon Kendra, LangChain, and large language models. Since the approach in this blog is LLM agnostic, any LLM can be used for inference. In our next post, we’ll outline ways to implement this solution using Amazon Bedrock and the Amazon Titan LLM.

About the Authors
Abhishek Maligehalli Shivalingaiah is a Senior AI Services Solution Architect at AWS. He is passionate about building applications using Generative AI, Amazon Kendra and NLP. He has around 10 years of experience in building Data & AI solutions to create value for customers and enterprises. He has even built a (personal) chatbot for fun to answers questions about his career and professional journey. Outside of work he enjoys making portraits of family & friends, and loves creating artworks.
Medha Aiyah is an Associate Solutions Architect at AWS, based in Austin, Texas. She recently graduated from the University of Texas at Dallas in December 2022 with her Masters of Science in Computer Science with a specialization in Intelligent Systems focusing on AI/ML. She is interested to learn more about AI/ML and utilizing AWS services to discover solutions customers can benefit from.
Hugo Tse is an Associate Solutions Architect at AWS based in Seattle, Washington. He holds a Master’s degree in Information Technology from Arizona State University and a bachelor’s degree in Economics from the University of Chicago. He is a member of the Information Systems Audit and Control Association (ISACA) and International Information System Security Certification Consortium (ISC)2. He enjoys helping customers benefit from technology.
Ayman Ishimwe is an Associate Solutions Architect at AWS based in Seattle, Washington. He holds a Master’s degree in Software Engineering and IT from Oakland University. He has a prior experience in software development, specifically in building microservices for distributed web applications. He is passionate about helping customers build robust and scalable solutions on AWS cloud services following best practices.
Shervin Suresh is an Associate Solutions Architect at AWS based in Austin, Texas. He has graduated with a Masters in Software Engineering with a Concentration in Cloud Computing and Virtualization and a Bachelors in Computer Engineering from San Jose State University. He is passionate about leveraging technology to help improve the lives of people from all backgrounds.

How Does Google AI’s New Paradigm Eliminate the Composition Cost in …

In today’s data-driven landscape, ensuring privacy while maximizing the utility of machine learning and data analytics algorithms has been a pressing challenge. The cost of composition, a phenomenon where the overall privacy guarantee deteriorates with multiple computation steps, has been a significant stumbling block. Despite strides in foundational research and the adoption of differential privacy, striking the right balance between privacy and utility has remained elusive.

Existing approaches like DP-SGD have made strides in preserving privacy during machine learning model training. However, they rely on random partitioning of training examples into minibatches, which limits their effectiveness in scenarios where data-dependent selection is needed.

Meet the Reorder-Slice-Compute (RSC) paradigm, a groundbreaking development presented at STOC 2023. This innovative framework offers a solution that allows for adaptive slice selection and circumvents the composition cost. By adhering to a specific structure involving ordered data points, slice size, and a differential privacy algorithm, the RSC paradigm opens up new avenues for enhancing utility without compromising privacy.

Metrics from extensive research and experimentation demonstrate the power of the RSC paradigm. Unlike traditional approaches, the RSC analysis eliminates the dependence on the number of steps, resulting in an overall privacy guarantee comparable to that of a single step. This breakthrough significantly improves the utility of DP algorithms for a range of fundamental aggregation and learning tasks.

One standout application of the RSC paradigm lies in solving the private interval point problem. By intelligently selecting slices and leveraging a novel analysis, the RSC algorithm achieves privacy-preserving solutions with an order of log*|X| points, closing a significant gap in prior DP algorithms.

The RSC paradigm also addresses common aggregation tasks like private approximate median and private learning of axis-aligned rectangles. By employing a sequence of RSC steps tailored to the specific problem, the algorithm limits mislabeled points, offering accurate and private results.

Furthermore, the RSC paradigm offers a game-changing approach to ML model training. By allowing for data-dependent selection order of training examples, it seamlessly integrates with DP-SGD, eliminating the privacy deterioration associated with composition. This advancement is poised to revolutionize training efficiency in production environments.

In conclusion, the Reorder-Slice-Compute (RSC) paradigm is a transformative solution to the longstanding challenge of balancing privacy and utility in data-driven environments. Its unique structure and novel analysis promise to unlock new possibilities in various aggregation and learning tasks. The RSC paradigm paves the way for more efficient and privacy-preserving machine learning model training by eliminating the composition cost. This paradigm shift marks a pivotal moment in the pursuit of robust data privacy in the era of big data.

Check out the Paper and Google Blog. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 30k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

The post How Does Google AI’s New Paradigm Eliminate the Composition Cost in Multi-Step Machine Learning ML Algorithms for Enhanced Utility appeared first on MarkTechPost.

Meet FLM-101B: An Open-Source Decoder-Only LLM With 101 Billion Parame …

Lately, Large language models (LLMs) are excelling in NLP and multimodal tasks but are facing two significant challenges: high computational costs and difficulties in conducting fair evaluations. These costs limit LLM development to a few major players, restricting research and applications. To address this, the paper introduces a growth strategy to significantly reduce LLM training expenses, emphasizing the need for cost-effective training methods in the field. 

To address the training cost challenge, researchers train a 100B LLM by the growth strategy. Growth means that the number of parameters is not fixed in the training process but expands from a smaller size to a large ones. In order to assess the intelligence of Large Language Models (LLMs), researchers have developed a comprehensive IQ evaluation benchmark. This benchmark considers four crucial aspects of intelligence:

Symbolic Mapping: LLMs are tested for their ability to generalize to new contexts using a symbolic mapping approach, similar to studies that use symbols instead of category labels.

Rule Understanding: The benchmark evaluates whether LLMs can comprehend established rules and perform actions accordingly, a key aspect of human intelligence.

Pattern Mining: LLMs are assessed for their capacity to recognize patterns through both inductive and deductive reasoning, reflecting the importance of pattern mining in various domains.

Anti-Interference Ability: This metric measures LLMs’ capability to maintain performance in the presence of external noise, highlighting the core aspect of intelligence related to resistance to interference.

The main contributions of this study can be essentially summarised as:

A pioneering achievement is the successful training of a Large Language Model (LLM) with over 100 billion parameters using a growth strategy from the ground up. Notably, this represents the most cost-effective approach to creating a 100B+ parameter model with a budget of only $100,000.

The research addresses various instability issues in LLM training through enhancements in FreeLM training objectives, promising methods for hyperparameter optimization, and the introduction of function-preserving growth. These methodological improvements hold promise for the wider research community.

Comprehensive experiments have been conducted, encompassing well-established knowledge-oriented benchmarks as well as a new systematic IQ evaluation benchmark. These experiments allow for a comparison of the model against robust baseline models, demonstrating the competitive and resilient performance of FLM-101B.

The research team made significant contributions to the research community by releasing model checkpoints, code, related tools, and other resources. These assets are aimed at fostering further research in the domain of bilingual Chinese and English LLMs at the scale of 100 billion+ parameters.

Overall, this work not only demonstrates the feasibility of cost-effective LLM training but also contributes to a more robust framework for evaluating the intelligence of these models, ultimately propelling the field closer to the realisation of AGI.

Check out the Paper and Code. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 30k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

The post Meet FLM-101B: An Open-Source Decoder-Only LLM With 101 Billion Parameters appeared first on MarkTechPost.

How to Humanize Content and Get Past AI Plagiarism

ChatGPT, Bard, and Bing can output AI-generated content faster than Usain Bolt can run the 100m. But with this speed comes issues—the content quality edges closer to the realm of plagiarism and unreliability.

Another reason is that ChatGPT never cites its sources according to academic standards. It could hallucinate and pull information out of thin air, which won’t help anyone looking to avoid plagiarism.

So, I’ll show how to humanize text in order to get past AI plagiarism checkers. But first, I’ll delve into how AI plagiarism detector works. Continue reading to discover the tools to help you avoid AI plagiarism and why you need them.

Deconstructing How An AI Plagiarism Checker Works

An AI plagiarism checker is a tool for determining whether the content you are submitting is unique or AI-generated.

When chatbots like Bard and Bing (sounds like a really cool band!) generate user content, they often lift information word for word from other websites and online resources. This makes them easy to detect because they follow a predetermined and predictable model. 

Here is an example sentence: “The sun shines brightly in the_____.”

In the example above, the most probable continuation is “morning” because the sun shines is associated with mornings. That’s what a robot with limited creativity would come up with. 

But humans might say, “The sun shines brightly in the night” because they live in the Northern Hemisphere or they are exploring edgy creativity.

And that’s the core working principle for AI detectors and plagiarism checkers.

First, the AI plagiarism checker tries to predict the content’s perplexity and burstiness. 

Perplexity measures an average user’s capability of understanding the output. Content with high perplexity is usually human. AI content sounds flat and repetitive, even if you use advanced prompts and plugins.

Similarly, burstiness refers to sentence variation in length and rhythm. Sentences in AI-generated content usually have a predictable rhythm and length. 

When humans write, the burstiness is high because we can drift into verbosity to make our point clearer and more straightforward, just like I am doing right now with this sentence. 

Sometimes, we keep it short. 

However, AI content generators usually produce a constant sentence tempo. If not, they’d make up the rest of the sentence with fluff.

With these variables (perplexity and burstiness) and other technical considerations, AI plagiarism detection tools can spot articles written by a bot or non-human virtual assistant.

But there is a problem.

Using an AI plagiarism checker online is not a reliable test of a work’s uniqueness. Some of these tools are unreliable — we don’t even know the creators or the algorithm behind them.

Besides, sometimes AI checkers can produce false positives, potentially ruining the reputations of innocent victims. Even universities are worried about these false plagiarism flags.

But instead of spending your time defending a plagiarism case that didn’t even happen, I’ll show you how to bypass AI plagiarism detection.

How To Avoid AI Plagiarism

Instead of avoiding AI entirely and missing out on its countless perks, use these hacks to get around its limitations:

Remove word repetition

After generating content with AI, edit the result and remove repetitions. 

Firstly, you don’t want your text to read like a high school student who ran out of ideas mid-essay and just wanted to reach the specified word count. 

Apart from that, poorly written AI content will harm your SEO objectives. These repetitive sentences can make your text spammy, attracting the wrath of Google’s anti-AI hammer.

So, to avoid getting your AI-generated text flagged by an online AI plagiarism checker, go over it line by line and limit redundancy.

Work on your research skills

Your research skills are your first line of action in your quest to remove AI plagiarism from your text.

Before writing, you need to research your topic extensively. Why go through the trouble? Because you need to understand the topic better to know whether the AI output is reliable or a waste of time.

You can spot errors if you know the topic before getting AI to generate content on it. You can also identify sections that read like they may have been lifted from sources that need citing.

Regard ChatGPT as an assistant

ChatGPT is not all-knowing, but it has more access to information than most humans. 

You can use AI text generators to get pointers and set the track for research. 

If you plan on writing your entire paper with ChatGPT, there is a high tendency for someone else to do the same thing. So, you risk an AI detection tool flagging your work as a plagiarized copy.

Use a paraphraser

After receiving copies from AI and verifying their authenticity, use a paraphraser to change the text’s tone. Even though you may still need to cite ideas obtained from secondary sources, paraphrasing will make the words more original and unique.

You can use an AI plagiarism checker to identify which blocks of texts to paraphrase and humanize. Tools like Quillbot can help you adjust your word choice and tone.

Use an AI humanizer (Undetectable AI)

Similar to paraphrasing tools, AI humanizers can beat any AI content detector by making robotic-sounding texts sound more human.

As its name implies, Undetectable AI is one of the best tools for you to humanize ChatGPT content without manually editing it for repetition or paraphrasing. 

This ultimate stealth tool will notify you if your content sounds human or robotic. Its robust algorithms give you multiple humanization choices for every reading level.

Cite your sources

Whether using AI or not, citing every information that does not come directly from you is non-negotiable. If you pass information from an AI text generator as yours, that’s blatant plagiarism.

Proper academic practice suggests you always give credit to the author(s). Even if you are referring to your work, you still need to cite the source to avoid self-plagiarism. You can always use citation tools to find authors and generate reference lists.

Double-check every fact

Large language models pull information from everywhere on the internet, often using a complex probability method to choose words likely to come after others in most contexts. This means that they can generate convincing statements that have no substance.

Nobody will cut you slack for putting out false information just because AI wrote it for you. So, whenever you get information using AI, double-check every fact. 

Add examples and personal stories

Break up the text with anecdotes and topic-relevant examples to beef up the information in your content. 

For example, as part of your climate change paper, you can discuss how your local government tackled desert encroachment by planting more trees. 

These examples will make the AI-generated content sound more human, as well as provide an opportunity to incorporate your personal views.

Adjust the tone

AI-generated texts have all the appeal and creativity of a cardboard box—not even GPT 4.0 can replicate human ingenuity. This makes them easy targets for AI detectors. 

So, if you want to evade the all-seeing eye of detectors like Originality.ai, adjust the tone of your AI-generated text.

Besides avoiding AI detection, adjusting the tone of your copy makes it more personal and appealing to the target audience—whether they are teachers grading your essay or customers reading your copy.

Proofread and rewrite

After getting the generated texts from AI, proofread and rewrite the content. Doing so will help you eliminate grammatical errors and improve the quality of your work.

Use an AI detector tool to highlight which parts of your text sound robotic. Then, rewrite it according to your target audience’s expertise level and the preferred tone of voice.

Undetectable AI Can Remove AI Plagiarism

Undetectable AI is an AI detection remover that rewrites auto-populated material to evade detection. This text humanizer uses advanced algorithms and paraphrasing features that help users generate texts similar to those written by humans.

The originality of this AI text humanizer is unmatched, as content creators can quickly recreate up to 15,000 characters of text for free. You can also use Undetectable AI’s humanization engine to improve your site’s SEO by optimizing your content for search engines.

How much would it cost you?

First-time users can use the free AI plagiarism remover until the credits expire. Afterward, you’d need to buy more credits to continue using the tool. 

You can pay for the Monthly plan for a starting price of $9.99 for 1000 words every month. Or you can save money by going for the Yearly plan at the discounted price of $5.00 per month. Businesses with large AI text humanization needs can acquire Undetectable AI for a negotiated price.

Why do you need to get past AI plagiarism?

Plagiarism is a huge offense as it can be considered stealing one’s intellectual property. This offense can tarnish your reputation or even land you in jail. Several politicians, influencers, and artists have lost their livelihoods or tarnished their reputations because of plagiarism accusations.

To avoid AI content detection tools flagging your material, you can reword it manually or with the help of a paraphrasing tool. When you humanize content online, always maintain correct syntax, grammar, and sentence structure.

Final words

People have gotten comfortable using AI to generate content. Unfortunately, they can misuse it and plagiarize other people’s ideas since our current iteration of AI pulls information from existing sources. And that doesn’t include other accompanying issues like readability, tone, and credibility of information.

To avoid getting flagged for plagiarism, use the tips in this article when creating content to humanize AI content for free. The recommended tools and techniques aren’t exhaustive but are the ideal bases for your journey with AI and artificially generated content.

FAQs

Why would I want to bypass AI content detectors?

Bypassing AI content detectors keeps you in Google’s good books and helps you avoid plagiarism. Since AI is now a part of content creation, with various detectors available to identify such texts, you must consider measures to help you bypass them. Getting flagged by AI content detectors isn’t a death sentence for your article, but it can indicate lower-quality content and incur penalties from search engines. 

What are AI content detectors?

An AI content detector is a machine learning program that identifies AI-generated texts. Since the rise of tools like ChatGPT, people have developed AI detectors to keep AI misuse in check, leading people to appreciate the need to humanize AI content online.

Can Undetectable.ai bypass multiple AI detectors?

Yes, Undetectable AI can easily bypass AI detectors by rewriting and humanizing AI-generated texts. It produces results that pass GPTZero and Google’s checks. The software is recommended as it can also efficiently rewrite and rephrase texts in different tones while correcting grammatical and contextual errors. 

Which is the most effective tool for bypassing AI content detectors?

Even though there are different AI content detectors on the internet today, one stands out. The most effective tool for bypassing AI content detectors is Undetectable AI. It stands out because of its near-perfect human tone and unique algorithms that help content creators optimize their texts.

How can I circumvent AI content detectors?

There are so many ways to circumvent AI detectors: eliminating fluff and repetitive ideas, working on your research skills, using AI responsibly, paraphrasing, and using an AI humanizer like Undetectable AI and AI Detection Bypass.

So, Now Let’s Try Undetectable AI:

Step 1: Go to https://undetectable.ai/

Step 2: Signup or Log in

Step 3: After login, you should see the screen below

Step 4: Now in the playground box (black box), paste your content. Now, choose what you want to do from the given options.

Step 5: Watch the demo to see the results and steps in detail:

Also, don’t forget to join our 29k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

Thanks to Undetectable for the thought leadership/ Educational article. Undetectable has supported us in this content/article.

The post How to Humanize Content and Get Past AI Plagiarism appeared first on MarkTechPost.

Visualize an Amazon Comprehend analysis with a word cloud in Amazon Qu …

Searching for insights in a repository of free-form text documents can be like finding a needle in a haystack. A traditional approach might be to use word counting or other basic analysis to parse documents, but with the power of Amazon AI and machine learning (ML) tools, we can gather deeper understanding of the content.
Amazon Comprehend is a fully, managed service that uses natural language processing (NLP) to extract insights about the content of documents. Amazon Comprehend develops insights by recognizing the entities, key phrases, sentiment, themes, and custom elements in a document. Amazon Comprehend can create new insights based on understanding the document structure and entity relationships. For example, with Amazon Comprehend, you can scan an entire document repository for key phrases.
Amazon Comprehend lets non-ML experts easily do tasks that normally take hours of time. Amazon Comprehend eliminates much of the time needed to clean, build, and train your own model. For building deeper custom models in NLP or any other domain, Amazon SageMaker enables you to build, train, and deploy models in a much more conventional ML workflow if desired.
In this post, we use Amazon Comprehend and other AWS services to analyze and extract new insights from a repository of documents. Then, we use Amazon QuickSight to generate a simple yet powerful word cloud visual to easily spot themes or trends.
Overview of solution
The following diagram illustrates the solution architecture.

To begin, we gather the data to be analyzed and load it into an Amazon Simple Storage Service (Amazon S3) bucket in an AWS account. In this example, we use text formatted files. The data is then analyzed by Amazon Comprehend. Amazon Comprehend creates a JSON formatted output that needs to be transformed and processed into a database format using AWS Glue. We verify the data and extract specific formatted data tables using Amazon Athena for a QuickSight analysis using a word cloud. For more information about visualizations, refer to Visualizing data in Amazon QuickSight.
Prerequisites
For this walkthrough, you should have the following prerequisites:

An AWS account
Access to the AWS Management Console
Basic database table knowledge
S3 buckets for input and output data

Upload data to an S3 bucket
Upload your data to an S3 bucket. For this post, we use UTF-8 formatted text of the US Constitution as the input file. Then you’re ready to analyze the data and create visualizations.
Analyze data using Amazon Comprehend
There are many types of text-based and image information that can be processed using Amazon Comprehend. In addition to text files, you can use Amazon Comprehend for one-step classification and entity recognition to to accept image files, PDF files, and Microsoft Word files as input, which are not discussed in this post.
To analyze your data, complete the following steps:

On the Amazon Comprehend console, choose Analysis jobs in the navigation pane.
Choose Create analysis job.
Enter a name for your job.
For Analysis type, choose Key phrases.
For Language¸ choose English.
For Input data location, specify the folder you created as a prerequisite.
For Output data location, specify the folder you created as a prerequisite.
Choose Create an IAM role.
Enter a suffix for the role name.
Choose Create job.

The job will run and the status will be displayed on the Analysis jobs page.

Wait for the analysis job to complete. Amazon Comprehend will create a file and place it in the output data folder you provided. The file is in .gz or GZIP format.
This file needs to be download and converted to a non-compressed format. You can download an object from the data folder or S3 bucket using the Amazon S3 console.

On the Amazon S3 console, select the object and choose Download. If you want to download the object to a specific folder, choose Download on the Actions menu.
After you download the file to your local computer, open the zipped file and save it as an uncompressed file.

The uncompressed file must be uploaded to the output folder before the AWS Glue crawler can process it. For this example, we upload the uncompressed file into the same output folder that we use in later steps.

On the Amazon S3 console, navigate to your S3 bucket and choose Upload.
Choose Add files.
Choose the uncompressed files from your local computer.
Choose Upload.

After you upload the file, delete the original zipped file.

On the Amazon S3 console, select the bucket and choose Delete.
Confirm the file name to permanently delete the file by entering the file name in the text box.
Choose Delete objects.

This will leave one file remaining in the output folder: the uncompressed file.
Convert JSON data to table format using AWS Glue
In this step, you prepare the Amazon Comprehend output to be used as input into Athena. The Amazon Comprehend output is in JSON format. You can use AWS Glue to convert JSON into a database structure to ultimately be read by QuickSight.

On the AWS Glue console, choose Crawlers in the navigation pane.
Choose Create crawler.
Enter a name for your crawler.
Choose Next.
For Is your data already mapped to Glue tables, select Not yet.
Add a data source.
For S3 path, enter the location of the Amazon Comprehend output data folder.

Be sure to add the trailing / to the path name. AWS Glue will search the folder path for all files.

Select Crawl all sub-folders.
Choose Add an S3 data source.

Create a new AWS Identity and Access Management (IAM) role for the crawler.
Enter a name for the IAM role.
Choose Update chosen IAM role to be sure the new role is assigned to the crawler.
Choose Next to enter the output (database) information.
Choose Add database.
Enter a database name.
Choose Next.
Choose Create crawler.
Choose Run crawler to run the crawler.

You can monitor the crawler status on the AWS Glue console.
Use Athena to prepare tables for QuickSight
Athena will extract data from the database tables the AWS Glue crawler created to provide a format that QuickSight will use to create the word cloud.

On the Athena console, choose Query editor in the navigation pane.
For Data source, choose AwsDataCatalog.
For Database, choose the database the crawler created.

To create a table compatible for QuickSight, the data must be unnested from the arrays.

The first step is to create a temporary database with the relevant Amazon Comprehend data:

CREATE TABLE temp AS
SELECT keyphrases, nested
FROM output
CROSS JOIN UNNEST(output.keyphrases) AS t (nested)

The following statement limits to phrases of at least three words and groups by frequency of the phrases:

CREATE TABLE tableforquicksight AS
SELECT COUNT(*) AS count, nested.text
FROM temp
WHERE nested.Score > .9 AND
length(nested.text) – length(replace(nested.text, ‘ ‘, ”)) + 1 > 2
GROUP BY nested.text
ORDER BY count desc

Use QuickSight to visualize output
Finally, you can create the visual output from the analysis.

On the QuickSight console, choose New analysis.
Choose New dataset.
For Create a dataset, choose From new data sources.
Choose Athena as the data source.
Enter a name for the data source and choose Create data source.

Choose Visualize.

Make sure QuickSight has access to the S3 buckets where the Athena tables are stored.

On the QuickSight console, choose the user profile icon and choose Manage QuickSight.

Choose Security & permissions.

Look for the section QuickSight access to AWS services.

By configuring access to AWS services, QuickSight can access the data in those services. Access by users and groups can be controlled through the options.

Verify Amazon S3 is granted access.

Now you can create the word cloud.

Choose the word cloud under Visual types.
Drag text to Group by and count to Size.

Choose the options menu (three dots) in the visualization to access the edit options. For example, you might want to hide the term “other” from the display. You can also edit items such as the title and subtitle for your visual. To download the word cloud as a PDF, choose Download on the QuickSight toolbar.
Clean up
To avoid incurring ongoing charges, delete any unused data and processes or resources provisioned on their respective service console.
Conclusion
Amazon Comprehend uses NLP to extract insights about the content of documents. It develops insights by recognizing the entities, key phrases, language, sentiments, and other common elements in a document. You can use Amazon Comprehend to create new products based on understanding the structure of documents. For example, with Amazon Comprehend, you can scan an entire document repository for key phrases.
This post described the steps to build a word cloud to visualize a text content analysis from Amazon Comprehend using AWS tools and QuickSight to visualize the data.
Let’s stay in touch via the comments section!

About the Authors
Kris Gedman is the US East sales leader for Retail & CPG at Amazon Web Services. When not working, he enjoys spending time with his friends and family, especially summers on Cape Cod. Kris is a temporarily retired Ninja Warrior but he loves watching and coaching his two sons for now.
Clark Lefavour is a Solutions Architect leader at Amazon Web Services, supporting enterprise customers in the East region. Clark is based in New England and enjoys spending time architecting recipes in the kitchen.

Can Low-Cost Quadrupedal Robots Master Parkour? Unveiling a Revolution …

The quest to make robots perform complex physical tasks, such as navigating challenging environments, has been a long-standing challenge in robotics. One of the most demanding tasks in this domain is parkour, a sport that involves traversing obstacles with speed and agility. Parkour requires a combination of skills, including climbing, leaping, crawling, and tilting, which is particularly challenging for robots due to the need for precise coordination, perception, and decision-making. The primary problem this paper and article aim to address is how to efficiently teach robots these agile parkour skills, enabling them to navigate through diverse real-world scenarios.

Before delving into the proposed solution, it’s essential to understand the current state of the art in robotic locomotion. Traditional methods often involve manually designing control strategies, which can be highly labor-intensive and need more adaptability to different scenarios. Reinforcement learning (RL) has shown promise in teaching robots complex tasks. However, RL methods face challenges related to exploration and transferring learned skills from simulation to the real world.

Now, let’s explore the innovative approach introduced by a research team to tackle these challenges. The researchers have developed a two-stage RL method designed to effectively teach parkour skills to robots. The uniqueness of their approach lies in integrating “soft dynamics constraints” during the initial training phase, which is crucial for efficient skill acquisition.

The researchers’ approach comprises several key components contributing to its effectiveness.

1. Specialized Skill Policies: The method’s foundation involves constructing specialized skill policies essential for parkour. These policies are created using a combination of recurrent neural networks (GRU) and multilayer perceptrons (MLP) that output joint positions. They consider various sensory inputs, including depth images, proprioception (awareness of the body’s position), previous actions, and more. This combination of inputs allows robots to make informed decisions based on their environment.

2. Soft Dynamics Constraints: The approach’s innovative aspect is using “soft dynamics constraints” during the initial training phase. These constraints guide the learning process by providing robots with critical information about their environment. By introducing soft dynamics constraints, the researchers ensure that robots can explore and learn parkour skills efficiently. This results in faster learning and improved performance.

3. Simulated Environments: The researchers employ simulated environments created with IsaacGym to train the specialized skill policies. These environments consist of 40 tracks, each containing 20 obstacles of varying difficulties. The obstacles’ properties, such as height, width, and depth, increase linearly in complexity across the tracks. This setup allows robots to learn progressively challenging parkour skills.

4. Reward Structures: Reward structures are crucial in reinforcement learning. The researchers meticulously define reward terms for each specialized skill policy. These reward terms align with specific objectives, such as velocity, energy conservation, penetration depth, and penetration volume. The reward structures are carefully designed to incentivize and discourage undesirable behaviors.

5. Domain Adaptation: Transferring skills learned in simulation to the real world is a substantial challenge in robotics. The researchers employ domain adaptation techniques to bridge this gap. Robots can apply their parkour abilities in practical settings by adapting the skills acquired in simulated environments to real-world scenarios.

6. Vision as a Key Component: Vision plays a pivotal role in enabling robots to perform parkour with agility. Vision sensors, such as depth cameras, provide robots with critical information about their surroundings. This visual perception enables robots to sense obstacle properties, prepare for agile maneuvers, and make informed decisions while approaching obstacles.

7. Performance: The proposed method surpasses several baseline methods and ablations. Notably, the two-stage RL approach with soft dynamics constraints accelerates learning significantly. Robots trained using this method achieve higher success rates in tasks requiring exploration, including climbing, leaping, crawling, and tilting. Additionally, recurrent neural networks prove indispensable for skills that demand memory, such as climbing and jumping.

In conclusion, this research addresses the challenge of efficiently teaching robots agile parkour skills. The innovative two-stage RL approach with soft dynamics constraints has revolutionized how robots acquire these skills. It leverages vision, simulation, reward structures, and domain adaptation, opening up new possibilities for robots to navigate complex environments with precision and agility. Vision’s integration underscores its importance in robotic dexterity, allowing real-time perception and dynamic decision-making. In summary, this innovative approach marks a significant advancement in robotic locomotion, solving the problem of teaching parkour skills and expanding robots’ capabilities in complex tasks.

Check out the Paper, Code, and Project Page. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 30k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

The post Can Low-Cost Quadrupedal Robots Master Parkour? Unveiling a Revolutionary Learning System for Agile Robotic Movement appeared first on MarkTechPost.

Is The Wait for Jurassic Park Over? This AI Model Uses Image-to-Image …

Image-to-image translation (I2I) is an interesting field within computer vision and machine learning that holds the power to transform visual content from one domain into another seamlessly. This transformative process goes beyond the simple change of pixel values; it entails a profound understanding of the underlying structures, semantics, and styles of images. 

I2I has found extensive applications in various domains, from generating artistic renditions of photographs to converting satellite images into maps and even translating sketches into photorealistic images. It leverages the capabilities of deep learning models, such as Generative Adversarial Networks (GANs) and Convolutional Neural Networks (CNNs). 

Traditional I2I methods have primarily focused on translating between domains with small gaps, such as photos to paintings or different types of animals. However, these tasks do not require generating significantly different visual features or inferences about shape during the translation process. 

Let us meet with Revive-2I, a novel approach to I2I, that explores the task of translating skulls into living animals, a task known as Skull2Animal.

Skull2Animal is a challenging task that involves translating skulls into images of living animals. This task presents a significant challenge as it requires generating new visual features, textures, and colors, and making inferences about the geometry of the target domain.

Skull2Image task. Source: https://arxiv.org/abs/2308.07316

To overcome the challenges of long I2I translation, Revive-2I uses text prompts that describe the desired changes in the image. It can generate realistic and verifiable results. This approach offers a stricter constraint for acceptable translations, ensuring the generated images align with the intended target domain.

Revive-2I utilizes natural language prompts to perform zero-shot I2I via latent diffusion models. 

Revive-2I consists of two main steps: encoding and text-guided decoding. In the encoding step, the source image is transformed into a latent representation using a process called diffusion. This latent representation is then noised to incorporate the desired changes. By performing the diffusion process in the latent space, Revive-2I achieves faster and more efficient translations.

Overview of Revive-2I. Source: https://arxiv.org/abs/2308.07316

Finding the sweet spot for Revive-2I was not an easy task. This had to be experimented with different numbers of steps in the forward diffusion process. By taking partial steps, the translation process can better preserve the content of the source image while incorporating the features of the target domain. This approach allows for more robust translations while still injecting the desired changes guided by the text prompts.

The ability to perform constrained longI2I has significant implications in various fields. For example, law enforcement agencies can utilize this technology to generate realistic images of suspects based on sketches, aiding in identification. Wildlife conservationists can showcase the effects of climate change on ecosystems and habitats by translating images of endangered species into their living counterparts. Additionally, paleontologists can bring ancient fossils to life by translating them into images of their living. Looks like we can finally have Jurassic Park.

Check out the Paper, Code, and Project Page. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 30k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

The post Is The Wait for Jurassic Park Over? This AI Model Uses Image-to-Image Translation to Bring Ancient Fossils to Life appeared first on MarkTechPost.

Enhancing GPT-4 Summarization Through Chain of Density Prompts

Large Language Models have gained a lot of attention in recent times due to their excellent capabilities. LLMs are capable of everything from question answering and content generation to language translation and textual summarization. Recent developments in automatic summarization are largely attributable to a change in strategy from supervised fine-tuning on labeled datasets to the use of Large Language Models like OpenAI developed GPT-4 with zero-shot prompting. This change enables careful prompting to customize a variety of summary properties, including length, themes, and style, without the need for extra training.

In automatic summarization, deciding how much information to include in a summary is a difficult task. An excellent summary should strike a careful balance between being comprehensive and entity-centric while avoiding overly dense language that might be confusing to readers. In recent research, a team of researchers has conducted a study using the well-known GPT-4 to create summaries with a Chain of Density (CoD) prompt in order to understand the trade-off better.

The main goal of this study was to find a limit by collecting human preferences for a collection of summaries produced by GPT-4 that are progressively more dense. The CoD prompt comprised several steps, and GPT-4 initially generated a summary with a limited number of listed entities. It then incrementally lengthened the summary by including the missing salient items. In comparison to summaries produced by a conventional GPT-4 prompt, these CoD-generated summaries were distinguished by enhanced abstraction, a higher level of fusion, i.e., information integration, and less bias towards the beginning of the source text.

One hundred items from CNN DailyMail were used in human preference research to evaluate the efficacy of CoD-generated summaries. The study’s results showed that GPT-4 summaries generated with the CoD prompt, which were denser than those generated by a vanilla prompt yet drew close to the density of human-written summaries, were preferred by human evaluators. This implies that achieving the ideal balance between informativeness and readability in summary is crucial. The researchers also released 5,000 unannotated CoD summaries in addition to the human preference study, all of which are available to the public on the HuggingFace website.

The team has summarized their key contributions as follows –

The Chain of Density (CoD) method has been introduced, which is an iterative prompt-based strategy that gradually improves the entity density of summaries produced by GPT-4.

Comprehensive Evaluation: The research thoroughly evaluates ever-denser CoD summaries, including manual and automatic evaluations. By favoring fewer entities, clarity, and informativeness in summarizations, this evaluation seeks to understand the delicate balance between the two.

Open Source Resources: The study offers open-source access to 5,000 unannotated CoD summaries, annotations, and summaries produced by GPT-4. These tools are made available for analysis, assessment, or instruction, promoting continued development in the automatic summarization sector.

In conclusion, this research highlights the ideal balance between compactness and informativeness in automatic summaries, as determined by human preferences, and contends that it is desirable for automated summarization processes to achieve a level of density close to that of human-generated summaries.

Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 30k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

The post Enhancing GPT-4 Summarization Through Chain of Density Prompts appeared first on MarkTechPost.

Amazon SageMaker simplifies the Amazon SageMaker Studio setup for indi …

Today, we are excited to announce the simplified Quick setup experience in Amazon SageMaker. With this new capability, individual users can launch Amazon SageMaker Studio with default presets in minutes.
SageMaker Studio is an integrated development environment (IDE) for machine learning (ML). ML practitioners can perform all ML development steps—from preparing their data to building, training, and deploying ML models—within a single, integrated visual interface. You also get access to a large collection of models and pre-built solutions that you can deploy with a few clicks.
To use SageMaker Studio or other personal apps such as Amazon SageMaker Canvas, or to collaborate in shared spaces, AWS customers need to first set up a SageMaker domain. A SageMaker domain consists of an associated Amazon Elastic File System (Amazon EFS) volume, a list of authorized users, and a variety of security, application, policy, and Amazon Virtual Private Cloud (Amazon VPC) configurations. When a user is onboarded to a SageMaker domain, they are assigned a user profile that they can use to launch their apps. User authentication can be via AWS IAM Identity Center (successor to AWS Single Sign-On) or AWS Identity and Access Management (IAM).
Setting up a SageMaker domain and associated user profiles requires understanding the concepts of IAM roles, domains, authentication, and VPCs, and going through a number of configuration steps. To complete these configuration steps, data scientists and developers typically work with their IT admin teams who provision SageMaker Studio and set up the right guardrails.
Customers told us that the onboarding process can sometimes be time consuming, delaying data scientists and ML teams from getting started with SageMaker Studio. We listened and simplified the onboarding experience!
Introducing the simplified Quick Studio setup
The new Quick Studio setup experience for SageMaker provides a new onboarding and administration experience that makes it easy for individual users to set up and manage SageMaker Studio. Data scientists and ML admins can set up SageMaker Studio in minutes with a single click. SageMaker takes care of provisioning the SageMaker domain with default presets, including setting up the IAM role, IAM authentication, and public internet mode. ML admins can alter SageMaker Studio settings for the created domain and customize the UI further at any time. Let’s take a look at how it works.
Prerequisites
To use the Quick Studio setup, you need the following:

An AWS account
An IAM role with permissions to create the resources needed to set up a SageMaker domain

Use the Quick Studio setup option
Let’s discuss a scenario where a new user wants to access SageMaker Studio. The user experience includes the following steps:

In your AWS account, navigate to the SageMaker console and choose Set up for single user.

SageMaker starts preparing the SageMaker domain. This process typically takes a few minutes. The new domain’s name is prefixed with QuickSetupDomain-.

As soon as the SageMaker domain is ready, a notification appears on the screen stating “The SageMaker Domain is ready” and the user profile under the domain is also created successfully.

Choose Launch next to the created user profile and choose Studio.

Because it’s the first time SageMaker Studio is getting launched for this user profile, SageMaker creates a new JupyterServer app, which takes a few minutes.

A few minutes later, the Studio IDE loads and you’re presented with the SageMaker Studio Home page.

Components of the Quick Studio setup
When using the Quick Studio setup, SageMaker creates the following resources:

A new IAM role with the appropriate permissions for using SageMaker Studio, Amazon Simple Storage Service (Amazon S3), and SageMaker Canvas. You can modify the permissions of the created IAM role at any time based on your use case or persona-specific requirements.
Another IAM role prefixed with AmazonSagemakerCanvasForecastRole-, which enables permissions for the SageMaker Canvas time series forecasting feature.
A SageMaker Studio domain and a user profile for the domain with unique names. IAM is used as the authentication mode. The IAM role created is used as the default SageMaker execution role for the domain and user profile. You can launch any of the personal apps available, such as SageMaker Studio and SageMaker Canvas, which are enabled by default.
An EFS volume, which serves as the file system for SageMaker Studio. Apart from Amazon EFS, a new S3 bucket with prefix sagemaker-studio- is created for notebook sharing.

SageMaker Studio also uses the default VPC and its associated subnets. If there is no default VPC, or if the default VPC has no subnets, then it selects one of the existing VPCs that has associated subnets. If there is no VPC, it will prompt the user to create one on the Amazon VPC console. The VPC with all subnets under it are used to set up Amazon EFS.
Conclusion
Now, a single click is all it takes to get started with SageMaker Studio. The Quick Studio setup for individual users is available in all AWS commercial Regions where SageMaker is currently available.
Try out this new feature on the SageMaker console and let us know what you think. We always look forward to your feedback! You can send it through your usual AWS Support contacts or post it on the AWS Forum for SageMaker.

About the authors
Vikesh Pandey is a Machine Learning Specialist Solutions Architect at AWS, helping customers from financial industries design and build solutions on generative AI and ML. Outside of work, Vikesh enjoys trying out different cuisines and playing outdoor sports.
Anastasia Tzeveleka is a Machine Learning and AI Specialist Solutions Architect at AWS. She works with customers in EMEA and helps them architect machine learning solutions at scale using AWS services. She has worked on projects in different domains including natural language processing (NLP), MLOps, and low-code/no-code tools.

Unlocking language barriers: Translate application logs with Amazon Tr …

Application logs are an essential piece of information that provides crucial insights into the inner workings of an application. This includes valuable information such as events, errors, and user interactions that would aid an application developer or an operations support engineer to debug and provide support. However, when these logs are presented in languages other than English, it creates a significant hurdle for developers who can’t read the content, and hinders the support team’s ability to identify and address issues promptly.
In this post, we explore a solution on how you can unlock language barriers using Amazon Translate, a fully managed neural machine translation service for translating text to and from English across a wide range of supported languages. The solution will complement your existing logging workflows by automatically translating all your applications logs in Amazon CloudWatch in real time, which can alleviate the challenges posed by non-English application logs.
Solution overview
This solution shows you how you can use three key services to automate the translation of your application logs in an event-driven manner:

CloudWatch Logs is used to monitor, store, and access your log files generated from various sources such as AWS services and your applications
Amazon Translate is used to perform the translation of text to and from English
AWS Lambda is a compute service that lets you run codes to retrieve application logs and translate them through the use of the Amazon Translate SDK

The following diagram illustrates the solution architecture.

The workflow consists of the following steps:

A custom or third-party application is hosted on an Amazon Elastic Compute Cloud (Amazon EC2) instance and the generated application logs are uploaded to CloudWatch Logs via the CloudWatch Logs agent.
Each log entry written to CloudWatch Logs triggers the Lambda function subscribed to the CloudWatch log group.
The function processes the contents of the log entry and uses Amazon Translate SDK translate_text to translate the log content.
The translated log content is returned to the function.
The function writes the translated log content back to CloudWatch Logs in a different log group.

The entire process happens automatically in real time, and your developers will be able to access the translated application logs from the CloudWatch log groups with no change in how your existing application writes logs to CloudWatch.
Prerequisites
To follow through the instructions in this solution, you need an AWS account with an AWS Identity and Access Management (IAM) user who has permission to AWS CloudFormation, Amazon Translate, CloudWatch, Lambda, and IAM.
Deploy the solution
To get started, launch the following CloudFormation template to create a Lambda function, two CloudWatch log groups, and IAM role. Proceed to deploy with the default settings. This template takes about 1 minute to complete.
After the stack is created successfully, you can review the Lambda function by navigating to the Lambda console and locating the function translate-application-logs.
You can observe that there is a CloudWatch Logs trigger added to the function.

You can view the details of the trigger configuration by navigating to the Configuration tab and choosing Triggers in the navigation pane.

You can confirm that the trigger has been configured to subscribe to log events from the log group /applicationlogs. This is where your non-English application logs will be written to.
Next, choose Environment variables in the navigation pane.

Two environment variables are provided here:

source_language – The original language that the application log is in (for example, ja for Japanese)
target_language – The target language to translate the application log to (for example, en for English)

For a list of supported languages, refer to Supported languages and language codes.
Next, go to the Code tab and review the function logic:

import json, boto3, gzip, base64, os

translate = boto3.client(service_name=’translate’, region_name=os.environ[‘AWS_REGION’], use_ssl=True)
logs = boto3.client(‘logs’)

def lambda_handler(event, context):
# retrieve log messages
encoded_zipped_data = event[‘awslogs’][‘data’]
zipped_data = base64.b64decode(encoded_zipped_data)
data = gzip.decompress(zipped_data)
json_log = json.loads(data)
logGroup = json_log[‘logGroup’]+’-‘+os.environ[‘target_language’]
logStream = json_log[‘logStream’]

# check if log group exists, create if not
dlg = logs.describe_log_groups(logGroupNamePrefix=logGroup)
if len(dlg[‘logGroups’]) == 0:
logs.create_log_group(logGroupName=logGroup)

# check if log stream exists, create if not
dls = logs.describe_log_streams(logGroupName=logGroup, logStreamNamePrefix=logStream)
if len(dls[‘logStreams’]) == 0:
logs.create_log_stream(logGroupName=logGroup, logStreamName=logStream)

# translate log event messages from source language to target language
for logevent in json_log[‘logEvents’]:
logevent[‘message’] = translate.translate_text(Text=logevent[‘message’], SourceLanguageCode=os.environ[‘source_language’], TargetLanguageCode=os.environ[‘target_language’]).get(‘TranslatedText’)
del logevent[‘id’]

# write translated log events back to a different log group in CloudWatch
logs.put_log_events(
logGroupName = logGroup,
logStreamName = logStream,
logEvents = json_log[‘logEvents’]
)

# return success
return {
‘statusCode’: 200,
‘body’: ‘Translation success!’
}

Test the solution
Finally, to test the solution, you can create a log message through the CloudWatch console and choose the created log group and log stream.

After creating your log messages, you will be able to see it translated immediately.

Clean up
To clean up the resources created in this post, delete the CloudFormation stack via the CloudFormation console.
Conclusion
This post addressed the challenge faced by developers and support teams when application logs are presented in languages other than English, making it difficult for them to debug and provide support. The proposed solution uses Amazon Translate to automatically translate non-English logs in CloudWatch, and provides step-by-step guidance on deploying the solution in your environment. Through this implementation, developers can now seamlessly bridge the language barrier, empowering them to address issues swiftly and effectively.
Try out this implementation and let us know your thoughts in the comments.

About the author
Xan Huang is a Senior Solutions Architect with AWS and is based in Singapore. He works with major financial institutions to design and build secure, scalable, and highly available solutions in the cloud. Outside of work, Xan spends most of his free time with his family and documenting his daughter’s growing up journey.

Accelerate client success management through email classification with …

This is a guest post from Scalable Capital, a leading FinTech in Europe that offers digital wealth management and a brokerage platform with a trading flat rate.
As a fast-growing company, Scalable Capital’s goals are to not only build an innovative, robust, and reliable infrastructure, but to also provide the best experiences for our clients, especially when it comes to client services.
Scalable receives hundreds of email inquiries from our clients on a daily basis. By implementing a modern natural language processing (NLP) model, the response process has been shaped much more efficiently, and waiting time for clients has been reduced tremendously. The machine learning (ML) model classifies new incoming customer requests as soon as they arrive and redirects them to predefined queues, which allows our dedicated client success agents to focus on the contents of the emails according to their skills and provide appropriate responses.
In this post, we demonstrate the technical benefits of using Hugging Face transformers deployed with Amazon SageMaker, such as training and experimentation at scale, and increased productivity and cost-efficiency.
Problem statement
Scalable Capital is one of the fastest growing FinTechs in Europe. With the aim to democratize investment, the company provides its clients with easy access to the financial markets. Clients of Scalable can actively participate in the market through the company’s brokerage trading platform, or use Scalable Wealth Management to invest in an intelligent and automated fashion. In 2021, Scalable Capital experienced a tenfold increase of its client base, from tens of thousands to hundreds of thousands.
To provide our clients with a top-class (and consistent) user experience across products and client service, the company was looking for automated solutions to generate efficiencies for a scalable solution while maintaining operational excellence. Scalable Capital’s data science and client service teams identified that one of the largest bottlenecks in servicing our clients was responding to email inquiries. Specifically, the bottleneck was the classification step, in which employees had to read and label request texts on a daily basis. After the emails were routed to their proper queues, the respective specialists quickly engaged and resolved the cases.
To streamline this classification process, the data science team at Scalable built and deployed a multitask NLP model using state-of-the-art transformer architecture, based on the pre-trained distilbert-base-german-cased model published by Hugging Face. distilbert-base-german-cased uses the knowledge distillation method to pretrain a smaller general-purpose language representation model than the original BERT base model. The distilled version achieves comparable performance to the original version, while being smaller and faster. To facilitate our ML lifecycle process, we decided to adopt SageMaker to build, deploy, serve, and monitor our models. In the following section, we introduce our project architecture design.
Solution overview
Scalable Capital’s ML infrastructure consists of two AWS accounts: one as an environment for the development stage and the other one for the production stage.
The following diagram shows the workflow for our email classifier project, but can also be generalized to other data science projects.

Email classification project diagram

The workflow consists of the following components:

Model experimentation – Data scientists use Amazon SageMaker Studio to carry out the first steps in the data science lifecycle: exploratory data analysis (EDA), data cleaning and preparation, and building prototype models. When the exploratory phase is complete, we turn to VSCode hosted by a SageMaker notebook as our remote development tool to modularize and productionize our code base. To explore different types of models and model configurations, and at the same time to keep track of our experimentations, we use SageMaker Training and SageMaker Experiments.
Model build – After we decide on a model for our production use case, in this case a multi-task distilbert-base-german-cased model, fine-tuned from the pretrained model from Hugging Face, we commit and push our code to Github develop branch. The Github merge event triggers our Jenkins CI pipeline, which in turn starts a SageMaker Pipelines job with test data. This acts as a test to make sure that codes are running as expected. A test endpoint is deployed for testing purposes.
Model deployment – After making sure that everything is running as expected, data scientists merge the develop branch into the primary branch. This merge event now triggers a SageMaker Pipelines job using production data for training purposes. Afterwards, model artifacts are produced and stored in an output Amazon Simple Storage Service (Amazon S3) bucket, and a new model version is logged in the SageMaker model registry. Data scientists examine the performance of the new model, then approve if it’s in line with expectations. The model approval event is captured by Amazon EventBridge, which then deploys the model to a SageMaker endpoint in the production environment.
MLOps – Because the SageMaker endpoint is private and can’t be reached by services outside of the VPC, an AWS Lambda function and Amazon API Gateway public endpoint are required to communicate with CRM. Whenever new emails arrive in the CRM inbox, CRM invokes the API Gateway public endpoint, which in turn triggers the Lambda function to invoke the private SageMaker endpoint. The function then relays the classification back to CRM through the API Gateway public endpoint. To monitor the performance of our deployed model, we implement a feedback loop between CRM and the data scientists to keep track of prediction metrics from the model. On a monthly basis, CRM updates the historical data used for experimentation and model training. We use Amazon Managed Workflows for Apache Airflow (Amazon MWAA) as a scheduler for our monthly retrain.

In the following sections, we break down the data preparation, model experimentation, and model deployment steps in more detail.
Data preparation
Scalable Capital uses a CRM tool for managing and storing email data. Relevant email contents consist of subject, body, and the custodian banks. There are three labels to assign to each email: which line of business the email is from, which queue is appropriate, and the specific topic of the email.
Before we start training any NLP models, we ensure that the input data is clean and the labels are assigned according to expectation.
To retrieve clean inquiry contents from Scalable clients, we remove from raw email data and extra text and symbols, such as email signatures, impressums, quotes of previous messages in email chains, CSS symbols, and so on. Otherwise, our future trained models might experience degraded performance.
Labels for emails evolve over time as Scalable client service teams add new ones and refine or remove existing ones to accommodate business needs. To make sure that labels for training data as well as expected classifications for prediction are up to date, the data science team works in close collaboration with the client service team to ensure the correctness of the labels.
Model experimentation
We start our experiment with the readily available pre-trained distilbert-base-german-cased model published by Hugging Face. Because the pre-trained model is a general-purpose language representation model, we can adapt the architecture to perform specific downstream tasks—such as classification and question answering—by attaching appropriate heads to the neural network. In our use case, the downstream task we are interested in is sequence classification. Without modifying the existing architecture, we decide to fine-tune three separate pre-trained models for each of our required categories. With the SageMaker Hugging Face Deep Learning Containers (DLCs), starting and managing NLP experiments are made simple with Hugging Face containers and the SageMaker Experiments API.
The following is a code snippet of train.py:

config = AutoConfig.from_pretrained(“distilbert-base-german-cased”) # load original config
config.num_labels = num_labels # adapt original config to a specific number of labels (default is 2)
# instantiate a pretrained model
model = DistilBertForSequenceClassification.from_pretrained(“distilbert-base-german-cased”, config=config)

trainer = Trainer(
model=model, # the instantiated Transformers model to be trained
args=training_args, # training arguments, defined above
train_dataset=train_dataset, # training dataset
eval_dataset=val_dataset # evaluation dataset
)
trainer.train()

The following code is the Hugging Face estimator:

huggingface_estimator = HuggingFace(
entry_point=’train.py’,
source_dir=’./scripts’,
instance_type=’ml.p3.2xlarge’,
instance_count=1,
role=role,
transformers_version=’4.26.0′,
pytorch_version=’1.13.1′,
py_version=’py39′,
hyperparameters = hyperparameters
)

To validate the fine-tuned models, we use the F1-score due to the imbalanced nature of our email dataset, but also to compute other metrics such as accuracy, precision, and recall. For the SageMaker Experiments API to register the training job’s metrics, we need to first log the metrics to the training job local console, which are picked up by Amazon CloudWatch. Then we define the correct regex format to capture the CloudWatch logs. The metric definitions include the name of the metrics and regex validation for extracting the metrics from the training job:

metric_definitions = [
{“Name”: “train:loss”, “Regex”: “‘loss’: ([0-9]+(.|e-)[0-9]+),?”},
{“Name”: “learning_rate”, “Regex”: “‘learning_rate’: ([0-9]+(.|e-)[0-9]+),?”},
{“Name”: “val:loss”, “Regex”: “‘eval_loss’: ([0-9]+(.|e-)[0-9]+),?”},
{“Name”: “train:accuracy”, “Regex”: “‘train_accuracy’: ([0-9]+(.|e-)[0-9]+),?”},
{“Name”: “val:accuracy”, “Regex”: “‘eval_accuracy’: ([0-9]+(.|e-)[0-9]+),?”},
{“Name”: “train:precision”, “Regex”: “‘train_precision’: ([0-9]+(.|e-)[0-9]+),?”},
{“Name”: “val:precision”, “Regex”: “‘eval_precision’: ([0-9]+(.|e-)[0-9]+),?”},
{“Name”: “train:recall”, “Regex”: “‘train_recall’: ([0-9]+(.|e-)[0-9]+),?”},
{“Name”: “val:recall”, “Regex”: “‘eval_recall’: ([0-9]+(.|e-)[0-9]+),?”},
{“Name”: “train:f1”, “Regex”: “‘train_f1’: ([0-9]+(.|e-)[0-9]+),?”},
{“Name”: “val:f1”, “Regex”: “‘eval_f1’: ([0-9]+(.|e-)[0-9]+),?”},
{“Name”: “val:runtime”, “Regex”: “‘eval_runtime’: ([0-9]+(.|e-)[0-9]+),?”},
{“Name”: “val:samples_per_second”, “Regex”: “‘eval_samples_per_second’: ([0-9]+(.|e-)[0-9]+),?”},
{“Name”: “epoch”, “Regex”: “‘epoch’: ([0-9]+(.|e-)[0-9]+),?”},
]

As part of the training iteration for the classifier model, we use a confusion matrix and classification report to evaluate the result. The following figure shows the confusion matrix for line of business prediction.

Confusion Matrix

The following screenshot shows an example of the classification report for line of business prediction.

Classification Report

As a next iteration of our experiment, we’ll take advantage of multi-task learning to improve our model. Multi-task learning is a form of training where a model learns to solve multiple tasks simultaneously, because the shared information among tasks can improve learning efficiencies. By attaching two more classification heads to the original distilbert architecture, we can carry out multi-task fine-tuning, which attains reasonable metrics for our client service team.
Model deployment
In our use case, the email classifier is to be deployed to an endpoint, to which our CRM pipeline can send a batch of unclassified emails and get back predictions. Because we have other logics—such as input data cleaning and multi-task predictions—in addition to Hugging Face model inference, we need to write a custom inference script that adheres to the SageMaker standard.
The following is a code snippet of inference.py:

def model_fn(model_dir):
model = load_from_artifact(model_dir)

return model

def transform_fn(model, input_data, content_type, accept):
if content_type == “application/json”:
data = json.loads(input_data)
data = pd.DataFrame(data)

else:
raise ValueError(f”Unsupported content type: {content_type}”)

data = preprocess(data)

# Inference
with torch.no_grad():
predictions = model(data)

predictions = postprocess(predictions)

if content_type == ‘application/json’:
return json.dumps(predictions.to_dict(orient=”records”))
else:
raise NotImplementedError

When everything is up and ready, we use SageMaker Pipelines to manage our training pipeline and attach it to our infrastructure to complete our MLOps setup.
To monitor the performance of the deployed model, we build a feedback loop to enable CRM to provide us with the status of classified emails when cases are closed. Based on this information, we make adjustments to improve the deployed model.
Conclusion
In this post, we shared how SageMaker facilitates the data science team at Scalable to manage the lifecycle of a data science project efficiently, namely the email classifier project. The lifecycle starts with the initial phase of data analysis and exploration with SageMaker Studio; moves on to model experimentation and deployment with SageMaker training, inference, and Hugging Face DLCs; and completes with a training pipeline with SageMaker Pipelines integrated with other AWS services. Thanks to this infrastructure, we are able to iterate and deploy new models more efficiently, and are therefore able to improve existing processes within Scalable as well as our clients’ experiences.
To learn more about Hugging Face and SageMaker, refer to the following resources:

Use Hugging Face with Amazon SageMaker
What are AWS Deep Learning Containers?
Use Version 2.x of the SageMaker Python SDK: Frameworks: Hugging Face

About the Authors
Dr. Sandra Schmid is Head of Data Analytics at Scalable GmbH. She is responsible for data-driven approaches and use cases in the company together with her teams. Her key focus is finding the best combination of machine learning and data science models and business goals in order to gain as much business value and efficiencies out of data as possible.
Huy Dang Data Scientist at Scalable GmbH. His responsibilities include data analytics, building and deploying machine learning models, as well as developing and maintaining infrastructure for the data science team. In his spare time, he enjoys reading, hiking, rock climbing, and staying up to date with the latest machine learning developments.
Mia Chang is a ML Specialist Solutions Architect for Amazon Web Services. She works with customers in EMEA and shares best practices for running AI/ML workloads on the cloud with her background in applied mathematics, computer science, and AI/ML. She focuses on NLP-specific workloads, and shares her experience as a conference speaker and a book author. In her free time, she enjoys yoga, board games, and brewing coffee.
Moritz Guertler is an Account Executive in the Digital Native Businesses segment at AWS. He focuses on customers in the FinTech space and supports them in accelerating innovation through secure and scalable cloud infrastructure.

LLMs and Data Analysis: How AI is Making Sense of Big Data for Busines …

Large Language Models (LLMs) have the ability to go through extensive data sets to provide valuable insights for businesses. This article delves into how companies are utilizing LLMs to analyze customer reviews, social media interactions, or even internal reports to make informed business decisions.

What are LLMs, and how can they be used for Data Analysis

Large Language Models, or LLMs, are powerful neural networks with billions of parameters. They’ve been trained on massive amounts of text data using semi-supervised learning. These models can perform tasks like mathematical reasoning and sentiment analysis, demonstrating their understanding of the structure and meaning of human language.

LLMs have been trained on data spanning hundreds of Terabytes, which gives them a deep contextual understanding. This understanding extends across various applications, making them highly effective at responding to different prompts.

LLMs can effectively analyze unstructured data such as text files, web pages, etc. They are very effective at sentiment analysis and categorizing and summarizing text data. Since they can capture a text’s underlying emotions and themes, they are ideal for customer feedback analysis, market research, and monitoring social media.

How are they different from traditional analytics methods?

Traditional machine learning models like decision trees and gradient boosting methods are more effective in handling structured data, i.e., present in the form of tables. On the contrary, LLMs work with unstructured data like text files. 

LLMs excel at natural language understanding and generation tasks, offering powerful processing and generating human language capabilities. However, they are not designed for handling structured data, image analysis, or clustering, whereas the traditional methods mentioned above perform very well.

Compared to traditional methods, LLMs require minimal data preprocessing and feature engineering. LLMs are trained on vast amounts of text data and are designed to automatically learn patterns and representations from raw text, making them versatile for various natural language understanding tasks. 

However, one significant challenge with LLMs is their low interpretability. Understanding how these models arrive at their conclusions or generate specific outputs can be challenging because they lack transparency in their decision-making processes.

Practical Applications of LLMs in Data Analysis

The ability to process large volumes of textual data makes LLMs valuable for data analysis and science workflows. Some of the ways they are being used are:

Sentiment Analysis: Large language models can perform sentiment analysis, which involves recognizing and categorizing emotions and subjective information in text. They achieve this by fine-tuning on a dataset that provides sentiment labels, allowing them to identify and classify opinions in text data automatically. Using sentiment analysis, LLMs are particularly useful for analyzing customer reviews.

Named Entity Recognition (NER): LLMs excel in NER, which involves identifying and categorizing important entities like names, places, companies, and events in unstructured text. They leverage Deep Learning algorithms to grasp the context and nuances of the language to achieve the task.

Text Generation: LLMs can produce top-notch and contextually appropriate texts and can thus be used to create chatbots that engage in meaningful conversations with business users, delivering precise responses to their inquiries. 

Large language models are vital in enhancing Natural Language Understanding for data science tasks. Combined with other technologies, they empower data scientists to uncover nuanced meanings in text data, like product reviews, social media posts, and customer survey responses.

How can businesses use LLMs?

Virtual Assistants

LLM-powered chatbots help businesses optimize their employees’ work hours, potentially reducing costs. These chatbots handle routine tasks, freeing employees for more complex and strategic work. IBM Watson Assistant is a conversational AI platform focusing on customer management. It uses machine learning to handle inquiries, guide users through actions via chat and can transfer to a human agent when necessary. It also offers 24/7 availability and maintains accuracy.

Fraud Detection

LLMs are valuable for automating fraud detection by identifying alert-triggering patterns. Their efficiency, scalability, and machine-learning capabilities make them attractive to businesses. For instance, FICO’s Falcon Intelligence Network, utilized by global financial institutions, combines machine learning, data analytics, and human expertise to detect and prevent fraud across various channels and transactions.

Translation

Google Translate, a well-known service, employs an LLM to offer automated translations for text and speech in over 100 languages. Over time, it has improved accuracy by utilizing extensive multilingual text data and advanced neural network algorithms.

Sentiment Analysis

Sprinklr, a social media management and customer engagement platform, employs large language models for sentiment analysis. This aids businesses in tracking and responding to discussions about their brand or product on social media. Sprinklr’s platform assesses social media data to spot sentiment trends and offer insights into customer behavior and preferences.

Limitations of LLMs for Data Analytics

Using Large Language Models (LLMs) for data analytics has its challenges. One major drawback is the high cost associated with training and running LLMs, primarily due to the significant power consumption of numerous GPUs working in parallel. Additionally, LLMs are often seen as “black boxes,” meaning it’s challenging to understand why they produce certain outputs.

Another issue with LLMs is their primary goal of generating natural language, not necessarily accurate information. This can lead to situations where LLMs generate convincing but factually incorrect content, a phenomenon known as hallucination.

Furthermore, LLMs may carry societal and geographical biases because they are trained on vast internet text sources. To cut costs, many vendors opt for third-party APIs like those from OpenAI, potentially causing the information to be processed and stored on worldwide servers.

Conclusion

Large Language Models (LLMs) are powerful tools for data analysis, offering businesses the ability to extract valuable insights from vast volumes of data. They excel in sentiment analysis, Named Entity Recognition (NER), and text generation, making them indispensable for tasks like customer feedback analysis, fraud detection, and customer engagement. 

However, using LLMs presents ethical considerations, including biases encoded in their training data and the potential for generating inaccurate information. Striking a balance between LLMs’ benefits and ethical challenges is crucial for responsible and effective utilization in data analysis.

Also, don’t forget to join our 30k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

References

https://www.linkedin.com/pulse/end-data-analytics-we-know-large-language-models-anurag-kapoor/

https://www.linkedin.com/pulse/analyzing-tabular-data-large-language-models-abdul-owais-hazari/

What is LLM Data Science? Basics and Functions – Netnut

https://indiaai.gov.in/article/training-data-used-to-train-llm-models

https://campus.datacamp.com/courses/large-language-models-llms-concepts/building-blocks-of-llms?ex=3

https://www.mlopsaudits.com/blog/llms-vs-traditional-ml-algorithms-comparison

https://www.snowflake.com/guides/what-large-language-model-and-what-can-llms-do-data-science

5 Practical Business Use Cases for Large Language Models

https://www.textkernel.com/technology/seven-limitations-of-large-language-models-in-recruitment-technology/#:~:text=A%20major%20limitation%20of%20LLMs,behave%20the%20way%20they%20do.

The post LLMs and Data Analysis: How AI is Making Sense of Big Data for Business Insights appeared first on MarkTechPost.

Meet PhysObjects: An Object-Centric Dataset of 36.9K Crowd-Sourced and …

In the real world, information is often conveyed through a combination of text images or videos. To understand and interact with this information effectively, AI systems must be able to process both modalities. Visual language models bridge the gap between natural language understanding and computer vision, enabling more comprehensive world comprehension. 

These models can generate rich and contextually relevant descriptions, stories, or explanations incorporating textual and visual elements. This is valuable for creating content for various purposes, including marketing, entertainment, and education. 

The major tasks of Visual Language Models are visual question answering and image captioning. In visual question answering, the AI model is presented with an image and a text-based question about that image. The model first uses computer vision techniques to understand the contents of the image and processes the textual question using NLP. The answer should ideally reflect the image’s content and address the specific query posed in the question. Whereas image captioning involves automatically generating descriptive textual captions or sentences that explain the content of an image. 

The current VLMs need to be improved in capturing the physical concepts like material type and fragility of common objects. This makes the robotic identification tasks that involve physical reasoning of the objects extremely difficult. To resolve this, Stanford, Princeton, and Google Deep Mind researchers propose PhysObjects. It is an object-centric dataset of 36.9K crowd-sourced and 417K automated physical concept annotations of common household objects. Crowd-sourced annotation collects and labels large volumes of data using a distributed group of individuals.

They have demonstrated that a fine-tuned VLM on PhysObjects can improve physical reasoning abilities significantly. Their physically-grounded VLM achieves improved prediction accuracy on the held-out dataset example. They combined this physically grounded VLM with an LLM-based robotic planner to test its advantages, where the LLM queries the VLM about the physical concepts of objects in its scene.

Researchers used the EgoObjects dataset as their image source. That was the largest object-centric dataset of real objects that was publicly released when they were constructing PhysObjects. As the dataset consists of videos of realistic household arrangements, it is relevant to the training of household robotics. On average, It includes 117,424 images, 225,466 objects, and 4,203 object instance IDs.

Their results show that the models improved in planning performance on tasks that require physical reasoning, compared to baselines that do not use physically grounded VLMs. Their future work involves expanding beyond physical reasoning, such as geometric reasoning or social reasoning. Their methodology and dataset are the first step toward using VLMs for more sophisticated reasoning in robotics.

Check out the Paper and Project Page. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 30k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

The post Meet PhysObjects: An Object-Centric Dataset of 36.9K Crowd-Sourced and 417K Automated Physical Concept Annotations of Common Household Objects appeared first on MarkTechPost.

Princeton Researchers Propose CoALA: A Conceptual AI Framework to Syst …

In the rapidly evolving field of artificial intelligence, the quest to develop language agents capable of comprehending and generating human language has presented a formidable challenge. These agents are expected to understand and interpret language and execute complex tasks. For researchers and developers, the question of how to design and enhance these agents has become a paramount concern.

A team of researchers from Princeton University has introduced the Cognitive Architectures for Language Agents (CoALA) framework, a groundbreaking conceptual model. This innovative framework seeks to instill structure and clarity into the development of language agents by categorizing them based on their internal mechanisms, memory modules, action spaces, and decision-making processes. One remarkable application of this framework is exemplified by the LegoNN method, which researchers at Meta AI have developed.

LegoNN, an integral component of the CoALA framework, presents a groundbreaking approach to constructing encoder-decoder models. These models serve as the backbone for a wide array of tasks involving sequence generation, including Machine Translation (MT), Automatic Speech Recognition (ASR), and Optical Character Recognition (OCR).

Traditional methods for building encoder-decoder models typically involve crafting separate models for each task. This laborious approach demands substantial time and computational resources, as each model necessitates individualized training and fine-tuning.

LegoNN, however, introduces a paradigm shift through its modular approach. It empowers developers to fashion adaptable decoder modules that can be repurposed across a diverse spectrum of sequence generation tasks. These modules have been ingeniously designed to integrate into various language-related applications seamlessly.

The hallmark innovation of LegoNN lies in its emphasis on reusability. Once a decoder module is meticulously trained for a particular task, it can be harnessed across different scenarios without extensive retraining. This results in substantial time and computational resource savings, paving the way for creating highly efficient and versatile language agents.

The introduction of the CoALA framework and methods like LegoNN represents a significant paradigm shift in the development of language agents. Here’s a summary of the key points:

Structured Development: CoALA provides a structured approach to categorizing language agents. This categorization helps researchers and developers better understand the internal workings of these agents, leading to more informed design decisions.

Modular Reusability: LegoNN’s modular approach introduces a new level of reusability in language agent development. By creating decoder modules that can adapt to different tasks, developers can significantly reduce the time and effort required for building and training models.

Efficiency and Versatility: The reusability aspect of LegoNN directly translates to increased efficiency and versatility. Language agents can now perform a wide range of tasks without the need for custom-built models for each specific application.

Cost Savings: Traditional approaches to language agent development involve substantial computational costs. LegoNN’s modular design saves time and reduces the computational resources required, making it a cost-effective solution.

Improved Performance: With LegoNN, the reuse of decoder modules can lead to improved performance. These modules can be fine-tuned for specific tasks and applied to various scenarios, resulting in more robust language agents.

In conclusion, the CoALA framework and innovative methods like LegoNN are transforming the language agent development landscape. This framework paves the way for more efficient, versatile, and cost-effective language agents by offering a structured approach and emphasizing modular reusability. As the field of artificial intelligence advances, the CoALA framework stands as a beacon of progress in the quest for smarter and more capable language agents.

Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 30k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

Language Agents are cool & fast-moving, but no systematic way to understand & design them..So we use classical CogSci & AI insights to propose Cognitive Architectures for Language Agents (CoALA)!https://t.co/nPInnglURTw/ great @tedsumers @karthik_r_n @cocosci_lab (1/6) pic.twitter.com/y1OIR2c1DK— Shunyu Yao (@ShunyuYao12) September 6, 2023

The post Princeton Researchers Propose CoALA: A Conceptual AI Framework to Systematically Understand and Build Language Agents appeared first on MarkTechPost.