Highlight text as it’s being spoken using Amazon Polly

Amazon Polly is a service that turns text into lifelike speech. It enables the development of a whole class of applications that can convert text into speech in multiple languages.
This service can be used by chatbots, audio books, and other text-to-speech applications in conjunction with other AWS AI or machine learning (ML) services. For example, Amazon Lex and Amazon Polly can be combined to create a chatbot that engages in a two-way conversation with a user and performs certain tasks based on the user’s commands. Amazon Transcribe, Amazon Translate, and Amazon Polly can be combined to transcribe speech to text in the source language, translate it to a different language, and speak it.
In this post, we present an interesting approach for highlighting text as it’s being spoken using Amazon Polly. This solution can be used in many text-to-speech applications to do the following:

Add visual capabilities to audio in books, websites, and blogs
Increase comprehension when customers are trying to understand the text rapidly as it’s being spoken

Our solution gives the client (the browser, in this example), the ability to know what text (word or sentence) is being spoken by Amazon Polly at any instant. This enables the client to dynamically highlight the text as it’s being spoken. Such a capability is useful for providing visual aid to speech for the use cases mentioned previously.
Our solution can be extended to perform additional tasks besides highlighting text. For example, the browser can show images, play music, or perform other animations on the front end as the text is being spoken. This capability is useful for creating dynamic audio books, educational content, and richer text-to-speech applications.
Solution overview
At its core, the solution uses Amazon Polly to convert a string of text into speech. The text can be input from the browser or through an API call to the endpoint exposed by our solution. The speech generated by Amazon Polly is stored as an audio file (MP3 format) in an Amazon Simple Storage Service (Amazon S3) bucket.
However, using the audio file alone, the browser can’t find what parts of the text are being spoken at any instant because we don’t have granular information on when each word is spoken.
Amazon Polly provides a way to obtain this using speech marks. Speech marks are stored in a text file that shows the time (measured in milliseconds from start of the audio) when each word or sentence is spoken.
Amazon Polly returns speech mark objects in a line-delimited JSON stream. A speech mark object contains the following fields:

Time – The timestamp in milliseconds from the beginning of the corresponding audio stream
Type – The type of speech mark (sentence, word, viseme, or SSML)
Start – The offset in bytes (not characters) of the start of the object in the input text (not including viseme marks)
End – The offset in bytes (not characters) of the object’s end in the input text (not including viseme marks)
Value – This varies depending on the type of speech mark:

SSML – <mark> SSML tag
Viseme – The viseme name
Word or sentence – A substring of the input text as delimited by the start and end fields

For example, the sentence “Mary had a little lamb” can give you the following speech marks file if you use SpeechMarkTypes = [“word”, “sentence”] in the API call to obtain the speech marks:

{“time”:0,”type”:”sentence”,”start”:0,”end”:23,”value”:”Mary had a little lamb.”}
{“time”:6,”type”:”word”,”start”:0,”end”:4,”value”:”Mary”}
{“time”:373,”type”:”word”,”start”:5,”end”:8,”value”:”had”}
{“time”:604,”type”:”word”,”start”:9,”end”:10,”value”:”a”}
{“time”:643,”type”:”word”,”start”:11,”end”:17,”value”:”little”}
{“time”:882,”type”:”word”,”start”:18, “end”:22,”value”:”lamb”}

The word “had” (at the end of line 3) begins 373 milliseconds after the audio stream begins, starts at byte 5, and ends at byte 8 of the input text.
Architecture overview
The architecture of our solution is presented in the following diagram.

Highlight Text as it’s spoken, using Amazon Polly

Our website for the solution is stored on Amazon S3 as static files (JavaScript, HTML), which are hosted in Amazon CloudFront (1) and served to the end-user’s browser (2).
When the user enters text in the browser through a simple HTML form, it’s processed by JavaScript in the browser. This calls an API (3) through Amazon API Gateway, to invoke an AWS Lambda function (4). The Lambda function calls Amazon Polly (5) to generate speech (audio) and speech marks (JSON) files. Two calls are made to Amazon Polly to fetch the audio and speech marks files. The calls are made using JavaScript async functions. The output of these calls is the audio and speech marks files, which are stored in Amazon S3 (6a). To avoid multiple users overwriting each others’ files in the S3 bucket, the files are stored in a folder with a timestamp. This minimizes the chances of two users overwriting each others’ files in Amazon S3. For a production release, we can employ more robust approaches to segregate users’ files based on user ID or timestamp and other unique characteristics.
The Lambda function creates pre-signed URLs for the speech and speech marks files and returns them to the browser in the form of an array (7, 8, 9).
When the browser sends the text file to the API endpoint (3), it gets back two pre-signed URLs for the audio file and the speech marks file in one synchronous invocation (9). This is indicated by the key symbol next to the arrow.
A JavaScript function in the browser fetches the speech marks file and the audio from their URL handles (10). It sets up the audio player to play the audio. (The HTML audio tag is used for this purpose).
When the user clicks the play button, it parses the speech marks retrieved in the earlier step to create a series of timed events using timeouts. The events invoke a callback function, which is another JavaScript function used to highlight the spoken text in the browser. Simultaneously, the JavaScript function streams the audio file from its URL handle.
The result is that the events are run at the appropriate times to highlight the text as it’s spoken while the audio is being played. The use of JavaScript timeouts provides us the synchronization of the audio with the highlighted text.
Prerequisites
To run this solution, you need an AWS account with an AWS Identity and Access Management (IAM) user who has permission to use Amazon CloudFront, Amazon API Gateway, Amazon Polly, Amazon S3, AWS Lambda, and AWS Step Functions.
Use Lambda to generate speech and speech marks
The following code invokes the Amazon Polly synthesize_speech function two times to fetch the audio and speech marks file. They’re run as asynchronous functions and coordinated to return the result at the same time using promises.

const p1 = new Promise(doSynthesizeSpeech marks);
const p2 = new Promise(doSynthesizeSpeech);
var result;

await Promise.all([p1, p2])
.then((values) => {
//return array of presigned urls
console.log(‘Values:’, values);
result = { “output” : values };
})
.catch((err) => {
console.log(“Error:” + err);
result = err;
});

On the JavaScript side, the text highlighting is done by highlighter(start, finish, word) and the timed events are set by setTimers():

function highlighter(start, finish, word) {
let textarea = document.getElementById(“postText”);
//console.log(start + “,” + finish + “,” + word);
textarea.focus();
textarea.setSelectionRange(start, finish);
}

function setTimers() {
let speech marksStr = sessionStorage.getItem(“speech marks”);
//read through the speech marks file and set timers for every word
console.log(speech marksStr);
let speech marks = speech marksStr.split(“n”);
for (let i = 0; i < speech marks.length; i++) {
//console.log(i + “:” + speech marks[i]);
if (speech marks[i].length == 0) {
continue;
}

smjson = JSON.parse(speech marks[i]);
t = smjson[“time”];
s = smjson[“start”];
f = smjson[“end”];
word = smjson[“value”];
setTimeout(highlighter, t, s, f, word);
}
}

Alternative approaches
Instead of the previous approach, you can consider a few alternatives:

Create both the speech marks and audio files inside a Step Functions state machine. The state machine can invoke the parallel branch condition to invoke two different Lambda functions: one to generate speech and another to generate speech marks. The code for this can be found in the using-step-functions subfolder in the Github repo.
Invoke Amazon Polly asynchronously to generate the audio and speech marks. This approach can be used if the text content is large or the user doesn’t need a real-time response. For more details about creating long audio files, refer to Creating Long Audio Files.
Have Amazon Polly create the presigned URL directly using the generate_presigned_url call on the Amazon Polly client in Boto3. If you go with this approach, Amazon Polly generates the audio and speech marks newly every time. In our current approach, we store these files in Amazon S3. Although these stored files aren’t accessible from the browser in our version of the code, you can modify the code to play previously generated audio files by fetching them from Amazon S3 (instead of regenerating the audio for the text again using Amazon Polly). We have more code examples for accessing Amazon Polly with Python in the AWS Code Library.

Create the solution
The entire solution is available from our Github repo. To create this solution in your account, follow the instructions in the README.md file. The solution includes an AWS CloudFormation template to provision your resources.
Cleanup
To clean up the resources created in this demo, perform the following steps:

Delete the S3 buckets created to store the CloudFormation template (Bucket A), the source code (Bucket B) and the website (pth-cf-text-highlighter-website-[Suffix]).
Delete the CloudFormation stack pth-cf.
Delete the S3 bucket containing the speech files (pth-speech-[Suffix]). This bucket was created by the CloudFormation template to store the audio and speech marks files generated by Amazon Polly.

Summary
In this post, we showed an example of a solution that can highlight text as it’s being spoken using Amazon Polly. It was developed using the Amazon Polly speech marks feature, which provides us markers for the place each word or sentence begins in an audio file.
The solution is available as a CloudFormation template. It can be deployed as is to any web application that performs text-to-speech conversion. This would be useful for adding visual capabilities to audio in books, avatars with lip-sync capabilities (using viseme speech marks), websites, and blogs, and for aiding people with hearing impairments.
It can be extended to perform additional tasks besides highlighting text. For example, the browser can show images, play music, and perform other animations on the front end while the text is being spoken. This capability can be useful for creating dynamic audio books, educational content, and richer text-to-speech applications.
We welcome you to try out this solution and learn more about the relevant AWS services from the following links. You can extend the functionality for your specific needs.

Amazon API Gateway
Amazon CloudFront
AWS Lambda
Amazon Polly
Amazon S3

About the Author
Varad G Varadarajan is a Trusted Advisor and Field CTO for Digital Native Businesses (DNB) customers at AWS. He helps them architect and build innovative solutions at scale using AWS products and services. Varad’s areas of interest are IT strategy consulting, architecture, and product management. Outside of work, Varad enjoys creative writing, watching movies with family and friends, and traveling.

Predict vehicle fleet failure probability using Amazon SageMaker Jumps …

Predictive maintenance is critical in automotive industries because it can avoid out-of-the-blue mechanical failures and reactive maintenance activities that disrupt operations. By predicting vehicle failures and scheduling maintenance and repairs, you’ll reduce downtime, improve safety, and boost productivity levels.
What if we could apply deep learning techniques to common areas that drive vehicle failures, unplanned downtime, and repair costs?
In this post, we show you how to train and deploy a model to predict vehicle fleet failure probability using Amazon SageMaker JumpStart. SageMaker Jumpstart is the machine learning (ML) hub of Amazon SageMaker, providing pre-trained, publicly available models for a wide range of problem types to help you get started with ML. The solution outlined in the post is available on GitHub.
SageMaker JumpStart solution templates
SageMaker JumpStart provides one-click, end-to-end solutions for many common ML use cases. Explore the following use cases for more information on available solution templates:

Demand forecasting
Credit rating prediction
Fraud detection
Computer vision
Extract and analyze data from documents
Predictive maintenance
Churn prediction
Personalized recommendations
Reinforcement learning
Healthcare and life sciences
Financial pricing

The SageMaker JumpStart solution templates cover a variety of use cases, under each of which several different solution templates are offered (the solution in this post, Predictive Maintenance for Vehicle Fleets, is in the Solutions section). Choose the solution template that best fits your use case from the SageMaker JumpStart landing page. For more information on specific solutions under each use case and how to launch a SageMaker JumpStart solution, see Solution Templates.
Solution overview
The AWS predictive maintenance solution for automotive fleets applies deep learning techniques to common areas that drive vehicle failures, unplanned downtime, and repair costs. It serves as an initial building block for you to get to a proof of concept in a short period of time. This solution contains data preparation and visualization functionality within SageMaker and allows you to train and optimize the hyperparameters of deep learning models for your dataset. You can use your own data or try the solution with a synthetic dataset as part of this solution. This version processes vehicle sensor data over time. A subsequent version will process maintenance record data.
The following diagram demonstrates how you can use this solution with SageMaker components. As part of the solution, the following services are used:

Amazon S3 – We use Amazon Simple Storage Service (Amazon S3) to store datasets
SageMaker notebook – We use a notebook to preprocess and visualize the data, and to train the deep learning model
SageMaker endpoint – We use the endpoint to deploy the trained model

The workflow includes the following steps:

An extract of historical data is created from the Fleet Management System containing vehicle data and sensor logs.
After the ML model is trained, the SageMaker model artifact is deployed.
The connected vehicle sends sensor logs to AWS IoT Core (alternatively, via an HTTP interface).
Sensor logs are persisted via Amazon Kinesis Data Firehose.
Sensor logs are sent to AWS Lambda for querying against the model to make predictions.
Lambda sends sensor logs to Sagemaker model inference for predictions.
Predictions are persisted in Amazon Aurora.
Aggregate results are displayed on an Amazon QuickSight dashboard.
Real-time notifications on the predicted probability of failure are sent to Amazon Simple Notification Service (Amazon SNS).
Amazon SNS sends notifications back to the connected vehicle.

The solution consists of six notebooks:

0_demo.ipynb – A quick preview of our solution
1_introduction.ipynb – Introduction and solution overview
2_data_preparation.ipynb – Prepare a sample dataset
3_data_visualization.ipynb – Visualize our sample dataset
4_model_training.ipynb – Train a model on our sample dataset to detect failures
5_results_analysis.ipynb – Analyze the results from the model we trained

Prerequisites
Amazon SageMaker Studio is the integrated development environment (IDE) within SageMaker that provides us with all the ML features that we need in a single pane of glass. Before we can run SageMaker JumpStart, we need to set up SageMaker Studio. You can skip this step if you already have your own version of SageMaker Studio running.
The first thing we need to do before we can use any AWS services is to make sure we have signed up for and created an AWS account. Then we create an administrative user and a group. For instructions on both steps, refer to Set Up Amazon SageMaker Prerequisites.
The next step is to create a SageMaker domain. A domain sets up all the storage and allows you to add users to access SageMaker. For more information, refer to Onboard to Amazon SageMaker Domain. This demo is created in the AWS Region us-east-1.
Finally, you launch SageMaker Studio. For this post, we recommend launching a user profile app. For instructions, refer to Launch Amazon SageMaker Studio.
To run this SageMaker JumpStart solution and have the infrastructure deployed to your AWS account, you need to create an active SageMaker Studio instance (see Onboard to Amazon SageMaker Studio). When your instance is ready, use the instructions in SageMaker JumpStart to launch the solution. The solution artifacts are included in this GitHub repository for reference.
Launch the SageMaker Jumpstart solution
To get started with the solution, complete the following steps:

On the SageMaker Studio console, choose JumpStart.
On the Solutions tab, choose Predictive Maintenance for Vehicle Fleets.
Choose Launch. It takes a few minutes to deploy the solution.
After the solution is deployed, choose Open Notebook.

If you’re prompted to select a kernel, choose PyTorch 1.8 Python 3.6 for all notebooks in this solution.
Solution preview
We first work on the 0_demo.ipynb notebook. In this notebook, you can get a quick preview of what the outcome will look like when you complete the full notebook for this solution.
Choose Run and Run All Cells to run all cells in SageMaker Studio (or Cell and Run All in a SageMaker notebook instance). You can run all the cells in each notebook one after the other. Ensure all the cells finish processing before moving to the next notebook.

This solution relies on a config file to run the provisioned AWS resources. We generate the file as follows:

import boto3
import os
import json

client = boto3.client(‘servicecatalog’)
cwd = os.getcwd().split(‘/’)
i= cwd.index(‘S3Downloads’)
pp_name = cwd[i + 1]
pp = client.describe_provisioned_product(Name=pp_name)
record_id = pp[‘ProvisionedProductDetail’][‘LastSuccessfulProvisioningRecordId’]
record = client.describe_record(Id=record_id)

keys = [ x[‘OutputKey’] for x in record[‘RecordOutputs’] if ‘OutputKey’ and ‘OutputValue’ in x]
values = [ x[‘OutputValue’] for x in record[‘RecordOutputs’] if ‘OutputKey’ and ‘OutputValue’ in x]
stack_output = dict(zip(keys, values))

with open(f’/root/S3Downloads/{pp_name}/stack_outputs.json’, ‘w’) as f:
json.dump(stack_output, f)

We have some sample time series input data consisting of a vehicle’s battery voltage and battery current over time. Next, we load and visualize the sample data. As shown in the following screenshots, the voltage and current values are on the Y axis and the readings (19 readings recorded) are on the X axis.

We have previously trained a model on this voltage and current data that predicts the probability of vehicle failure and have deployed the model as an endpoint in SageMaker. We will call this endpoint with some sample data to determine the probability of failure in the next time period.
Given the sample input data, the predicted probability of failure is 45.73%.
To move to the next stage, choose Click here to continue.

Introduction and solution overview
The 1_introduction.ipynb notebook provides an overview of the solution and stages, and a look into the configuration file that has content definition, data sampling period, train and test sample count, parameters, location, and column names for generated content.
After you review this notebook, you can move to the next stage.
Prepare a sample dataset
We prepare a sample dataset in the 2_data_preparation.ipynb notebook.
We first generate the configuration file for this solution:

import boto3
import os
import json

client = boto3.client(‘servicecatalog’)
cwd = os.getcwd().split(‘/’)
i= cwd.index(‘S3Downloads’)
pp_name = cwd[i + 1]
pp = client.describe_provisioned_product(Name=pp_name)
record_id = pp[‘ProvisionedProductDetail’][‘LastSuccessfulProvisioningRecordId’]
record = client.describe_record(Id=record_id)

keys = [ x[‘OutputKey’] for x in record[‘RecordOutputs’] if ‘OutputKey’ and ‘OutputValue’ in x]
values = [ x[‘OutputValue’] for x in record[‘RecordOutputs’] if ‘OutputKey’ and ‘OutputValue’ in x]
stack_output = dict(zip(keys, values))

with open(f’/root/S3Downloads/{pp_name}/stack_outputs.json’, ‘w’) as f:
json.dump(stack_output, f)
import os

from source.config import Config
from source.preprocessing import pivot_data, sample_dataset
from source.dataset import DatasetGenerator
config = Config(filename=”config/config.yaml”, fetch_sensor_headers=False)
config

The config properties are as follows:

fleet_info_fn=data/example_fleet_info.csv
fleet_sensor_logs_fn=data/example_fleet_sensor_logs.csv
vehicle_id_column=vehicle_id
timestamp_column=timestamp
target_column=target
period_ms=30000
dataset_size=25000
window_length=20
chunksize=10000
processing_chunksize=2500
fleet_dataset_fn=data/processed/fleet_dataset.csv
train_dataset_fn=data/processed/train_dataset.csv
test_dataset_fn=data/processed/test_dataset.csv
period_column=period_ms

You can define your own dataset or use our scripts to generate a sample dataset:

if should_generate_data:
fleet_statistics_fn = “data/generation/fleet_statistics.csv”
generator = DatasetGenerator(fleet_statistics_fn=fleet_statistics_fn,
fleet_info_fn=config.fleet_info_fn,
fleet_sensor_logs_fn=config.fleet_sensor_logs_fn,
period_ms=config.period_ms,
)
generator.generate_dataset()

assert os.path.exists(config.fleet_info_fn), “Please copy your data to {}”.format(config.fleet_info_fn)
assert os.path.exists(config.fleet_sensor_logs_fn), “Please copy your data to {}”.format(config.fleet_sensor_logs_fn)

You can merge the sensor data and fleet vehicle data together:

pivot_data(config)
sample_dataset(config)

We can now move to data visualization.
Visualize our sample dataset
We visualize our sample dataset in 3_data_vizualization.ipynb. This solution relies on a config file to run the provisioned AWS resources. Let’s generate the file similar to the previous notebook.
The following screenshot shows our dataset.

Next, let’s build the dataset:

train_ds = PMDataset_torch(
config.train_dataset_fn,
sensor_headers=config.sensor_headers,
target_column=config.target_column,
standardize=True)

properties = train_ds.vehicle_properties_headers.copy()
properties.remove(‘vehicle_id’)
properties.remove(‘timestamp’)
properties.remove(‘period_ms’)

Now that the dataset is ready, let’s visualize the data statistics. The following screenshot shows the data distribution based on vehicle make, engine type, vehicle class, and model.

Comparing the log data, let’s look at an example of the mean voltage across different years for Make E and C (random).
The mean of voltage and current is on the Y axis and the number of readings is on the X axis.

Possible values for log_target: [‘make’, ‘model’, ‘year’, ‘vehicle_class’, ‘engine_type’]

Randomly assigned value for log_target: make

Possible values for log_target_value1: [‘Make A’, ‘Make B’, ‘Make E’, ‘Make C’, ‘Make D’]

Randomly assigned value for log_target_value1: Make B

Possible values for log_target_value2: [‘Make A’, ‘Make B’, ‘Make E’, ‘Make C’, ‘Make D’]

Randomly assigned value for log_target_value2: Make D

Based on the above, we assume log_target: make, log_target_value1: Make B and log_target_value2: Make D

The following graphs break down the mean of the log data.

The following graphs visualize an example of different sensor log values against voltage and current.

Train a model on our sample dataset to detect failures
In the 4_model_training.ipynb notebook, we train a model on our sample dataset to detect failures.
Let’s generate the configuration file similar to the previous notebook, and then proceed with training configuration:

sage_session = sagemaker.session.Session()
s3_bucket = sagemaker_configs[“S3Bucket”]
s3_output_path = ‘s3://{}/’.format(s3_bucket)
print(“S3 bucket path: {}”.format(s3_output_path))

# run in local_mode on this machine, or as a SageMaker TrainingJob
local_mode = False

if local_mode:
instance_type = ‘local’
else:
instance_type = sagemaker_configs[“SageMakerTrainingInstanceType”]

role = sagemaker.get_execution_role()
print(“Using IAM role arn: {}”.format(role))
# only run from SageMaker notebook instance
if local_mode:
!/bin/bash ./setup.sh
cpu_or_gpu = ‘gpu’ if instance_type.startswith(‘ml.p’) else ‘cpu’

We can now define the data and initiate hyperparameter optimization:

%%time

estimator = PyTorch(entry_point=”train.py”,
source_dir=’source’,
role=role,
dependencies=[“source/dl_utils”],
instance_type=instance_type,
instance_count=1,
output_path=s3_output_path,
framework_version=”1.5.0″,
py_version=’py3′,
base_job_name=job_name_prefix,
metric_definitions=metric_definitions,
hyperparameters= {
‘epoch’: 100, # tune it according to your need
‘target_column’: config.target_column,
‘sensor_headers’: json.dumps(config.sensor_headers),
‘train_input_filename’: os.path.basename(config.train_dataset_fn),
‘test_input_filename’: os.path.basename(config.test_dataset_fn),
}
)

if local_mode:
estimator.fit({‘train’: training_data, ‘test’: testing_data})
%%time

tuner = HyperparameterTuner(estimator,
objective_metric_name=’test_auc’,
objective_type=’Maximize’,
hyperparameter_ranges=hyperparameter_ranges,
metric_definitions=metric_definitions,
max_jobs=max_jobs,
max_parallel_jobs=max_parallel_jobs,
base_tuning_job_name=job_name_prefix)
tuner.fit({‘train’: training_data, ‘test’: testing_data})

Analyze the results from the model we trained
In the 5_results_analysis.ipynb notebook, we get data from our hyperparameter tuning job, visualize metrics of all the jobs to identify the best job, and build an endpoint for the best training job.
Let’s generate the configuration file similar to the previous notebook and visualize the metrics of all the jobs. The following plot visualizes test accuracy vs. epoch.

The following screenshot shows the hyperparameter tuning jobs we ran.

You can now visualize data from the best training job (out of the four training jobs) based on the test accuracy (red).
As we can see in the following screenshots, the test loss declines and AUC and accuracy increase with epochs.

Based on the visualizations, we can now build an endpoint for the best training job:

%%time

role = sagemaker.get_execution_role()

model = PyTorchModel(model_data=model_artifact,
role=role,
entry_point=”inference.py”,
source_dir=”source/dl_utils”,
framework_version=’1.5.0′,
py_version = ‘py3′,
name=sagemaker_configs[“SageMakerModelName”],
code_location=”s3://{}/endpoint”.format(s3_bucket)
)

endpoint_instance_type = sagemaker_configs[“SageMakerInferenceInstanceType”]

predictor = model.deploy(initial_instance_count=1, instance_type=endpoint_instance_type, endpoint_name=sagemaker_configs[“SageMakerEndpointName”])

def custom_np_serializer(data):
return json.dumps(data.tolist())

def custom_np_deserializer(np_bytes, content_type=’application/x-npy’):
out = np.array(json.loads(np_bytes.read()))
return out

predictor.serializer = custom_np_serializer
predictor.deserializer = custom_np_deserializer

After we build the endpoint, we can test the predictor by passing it sample sensor logs:

import botocore

config = botocore.config.Config(read_timeout=200)
runtime = boto3.client(‘runtime.sagemaker’, config=config)

data = np.ones(shape=(1, 20, 2)).tolist()
payload = json.dumps(data)

response = runtime.invoke_endpoint(EndpointName=sagemaker_configs[“SageMakerEndpointName”],
ContentType=’application/json’,
Body=payload)
out = json.loads(response[‘Body’].read().decode())[0]

print(“Given the sample input data, the predicted probability of failure is {:0.2f}%”.format(100*(1.0-out[0])))

Given the sample input data, the predicted probability of failure is 34.60%.
Clean up
When you’ve finished with this solution, make sure that you delete all unwanted AWS resources. On the Predictive Maintenance for Vehicle Fleets page, under Delete solution, choose Delete all resources to delete all the resources associated with the solution.

You need to manually delete any extra resources that you may have created in this notebook. Some examples include the extra S3 buckets (to the solution’s default bucket) and the extra SageMaker endpoints (using a custom name).
Customize the solution
Our solution is simple to customize. To modify the input data visualizations, refer to sagemaker/3_data_visualization.ipynb. To customize the machine learning, refer to sagemaker/source/train.py and sagemaker/source/dl_utils/network.py. To customize the dataset processing, refer to sagemaker/1_introduction.ipynb on how to define the config file.
Additionally, you can change the configuration in the config file. The default configuration is as follows:

fleet_info_fn=data/example_fleet_info.csv
fleet_sensor_logs_fn=data/example_fleet_sensor_logs.csv
vehicle_id_column=vehicle_id
timestamp_column=timestamp
target_column=target
period_ms=30000
dataset_size=10000
window_length=20
chunksize=10000
processing_chunksize=1000
fleet_dataset_fn=data/processed/fleet_dataset.csv
train_dataset_fn=data/processed/train_dataset.csv
test_dataset_fn=data/processed/test_dataset.csv
period_column=period_ms

The config file has the following parameters:

fleet_info_fn, fleet_sensor_logs_fn, fleet_dataset_fn, train_dataset_fn, and test_dataset_fn define the location of dataset files
vehicle_id_column, timestamp_column, target_column, and period_column define the headers for columns
dataset_size, chunksize, processing_chunksize, period_ms, and window_length define the properties of the dataset

Conclusion
In this post, we showed you how to train and deploy a model to predict vehicle fleet failure probability using SageMaker JumpStart. The solution is based on ML and deep learning models and allows a wide variety of input data including any time-varying sensor data. Because every vehicle has different telemetry on it, you can fine-tune the provided model to the frequency and type of data that you have.
To learn more about what you can do with SageMaker JumpStart, refer to the following:

Visual inspection automation using Amazon SageMaker JumpStart
Run automatic model tuning with Amazon SageMaker JumpStart
Get started with generative AI on AWS using Amazon SageMaker JumpStart

Resources

Amazon SageMaker Developer Guide
SageMaker JumpStart Developer Guide
Perform Automatic Model Tuning with SageMaker
SageMaker JumpStart predictive maintenance solution

About the Authors
Rajakumar Sampathkumar is a Principal Technical Account Manager at AWS, providing customers guidance on business-technology alignment and supporting the reinvention of their cloud operation models and processes. He is passionate about cloud and machine learning. Raj is also a machine learning specialist and works with AWS customers to design, deploy, and manage their AWS workloads and architectures.

Everything About Vector Databases – Their Significance, Vector Embed …

Large Language Models have shown immense growth and advancements in recent times. The field of Artificial Intelligence is booming with every new release of these models. From education and finance to healthcare and media, LLMs are contributing to almost every domain. Famous LLMs like GPT, BERT, PaLM, and LLaMa are revolutionizing the AI industry by imitating humans. The well-known chatbot called ChatGPT, based on GPT architecture and developed by OpenAI, imitates humans by generating accurate and creative content, answering questions, summarizing massive textual paragraphs, and language translation. 

What are Vector Databases?

A new and unique type of database that is gaining immense popularity in the fields of AI and Machine Learning is the vector database. Different from conventional relational databases, which were initially intended to store tabular data in rows and columns, and more recent NoSQL databases like MongoDB, which store data in JSON documents, vector databases are different in nature. This is because vector embeddings are the only sort of data that a vector database is intended to store and retrieve.

Large Language Models and all the new applications depend on vector embedding and vector databases. These databases are specialized databases made for the effective storage and manipulation of vector data. Vector data, which uses points, lines, and polygons to describe objects in space, is frequently used in a variety of industries, including computer graphics, Machine Learning, and Geographic Information Systems. 

A vector database is based on vector embedding, which is a sort of data encoding carrying semantic information that aids AI systems in interpreting the data and in maintaining long-term memory. These embeddings are the condensed versions of the training data that are produced as part of the ML process. They serve as a filter used to run new data during the inference phase of machine learning. 

In vector databases, the geometric qualities of the data are used to organize and store it. Each item is identified by its coordinates in space and other properties that give its characteristics. A vector database, for instance, could be used to record details on towns, highways, rivers, and other geographic features in a GIS application.

Advantages of vector databases

Spatial Indexing – Vector databases use spatial indexing techniques like R-trees and Quad-trees to enable data retrieval based on geographical relationships, such as proximity and confinement, which makes vector databases better than other databases.

Multi-dimensional Indexing: Vector databases can support indexing on additional vector data qualities in addition to spatial indexing, allowing for effective searching and filtering based on non-spatial attributes.

Geometric Operations: For geometric operations like intersection, buffering, and distance computations, vector databases frequently have built-in support, which is important for tasks like spatial analysis, routing, and map visualization.

Integration with Geographic Information Systems (GIS): To efficiently handle and analyze spatial data, vector databases are frequently used in conjunction with GIS software and tools.

Best Vector Databases for Building LLMs

In the case of Large Language Models, a vector database is getting popular, with its main application being the storage of vector embeddings that result from the training of the LLM. 

Pinecone – Pinecone is a strong vector database that stands out for its outstanding performance, scalability, and ability to handle complicated data. It is perfect for applications that demand instant access to vectors and real-time updates because it is built to excel at quick and efficient data retrieval.

DataStax – AstraDB, a vector database from DataStax, is available to speed up application development. AstraDB streamlines and expedites the construction of apps by integrating with Cassandra operations and working with AppCloudDB. It streamlines the development process by eliminating the necessity for laborious setup updates and allows developers to scale applications automatically across various cloud infrastructures.

MongoDB – MongoDB’s Atlas Vector Search feature is a significant advancement in the integration of generative AI and semantic search into applications. With the incorporation of vector search capabilities, MongoDB enables developers to work with data analysis, recommendation systems, and Natural Language Processing. Atlas Vector Search empowers developers to perform searches on unstructured data effortlessly, which provides the ability to generate vector embeddings using preferred machine learning models like OpenAI or Hugging Face and store them directly in MongoDB Atlas.

Vespa – Vespa.ai is a potent vector database with real-time analytics capabilities and speedy query returns, making it a useful tool for businesses that need to handle data quickly and effectively. Its high data availability and fault tolerance are two of its primary advantages. 

Milvus – A vector database system called Milvus was created primarily to manage complex data in an effective manner. It provides fast data retrieval and analysis, making it a great solution for applications that call for real-time processing and instant insights. The capacity of Milvus to successfully handle large datasets is one of its main advantages.

In conclusion, Vector databases provide powerful capabilities for managing and analyzing vector data, making them essential tools in various industries and applications involving spatial information.

Don’t forget to join our 25k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club

References 

https://medium.com/gft-engineering/vector-databases-large-language-models-and-case-based-reasoning-cfa133ad9244

https://analyticsindiamag.com/10-best-vector-database-for-building-llms/

https://www.kdnuggets.com/2023/06/vector-databases-important-llms.html

https://www.datanami.com/2023/03/27/vector-databases-emerge-to-fill-critical-role-in-ai/

The post Everything About Vector Databases – Their Significance, Vector Embeddings, and Top Vector Databases for Large Language Models (LLMs) appeared first on MarkTechPost.

Meet Magic123: A Novel Image-to-3D Pipeline that Uses a Two-Stage Coar …

Despite only seeing the world in two dimensions, humans are adept at navigating, thinking, and interacting with their three-dimensional environment. This suggests a profoundly ingrained cognitive awareness of the traits and actions of the 3D environment, which is a great aspect of human nature. Artists who can create detailed 3D reproductions from a single photograph take this skill to a new level. Contrarily, after decades of research and advancement, the challenge of 3D reconstruction from an unposed image, including the production of geometry and textures, remains an open, ill-posed topic in computer vision. Many 3D creation activities may be learned-based thanks to recent deep learning advancements. 

Although deep learning has made great progress in image identification and creation, it still needs to improve in the real world’s specific challenge of single-image 3D reconstruction. They place the blame for this significant gap between human and machine 3D reconstruction abilities on two main issues: (i) a lack of large-scale 3D datasets that prevents large-scale learning of 3D geometry, and (ii) the tradeoff between the level of detail and computational resources when working on 3D data. Utilizing 2D priors is one strategy to solve the issue. There is a vast amount of real 2D picture data online. To train cutting-edge image interpretation and generation algorithms like CLIP and Stable Diffusion, one of the most comprehensive datasets for text-image pairs is LAION. 

There has been a noticeable increase in strategies that employ 2D models as priors for creating 3D material due to the expanding generalization capabilities of 2D generation models. DreamFusion pioneered this 2D prior-based technology for text-to-3D creation. The method shows a remarkable ability to direct unique views and enhance a neural radiance field (NeRF) in a zero-shot situation. Building on DreamFusion, recent research has attempted to adapt these 2D priors for single-picture 3D reconstructions using RealFusion and NeuralLift. A different strategy is to use 3D priors. In earlier efforts, 3D priors like topological restrictions were used to aid in 3D creation. These hand-made 3D priors can create some 3D stuff but could be better 3D content. 

A 2D diffusion model was recently modified to become view-dependent, and this view-dependent diffusion was then used as a 3D prior in techniques like Zero-1-to-3 and 3Dim. According to their behavior analysis, both 2D and 3D priors have benefits and disadvantages. Compared to 3D priors, 2D priors have outstanding generalizations for 3D creation, as shown by the example of the dragon statue in Figure 1. Due to their limited 3D understanding, approaches solely depending on 2D priors ultimately suffer from losing 3D fidelity and consistency. Unrealistic geometry results from this, such as many faces (Janus issues), disparate sizes, uneven texture, etc. The example of the teddy bear in Figure 1 is a failure scenario. 

However, because of the small amount of 3D training data, more than a severe dependence on 3D priors is needed for in-the-wild reconstruction. As a result, as shown in Fig. 1, although 3D prior-based solutions successfully handle common items (such as the teddy bear example in the top row), they struggle with less frequent things, producing too simplistic and occasionally even flat, 3D geometries (such as the dragon statue at the bottom left). Researchers from King Abdullah University of Science and Technology (KAUST), Snap Inc. and Visual Geometry Group, University of Oxford in this study promote the simultaneous use of both priors to direct innovative perspectives in image-to-3D creation rather than merely depending on a 2D or 3D prior. They may control the balance between exploration and exploitation in the resulting 3D geometry by varying the specific but useful tradeoff parameter between the potency of the 2D and 3D priors. 

Figure 1 shows the trade-off between Magic123’s 2D and 3D priors. A teddy bear (a frequent item), two stacked donuts (a less common thing), and a dragon statue (an uncommon object) are the three scenarios for which they compare single-image reconstructions. As seen on the right, Magic123, which only has a 2D background, favours geometric exploration and creates 3D material with greater creativity but maybe with less consistency. With just 3D before, Magic123 (on the left) prioritises geometry exploitation, resulting in exact but maybe simpler geometry with less features. 

Prioritizing the 2D previous can improve creative 3D skills to compensate for the partial 3D information in each 2D image. However, this could lead to less accurate 3D geometry because of a lack of 3D understanding. Prioritizing the 3D prior, on the other hand, can result in more 3D-constrained solutions and more accurate 3D geometry, but at the expense of lower creativity and a diminished capacity to find workable solutions for difficult and unusual circumstances. They present Magic123, a cutting-edge image-to-3D pipeline that produces high-quality 3D outputs using a two-stage coarse-to-fine optimization approach that uses both 2D and 3D priors. 

They refine a neural radiance field (NeRF) in the coarse stage. NeRF effectively learns an implicit volume representation for learning complicated geometry. However, NeRF uses a lot of memory, which results in low-resolution generated pictures being sent to the diffusion models, which lowers the output quality for the image-to-3D process. Instant-NGP, a more resource-efficient NeRF substitute, is limited to an image-to-3D pipeline resolution of 128 128 on a 16GB memory GPU. As a result, they add a second step and use Deep Marching Tetrahedra (DMTet), a memory-efficient and texture-decomposed SDF-Mesh hybrid representation, to enhance the quality of the 3D content. 

With the help of this method, they can separate the NeRF’s geometry and texture refinements and boost resolution to 1K. They use a mix of 2D and 3D priors in both phases to direct innovative perspectives. They offer the following summary of their contributions: 

• They present Magic123, a revolutionary image-to-3D pipeline that creates high-quality, high-resolution 3D geometry and textures using a two-stage coarse-to-fine optimisation procedure. 

• They suggest simultaneously using 2D and 3D priors to create accurate 3D content from any given image. Priors’ strength parameter enables a tradeoff between exploring and using geometry. Users may experiment with this tradeoff parameter to create the required 3D content. 

• They can discover a balanced tradeoff between the 2D and 3D priors, resulting in 3D reconstructions that are relatively realistic and detailed. Magic123 produces state-of-the-art outcomes in 3D reconstruction from single unposed photos in real-world and synthetic contexts using the same set of parameters for all samples without further reconfiguring.

Check out the Paper and Project. Don’t forget to join our 25k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Meet Magic123: A Novel Image-to-3D Pipeline that Uses a Two-Stage Coarse-to-Fine Optimization Process to Produce High-Quality High-Resolution 3D Geometry and Textures appeared first on MarkTechPost.

You Gotta Pump Those Dimensions: DreamEditor is an AI Model That Edits …

The 3D computer vision domain was flooded with NeRFs in recent years. They emerged as a groundbreaking technique and enabled the reconstruction and synthesis of novel views of a scene. NeRFs capture and model the underlying geometry and appearance information from a collection of multi-view images.

By leveraging neural networks, NeRFs offer a data-driven approach that surpasses traditional methods. The neural networks in NeRFs learn to represent the complex relationship between scene geometry, lighting, and view-dependent appearance, allowing for highly detailed and realistic scene reconstructions. The key advantage of NeRFs lies in their ability to generate photo-realistic images from any desired viewpoint within a scene, even in regions that were not captured by the original set of images.

The success of NeRFs has opened up new possibilities in computer graphics, virtual reality, and augmented reality, enabling the creation of immersive and interactive virtual environments that closely resemble real-world scenes. Therefore, there is a serious interest in the domain to advance NeRFs even further.

Some drawbacks of NeRFs limit their applicability in real-world scenarios. For example, editing neural fields is a significant challenge due to the implicit encoding of the shape and texture information within high-dimensional neural network features. While some methods tried to tackle this using explored editing techniques, they often require extensive user input and struggle to achieve precise and high-quality results. 

The ability to edit NeRFs can open possibilities in real-world applications. However, so far, all the attempts were not good enough for them to solve the problems. Well, we have a new player in the game, and it’s named DreamEditor.

DreamEditor allows you to edit 3D NeRFs. Source: https://arxiv.org/pdf/2306.13455.pdf

DreamEditor is a user-friendly framework that allows intuitive and convenient modification of neural fields using text prompts. By representing the scene with a mesh-based neural field and employing a stepwise editing framework, DreamEditor enables a wide range of editing effects, including re-texturing, object replacement, and object insertion.

The mesh representation facilitates precise local editing by converting 2D editing masks into 3D editing regions while also disentangling geometry and texture to prevent excessive deformation. The stepwise framework combines pre-trained diffusion models with score distillation sampling, allowing efficient and accurate editing based on simple text prompts. 

Overview of DreamEditor. Source: https://arxiv.org/pdf/2306.13455.pdf

DreamEditor follows three key stages to facilitate intuitive and precise text-guided 3D scene editing. In the initial stage, the original neural radiance field is transformed into a mesh-based neural field. This mesh representation enables spatially-selective editing. After the conversion, it employs a customized Text-to-Image (T2I) model that is trained on the specific scene to capture the semantic relationships between keywords in the text prompts and the scene’s visual content. Finally, the edited modifications are applied to the target object within the neural field using the T2I diffusion mode.

DreamEditor can accurately and progressively edit the 3D scene while maintaining a high level of fidelity and realism. This stepwise approach, from mesh-based representation to precise localization and controlled editing through diffusion models, allows DreamEditor to achieve highly realistic editing results while minimizing unnecessary modifications in irrelevant regions.

Check out the Paper. Don’t forget to join our 25k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com
The post You Gotta Pump Those Dimensions: DreamEditor is an AI Model That Edits 3D Scenes Using Text-Prompts appeared first on MarkTechPost.

Microsoft Researchers Propose a Novel Framework for LLM Calibration Us …

Recent developments have seen a remarkable increase in the capability of large language models (LLMs), with generative pretrained transformer (GPT) models showing significant promise. The transition from GPT-3 to GPT-4, as well as the appearance of other LLMs like PaLM and LLaMA, demonstrated a considerable improvement in problem-solving and natural language understanding skills. Additionally, generative models are frequently used in a variety of sectors to generate data for different applications. When LLMs are used in applications that need a high level of accuracy and dependability, like the biological and healthcare areas, the problem of hallucination remains a significant barrier. 

Unfortunately, there are no systematic techniques available to accurately detect hallucinations or gauge the output’s level of confidence. Particularly after using reinforcement learning with human input, the intrinsic confidence score from the generative LLMs is sometimes unavailable or not effectively calibrated with regard to the intended aim. Heuristic techniques are costly to compute and are subject to bias from the LLM itself, such as sampling an ensemble of LLM answers. There are two basic categories of methods for evaluating the degree of confidence in LLM replies. In the first, the LLM is prodded in a variety of ways to create many replies, which are then used to infer the answer’s dependability.

Self-consistency and chain-of-thought prompting are two examples. These techniques are less quantitative and susceptible to model-induced bias in the estimated confidence. There is no standardised way to measure this, but the prompting technique may have a significant impact on the quality of the outcomes. The second category of options turns to outside sources of data, such as hiring human reviewers to verify the answer or using huge amounts of labeled data to create assessment models. One of the primary obstacles to current supervised model training is the extensive manual annotation work that these techniques necessitate. In that regard, self-supervision offers a viable option since it can adaptably use data patterns and outside-the-box expertise. 

Researchers from Microsoft in this study provide a flexible framework that uses Pareto optimum learning to mix data from both the LLM response and supervision sources. They were motivated by earlier efforts in programmatic supervision and the wealth of Pareto optimization research. The following intuitions guide their strategy. In order to prevent bias from an LLM judging itself, external sources of supervision that are independent of the LLM are required. Second, think of the LLM errors as noisy perturbations on the gold labels. When a model is fitted with both LLM noise and independent external noise, implicit label smoothing is actually performed, which enhances calibration power. 

In that regard, Pareto optimum self-supervision provides a useful framework for integrating both qualities. Notably, the suggested method just needs unlabeled data, making it appropriate for fields where annotation is costly. Their unique approach to LLM calibration by Pareto optimum self-supervision is the paper’s key innovation. They suggest using the Pareto Optimum Learning assessed risk (POLAR) score to calculate the likelihood of LLM mistakes. They present experimental findings on four distinct NLP tasks and demonstrate that the suggested POLAR score is substantially linked with the LLM error rate assessed on gold labels. They show enhanced LLM performance for high-risk situations as determined by the POLAR score utilizing dynamic prompting strategies. Without utilizing any human-labeled training data, they demonstrate how their method can remove LLM mistakes and improve a GPT-4 baseline performance to exceed the most advanced supervised model.

Check Out the Paper. Don’t forget to join our 25k+ ML SubReddit, Discord Channel, Twitter, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Featured Tools:

StoryBird AI

Taplio

Aragon AI

tinyEinstein

AdCreative.ai

SaneBox

Vidon

Audioread

Check Out 100’s AI Tools in AI Tools Club
The post Microsoft Researchers Propose a Novel Framework for LLM Calibration Using Pareto Optimal Self-Supervision without Using Labeled Training Data appeared first on MarkTechPost.

70% of Developers Embrace AI Today: Delving into the Rise of Large Lan …

Artificial Intelligence has limitless possibilities, which is truly evident from the new releases and developments it introduces everyone to. With the release of the latest chatbot developed by OpenAI called ChatGPT, the field of AI has taken over the world as ChatGPT, due to its GPT’s transformer architecture, is always in the headlines. From deep learning, Natural Language Processing (NLP), and Natural Language Understanding (NLU) to Computer Vision, AI is propelling everyone into a future with endless innovations. Almost every industry is utilizing the potential of AI and revolutionizing itself. The excellent technological advancements, particularly in the areas of Large Language Models (LLMs), LangChain, and Vector Databases, are responsible for this remarkable development.

Large Language Models

The development of Large Language Models (LLMs) represents a huge step forward for Artificial Intelligence. These deep learning-based models demonstrate impressive accuracy and fluency while processing and comprehending natural language. LLMs are trained with the help of massive volumes of text data from a variety of sources, including books, journals, webpages, and other textual resources. They pick up on linguistic structures, patterns, and semantic linkages as they learn the language, which helps them understand the complexities of human communication.

The underlying architecture of LLMs typically involves a deep neural network with multiple layers. Based on the discovered patterns and connections found in the training data, this network analyses the input text and produces predictions. In order to reduce the discrepancy between the model’s expected and intended outputs, the model’s parameters are adjusted during the training phase. The LLM consumes the text data during training and tries to anticipate the following word or series of words depending on the context. 

Uses of LLMs

Answering questions: LLMs are skilled at answering questions, and in order to deliver precise and succinct responses to a question, they search through a vast corpus of text, such as books, papers, or websites.

Content generation – LLMs have proven useful in activities involving content generation. They are capable of producing grammatically sound and coherent articles, blog entries, and other written content. 

Text Summarization: LLMs are excellent in text summarization, which entails retaining vital information while condensing lengthy texts into shorter, more digestible summaries. 

Chatbots – LLMs are frequently utilized in the creation of chatbots and systems that use conversational AI. They make it possible for these systems to interact with users in normal language by comprehending their questions, responding appropriately, and keeping context throughout the interaction. 

Language Translation – LLMs are able to accurately translate text between languages accurately, facilitating successful communication despite language hurdles. 

Steps of training an LLM

The initial stage in training an LLM is to compile a sizable textual dataset that the model will utilize to discover linguistic patterns and structures.

Pre-processing is required once the dataset has been gathered to prepare it for training. In order to do this, the data must be cleaned by eliminating any unnecessary or redundant entries.

Selecting the appropriate model architecture is essential for training an LLM. Transformer-based architectures have shown to be very efficient at processing and producing natural language, including the GPT model. 

The model’s parameters are adjusted to train the LLM, and their accuracy is increased using deep learning methods like backpropagation. The model processes the input data during training and produces predictions based on the recognized patterns.

After the initial training, the LLM is further fine-tuned on specific tasks or domains to improve its performance in those areas.

It is essential to evaluate the trained LLM’s performance in order to determine its efficacy by using a number of metrics, including perplexity and accuracy, to assess the model’s performance. 

The LLM is put into use in a production environment for real-world applications once it has been trained and assessed.

Some famous Language Models

GPT – Generative Pre-trained Transformer is a prominent member of OpenAI’s GPT model family and serves as the underlying model for the well-known ChatGPT. It is a decoder-only unidirectional autoregressive model as it generates text by predicting the next word based on the previously generated words. With 175 billion parameters, GPT is popularly used for content generation, question answering, and whatnot.

BERT – Bidirectional Encoder Representations from Transformers (BERT) is one of the first Transformer-based self-supervised language models. It is a potent model for comprehending and processing natural language with 340 million parameters.

PaLM – Google’s Pathways Language Model (PaLM) with 540 billion parameters used a modified version of the common encoder-decoder Transformer model architecture and showed great performance in natural language processing tasks, code generation, question answering, etc. 

LangChain

LLMs have inherent limits when it comes to producing precise answers or addressing tasks that call for in-depth domain knowledge or experience, despite being adaptable and capable of executing a wide range of language tasks. LangChain, in this case, serves as a link between LLMs and subject-matter specialists. While incorporating specialized knowledge from domain experts, it makes use of the power of LLMs. It delivers more precise, thorough, and contextually appropriate answers in specialized subjects by fusing the general language understanding of LLMs with domain-specific expertise.

Importance of LangChain

When asking an LLM for a list of the top-performing stores from the previous week, without the LangChain framework, the LLM would come up with a logical SQL query to extract the desired outcome with fake but plausible column names. With the help of LangChain architecture, programmers can provide the LLM with a range of options and features. They can request that the LLM create a workflow that divides the issue across several parts and can be guided by the LLM’s questions and intermediary steps, leading to the LLM being able to respond with a comprehensive statement.

In order to search for medicine, LLMs can give generic information about medical issues, but they might not have the in-depth understanding needed to make specific diagnoses or therapy suggestions. LangChain, on the other hand, can add medical knowledge from specialists or databases of medical information to improve the LLM’s responses. 

Vector Databases

The vector database is a brand-new and distinctive database rapidly gaining acceptance in AI and machine learning domains. These are distinct from traditional relational databases, designed to store tabular data in rows and columns initially, and more contemporary NoSQL databases, like MongoDB, which store data as JSON documents. This is due to the fact that a vector database is only designed to store and retrieve vector embeddings as data.

A vector database is based on vector embedding, a data encoding carrying semantic information that enables AI systems to interpret and maintain the data long-term. In vector databases, the data is organized and stored using its geometric properties, where the coordinates of each object in space and other qualities that define it are used to identify it. These databases help search for similar items and perform advanced analysis on massive amounts of data. 

Top Vector Databases

Pinecone – Pinecone is a cloud-based vector database that was created with the express purpose of storing, indexing, and rapidly searching large collections of high-dimensional vectors. Its capability to perform real-time indexing and searching is one of its primary characteristics. It can handle both sparse and dense vectors.

Chroma – Chroma is an open-source vector database that provides a quick and scalable way to store and retrieve embeddings. It is user-friendly and lightweight, offering a straightforward API and supporting a variety of backends, including well-liked choices like RocksDB and Faiss. 

Milvus – Milvus is a vector database system that is specifically designed to handle large amounts of complex data in an efficient manner. For a variety of applications, including similarity search, anomaly detection, and natural language processing, it is a strong and adaptable solution that offers high speed, performance, scalability, and specialized functionality.

Redis – It is an amazing vector database with features including indexing and search, distance calculation, high performance, data storage and analysis, and quick response time. 

Vespa – Vespa supports geospatial search, and real-time analytics, gives quick query results, and has high data availability and a number of ranking options.

In conclusion, this year will see unprecedented growth in the widespread use of Artificial Intelligence. This outstanding development is due to the outstanding technological developments, particularly in the fields of Large Language Models (LLMs), LangChain, and Vector Databases. LLMs have transformed natural language processing; LangChain has given programmers a framework to build intelligent agents, and high-dimensional data can now be stored, indexed, and retrieved efficiently with vector databases. Together, these technological innovations have paved the way for an AI-driven future.

Don’t forget to join our 25k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club

References:

https://towardsdatascience.com/self-supervised-transformer-models-bert-gpt3-mum-and-paml-2b5e29ea0c26

https://induraj2020.medium.com/what-are-large-language-models-llms-and-their-uses-ccfdee4ce386

https://www.analyticsvidhya.com/blog/2023/03/an-introduction-to-large-language-models-llms/

https://www.einblick.ai/blog/what-is-langchain-why-use-it/

https://medium.com/databutton/getting-started-with-langchain-a-powerful-tool-for-working-with-large-language-models-286419ba0842

https://thenewstack.io/vector-databases-long-term-memory-for-artificial-intelligence/

https://www.pinecone.io/learn/vector-database

https://geekflare.com/best-vector-databases/

The post 70% of Developers Embrace AI Today: Delving into the Rise of Large Language Models, LangChain, and Vector Databases in Current Tech Landscape appeared first on MarkTechPost.

Paper Summary: A Hybrid Approach With GAN and DP for Privacy Preservat …

Anonymization is a significant problem when handling Industrial Internet of Things (IIoT) data. Machine Learning (ML) applications require decrypted data to perform tasks efficiently, which means that third parties involved in data processing may have access to sensitive information. This poses a risk of privacy leaks and information leakage for the companies generating the data. Consequently, due to these concerns, companies are hesitant to share their IIoT data with third parties.

The state of the art in addressing the anonymization problem involves various approaches such as encryption, homomorphic encryption, cryptographic techniques, and distributed/federated learning. However, these methods have limitations in terms of computational costs, explainability of ML models, and vulnerabilities to cyber-attacks. Furthermore, existing privacy preservation techniques often result in a trade-off between privacy and accuracy, where achieving high privacy protection leads to a significant loss in ML model accuracy. These challenges hinder the effective and efficient preservation of IIoT data privacy.

In this context, a research team from Kadir Has University in Turkey proposed a novel method that combines Generative Adversarial Networks (GAN) and Differential Privacy (DP) to preserve sensitive data in IIoT operations. The hybrid approach aims to achieve privacy preservation with minimal accuracy loss and low additional computational costs. The GAN is used to generate synthetic copies of sensitive data, while DP introduces random noise and parameters to maintain privacy. The proposed method is tested using publicly available datasets and a realistic IIoT dataset collected from a confectionery production process.

The authors propose a hybrid privacy-preserving approach for IIoT environments. Their method involves two main components: GAN and DP.

GAN: They use GAN, specifically the Conditional Tabular GAN (CTGAN) approach, to create a synthetic copy (XG) of the original data set (XO). GAN learns the distribution of the data and generates synthetic data with similar statistics to the original.

DP: To enhance privacy, they add random noise from a Laplace distribution to sensitive features in the data. This technique preserves privacy while maintaining the overall probability distribution of the data.

The proposed approach involves the following:

Creating a synthetic data set with GAN.

Replacing sensitive features.

Applying differential privacy by adding random noise.

The resulting data set is privacy-preserving and can be used for machine learning analysis without compromising sensitive information. The algorithm’s complexity depends on the number of sensitive features and the size of the data set. The authors emphasize that their method ensures overall privacy protection for IIoT data.

The evaluation performed in this paper involved conducting experiments to test the proposed hybrid approach for privacy-preserving data synthesis and prediction. The experiments were done on four SCADA data sets: wind turbine, steam production, energy efficiency, and synchronous motors. The experiments used the CTGAN synthetic data generation and differential privacy (DP) techniques. The evaluation criteria included measuring accuracy using the R-squared metric and privacy preservation using six privacy metrics. The results showed that the proposed hybrid approach achieved higher accuracy and privacy preservation than other methods, such as CTGAN and DP. The experiments also tested the performance of the proposed method on data sets with hidden sensitive features and demonstrated its ability to protect such sensitive data.

In conclusion, the paper proposed a novel hybrid approach combining GAN and DP to address the anonymization problem in Industrial Internet of Things (IIoT) data. The proposed method involves creating a synthetic data set using GAN and applying DP by adding random noise to sensitive features. The evaluation results demonstrated that the proposed hybrid approach achieved higher accuracy and privacy preservation than other methods. This approach offers a promising solution for preserving sensitive data in IIoT environments while minimizing accuracy loss and computational costs.

Check Out the Paper. Don’t forget to join our 25k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Featured Tools:

StoryBird AI

Taplio

Aragon AI

tinyEinstein

AdCreative.ai

SaneBox

Vidon

Audioread

Check Out 100’s AI Tools in AI Tools Club
The post Paper Summary: A Hybrid Approach With GAN and DP for Privacy Preservation of IIoT Data appeared first on MarkTechPost.

Retain original PDF formatting to view translated documents with Amazo …

Companies across various industries create, scan, and store large volumes of PDF documents. In many cases, the content is text-heavy and often written in a different language and requires translation. To address this, you need an automated solution to extract the contents within these PDFs and translate them quickly and cost-efficiently.
Many businesses have diverse global users and need to translate text to enable cross-lingual communication between them. This is a manual, slow, and expensive human effort. There’s a need to find a scalable, reliable, and cost-effective solution to translate documents while retaining the original document formatting.
For verticals such as healthcare, due to regulatory requirements, the translated documents require an additional human in the loop to verify the validity of the machine-translated document.
If the translated document doesn’t retain the original formatting and structure, it loses its context. This can make it difficult for a human reviewer to validate and make corrections.
In this post, we demonstrate how to create a new translated PDF from a scanned PDF while retaining the original document structure and formatting using a geometry-based approach with Amazon Textract, Amazon Translate, and Apache PDFBox.
Solution overview
The solution presented in this post uses the following components:

Amazon Textract – A fully managed machine learning (ML) service that automatically extracts printed text, handwriting, and other data from scanned documents that goes beyond simple optical character recognition (OCR) to identify, understand, and extract data from forms and tables. Amazon Textract can detect text in a variety of documents, including financial reports, medical records, and tax forms.
Amazon Translate – A neural machine translation service that delivers fast, high-quality, and affordable language translation. Amazon Translate provides high-quality on-demand and batch translation capabilities across more than 2,970 language pairs, while decreasing your translation costs.
PDF Translate – An open-source library written in Java and published on AWS Samples in GitHub. This library contains logic to generate translated PDF documents in your desired language with Amazon Textract and Amazon Translate. It also uses the open-source Java library Apache PDFBox to create PDF documents. There are similar PDF processing libraries available in other programming languages, for example Node PDFBox.

While performing machine translations, you may have situations where you wish to preserve specific sections of text from being translated, such as names or unique identifiers. Amazon Translate allows tag modifications, which allows you to specify what text should not be translated. Amazon Translate also supports formality customization, which allows you to customize the level of formality in your translation output.
For details on Amazon Textract limits, refer to Quotas in Amazon Textract.
The solution is restricted to the languages that can be extracted by Amazon Textract, which currently supports English, Spanish, Italian, Portuguese, French, and German. These languages are also supported by Amazon Translate. For the full list of languages supported by Amazon Translate, refer to Supported languages and language codes.
We use the following PDF to demonstrate translating the text from English to Spanish. The solution also supports generating the translated document without any formatting. The position of the translated text is maintained. The source and translated PDF documents can also be found in the AWS Samples GitHub repo.
In the following sections, we demonstrate how to run the translation code on a local machine and look at the translation code in more detail.

Prerequisites
Before you get started, set up your AWS account and the AWS Command Line Interface (AWS CLI). For access to any AWS Services such as Textract and Translate, appropriate IAM permissions are needed. We recommend utilizing least privilege permissions. To learn more about IAM permissions see Policies and permissions in IAM as well as How Amazon Textract works with IAM and How Amazon Translate works with IAM.
Run the translation code on a local machine
This solution focuses on the standalone Java code to extract and translate a PDF document. This is for easier testing and customizations to get the best-rendered translated PDF document. The code can then be integrated into an automated solution to deploy and run in AWS. See Translating PDF documents using Amazon Translate and Amazon Textract for a sample architecture that uses Amazon Simple Storage Service (Amazon S3) to store the documents and AWS Lambda to run the code.
To run the code on a local machine, complete the following steps. The code examples are available on the GitHub repo.

Clone the GitHub repo:

git clone https://github.com/aws-samples/amazon-translate-pdf

Run the following command:

cd amazon-translate-pdf

Run the following command to translate from English to Spanish:

java -jar target/translate-pdf-1.0.jar –source en –translated es

Two translated PDF documents are created in the documents folder, with and without the original formatting (SampleOutput-es.pdf and SampleOutput-min-es.pdf).
Code to generate the translated PDF
The following code snippets show how to take a PDF document and generate a corresponding translated PDF document. It extracts the text using Amazon Textract and creates the translated PDF by adding the translated text as a layer to the image. It builds on the solution shown in the post Generating searchable PDFs from scanned documents automatically with Amazon Textract.
The code first gets each line of text with Amazon Textract. Amazon Translate is used to get translated text and save the geometry of the translated text.

Region region = Region.US_EAST_1;
TextractClient textractClient = TextractClient.builder()
.region(region)
.build();

// Get the input Document object as bytes
Document pdfDoc = Document.builder()
.bytes(SdkBytes.fromByteBuffer(imageBytes))
.build();

TranslateClient translateClient = TranslateClient.builder()
.region(region)
.build();

DetectDocumentTextRequest detectDocumentTextRequest = DetectDocumentTextRequest.builder()
.document(pdfDoc)
.build();

// Invoke the Detect operation
DetectDocumentTextResponse textResponse = textractClient.detectDocumentText(detectDocumentTextRequest);

List<Block> blocks = textResponse.blocks();
List<TextLine> lines = new ArrayList<>();
BoundingBox boundingBox;

for (Block block : blocks) {
if ((block.blockType()).equals(BlockType.LINE)) {
String source = block.text();

TranslateTextRequest requestTranslate = TranslateTextRequest.builder()
.sourceLanguageCode(sourceLanguage)
.targetLanguageCode(destinationLanguage)
.text(source)
.build();

TranslateTextResponse resultTranslate = translateClient.translateText(requestTranslate);

boundingBox = block.geometry().boundingBox();
lines.add(new TextLine(boundingBox.left(),
boundingBox.top(),
boundingBox.width(),
boundingBox.height(),
resultTranslate.translatedText(),
source));
}
}
return lines;

The font size is calculated as follows and can easily be configured:

int fontSize = 20;
float textWidth = font.getStringWidth(text) / 1000 * fontSize;
float textHeight = font.getFontDescriptor().getFontBoundingBox().getHeight() / 1000 * fontSize;
 
if (textWidth > bbWidth) {
    while (textWidth > bbWidth) {
        fontSize -= 1;
        textWidth = font.getStringWidth(text) / 1000 * fontSize;
        textHeight = font.getFontDescriptor().getFontBoundingBox().getHeight() / 1000 * fontSize;
     }
} else if (textWidth < bbWidth) {
     while (textWidth < bbWidth) {
         fontSize += 1;
         textWidth = font.getStringWidth(text) / 1000 * fontSize;
         textHeight = font.getFontDescriptor().getFontBoundingBox().getHeight() / 1000 * fontSize;
      }
}

The translated PDF is created from the saved geometry and translated text. Changes to the color of the translated text can easily be configured.

float width = image.getWidth();
float height = image.getHeight();
 
PDRectangle box = new PDRectangle(width, height);
PDPage page = new PDPage(box);
page.setMediaBox(box);
this.document.addPage(page); //org.apache.pdfbox.pdmodel.PDDocument
 
PDImageXObject pdImage;
 
if(imageType == ImageType.JPEG){
    pdImage = JPEGFactory.createFromImage(this.document, image);
} else {
    pdImage = LosslessFactory.createFromImage(this.document, image);
}
 
PDPageContentStream contentStream = new PDPageContentStream(document, page, PDPageContentStream.AppendMode.OVERWRITE, false);
 
contentStream.drawImage(pdImage, 0, 0);
contentStream.setRenderingMode(RenderingMode.FILL);
 
for (TextLine cline : lines){
    String clinetext = cline.text;
    String clinetextOriginal = cline.originalText;
                      
    FontInfo fontInfo = calculateFontSize(clinetextOriginal, (float) cline.width * width, (float) cline.height * height, font);
    //config to include original document structure – overlay with original
    contentStream.setNonStrokingColor(Color.WHITE);
    contentStream.addRect((float) cline.left * width, (float) (height – height * cline.top – fontInfo.textHeight), (float) cline.width * width, (float) cline.height * height);
    contentStream.fill();
 
    fontInfo = calculateFontSize(clinetext, (float) cline.width * width, (float) cline.height * height, font);
    //config to include original document structure – overlay with translated
    contentStream.setNonStrokingColor(Color.WHITE);
    contentStream.addRect((float) cline.left * width, (float) (height – height * cline.top – fontInfo.textHeight), (float) cline.width * width, (float) cline.height * height);
    contentStream.fill();
    //change the output text color here
    fontInfo = calculateFontSize(clinetext.length() <= clinetextOriginal.length() ? clinetextOriginal : clinetext, (float) cline.width * width, (float) cline.height * height, font);
    contentStream.setNonStrokingColor(Color.BLACK);
    contentStream.beginText();
    contentStream.setFont(font, fontInfo.fontSize);
    contentStream.newLineAtOffset((float) cline.left * width, (float) (height – height * cline.top – fontInfo.textHeight));
    contentStream.showText(clinetext);
    contentStream.endText();
}
contentStream.close()

The following image shows the document translated into Spanish with the original formatting (SampleOutput-es.pdf).

The following image shows the translated PDF in Spanish without any formatting (SampleOutput-min-es.pdf).

Processing time
The employment application pdf took about 10 seconds to extract, process and render the translated pdf. The processing time for text heavy document such as the Declaration of Independence PDF took less than a minute.
Cost
With Amazon Textract, you pay as you go based on the number of pages and images processed. With Amazon Translate, you pay as you go based on the number of text characters that are processed. Refer to Amazon Textract pricing and Amazon Translate pricing for actual costs.
Conclusion
This post showed how to use Amazon Textract and Amazon Translate to generate translated PDF documents while retaining the original document structure. You can optionally postprocess Amazon Textract results to improve the quality of the translation, for example extracted words can be passed through ML-based spellchecks such as SymSpell for data validation, or clustering algorithms can be used to preserve reading order. You can also use Amazon Augmented AI (Amazon A2I) to build human review workflows where you can use your own private workforce to review the original and translated PDF documents to provide more accuracy and context. See Designing human review workflows with Amazon Translate and Amazon Augmented AI and Building a multi-lingual document translation workflow with domain-specific and language-specific customization to get started.

About the Authors
Anubha Singhal is a Senior Cloud Architect at Amazon Web Services in the AWS Professional Services organization.
Sean Lawrence was formerly a Front End Engineer at AWS. He specialized in front end development in the AWS Professional Services organization and the Amazon Privacy team.

Auto-labeling module for deep learning-based Advanced Driver Assistanc …

In computer vision (CV), adding tags to identify objects of interest or bounding boxes to locate the objects is called labeling. It’s one of the prerequisite tasks to prepare training data to train a deep learning model. Hundreds of thousands of work hours are spent generating high-quality labels from images and videos for various CV use cases. You can use Amazon SageMaker Data Labeling in two ways to create these labels:

Amazon SageMaker Ground Truth Plus – This service provides an expert workforce that is trained on ML tasks and can help meet your data security, privacy, and compliance requirements. You upload your data, and the Ground Truth Plus team creates and manages data labeling workflows and the workforce on your behalf.
Amazon SageMaker Ground Truth – Alternatively, you can manage your own data labeling workflows and workforce to label data.

Specifically, for deep learning-based autonomous vehicle (AV) and Advanced Driver Assistance Systems (ADAS), there is a need to label complex multi-modal data from scratch, including synchronized LiDAR, RADAR, and multi-camera streams. For example, the following figure shows a 3D bounding box around a car in the Point Cloud view for LiDAR data, aligned orthogonal LiDAR views on the side, and seven different camera streams with projected labels of the bounding box.

AV/ADAS teams need to label several thousand frames from scratch, and rely on techniques like label consolidation, automatic calibration, frame selection, frame sequence interpolation, and active learning to get a single labeled dataset. Ground Truth supports these features. For a full list of features, refer to Amazon SageMaker Data Labeling Features. However, it can be challenging, expensive, and time-consuming to label tens of thousands of miles of recorded video and LiDAR data for companies that are in the business of creating AV/ADAS systems. One technique used to solve this problem today is auto-labeling, which is highlighted in the following diagram for a modular functions design for ADAS on AWS.

In this post, we demonstrate how to use SageMaker features such as Amazon SageMaker JumpStart models and asynchronous inference capabilities along with Ground Truth’s functionality to perform auto-labeling.
Auto-labeling overview
Auto-labeling (sometimes referred to as pre-labeling) occurs before or alongside manual labeling tasks. In this module, the best-so-far model trained for a particular task (for example, pedestrian detection or lane segmentation) is used to generate high-quality labels. Manual labelers simply verify or adjust the automatically created labels from the resulting dataset. This is easier, faster and cheaper than labeling these large datasets from scratch. Downstream modules such as the training or validation modules can use these labels as is.
Active learning is another concept that is closely related to auto-labeling. It’s a machine learning (ML) technique that identifies data that should be labeled by your workers. Ground Truth’s automated data labeling functionality is an example of active learning. When Ground Truth starts an automated data labeling job, it selects a random sample of input data objects and sends them to human workers. When the labeled data is returned, it’s used to create a training set and a validation set. Ground Truth uses these datasets to train and validate the model used for auto-labeling. Ground Truth then runs a batch transform job to generate labels for unlabeled data, along with confidence scores for new data. Labeled data with low confidence scores is sent to human labelers. This process of training, validating, and batch transform is repeated until the full dataset is labeled.
In contrast, auto-labeling assumes that a high-quality, pre-trained model exists (either privately within the company, or publicly in a hub). This model is used to generate labels that can be trusted and used for downstream tasks such as label verification tasks, training, or simulation. This pre-trained model in the case of AV/ADAS systems is deployed onto the car at the edge, and can be used within large-scale, batch inference jobs on the cloud to generate high-quality labels.
JumpStart provides pretrained, open-source models for a wide range of problem types to help you get started with machine learning. You can use JumpStart to share models within your organization. Let’s get started!
Solution overview
For this post, we outline the major steps without going over every cell in our example notebook. To follow along or try it on your own, you can run the Jupyter notebook in Amazon SageMaker Studio.
The following diagram provides a solution overview.

Set up the role and session
For this example, we used a Data Science 3.0 kernel in Studio on an ml.m5.large instance type. First, we do some basic imports and set up the role and session for use later in the notebook:

import sagemaker, boto3, json
from sagemaker import get_execution_role
from utils import *

Create your model using SageMaker
In this step, we create a model for the auto-labeling task. You can choose from three options to create a model:

Create a model from JumpStart – With JumpStart, we can perform inference on the pre-trained model, even without fine-tuning it first on a new dataset
Use a model shared via JumpStart with your team or organization – You can use this option if you want to use a model developed by one of the teams within your organization
Use an existing endpoint – You can use this option if you have an existing model already deployed in your account

To use the first option, we select a model from JumpStart (here, we use mxnet-is-mask-rcnn-fpn-resnet101-v1d-coco. A list of models is available in the models_manifest.json file provided by JumpStart.
We use this JumpStart model that is publicly available and trained on the instance segmentation task, but you are free to use a private model as well. In the following code, we use the image_uris, model_uris, and script_uris to retrieve the right parameter values to use this MXNet model in the sagemaker.model.Model API to create the model:

from sagemaker import image_uris, model_uris, script_uris, hyperparameters
from sagemaker.model import Model
from sagemaker.predictor import Predictor
from sagemaker.utils import name_from_base

endpoint_name = name_from_base(f”jumpstart-example-infer-{model_id}”)
inference_instance_type = “ml.p3.2xlarge”

# Retrieve the inference docker container uri
deploy_image_uri = image_uris.retrieve(
region=None,
framework=None, # automatically inferred from model_id
image_scope=”inference”,
model_id=model_id,
model_version=model_version,
instance_type=inference_instance_type,
)

# Retrieve the inference script uri. This includes scripts for model loading, inference handling etc.
deploy_source_uri = script_uris.retrieve(
model_id=model_id, model_version=model_version, script_scope=”inference”
)

# Retrieve the base model uri
base_model_uri = model_uris.retrieve(
model_id=model_id, model_version=model_version, model_scope=”inference”
)

# Create the SageMaker model instance
model = Model(
image_uri=deploy_image_uri,
source_dir=deploy_source_uri,
model_data=base_model_uri,
entry_point=”inference.py”, # entry point file in source_dir and present in deploy_source_uri
role=aws_role,
predictor_cls=Predictor,
name=endpoint_name,
)

Set up asynchronous inference and scaling
Here we set up an asynchronous inference config before deploying the model. We chose asynchronous inference because it can handle large payload sizes and can meet near-real-time latency requirements. In addition, you can configure the endpoint to auto scale and apply a scaling policy to set the instance count to zero when there are no requests to process. In the following code, we set max_concurrent_invocations_per_instance to 4. We also set up auto scaling such that the endpoint scales up when needed and scales down to zero after the auto-labeling job is complete.

from sagemaker.async_inference.async_inference_config import AsyncInferenceConfig

async_config = AsyncInferenceConfig(
output_path=f”s3://{sess.default_bucket()}/asyncinference/output”,
max_concurrent_invocations_per_instance=4)
.
.
.
response = client.put_scaling_policy(
PolicyName=”Invocations-ScalingPolicy”,
ServiceNamespace=”sagemaker”, # The namespace of the AWS service that provides the resource.
ResourceId=resource_id, # Endpoint name
ScalableDimension=”sagemaker:variant:DesiredInstanceCount”, # SageMaker supports only Instance Count
PolicyType=”TargetTrackingScaling”, # ‘StepScaling’|’TargetTrackingScaling’
TargetTrackingScalingPolicyConfiguration={
“TargetValue”: 5.0, # The target value for the metric. – here the metric is – SageMakerVariantInvocationsPerInstance
“CustomizedMetricSpecification”: {
“MetricName”: “ApproximateBacklogSizePerInstance”,
“Namespace”: “AWS/SageMaker”,
“Dimensions”: [{“Name”: “EndpointName”, “Value”: endpoint_name}],
“Statistic”: “Average”,
},
“ScaleInCooldown”: 300,
“ScaleOutCooldown”: 300
},
)

Download data and perform inference
We use the Ford Multi-AV Seasonal dataset from the AWS Open Data Catalog.
First, we download and prepare the date for inference. We have provided preprocessing steps to process the dataset in the notebook; you can change it to process your dataset. Then, using the SageMaker API, we can start the asynchronous inference job as follows:

import glob
import time

max_images = 10
input_locations,output_locations, = [], []

for i, file in enumerate(glob.glob(“data/processedimages/*.png”)):
input_1_s3_location = upload_image(sess,file,sess.default_bucket())
input_locations.append(input_1_s3_location)
async_response = base_model_predictor.predict_async(input_path=input_1_s3_location)
output_locations.append(async_response.output_path)
if i > max_images:
break

This may take up to 30 minutes or more depending on how much data you have uploaded for asynchronous inference. You can visualize one of these inferences as follows:

plot_response(‘data/single.out’)

Convert the asynchronous inference output to a Ground Truth input manifest
In this step, we create an input manifest for a bounding box verification job on Ground Truth. We upload the Ground Truth UI template and label categories file, and create the verification job. The notebook linked to this post uses a private workforce to perform the labeling; you can change this if you’re using other types of workforces. For more details, refer to the full code in the notebook.
Verify labels from the auto-labeling process in Ground Truth
In this step, we complete the verification by accessing the labeling portal. For more details, refer to here.
When you access the portal as a workforce member, you will be able to see the bounding boxes created by the JumpStart model and make adjustments as required.

You can use this template to repeat auto-labeling with many task-specific models, potentially merge labels, and use the resulting labeled dataset in downstream tasks.
Clean up
In this step, we clean up by deleting the endpoint and the model created in previous steps:

# Delete the SageMaker endpoint
base_model_predictor.delete_model()
base_model_predictor.delete_endpoint()

Conclusion
In this post, we walked through an auto-labeling process involving JumpStart and asynchronous inference. We used the results of the auto-labeling process to convert and visualize labeled data on a real-world dataset. You can use the solution to perform auto-labeling with many task-specific models, potentially merge labels, and use the resulting labeled dataset in downstream tasks. You can also explore using tools like the Segment Anything Model for generating segment masks as part of the auto-labeling process. In future posts in this series, we will cover the perception module and segmentation. For more information on JumpStart and asynchronous inference, refer to SageMaker JumpStart and Asynchronous inference, respectively. We encourage you to reuse this content for use cases beyond AV/ADAS, and reach out to AWS for any help.

About the authors
Gopi Krishnamurthy is a Senior AI/ML Solutions Architect at Amazon Web Services based in New York City. He works with large Automotive customers as their trusted advisor to transform their Machine Learning workloads and migrate to the cloud. His core interests include deep learning and serverless technologies. Outside of work, he likes to spend time with his family and explore a wide range of music.
Shreyas Subramanian is a Principal AI/ML specialist Solutions Architect, and helps customers by using Machine Learning to solve their business challenges using the AWS platform. Shreyas has a background in large scale optimization and Machine Learning, and in use of Machine Learning and Reinforcement Learning for accelerating optimization tasks.

Researchers From Binghamton University Introduce A Privacy-Enhancing A …

Anonymization is a critical problem in the context of face recognition and identification algorithms. With the increasing productization of these technologies, ethical concerns have emerged regarding the privacy and security of individuals. The ability to recognize and identify individuals through their facial features raises questions about consent, control over personal data, and potential misuse. The current tagging systems in social networks need to adequately address the problem of unwanted or unapproved faces appearing in photos. 

Controversies and ethical concerns have marred the state of the art in face recognition and identification algorithms. Previous systems lacked proper generalization and accuracy guarantees, leading to unintended consequences. Counter-manipulation techniques such as blurring and masking have been employed to turn off face recognition, but they alter the image content and are easily detectable. Adversarial generation and confiscation methods have also been developed, but face recognition algorithms are improving to withstand such attacks. 

 In this context, a new article recently published by a research team from Binghamton University proposes a privacy-enhancing system that leverages deepfakes to mislead face recognition systems without breaking image continuity. They introduce the concept of “My Face My Choice” (MFMC), where individuals can control which photos they appear in, replacing their faces with dissimilar deepfakes for unauthorized viewers.

The proposed method, MFMC, aims to create deepfake versions of photos with multiple people based on complex access rights granted by individuals in the picture. The system operates in a social photo-sharing network, where access rights are defined per face rather than per image. When an image is uploaded, friends of the uploader can be tagged, while the remaining faces are replaced with deepfakes. These deepfakes are carefully selected based on various metrics, ensuring they are quantitatively dissimilar to the original faces but maintain contextual and visual continuity. The authors conduct extensive evaluations using different datasets, deepfake generators, and face recognition approaches to verify the effectiveness and quality of the proposed system. MFMC represents a significant advancement in utilizing face embeddings to create useful deepfakes as a defense against face recognition algorithms.

The article shows the requirements of a deepfake generator that can transfer the identity of a synthetic target face to an original source face while preserving facial and environmental attributes. Authors integrate multiple deepfake generators, such as Nirkin et al., FTGAN, FSGAN, and SimSwap, into their framework. They also introduce three access models: Disclosure by Proxy, Disclosure by Explicit Authorization, and Access Rule Based Disclosure, to balance social media participation and individual privacy. 

The evaluation of the MFMC system includes assessing the reduction in face recognition accuracy using seven state-of-the-art face recognition systems and comparing the results with existing privacy-preserving face alteration methods, such as CIAGAN and Deep Privacy. The evaluation demonstrates the effectiveness of MFMC in reducing face recognition accuracy. It highlights its superiority over other methods in system design, production systemization, and evaluation against face recognition systems.

In conclusion, the article presents the MFMC system as a novel approach to address the privacy concerns associated with face recognition and identification algorithms. By leveraging deepfakes and access rights granted by individuals, MFMC allows users to control which photos they appear in, replacing their faces with dissimilar deepfakes for unauthorized viewers. The evaluation of MFMC demonstrates its effectiveness in reducing face recognition accuracy, surpassing existing privacy-preserving face alteration methods. This research represents a significant step towards enhancing privacy in the era of face recognition technology and opens up possibilities for further advancements in this field.

Check Out the Paper. Don’t forget to join our 25k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Featured Tools:

StoryBird AI

Taplio

Aragon AI

tinyEinstein

AdCreative.ai

SaneBox

Vidon

Audioread

Check Out 100’s AI Tools in AI Tools Club
The post Researchers From Binghamton University Introduce A Privacy-Enhancing Anonymization System (My Face, My Choice) For Everyone To Have Control Over Their Faces In Social Photo Sharing Networks appeared first on MarkTechPost.

What is Field Programmable Gate Array (FPGA): FPGA vs. GPU for Artific …

A Field Programmable Gate Array (FPGA) is an integrated circuit that can be configured and customized after manufacturing. These chips are called “field-programmable” because of this ability. They consist of programmable logic blocks that can be set up to carry out a wide range of functions or act as logic gates, providing the user with great flexibility in how the circuit operates.

Field-programmable gate arrays (FPGAs) are semiconductor devices made up of configurable logic blocks (CLBs) and programmable interconnects. These blocks can perform simple to complex operations and can include memory components such as flip-flops or memory blocks. 

FPGAs are similar to programmable read-only memory chips but can accommodate more gates and are reprogrammable, unlike ASICs, which are designed for specific tasks. They can be used to customize microprocessors for particular uses and are popular in various industries, including wireless communications, data centers, automotive, medical, and aerospace. The reprogrammable nature of FPGAs allows for flexibility and design updates as needed.

                         Source: https://allaboutfpga.com/fpga-architecture/

Source: https://blog.samtec.com/post/new-intel-fpga-platform-features-samtec-interconnect/

Applications of FPGAs

FPGAs are utilized in various industries and have diverse areas of implementation. Some of their primary areas of use include.

Energy Industry

FPGAs can play an important role in smart power grid technology by improving performance and scalability while keeping power consumption low. This is particularly useful in transmission and distribution (T&D) substations where efficient power networks are needed for optimal operation.

Improved automotive experiences

Microsemi FPGAs allow original equipment manufacturers (OEMs) and suppliers to create new safety applications for vehicles, such as cruise control, blind spot warning, and collision avoidance. These FPGAs also provide cybersecurity features like information assurance, anti-tampering, hardware security, and dependability features like error-corrected memory and low static power.

Aerospace and defense

Industrial manufacturing companies provide rad-hard and rad-tolerant FPGAs, which are often space-grade, to meet the performance, reliability, and lifespan requirements of harsh environments. These FPGAs offer greater flexibility than traditional ASIC implementations and are particularly suitable for processing-intensive space systems.

Computer Vision systems

In today’s world, computer vision systems are prevalent in various gadgets such as video surveillance cameras, robots, and other devices. It is often necessary to use an FPGA-based system to enable these gadgets to interact with people appropriately based on their position, surroundings, and facial recognition capabilities.

Data centers

The Internet of Things and big data are resulting in a tremendous increase in the amount of data being acquired and processed. The use of deep learning techniques for parallel computation drives the need for low-latency, flexible, and secure computational capacity. Due to rising space costs, adding more servers cannot meet this demand. FPGAs are gaining acceptance in data centers due to their ability to accelerate processing, flexibility in design, and hardware-based security against software vulnerabilities.

Real-time systems

FPGAs are used in real-time systems where response time is critical, as conventional CPUs have unpredictable response times, making it difficult to predict when a trigger will fire accurately. 

Designing ASICs

Creating the circuit’s architecture is the first step, and then a prototype is constructed and tested using an FPGA, allowing errors to be corrected. Once the prototype performs as expected, an ASIC project is developed. This approach saves time, as creating an integrated circuit can be laborious and complex.

FPGA-based Acceleration as a Service

FPGA-based systems can perform complex tasks and process data more quickly than their virtual counterparts. While not everyone may be able to reprogram an FPGA for a specific task, cloud services are making FPGA-based data processing more accessible to customers. Some cloud providers are even offering a new service called Acceleration as a Service (AaaS), which allows customers to access FPGA accelerators.

With AaaS, one can utilize FPGAs to speed up various types of workloads, such as:

Training machine learning models

Handling big data

Analyzing video streaming

Conducting financial computations

Enhancing databases

Some FPGA manufacturers are already working on creating cloud-based FPGAs for AI workload acceleration and other applications requiring high computing power. For example, Intel is powering the Alibaba Cloud AaaS service known as f1 instances. The Acceleration Stack for Intel Xeon CPU with FPGAs, also available to Alibaba Cloud users, offers two popular software development flows, RTL and OpenCL.

Another major company in the industry, Microsoft, is also competing to build an efficient AI platform. Their project, Brainwave, offers FPGA technology to accelerate deep neural network inferencing. Like Alibaba Cloud, they also use Intel’s Stratix 10 FPGA.

FPGA vs. GPU for Deep Learning/Artificial Intelligence

GPUs excel in parallel processing by performing many arithmetic operations simultaneously, providing significant acceleration in situations where the same workload must be performed in quick succession. However, running AI on GPUs has its limitations. GPUs do not provide the same level of performance as ASICs, which are chips specifically designed for a particular deep-learning workload.

On the other hand, FPGAs offer hardware customization with integrated AI capabilities and can be programmed to mimic the behavior of a GPU or an ASIC. Their reprogrammable and reconfigurable nature makes them suitable for the rapidly changing AI landscape, allowing for quick testing of algorithms and faster time to market. FPGAs offer numerous advantages for deep learning applications and other AI workloads:

Low latency: As compared to a standard GPU, an FPGA has a larger memory bandwidth which allows it to process large volumes of data.

Excellent value and cost-effectiveness: FPGAs can be reprogrammed for different functionalities, making them one of the most cost-effective hardware options. Designers can save cost and board space by integrating additional capabilities onto the same chip.

Low power consumption: With FPGAs, hardwares can be fine-tuned to the application, helping to meet power efficiency requirements. 

Parallelism: One can use an FPGA’s portion for a function rather than the entire chip, which allows it to host multiple functions in parallel.

Integrating AI into workloads: Using FPGAs, AI capabilities like deep packet inspection or financial fraud detection can be added to existing workloads.

Providing acceleration for high-performance computing (HPC) clusters: FPGAs can facilitate the convergence of AI and HPC by serving as programmable accelerators for inference.

Disadvantages of using FPGAs

Programming: While FPGAs offer a high degree of flexibility, they can be difficult to reprogram, and there is a need for more experienced programmers in the market.

Implementation complexity: While the potential for using FPGAs to accelerate deep learning is promising, only a few companies have attempted to implement it. For many AI solution developers, the more traditional combination of GPUs and CPUs is a more manageable option.

Cost: The difficulty of reprogramming the circuit and the shortage of experienced programmers in the market make using an FPGA for accelerating AI-based applications a costly solution. The expense of multiple reprogramming of a circuit can be quite high for small-scale projects.

Lack of libraries: A limited number of ML libraries support FPGAs out of the box.

Also, don’t forget to join our Reddit Page, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
The post What is Field Programmable Gate Array (FPGA): FPGA vs. GPU for Artificial Intelligence (AI) appeared first on MarkTechPost.

Google AI Introduces MediaPipe Diffusion Plugins That Enable Controll …

Diffusion models have been widely used with remarkable success in text-to-image generation in recent years, leading to significant improvements in image quality, inference performance, and the scope of our creative possibilities. However, effective generation management remains a challenge, especially under conditions that are hard to define in words.

MediaPipe dispersion plugins, developed by Google researchers, make it possible to execute on-device text-to-image generation under user control. In this study, we extend our previous work on GPU inference for large generative models on the device itself, and we present low-cost solutions for programmable text-to-image creation that can be integrated into preexisting diffusion models and their Low-Rank Adaptation (LoRA) variations.

Iterative denoising is modeled for image production in diffusion models. Each iteration of the diffusion model begins with an image contaminated by noise and ends with an image of the target notion. Language understanding through text prompts has significantly enhanced the image-generating process. The text embedding is linked to the model for text-to-image production through cross-attention layers. However, the position and pose of an object are two examples of details that could be more challenging to convey using text prompts. Researchers introduce control information from a condition image into diffusion utilizing extra models.

The Plug-and-Play, ControlNet, and T2I Adapter methods are frequently used to generate controlled text-to-image output. To encode the state from an input image, Plug-and-Play employs a copy of the diffusion model (860M parameters for Stable Diffusion 1.5) and a widely-used denoising diffusion implicit model (DDIM) inversion approach that inverts the generation process from an input image to derive an initial noise input. The spatial features with self-attention are extracted from the copied diffusion and injected into the text-to-image diffusion using Plug-and-Play. ControlNet constructs a trainable duplicate of the encoder of a diffusion model and connects it via a convolution layer with zero-initialized parameters to encode conditioning information that is then passed on to the decoder layers. Unfortunately, this has led to a significant increase in size—about 450M parameters for Stable Diffusion 1.5—half as much as the diffusion model itself. T2I Adapter delivers comparable results in controlled generation despite being a smaller network (77M parameters). The condition picture is the only input to T2I Adapter, and the result is used by all subsequent diffusion cycles. However, this style of adapter is not made for mobile gadgets.

The MediaPipe diffusion plugin is a standalone network we developed to make conditioned generation effective, flexible, and scalable.

Connects simply to a trained baseline model; pluggable.

Zero-based training means no weights from the original model were used.

It is portable because it can be run independently of the base model on mobile devices at almost no additional expense.

The plugin is its network, the results of which can be integrated into an existing model for converting text to images. The diffusion model’s (blue) corresponding downsampling layer receives the retrieved features from the plugin.

A portable on-device paradigm for text-to-image creation, the MediaPipe dispersion plugin is available as a free download. It takes a conditioned image and uses multiscale feature extraction to add features at the appropriate scales to the encoder of a diffusion model. When coupled with a text-to-image diffusion model, the plugin model adds a conditioning signal to the image production. We intend for the plugin network to have only 6M parameters, making it a relatively simple model. To achieve rapid inference on mobile devices, MobileNetv2 employs depth-wise convolutions and inverted bottlenecks.

Fundamental Characteristics

Easy-to-understand abstractions for self-service machine learning. To modify, test, prototype, and release an application, use a low-code API or a no-code studio.

Innovative machine learning (ML) approaches to common problems, developed using Google’s ML know-how.

Complete optimization, including hardware acceleration, while remaining small and efficient enough to run smoothly on smartphones running on battery power.

Check Out the Project Page and Google Blog. Don’t forget to join our 25k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Featured Tools:

StoryBird AI

Taplio

Aragon AI

tinyEinstein

AdCreative.ai

SaneBox

Vidon

Audioread

Check Out 100’s AI Tools in AI Tools Club
The post Google AI Introduces MediaPipe Diffusion Plugins That Enable Controllable Text-To-Image Generation On-Device appeared first on MarkTechPost.

Baidu Ernie 3.5 Emerges as Champion in Chinese Language AI: But Is It …

In an exciting breakthrough in the Chinese language AI market, Baidu, the renowned search engine provider, has unveiled its latest model, Ernie 3.5. A large Chinese language model that has been claimed to surpass its counterparts, ChatGPT3 and even GPT4 by Baidu, in several Chinese language capabilities and uses cases. This move by Baidu has put it in the front of the AI race within the country and has concreted its position as a leader in this space in the industry. The claims made by Baidu have been backed by a detailed evaluation conducted by China Science Daily, evaluating the Ernie 3.5 on datasets like AGIeval and C-EVAL, which are standard datasets to test such engines. This noteworthy achievement has inspired other domestic key players such as Alibaba Group, Tencent Holdings, and others to develop AI models that stir competition in this rapidly changing field.

Ernie 3.5 has claimed a remarkable improvement in training and evaluation efficiencies, leading to an optimal inference time than the previous version and less resource intensive. Many different sources have reported almost 50% improvement from the earlier models that Baidu has put forward. This creation is again important because efficiency and cost-effectiveness are highly sought-after characteristics in the AI market. Baidu has a well-laid-out vision for Ernie 3.5. They are planning to launch an external plugin support feature, which will enhance the capability of an existing model to work on specific tasks such as summarization and Question-Answer. It is important to mention that ChatGPT introduced plugin support earlier this year, correctly identifying the growing demand for specialized AI in the market.

Baidu’s claims have been strengthened by an esteemed science journal’s two comprehensive tests. In the first test, Ernie 3.5 outperformed ChatGPT in standard admission and qualification exams, commonly used for college or law school entrance evaluations. Its superior performance in the Chinese language demonstrates its advanced linguistic abilities. The second test evaluated Ernie 3.5 on over 13,000 multiple-choice questions, covering a wide range of subjects. Once again, Ernie 3.5 achieved higher scores than its competitors, cementing its position as the leading Chinese language AI model.

With Huawei all set and locked to unveil its highly anticipated Pangu AI model upgrade on July 7, the competition in the Chinese language AI market is expected to be more fierce and would reach new heights. The intense rivalry among Chinese companies shows their grit and commitment to delivering state-of-the-art  AI models. This language processing and understanding advancements potentially revolutionize various sectors, including education, customer service, and content creation.

It will be interesting to see how Chinese companies would trim down their resources and find scalable alternatives amid all the sanctions laid out on them by the US and how they would achieve and maintain domination amid growing competition.

Check Out the Baidu Article. Don’t forget to join our 25k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Featured Tools:

StoryBird AI

Taplio

tinyEinstein

AdCreative.ai

SaneBox

Vidon

Audioread

Check Out 100’s AI Tools in AI Tools Club
The post Baidu Ernie 3.5 Emerges as Champion in Chinese Language AI: But Is It Really Better Than ChatGPT? appeared first on MarkTechPost.

Best AI Text Generators in 2023

Since the release of ChatGPT, AI text generators have frequently been in the news. An AI text generator can aid in better and faster work if you prompt a suitably trained tool. ChatGPT may be the most well-known AI system right now, but the underlying GPT technology is causing a stir. Its two most recent versions, GPT-3 and GPT-4, are quite potent, and they are also available as an API so that other programmers can incorporate AI text production into their programs. That’s why there are dozens of comparable AI text generators. 

Here are some of the AI text generators to check now:

Jasper 

Regarding text generation using AI, Jasper is a household name. High-quality content of varying lengths that may be tailored to your brand’s tone is easily produced. Jasper is one of the most expensive programs on this list, so take advantage of the demo before committing. Since Zapier supports integration with Jasper, you may automate your AI’s text generation by linking it to all your other work apps.

Copy.ai

Copy.ai is an AI-driven copywriting tool that facilitates the creation of persuasive content for businesses. Neither a membership fee nor a minimum purchase is necessary to join. Cookies are used by this tool to provide a more personalized experience and for advertising. Cookies are used for GDPR compliance and bot identification on this site. Users’ clicks and taps on the site are recorded by the app and used to compile stats and heat maps. Cookies also remember the user’s favored language and server cluster. The user’s experience and the advertisements they see will benefit from this.

Anyword 

Anyword is an artificial intelligence (AI) based text generator and copywriting tool made for use in marketing. It removes the need for guesswork and helps users quickly create compelling content. To aid users in producing high-quality content that meets their specific requirements, Anyword employs an AI system. The AI program examines user input, recognizes recurring themes, and then creates original and tailored content to the user’s needs. Spell check, grammar correction, and optimal sentence structure are just a few of the extra functions in Anyword’s AI writing assistant.

Sudowrite 

Authors can save time using Sudowrite, an advanced AI writing tool, to write their novels or film. Many famous authors and journalists have complimented it, and it has been featured in high-profile journals like The New Yorker, The New York Times, and The Verge. The “Show, Not Tell” button and “Brainstorming Buddy,” two of Sudowrite’s many features, are designed to assist users in honing their writing skills. No prior knowledge or experience with artificial intelligence tools is assumed or required for its use. Human++, Inc. supports the software and offers a free trial period before charging a regular subscription price.

Rytr

Rytr is an AI writing assistant that helps you create high-quality content quickly and affordably. The tool can generate 100% unique content across 40+ use cases and 30+ languages using state-of-the-art language AI. Rytr’s extensive features include a rich-text editor, rewording and shortening tools, plagiarism checks, and formatting options. To top it all off, Rytr offers a browser extension that integrates with your email, documents, social media, invoices, and projects.

Notion AI 

Users may quickly generate material like blog posts, meeting agendas, and sales letters with the help of the powerful AI-driven application Notion AI. The initial draft is written by Notion AI, giving the user a head start on lengthy passages or entire pages. People can write more quickly and efficiently, broaden their thinking and unleash their creativity by tapping into the limitless potential of AI. In addition to these more obvious uses, you may also use it to write poetry, check for typos, translate text inline, and summarize drafts of longer writing pieces. With Notion AI, users can brainstorm, get ideas, and experience the magic of AI-powered content development.

Mem 

Mem is an artificial intelligence-driven note-taking tool. One of its many benefits is an AI-powered text generator that can draw information from your platform-stored notes. That implies the resulting text can be tailored more specifically to your requirements. It’s an excellent alternative to your current note-taking app. Thanks to Zapier integration, Mem can instantly receive data from your other favorite apps.

Frase 

Frase is an artificial intelligence (AI) writing and SEO program that could streamline various steps in the content creation process. It’s free, and you don’t have to know how to code it. Sentence rewriting, summarization, value proposition, slogan, description, paraphrasing, and blog title generation are just a few features of AI-powered writing tools. Different types of content call for other technologies, but they all streamline the process of making material that stands out and holds an audience’s interest in record time. In addition, Frase offers a variety of resources, including a Live Product Walkthrough, blog, crash course, and support center.

Writer 

It’s worth noting that Writer is not a GPT-based AI text generator. Instead, it uses a Palmyra LLM built in-house and trained with information relevant to the company’s operations. You can customize it to your company’s AI requirements by training it on internal data. Because of its integration with Zapier, Writer can receive cues from other apps and return the completed work to its source.

Surfer 

Organizations can use Surfer, a growth management tool driven by artificial intelligence, to boost their organic traffic and search engine rankings. Features like keyword research, a text editor, an SEO audit, and various free tools contribute to its entire content strategy. Using Surfer’s keyword research tool, individuals may establish themselves as authorities on their chosen topics and demonstrate their mastery of SEO best practices. Surfer’s Content Score provides real-time feedback on overall on-page SEO, and the content editor function enables users to write with guidelines and get content ideas.

Copysmith 

Enterprise and e-commerce companies can use Copysmith, an AI-powered content production platform. It has many tools to increase a company’s earnings from its internet presence. It features an API and a Chrome extension to facilitate integration, and it can generate product descriptions and other information in bulk to save time. A campaign builder and an AI-powered image generator are also included. In addition to these examples of applications, Copysmith also provides content enhancement, advertising, social media, blog templates, and creative thinking prompts.

LongShot AI

If you need assistance researching, creating, or optimizing lengthy pieces of material, look no further than LongShot AI, an AI-powered writing aid. By using AI’s processing speed, it speeds up the content creation process for its customers. Teleport Me, Scale, Neverland, Updigital, Digital Minds Group, B2Brain, and Westlake University are just a few of the 20,000+ marketers who put their faith in LongShot. Product Hunt ranked it the second-best product of the day, while G2 named it the easiest to use and the highest performer of winter 2023. LongShot AI’s many capabilities and integrations make it a useful tool for making videos. There’s a blog-building wizard that only takes 4 easy steps to complete. Users may tailor material for specific purposes, including copywriting, narrative, and email, with the help of AI-powered templates.

Hypotenuse AI

Using just a few keywords, users of Hypotenuse AI may quickly and effectively generate their own unique, informative articles, product descriptions, and social media copy. Regarding automated product descriptions, the AI Product Description Generator has you covered with Shopify support, an API, and flexible pricing plans. Image generation, content detective, batch generation, search engine optimization and electronic commerce, writing and brainstorming, summarizing, paid advertising and social networking, and integrations are all available on the platform. Hypotenuse makes it easy to produce high-quality material backed by solid research. From content brainstorming to AI-aided campaign co-creation, the entire writing process is easier for users.

Ink 

Available on the App Store for iOS devices, Ink is an AI-powered writing aid. The application uses AI to generate content across various subject areas. Business, copywriting, marketing, SMM, advertising, responses, and many more are just a few of the more than 50 modes and categories available in the program. In addition to the pre-set categories, users can use the Chat feature to make up their content. Students, professionals, and anybody else who wants to improve their writing can use this AI-powered technology to their advantage. Spell check, paraphrase, summarize, and correction tools are just some of the functions that make this program useful.

Outranking 

To help content teams achieve their full potential, Outranking is equipped with artificial intelligence and SEO tools. It’s a hub where content can be mapped, created, optimized, and monitored. Thanks to the AI-powered blog generator, you may have fresh content on your blog in a few minutes. A content team may use sophisticated term clustering and search engine ranking research in just a few short minutes to create a fully automated content plan. The platform organizes keywords into various groupings to maximize content return on investment. Artificial intelligence (AI) may provide writers with precise advice as they craft material to effectively convey the value of a brand, the benefits of a product, or the nature of a service. In addition, sophisticated scoring algorithms based on entities, related queries, and content coverage are enabled, allowing for optimized content to be generated.

Wordtune 

Wordtune is an AI-powered writing assistant and editor designed to help its users hone their craft. The application’s artificial intelligence-powered writing enhancement features include rewriting, paraphrasing, and suggestion tools for rapid revision. A summarizer, editor, and “Wordtune spices” are also available for users to put their spin on their writing. This desktop program is compatible with Chrome and can be downloaded from the Chrome web store. Users of Wordtune have praised the program for helping them think of better phrase alternatives, which has improved their writing and made their messages clearer to others. In addition, the tool’s AI-powered capabilities enable writers to feel more assured in their work by addressing common concerns. Wordtune is an invaluable tool for writers and content creators of all stripes.

GrowthBar 

GrowthBar is an AI-driven writing tool that streamlines the process of producing high-quality, long-form material for blogs and content teams. It has a Blog Topic Generator, Keyword Research Tool, Competitor Research Tool, Keyword Ranking Tool, and Free AI Writing Tools in addition to an AI Blog Outline, an On-Page SEO Audit Tool, an AI Paragraph Rewriter, an AI Meta Description, and an AI Writing Tool. An accompanying Chrome extension provides users with keyword and competition insights while they browse the web, allowing them to write in WordPress with AI. Thousands of marketers, bloggers, and agencies have given GrowthBar a perfect 5-star rating, and it has been highlighted in several media outlets.

Simplified 

Simplified is an all-in-one app built for them to help modern marketing teams save time and work together more effectively. It can write in several languages and has many free photos, videos, audio clips, and hundreds of designer templates. It’s free to use, has a content schedule for posting to social media at predetermined times, and more. Graphic design, artificial intelligence (AI) authoring, video and animation, social media strategy, and more are some areas where this software excels. Particularly well-liked are its Instagram Reels, animation maker, content rewriter, backdrop remover, and magic resizer (all powered by artificial intelligence).

Copymatic 

Automatically generate digital advertising, website text, blog posts, and more with Copymatic, an AI-powered copywriting and content creation tool. Using the GPT-3 AI language model, it can quickly and easily generate content that sounds and reads like someone wrote it. Users can modify Copymatic’s creative and voice settings to produce engaging and persuasive material. Product names, descriptions, social media postings, meta tags, and introductions can all be generated with this application. Copymatic also features a grammar checker and the ability to rework text freshly and insightfully.

Don’t forget to join our 25k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com
The post Best AI Text Generators in 2023 appeared first on MarkTechPost.