August 2023 - Page 5 of 9

Unmasking Deepfakes: Leveraging Head Pose Estimation Patterns for Enha …

Posted on August 16, 2023 by i-genie

The emergence of the ability to produce “fake” videos has sparked significant worries regarding the trustworthiness of visual content. Distinguishing between authentic and counterfeit information is crucial in addressing this predicament. Various algorithms leveraging deep learning and facial landmarks have demonstrated captivating outcomes in tackling this challenge. The main challenge in detecting fake videos lies in the potential harm caused by convincing deepfake technology, which can be used for deception, evidence tampering, privacy violation, and misinformation. Detecting these videos requires combining techniques like analyzing facial movements, textures, and temporal consistency, often utilizing machine learning like convolutional neural networks (CNNs).

Recent studies have focused on detecting deepfakes using various approaches. Some treat deepfakes as anomalies, looking for depth, background, and local-global information inconsistencies. Others see deepfakes as a unique pattern, utilizing deep learning techniques to analyze facial traits and color spaces. These efforts contribute to ongoing endeavors to differentiate real content from deepfake videos.

In this context, A new paper was recently published in which a new solution was proposed involving using head posture estimation (HPE) as a unique identifier for differentiating real videos from deepfakes. The authors suggest that analyzing the head posture of individuals in videos can help distinguish between genuine and deepfake content. This approach focuses on the angles of head orientation to spot inconsistencies introduced during video manipulation. The study aims to evaluate the effectiveness of this technique using various methods and datasets, contributing to improved deepfake detection strategies.

The main idea of the proposed method is to use head posture estimation as a characteristic feature for detecting deepfake videos.

HPE involves determining a person’s head position and orientation in an image or video. This information can be used to identify discrepancies introduced by deepfake manipulation, as even minor changes in head alignment can be challenging to replicate accurately. The study analyzes three HPE methods and conducts both horizontal and vertical analyses on the popular FF++ deepfake dataset. The goal is to identify the most effective method for deepfake detection.

The authors conducted experiments to detect deepfake videos using head pose patterns. They used the “FaceForensics++” dataset, which includes real and manipulated videos. They employed KNN with Dynamic Time Warping (DTW) to align sequences and deep learning models (1D convolution and GRU) to capture temporal patterns. These methods aimed to classify videos as real or fake based on head poses. The best results came from the HPE-based approach using FSA-Net with KNN-DTW. This method outperformed several state-of-the-art methods, showing stability and transferability across different subsets of the dataset. The study suggests head pose patterns are effective for deepfake detection, particularly for less realistic attacks like FaceSwap.

In conclusion, in this article, we presented a new method published recently in response to the growing threat of deepfake videos. This approach utilizes HPE to identify deepfakes by analyzing head orientations in videos for inconsistencies. This research team evaluated three HPE methods using the FF++ deepfake dataset and conducted experiments involving KNN with Dynamic Time Warping (DTW) and deep learning models. The HPE-based approach, employing FSA-Net with KNN-DTW, demonstrated superior performance over state-of-the-art methods. This underscores the potential of using head pose patterns to effectively detect deepfakes, especially in less realistic manipulations like FaceSwap.

Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 28k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

The post Unmasking Deepfakes: Leveraging Head Pose Estimation Patterns for Enhanced Detection Accuracy appeared first on MarkTechPost.

Google DeepMind Researchers Introduce SynJax: A Deep Learning Library …

Posted on August 16, 2023 by i-genie

Data may be viewed as having a structure in various areas that explains how its components fit together to form a greater whole. Depending on the activity, this structure is typically latent and changes. Consider Figure 1 for illustrations of distinct structures in natural language. Together, the words make up a sequence. There is a part-of-speech tag applied to each word in a sequence. These tags are interconnected, generating the red-hued linear chain. By segmenting the sentence, which is depicted with bubbles, the words in the sentence may be put together into tiny, disjointed, contiguous clusters. A more thorough examination of language would reveal that groups may be made recursively, creating a syntactic tree structure. Structures can connect two languages as well.

An alignment, for instance, in the same picture can link a Japanese translation to an English source. These grammatical constructs are universal. In biology, similar structures can be found. Tree-based models of RNA capture the hierarchical aspect of the protein folding process, whereas monotone alignment is used to match the nucleotides in RNA sequences. Genomic data is also split into contiguous groups. Most current deep-learning models make no explicit attempt to represent the intermediate structure and instead seek to predict output variables straight from the input. These models could benefit from explicit modeling of structure in several ways. Using the appropriate inductive biases could facilitate improved generalization. This would enhance downstream performance in addition to sample efficiency.

Figure 1: Examples of natural language structures.

Explicit structure modeling can incorporate a problem-specific set of restrictions or methods. The judgments made by the model are also more easily understandable because of the discrete structure. Finally, there are occasions when the structure is the result of learning itself. For instance, they may be aware that the data is explained by a hidden structure of a certain shape, but they need to know more about it. For modeling sequences, auto-regressive models are the predominant technique. In some situations, non-sequential structures can be linearized and proxied by a sequential structure. These models are strong because they don’t rely on independent assumptions and can be trained using much data. While identifying the ideal structure or marginalizing over hidden variables are common inference issues, sampling from auto-regressive models is often not tractable.

Using auto-regressive models in large-scale models is challenging because they demand biassed or high-variance approximations, which are frequently computationally costly. Models over factor graphs that factorize the same way as the target structure are an alternative to auto-regressive models. These models can precisely and efficiently calculate all interesting inference issues by employing specialized methods. Although each structure requires a unique method, each inference task does not require a specialized algorithm (argmax, sampling, marginals, entropy, etc.). To extract several numbers from just one function for each structure type, SynJax employs automated differentiation, as they shall demonstrate later.

Figure 2: A spanning tree implementation example of a policy gradient with a self-critical baseline and entropy regularisation.

The lack of practical libraries that would offer accelerator-friendly implementations of structure components has slowed research into structured distributions for deep understanding, especially since these components depend on algorithms that frequently do not map directly onto available deep learning primitives, unlike Transformer models. Researchers from Google Deepmind offer simple-to-use structural primitives that combine within the JAX machine learning framework, helping SynJax solve the challenge. Consider the example in Figure 2 to demonstrate how simple SynJax is to use. This code implements a policy gradient loss that necessitates computing several parameters, including sampling, argmax, entropy, and log probability, each of which requires a separate approach.

The structure is a nonprojective directed spanning tree with a single root edge restriction in this code line. As a result, SynJax will employ dist.sample() Wilson’s sampling approach for single-root trees, dist.entropy (), and Tarjan’s maximum spanning tree algorithm for single-root edge trees. Single-root edge trees can use the Matrix-Tree Theorem. Only one flag needs to be changed for SynJax to use entirely different algorithms that are suitable for that structure—Kuhlmann’s algorithm for argmax and various iterations of Eisner’s algorithm for other quantities—if they only want to slightly alter the type of trees by mandating that the trees adhere to the projectivity constraint as users. Because SynJax takes care of everything related to such algorithms, the user may concentrate on the modeling aspect of their issue without implementing them or even understanding how they work.

Check out the Paper and Github. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 28k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

The post Google DeepMind Researchers Introduce SynJax: A Deep Learning Library For JAX Structured Probability Distribution appeared first on MarkTechPost.

How Amazon Shopping uses Amazon Rekognition Content Moderation to revi …

Posted on August 16, 2023 by i-genie

Customers are increasingly turning to product reviews to make informed decisions in their shopping journey, whether they’re purchasing everyday items like a kitchen towel or making major purchases like buying a car. These reviews have transformed into an essential source of information, enabling shoppers to access the opinions and experiences of other customers. As a result, product reviews have become a crucial aspect of any store, offering valuable feedback and insights to help inform purchase decisions.
Amazon has one of the largest stores with hundreds of millions of items available. In 2022, 125 million customers contributed nearly 1.5 billion reviews and ratings to Amazon stores, making online reviews at Amazon a solid source of feedback for customers. At the scale of product reviews submitted every month, it is essential to verify that these reviews align with Amazon Community Guidelines regarding acceptable language, words, videos, and images. This practice is in place to guarantee customers receive accurate information regarding the product, and to prevent reviews from including inappropriate language, offensive imagery, or any type of hate speech directed towards individuals or communities. By enforcing these guidelines, Amazon can maintain a safe and inclusive environment for all customers.
Content moderation automation allows Amazon to scale the process while keeping high accuracy. It’s a complex problem space with unique challenges and requiring different techniques for text, images, and videos. Images are a relevant component of product reviews, often providing a more immediate impact on customers than text. With Amazon Rekognition Content Moderation, Amazon is able to automatically detect harmful images in product reviews with higher accuracy, reducing reliance on human reviewers to moderate such content. Rekognition Content Moderation has helped to improve the well-being of human moderators and achieve significant cost savings.

Moderation with self-hosted ML models
The Amazon Shopping team designed and implemented a moderation system that uses machine learning (ML) in conjunction with human-in-the-loop (HITL) review to ensure product reviews are about the customer experience with the product and don’t contain inappropriate or harmful content as per the community guidelines. The image moderation subsystem, as illustrated in the following diagram, utilized multiple self-hosted and self-trained computer vision models to detect images that violate Amazon guidelines. The decision handler determines the moderation action and provides reasons for its decision based on the ML models’ output, thereby deciding whether the image required a further review by a human moderator or could be automatically approved or rejected.

With these self-hosted ML models, the team started by automating decisions on 40% of the images received as part of the reviews and continuously worked on improving the solution through the years while facing several challenges:

Ongoing efforts to improve automation rate – The team desired to improve the accuracy of ML algorithms, aiming to increase the automation rate. This requires continuous investments in data labeling, data science, and MLOps for models training and deployment.
System complexity – The architecture complexity requires investments in MLOps to ensure the ML inference process scales efficiently to meet the growing content submission traffic.

Replace self-hosted ML models with the Rekognition Content Moderation API
Amazon Rekognition is a managed artificial intelligence (AI) service that offers pre-trained models through an API interface for image and video moderation. It has been widely adopted by industries such as ecommerce, social media, gaming, online dating apps, and others to moderate user-generated content (UGC). This includes a range of content types, such as product reviews, user profiles, and social media post moderation.
Rekognition Content Moderation automates and streamlines image and video moderation workflows without requiring ML experience. Amazon Rekognition customers can process millions of images and videos, efficiently detecting inappropriate or unwanted content, with fully managed APIs and customizable moderation rules to keep users safe and the business compliant.
The team successfully migrated a subset of self-managed ML models in the image moderation system for nudity and not safe for work (NSFW) content detection to the Amazon Rekognition Detect Moderation API, taking advantage of the highly accurate and comprehensive pre-trained moderation models. With the high accuracy of Amazon Rekognition, the team has been able to automate more decisions, save costs, and simplify their system architecture.

Improved accuracy and expanded moderation categories
The implementation of the Amazon Rekognition image moderation API has resulted in higher accuracy for detection of inappropriate content. This implies that an additional approximate of 1 million images per year will be automatically moderated without the need for any human review.
Operational excellence
The Amazon Shopping team was able to simplify the system architecture, reducing the operational effort required to manage and maintain the system. This approach has saved them months of DevOps effort per year, which means they can now allocate their time to developing innovative features instead of spending it on operational tasks.
Cost reduction
The high accuracy from Rekognition Content Moderation has enabled the team to send fewer images for human review, including potentially inappropriate content. This has reduced the cost associated with human moderation and allowed moderators to focus their efforts on more high-value business tasks. Combined with the DevOps efficiency gains, the Amazon Shopping team achieved significant cost savings.
Conclusion
Migrating from self-hosted ML models to the Amazon Rekognition Moderation API for product review moderation can provide many benefits for businesses, including significant cost savings. By automating the moderation process, online stores can quickly and accurately moderate large volumes of product reviews, improving the customer experience by ensuring that inappropriate or spam content is quickly removed. Additionally, by using a managed service like the Amazon Rekognition Moderation API, companies can reduce the time and resources needed to develop and maintain their own models, which can be especially useful for businesses with limited technical resources. The API’s flexibility also allows online stores to customize their moderation rules and thresholds to fit their specific needs.
Learn more about content moderation on AWS and our content moderation ML use cases. Take the first step towards streamlining your content moderation operations with AWS.

About the Authors
Shipra Kanoria is a Principal Product Manager at AWS. She is passionate about helping customers solve their most complex problems with the power of machine learning and artificial intelligence. Before joining AWS, Shipra spent over 4 years at Amazon Alexa, where she launched many productivity-related features on the Alexa voice assistant.
Luca Agostino Rubino is a Principal Software Engineer in the Amazon Shopping team. He works on Community features like Customer Reviews and Q&As, focusing through the years on Content Moderation and on scaling and automation of Machine Learning solutions.
Lana Zhang is a Senior Solutions Architect at AWS WWSO AI Services team, specializing in AI and ML for Content Moderation, Computer Vision, Natural Language Processing and Generative AI. With her expertise, she is dedicated to promoting AWS AI/ML solutions and assisting customers in transforming their business solutions across diverse industries, including social media, gaming, e-commerce, media, advertising & marketing.

Intelligent video and audio Q&A with multilingual support using LL …

Posted on August 16, 2023 by i-genie

Digital assets are vital visual representations of products, services, culture, and brand identity for businesses in an increasingly digital world. Digital assets, together with recorded user behavior, can facilitate customer engagement by offering interactive and personalized experiences, allowing companies to connect with their target audience on a deeper level. Efficiently discovering and searching for specific content within digital assets is crucial for businesses to optimize workflows, streamline collaboration, and deliver relevant content to the right audience. According to a study, by 2021, videos already make up 81% of all consumer internet traffic. This observation comes as no surprise because video and audio are powerful mediums offering more immersive experiences and naturally engages target audiences on a higher emotional level.
As companies accumulate large volumes of digital assets, it becomes more challenging to organize and manage them effectively to maximize their value. Traditionally, companies attach metadata, such as keywords, titles, and descriptions, to these digital assets to facilitate search and retrieval of relevant content. But this requires a well-designed digital asset management system and additional efforts to store these assets in the first place. In reality, most of the digital assets lack informative metadata that enables efficient content search. Additionally, you often need to do an analysis of different segments of the whole file and discover the concepts that are covered there. This is time consuming and requires a lot of manual effort.
Generative AI, particularly in the realm of natural language processing and understanding (NLP and NLU), has revolutionized the way we comprehend and analyze text, enabling us to gain deeper insights efficiently and at scale. The advancements in large language models (LLMs) have led to richer representations of texts, which provides better search capabilities for digital assets. Retrieval Augmented Generation (RAG), built on top of LLMs and advanced prompt techniques, is a popular approach to provide more accurate answers based on information hidden in the enterprise digital asset store. By taking advantage of embedding models of LLMs, and powerful indexers and retrievers, RAG can comprehend and process spoken or written queries and quickly find the most relevant information in the knowledge base. Previous studies have shown how RAG can be applied to provide a Q&A solution connecting with an enterprise’s private domain knowledge. However, among all types of digital assets, video and audio assets are the most common and important.
The RAG-based video/audio question answering solution can potentially solve business problems of locating training and reference materials that are in the form of non-text content. With limited tags or metadata associated of these assets, the solution is trying to make users interact with the chatbot and get answers to their queries, which could be links to specific video training (“I need link to Amazon S3 data storage training”) links to documents (“I need link to learn about machine learning”), or questions that were covered in the videos (“Tell me how to create an S3 bucket”). The response from the chatbot will be able to directly answer the question and also include the links to the source videos with the specific timestamp of the contents that are most relevant to the user’s request.
In this post, we demonstrate how to use the power of RAG in building a Q&A solution for video and audio assets on Amazon SageMaker.
Solution overview
The following diagram illustrates the solution architecture.

The workflow mainly consists of the following stages:

Convert video to text with a speech-to-text model and text alignment with videos and organization. We store the data in Amazon Simple Storage Service (Amazon S3).
Enable intelligent video search using a RAG approach with LLMs and LangChain. Users can get answers generated by LLMs and relevant sources with timestamps.
Build a multi-functional chatbot using LLMs with SageMaker, where the two aforementioned solutions are wrapped and deployed.

For a detailed implementation, refer to the GitHub repo.
Prerequisites
You need an AWS account with an AWS Identity and Access Management (IAM) role with permissions to manage resources created as part of the solution. For details, refer to create an AWS account.
If this is your first time working with Amazon SageMaker Studio, you first need to create a SageMaker domain. Additionally, you may need to request a service quota increase for the corresponding SageMaker processing and hosting instances. For preprocessing the video data, we use an ml.p3.2xlarge SageMaker processing instance. For hosting Falcon-40B, we use an ml.g5.12xlarge SageMaker hosting instance.
Convert video to text with a speech-to-text model and sentence embedding model
To be able to search through video or audio digital assets and provide contextual information from videos to LLMs, we need to convert all the media content to text and then follow the general approaches in NLP to process the text data. To make our solution more flexible to handle different scenarios, we provide the following options for this task:

Amazon Transcribe and Amazon Translate – If each video and audio file only contains one language, we highly recommend that you choose Amazon Transcribe, which is an AWS managed service to transcribe audio and video files. If you need to translate them into the same language, Amazon Translate is another AWS managed service, which supports multilingual translation.
Whisper – In real-world use cases, video data may include multiple languages, such as foreign language learning videos. Whisper is a multitasking speech recognition model that can perform multilingual speech recognition, speech translation, and language identification. You can use a Whisper model to detect and transcribe different languages on video data, and then translate all the different languages into one language. It’s important for most RAG solutions to run on the knowledge base with the same language. Even though OpenAI provides the Whisper API, for this post, we use the Whisper model from Hugging Face.

We run this task with an Amazon SageMaker Processing job on existing data. You can refer to data_preparation.ipynb for the details of how to run this task.
Convert video data to audio data
Because Amazon Transcribe can handle both video and audio data and the Whisper model can only accept audio data, to make both options work, we need to convert video data to audio data. In the following code, we use VideoFileClip from the library moviepy to run this job:
from moviepy.editor import VideoFileClip

video = VideoFileClip(video_path)
video.audio.write_audiofile(audio_path)
Transcribe audio data
When the audio data is ready, we can choose from our two transcribing options. You can choose the optimal option based on your own use case with the criteria we mentioned earlier.
Option 1: Amazon Transcribe and Amazon Translate
The first option is to use Amazon AI services, such as Amazon Transcribe and Amazon Translate, to get the transcriptions of the video and audio datasets. You can refer to the following GitHub example when choosing this option.
Option 2: Whisper
A Whisper model can handle audio data up to 30 seconds in duration. To handle large audio data, we adopt transformers.pipeline to run inference with Whisper. When searching relevant video clips or generating contents with RAG, timestamps for the relevant clips are the important references. Therefore, we turn return_timestamps on to get outputs with timestamps. By setting the parameter language in generate_kwargs, all the different languages in one video file are transcribed and translated into the same language. stride_length_s is the length of stride on the left and right of each chunk. With this parameter, we can make the Whisper model see more context when doing inference on each chunk, which will lead to a more accurate result. See the following code:
from transformers import pipeline
import torch

target_language = “en”
whisper_model = “whisper-large-v2”

device = “cuda:0” if torch.cuda.is_available() else “cpu”
pipe = pipeline(
“automatic-speech-recognition”,
model=f”openai/{whisper_model}”,
device=device
)

generate_kwargs = {“task”:”transcribe”, “language”:f”<|{target_language}|>”}
prediction = pipe(
file_path,
return_timestamps=True,
chunk_length_s=30,
stride_length_s=(5),
generate_kwargs=generate_kwargs
)
The output of pipe is the dictionary format data with items of text and chunks. text contains the entire transcribed result, and chunks consists of chunks with the timestamp and corresponding transcribed result (see the following screenshot). We use data in chunks to do further processing.

As the preceding screenshot shows, lot of sentences have been cut off and split into different chunks. To make the chunks more meaningful, we need to combine sentences cut off and update timestamps in the next step.
Organize sentences
We use a very simple rule to combine sentences. When the chunk ends with a period (.), we don’t make any change; otherwise, we concatenate it with the next chunk. The following code snippet explains how we make this change:
prev_chunk = None
new_chunks = []
for chunk in chunks:
if prev_chunk:
chunk[‘text’] = prev_chunk[‘text’] + chunk[‘text’]
chunk[‘timestamp’] = (prev_chunk[‘timestamp’][0], chunk[‘timestamp’][1])

if not chunk[‘text’].endswith(‘.’):
prev_chunk = chunk
else:
new_chunks.append(chunk)
prev_chunk = None
Compared to the original chunks produced by the audio-to-text converts, we can get complete sentences that are cut off originally.

Chunk sentences
The text content in documents is normally organized by paragraph. Each paragraph focuses on the same topic. Chunking by paragraph may help embed texts into more meaningful vectors, which may improve retrieval accuracy.
Unlike the normal text content in documents, transcriptions from the transcription model are not paragraphed. Even though there are some stops in the audio files, sometimes it can’t be used to paragraph sentences. On the other hand, langchain provides the recursive chunking text splitter function RecursiveCharacterTextSplitter, which can keep all the semantically relevant content in the same chunk. Because we need to keep timestamps with chunks, we implement our own chunking process. Inspired by the post How to chunk text into paragraphs using python, we chunk sentences based on the similarity between the adjacent sentences with a sentence embedding approach. The basic idea is to take the sentences with the lowest similarity to adjacent sentences as the split points. We use all-MiniLM-L6-v2 for sentence embedding. You can refer the original post for the explanation of this approach. We have made some minor changes on the original source code; refer to our source code for the implementation. The core part for this process is as follows:
# Embed sentences
model_name = “all-minilm-l6-v2″
model = SentenceTransformer(model_name)
embeddings = model.encode(sentences_all)
# Create similarities matrix
similarities = cosine_similarity(embeddings)

# Let’s apply our function. For long sentences i reccomend to use 10 or more sentences
minmimas = activate_similarities(similarities, p_size=p_size, order=order)

# Create empty string
split_points = [each for each in minmimas[0]]
text = ”

para_chunks = []
para_timestamp = []
start_timestamp = 0

for num, each in enumerate(sentences_all):
current_timestamp = timestamps_all[num]

if text == ” and (start_timestamp == current_timestamp[1]):
start_timestamp = current_timestamp[0]

if num in split_points:
para_chunks.append(text)
para_timestamp.append([start_timestamp, current_timestamp[1]])
text = f'{each}. ‘
start_timestamp = current_timestamp[1]
else:
text+=f'{each}. ‘

if len(text):
para_chunks.append(text)
para_timestamp.append([start_timestamp, timestamps_all[-1][1]])
To evaluate the efficiency of chunking with sentence embedding, we conducted qualitative comparisons between different chunking mechanisms. The assumption underlying such comparisons is that if the chunked texts are more semantically different and separate, there will be less irrelevant contextual information being retrieved for the Q&A, so that the answer will be more accurate and precise. At the same time, because less contextual information is sent to LLMs, the cost of inference will also be less as charges increment with the size of tokens.
We visualized the first two components of a PCA by reducing high dimension into two dimensions. Compared to recursive chunking, we can see the distances between vectors representing different chunks with sentence embedding are more scattered, meaning the chunks are more semantically separate. This means when the vector of a query is close to the vector of one chunk, it may have less possibility to be close to other chunks. A retrieval task will have fewer opportunities to choose relevant information from multiple semantically similar chunks.

When the chunking process is complete, we attach timestamps to the file name of each chunk, save it as a single file, and then upload it to an S3 bucket.
Enable intelligent video search using a RAG-based approach with LangChain
There are typically four approaches to build a RAG solution for Q&A with LangChain:

Using the load_qa_chain functionality, which feeds all information to an LLM. This is not an ideal approach given the context window size and the volume of video and audio data.
Using the RetrievalQA tool, which requires a text splitter, text embedding model, and vector store to process texts and retrieve relevant information.
Using VectorstoreIndexCreator, which is a wrapper around all logic in the second approach. The text splitter, text embedding model, and vector store are configured together inside the function at one time.
Using the ConversationalRetrievalChain tool, which further adds memory of chat history to the QA solution.

For this post, we use the second approach to explicitly customize and choose the best engineering practices. In the following sections, we describe each step in detail.
To search for the relevant content based on the user input queries, we use semantic search, which can better understand the intent behind and query and perform meaningful retrieval. We first use a pre-trained embedding model to embed all the transcribed text into a vector space. At search time, the query is also embedded into the same vector space and the closest embeddings from the source corpus are found. You can deploy the pre-trained embedding model as shown in Question answering using Retrieval Augmented Generation with foundation models in Amazon SageMaker JumpStart to create the embeddings for semantic search. In our post, we adopt similar ways to create an intelligent video search solution using a RAG-based approach with the open-source LangChain library. LangChain is an open-source framework for developing applications powered by language models. LangChain provides a generic interface for many different LLMs.
We first deploy an embedding model GPT-J 6B provided by Amazon SageMaker JumpStart and the language model Falcon-40B Instruct from Hugging Face to prepare for the solution. When the endpoints are ready, we follow similar steps described Question answering using Retrieval Augmented Generation with foundation models in Amazon SageMaker JumpStart to create the LLM model and embedding model for LangChain.
The following code snippet shows how to create the LLM model using the langchain.llms.sagemaker_endpoint.SagemakerEndpoint class and transform the request and response payload for the LLM in the ContentHandler:
from langchain.llms.sagemaker_endpoint import LLMContentHandler, SagemakerEndpoint

parameters = {
“max_new_tokens”: 500,
}

class ContentHandler(LLMContentHandler):
content_type = “application/json”
accepts = “application/json”

def transform_input(self, prompt: str, model_kwargs={}) -> bytes:
self.len_prompt = len(prompt)
input_str = json.dumps({“inputs”: prompt , “parameters”: {**model_kwargs}})
return input_str.encode(“utf-8”)

def transform_output(self, output: bytes) -> str:
response_json = output.read()
res = json.loads(response_json)
print(res)
ans = res[0][‘generated_text’][self.len_prompt:]
return ans

content_handler = ContentHandler()

sm_llm = SagemakerEndpoint(
endpoint_name=_MODEL_CONFIG_[“huggingface-falcon-40b”][“endpoint_name”],
region_name=aws_region,
model_kwargs=parameters,
content_handler=content_handler,
)
When we use a SageMaker JumpStart embedding model, we need to customize the LangChain SageMaker endpoint embedding class and transform the model request and response to integrate with LangChain. Load the processed video transcripts using the LangChain document loader and create an index.
We use the DirectoryLoader package in LangChain to load the text documents into the document loader:
loader = DirectoryLoader(“./data/demo-video-sagemaker-doc/”, glob=”*/.txt”)
documents = loader.load()
Next, we use the embedding models to create the embeddings of the contents and store the embeddings in a FAISS vector store to create an index. We use this index to find relevant documents that are semantically similar to the input query. With the VectorstoreIndexCreator class, you can just write a few lines of code to achieve this task:
index_creator = VectorstoreIndexCreator(
vectorstore_cls=FAISS,
embedding=embeddings,
text_splitter=CharacterTextSplitter(chunk_size=500, chunk_overlap=0),
)
index = index_creator.from_loaders([loader])
Now we can use the index to search for relevant context and pass it to the LLM model to generate an accurate response:
index.query(question=question, llm=sm_llm)
Build a multi-functional chatbot with SageMaker
With the deployed LLM on SageMaker, we can build a multi-functional smart chatbot to show how these models can help your business build advanced AI-powered applications. In this example, the chatbot uses Streamlit to build the UI and the LangChain framework to chain together different components around LLMs. With the help of the text-to-text and speech-to-text LLMs deployed on SageMaker, this smart chatbot accepts inputs from text files and audio files so users can chat with the input files (accepts text and audio files) and further build applications on top of this. The following diagram shows the architecture of the chatbot.

When a user uploads a text file to the chatbot, the chatbot puts the content into the LangChain memory component and the user can chat with the uploaded document. This part is inspired by the following GitHub example that builds a document chatbot with SageMaker. We also add an option to allow users to upload audio files. Then the chatbot automatically invokes the speech-to-text model hosted on the SageMaker endpoint to extract the text content from the uploaded audio file and add the text content to the LangChain memory. Lastly, we allow the user to select the option to use the knowledge base when answering questions. This is the RAG capability shown in the preceding diagram. We have defined the SageMaker endpoints that are deployed in the notebooks provided in the previous sections. Note that you need to pass the actual endpoint names that are shown in your account when running the Streamlit app. You can find the endpoint names on the SageMaker console under Inference and Endpoints.
Falcon_endpoint_name = os.getenv(“falcon_ep_name”, default=”falcon-40b-instruct-12xl”)
whisper_endpoint_name = os.getenv(‘wp_ep_name’, default=”whisper-large-v2″)
embedding_endpoint_name = os.getenv(’embed_ep_name’, default=”huggingface-textembedding-gpt-j-6b”)
When the knowledge base option is not selected, we use the conversation chain, where we add the memory component using the ConversationBufferMemory provided by LangChain, so the bot can remember the current conversation history:
def load_chain():
memory = ConversationBufferMemory(return_messages=True)
chain = ConversationChain(llm=llm, memory=memory)
return chain

chatchain = load_chain()
We use similar logic as shown in the earlier section for the RAG component and add the document retrieval function to the code. For demo purposes, we load the transcribed text stored in SageMaker Studio local storage as a document source. You can implement other RAG solutions using the vector databases based on your choice, such as Amazon OpenSearch Service, Amazon RDS, Amazon Kendra, and more.
When users use the knowledge base for the question, the following code snippet retrieves the relevant contents from the database and provides additional context for the LLM to answer the question. We used the specific method provided by FAISS, similarity_search_with_score, when searching for relevant documents. This is because it can also provide the metadata and similarity score of the retrieved source file. The returned distance score is L2 distance. Therefore, a lower score is better. This gives us more options to provide more context for the users, such as providing the exact timestamps of the source videos that are relevant to the input query. When the RAG option is selected by the user from the UI, the chatbot uses the load_qa_chain function provided by LangChain to provide the answers based on the input prompt.
docs = docsearch.similarity_search_with_score(user_input)
contexts = []

for doc, score in docs:
print(f”Content: {doc.page_content}, Metadata: {doc.metadata}, Score: {score}”)
if score <= 0.9:
contexts.append(doc)
source.append(doc.metadata[‘source’].split(‘/’)[-1])
print(f”n INPUT CONTEXT:{contexts}”)
prompt_template = “””Use the following pieces of context to answer the question at the end. If you don’t know the answer, just say that you don’t know, don’t try to make up an answer.:nn{context}nnQuestion: {question}nHelpful Answer:”””

PROMPT = PromptTemplate(template=prompt_template, input_variables=[“context”, “question”])
chain = load_qa_chain(llm=llm, prompt=PROMPT)
result = chain({“input_documents”: contexts, “question”: user_input},
return_only_outputs=True)[“output_text”]

if len(source) != 0:
df = pd.DataFrame(source, columns=[‘knowledge source’])
st.data_editor(df)
Run the chatbot app
Now we’re ready to run the Streamlit app. Open a terminal in SageMaker Studio and navigate to the cloned GitHub repository folder. You need to install the required Python packages that are specified in the requirements.txt file. Run pip install -r requirements.txt to prepare the Python dependencies.
Then run the following command to update the endpoint names in the environment variables based on the endpoints deployed in your account accordingly. When you run the chatbot.py file, it automatically updates the endpoint names based on the environment variables.
export falcon_ep_name=<the falcon endpoint name deployed in your account>
export wp_ep_name=<the whisper endpoint name deployed in your account>
export embed_ep_name=<the embedding endpoint name deployed in your account>
streamlit run app_chatbot/chatbot.py –server.port 6006 –server.maxUploadSize 6

To access the Streamlit UI, copy your SageMaker Studio URL and replace lab? with proxy/[PORT NUMBER]/. For this post, we specified the server port as 6006, so the URL should look like https://<domain ID>.studio.<region>.sagemaker.aws/jupyter/default/proxy/6006/.
Replace domain ID and region with the correct value in your account to access the UI.
Chat with your audio file
In the Conversation setup pane, choose Browse files to select local text or audio files to upload to the chatbot. If you select an audio file, it will automatically invoke the speech-to-text SageMaker endpoint to process the audio file and present the transcribed text to the console, as shown in the following screenshot. You can continue asking questions about the audio file and the chatbot will be able to remember the audio content and respond to your queries based on the audio content.

Use the knowledge base for the Q&A
When you want to answer questions that require specific domain knowledge or use the knowledge base, select Use knowledge base. This lets the chatbot retrieve relevant information from the knowledge base built earlier (the vector database) to add additional context to answer the question. For example, when we ask the question “what is the recommended way to first customize a foundation model?” to the chatbot without the knowledge base, the chatbot returns an answer similar to the following screenshot.

When we use the knowledge base to help answer this question, the chatbot returns a different response. In the demo video, we read the SageMaker document about how to customize a model in SageMaker Jumpstart.

The output also provides the original video file name with the retrieved timestamp of the corresponding text. Users can go back to the original video file and locate the specific clips in the original videos.

This example chatbot demonstrates how businesses can use various types of digital assets to enhance their knowledge base and provide multi-functional assistance to their employees to improve productivity and efficiency. You can build the knowledge database from documents, audio and video datasets, and even image datasets to consolidate all the resources together. With SageMaker serving as an advanced ML platform, you accelerate project ideation to production speed with the breadth and depth of the SageMaker services that cover the whole ML lifecycle.
Clean up
To save costs, delete all the resources you deployed as part of the post. You can follow the provided notebook’s cleanup section to programmatically delete the resources, or you can delete any SageMaker endpoints you may have created via the SageMaker console.
Conclusion
The advent of generative AI models powered by LLMs has revolutionized the way businesses acquire and apply insights from information. Within this context, digital assets, including video and audio content, play a pivotal role as visual representations of products, services, and brand identity. Efficiently searching and discovering specific content within these assets is vital for optimizing workflows, enhancing collaboration, and delivering tailored experiences to the intended audience. With the power of generative AI models on SageMaker, businesses can unlock the full potential of their video and audio resources. The integration of generative AI models empowers enterprises to build efficient and intelligent search solutions, enabling users to access relevant and contextual information from their digital assets, and thereby maximizing their value and fostering business success in the digital landscape.
For more information on working with generative AI on AWS, refer to Announcing New Tools for Building with Generative AI on AWS.

About the authors
Gordon Wang is a Senior AI/ML Specialist TAM at AWS. He supports strategic customers with AI/ML best practices across many industries. He is passionate about computer vision, NLP, generative AI, and MLOps. In his spare time, he loves running and hiking.
Melanie Li is a Senior AI/ML Specialist TAM at AWS based in Sydney, Australia. She helps enterprise customers build solutions using state-of-the-art AI/ML tools on AWS and provides guidance on architecting and implementing ML solutions with best practices. In her spare time, she loves to explore nature and spend time with family and friends.
Guang Yang is a Senior Applied Scientist at the Amazon Generative AI Innovation Center, where he works with customers across various verticals and applies creative problem solving to generate value for customers with state-of-the-art generative AI solutions.
Harjyot Malik is a Senior Program Manager at AWS based in Sydney, Australia. He works with the APJC Enterprise Support teams and helps them build and deliver strategies. He collaborates with business teams, delving into complex problems to unearth innovative solutions that in return drive efficiencies for the business. In his spare time, he loves to travel and explore new places.

How Customers.ai Became a Fastest-Growing Inc. 5000 Company: 5 Things …

Posted on August 16, 2023 by i-genie

How do you know you’ve “made it”?

If you own a business or are part of a growing startup, you’ve probably asked yourself this.

Just before my first company, WordStream, was acquired for $150 million by Gannett Co, I observed 21 signs that your company is on its way to success — or maybe has already made it.

#1) Hiring employee #1. Every journey begins with a single step. (We’re at around 40 today!!)

#2) Series A Funding. Validation of your vision.

#3) Consistent Sales & Revenue!

I’m elated to share that Customers.ai made the 2023 Inc. 5000 list of fastest growing companies in America.

We ranked 1286 on the 2023 Inc 5000 list and 69 among technology companies, including AI tools!

The Inc 5000 is a revenue-growth based ranking industry award system – we send them our financials and that’s how they compute the fastest growing companies.

On one level, it’s always a surprise to get an honor like this. On the other hand, the signs were there all along.

This past year has felt like I’ve been riding a unicorn with a jet engine strapped to its belly.

We’ve rebranded and rolled out so many new totally game-changing features. We moved into an office. We’ve doubled our team. We decreased one company’s customer acquisition cost by 93%. We increased another company’s email engagement by 5.5x. We outgrew our office. We helped B2C businesses find millions of customers. And it feels like it’s just the beginning.

So, how did we do it?

Here are 5 critical things that helped us along our journey to a fastest-growing company.

1. Obsess over understanding your customers.

What happened to catapult our growth in the last 12 months? Laser focus on refining our Ideal Customer Profile. This is an iterative process.

From day 1, you have an ICP in mind. You will almost certainly be wrong, and that’s okay. Look at the data. Which customers are most successful? Which customers are struggling? Which customers get the value immediately? Which customers are dropping like flies?

Often, this reveals that your product isn’t exactly what you thought it was and your product isn’t for exactly who you thought it was. That’s okay! What matters is making strategic changes when you identify inefficiencies and constantly doing this. Your product changes. Your customers change. The economy changes. All of this means you have to pay constant, close attention to who your ICP is.

2. Fanatically listen to your customers

We were having success generating buckets of leads for our customers and our outreach automations were working great. But I thought we could do better than just offering two great services, so we listened really closely when our customers talked about what they needed.

Customers need better or different data? We built analytics right into the outreach automations.

Customers need more integrations? We partnered with Microsoft and Klaviyo. Customers need to write better emails? We added AI email composition.

Your customers know what they need from you. You just have to listen.

3. Go Boldly Outbound

One of the main benefits of knowing who your customers are and what they want? Marketing becomes way more focused and efficient.

In the last year we pressed hard on the gas of targeted outbound marketing to grow our leads and sales — unlocking our growth potential.

Inbound marketing is like fishing with a pole, waiting for your ideal prospect to swim by. Targeted outbound marketing is like hunting with a laser scope, your sights trained on your ideal prospect.

Because outbound outreach campaigns go directly to your target audience, they are 10-100X more cost effective than digital ads (where you may be spending thousands of dollars per lead).

Outbound email marketing, done in an intentional way that protects your domain and optimizes for deliverability, is the most direct and cost effective tool for finding new customers.

4. Make noise about your progress.

If a company creates an exceptional product or service in the forest but no one is there to hear it, does it make a sound? Either way, they won’t get any new customers.

So, make a sound!

When we add product features (or just think it’s been a while since we’ve highlighted core features), we make a point of telling everyone who will listen. From every microphone we’ve got.

We’re excited to tell our customers’ success stories, too. This helps future customers understand how their lives can be transformed and their goals achieved through our partnership.

5. Believe in your product.

The best way to get people to continue buying your product is to demonstrate how well it can work for them. The best way to do that? Give it to them for free.

Nobody likes giving valuable stuff away for free. But we give customers 50 free leads for a reason…because (to paraphrase the marketing team at FritoLay) we bet they can’t get just 50! Once they’ve seen how great your product is, they know they need it.

Experience is the most powerful tool in your belt, don’t be afraid to use it!

Grade Your Website Lead Conversion Rate Instantly

Get an instant website audit and 50 leads for free

Please enable JavaScript in your browser to complete this form.Website / URL *Grade my website

What’s next for us? Well I’m still looking out for one of my favorite signs of success…

Behold! The WordStream soda fridge. #amazing pic.twitter.com/GiUGtJHCGL— Larry Kim (@larrykim) January 22, 2014

The post How Customers.ai Became a Fastest-Growing Inc. 5000 Company: 5 Things You Can Do appeared first on Customers.ai.

Top 50 AI Writing Tools To Try (August 2023)

Posted on August 15, 2023 by i-genie

Grammarly

Grammarly is a great tool for enhancing writing. It reviews grammar, spelling, punctuation, and style to ensure clear and professional content.

Jasper

Jasper AI is one of the most popular AI writing tools that makes it easy to create content for websites, blogs, social media, etc.

ChatGPT

ChatGPT is a robust language generation model helpful for a range of writing tasks. It handles conversation generation, language translation, summarization, and more.

GPT-4

GPT-4 creates text closely resembling human writing, becoming a potent asset for writers. Many top AI writing tools are enhancing their software by incorporating GPT-4 technology.

Growthbar

Growthbar is an ideal tool for creating blog content that optimizes the same for SEO.

ClosersCopy

This AI writing tool can be used for an array of tasks – writing blog posts, social media content, creating presentations, writing books, etc.

Writesonic

Writesonic allows users to generate high-quality articles, blog posts, and more. It can write content in most of the world’s popular languages, like English, Spanish, French, etc.

Article Forge

Article Forge allows users to generate SEO-optimized, high-quality, unique content about any topic.

ParagraphAI

ParagraphAI is an AI writing app for iOS, Android, and Chrome that helps users write emails and articles better and faster.

Scalenut

Scalenut is a platform for content intelligence that helps users find and create content most relevant to their audience.

Content at Scale

This tool produces high-quality articles quickly while focusing on quality instead of quantity.

Copy AI

Copy AI is a content generator software that helps users get out of the writer’s block.

Frase.io

Frase helps users attract and convert more customers from organic search.

Rytr

Rtyr is a pocket-friendly tool that helps users instantly generate high-quality content.

AI Writer

Using this tool, users can generate accurate, relevant, and high-quality content just by entering a brief description of their topic.

Simplified

Simplified allows users to create content, scale their brands, and collaborate with their team.

Copymatic

Copymatic allows users to create unique, high-quality content like blog posts, landing pages, and digital ads.

Peppertype

Peppertype allows content marketers to generate content ideas instantly.

HiveMind

HiveMind is a tool that automates tasks like content writing, data extraction, and translation.

Anyword

Anyword is an AI writing assistant that makes it easy for users to include specific details, SEO keywords, and other important information.

Narrato

Narrato is a platform used for content creation and copywriting. It helps create briefs, assign tasks to writers via the Narrato Marketplace, and manage the content workflow.

WordAI

WordAI is an AI copywriting tool that enhances content production by rephrasing and restructuring text. Using its natural language generation tools, it can generate up to 1,000 SEO-friendly rewrites from a single piece of content.

Writerly

Writerly provides a generative AI Chrome extension that helps users extract ideas from articles during browsing and generates content briefs for writers.

NeuralText

NeuralText is an all-in-one AI copywriter, SEO content, and keyword research tool.

INK

With INK’s top-notch SEO Optimization and Semantic Intelligence, users can produce content more quickly than ever.

SEO.ai

SEO.ai leverages advanced AI tech to analyze keywords, create search intent-based articles, and enhance content for faster, superior search engine outcomes.

HubSpot

Businesses can swiftly create copy for various programs and channels using HubSpot’s free AI writer.

ProWritingAid

This tool is a grammar checker, style editor, and writing mentor.

Wordtune

Wordtune is an AI Writing assistant that works on websites like Facebook, Twitter, Gmail, and LinkedIn.

Writer

Writer is a platform designed for enterprises to help them create content consistent with their brand.

LongShot

LongShot is a Generative AI for the latest content, one-click blogs, and user-sourced content.

GetGenie

GetGenie AI provides an easy-to-use WordPress plugin that uses AI to replace over ten different apps.

Reword

Reword allows users to generate highly engaging and readable articles.

Outranking

Outranking.io allows users to plan, research, write, optimize, and track their content all in one place.

Hoppycopy

Hoppycopy is a copywriting tool that allows SEO marketers to create powerful and effective email marketing campaigns, newsletters, and more.

Lightkey

This tool provides live text predictions, spelling, and grammar fixes while typing in MS Office, Chrome, Edge and supports 85 languages.

SEOmater

SEOmater boosts content writers and SEO experts with features like keyword research, content optimization, competitor analysis, performance tracking, and detailed reporting.

AISEO.ai

AISEO combines AI and SEO to write SEO-optimized content.

Neuroflash

Neuroflash is an AI-powered marketing copy creation software that helps marketing teams create short and long-form copies.

TextCortex

TextCortex is a robust tool crafted to accommodate one’s distinct communication style and individual requirements.

Regie.ai

Regie tailors content using specific business and prospect data to serve enterprise sales teams. It swiftly creates personalized, optimized one-on-one sales emails and sequences through Generative AI.

TextExpander

TextExpander is a writing productivity tool that aids teams in handling repetitive writing tasks, forming snippets, fixing spellings, sharing content, and more.

Lyne.ai

Lyne allows users to send personalized cold emails at scale.

Shopia

Shopia offers 80+ writing templates to create various content pieces instantly.

Lavender

Lavender is a browser extension that merges AI writing, social data, and inbox productivity tools.

NexMind

NexMind swiftly produces optimized long and short-form content with NLP and semantic suggestions.

Benchmark Email

Benchmark Email makes it easy to create and send emails that keep users in touch with their audience.

Swiftbrief

Swiftbrief is a comprehensive tool for crafting top-quality briefs to guide content writers.

Cohesive AI

Cohesive AI is a Cloud-based AI writing assistant that aids businesses in crafting, editing, and overseeing marketing content like SEO blogs.

Quillbot

QuillBot excels at summarizing and paraphrasing documents and articles, preventing plagiarism.

Don’t forget to join our 28k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

The post Top 50 AI Writing Tools To Try (August 2023) appeared first on MarkTechPost.

ChatGPT with Eyes and Ears: BuboGPT is an AI Approach That Enables Vis …

Posted on August 15, 2023 by i-genie

Large Language Models (LLMs) have emerged as game changers in the natural language processing domain. They are becoming a key part of our daily lives. The most famous example of an LLM is ChatGPT, and it is safe to assume almost everybody knows about it at this point, and most of us use it on a daily basis.

LLMs are characterized by their huge size and capacity to learn from vast amounts of text data. This enables them to generate coherent and contextually relevant human-like text. These models are built based on deep learning architectures, such as GPT (Generative Pre-trained Transformer) and BERT (Bidirectional Encoder Representations from Transformers), which uses attention mechanisms to capture long-range dependencies in a language.

By leveraging pre-training on large-scale datasets and fine-tuning on specific tasks, LLMs have shown remarkable performance in various language-related tasks, including text generation, sentiment analysis, machine translation, and question-answering. As LLMs continue to improve, they hold immense potential to revolutionize natural language understanding and generation, bridging the gap between machines and human-like language processing.

On the other hand, some people thought LLMs were not using their full potential as they are limited to text input only. They have been working on extending the potential of LLMs beyond language. Some of the studies have successfully integrated LLMs with various input signals, such as images, videos, speech, and audio, to build powerful multi-modal chatbots.

Though, there is still a long way to go here as most of these models lack the understanding of the relationships between visual objects and other modalities. While visually-enhanced LLMs can generate high-quality descriptions, they do so in a black-box manner without explicitly relating to the visual context.

Establishing an explicit and informative correspondence between text and other modalities in multi-modal LLMs can enhance user experience and enable a new set of applications for these models. Let us meet with BuboGPT, which tackles this limitation.

BuboGPT is the first attempt to incorporate visual grounding into LLMs by connecting visual objects with other modalities. BuboGPT enables joint multi-modal understanding and chatting for text, vision, and audio by learning a shared representation space that aligns well with pre-trained LLMs.

Overview of BuboGPT. Source: https://arxiv.org/pdf/2307.08581.pdf

Visual grounding is not an easy task to achieve, so that plays a crucial part in BuboGPT’s pipeline. To achieve this, BuboGPT builds a pipeline based on a self-attention mechanism. This mechanism establishes fine-grained relations between visual objects and modalities.

The pipeline includes three modules: a tagging module, a grounding module, and an entity-matching module. The tagging module generates relevant text tags/labels for the input image, the grounding module localizes semantic masks or boxes for each tag, and the entity-matching module uses LLM reasoning to retrieve matched entities from the tags and image descriptions. By connecting visual objects and other modalities through language, BuboGPT enhances the understanding of multi-modal inputs.

Example of BuboGPT chat. Source: https://arxiv.org/pdf/2307.08581.pdf

To enable a multi-modal understanding of arbitrary combinations of inputs, BuboGPT employs a two-stage training scheme similar to Mini-GPT4. In the first stage, it uses ImageBind as the audio encoder, BLIP-2 as the vision encoder, and Vicuna as the LLM to learn a Q-former that aligns vision or audio features with language. In the second stage, it performs multi-modal instruct tuning on a high-quality instruction-following dataset.

The construction of this dataset is crucial for the LLM to recognize provided modalities and whether the inputs are well-matched. Therefore, BuboGPT builds a novel high-quality dataset with subsets for vision instruction, audio instruction, sound localization with positive image-audio pairs, and image-audio captioning with negative pairs for semantic reasoning. By introducing negative image-audio pairs, BuboGPT learns better multi-modal alignment and exhibits stronger joint understanding capabilities.

Check out the Paper, Github, and Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 28k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

The post ChatGPT with Eyes and Ears: BuboGPT is an AI Approach That Enables Visual Grounding in Multi-Modal LLMs appeared first on MarkTechPost.

Best AI Tools for YouTubers in 2023

Posted on August 15, 2023 by i-genie

VidIQ

VidIQ is an online service that provides video makers on YouTube with an arsenal of intelligently designed tools. VidIQ’s YouTube coach, powered by ChatGPT, gives you access to tailored instruction whenever needed. This cutting-edge function integrates VidIQ and ChatGPT to give producers the best possible guidance based on their YouTube channels. VidIQ acts as a production team, helping with content ideation and channel audits. An automated YouTube video title generator is included in the suite of tools. It utilizes YouTube’s best practices and views prediction algorithms to create attention-grabbing titles for your videos. AI powers VidIQ’s YouTube Channel Name Generator and helps users create unique and memorable names for their channels. VidIQ recommends using relevant keywords in your channel name, video titles, and descriptions for better search engine rankings.

Explore AI

Explore AI is a video search engine driven by AI and designed specifically for YouTube. The site features numerous helpful YouTube channels, high-quality podcasts, and famous people in the tech industry. Its overarching goal is to broaden the scope of searchable content, and its semantic search capabilities guarantee precise results. Exploring AI uses GPT-3 to provide in-depth summaries, accurate time stamps, and accurate transcripts of videos on YouTube. This set of tools improves content accessibility and comprehension to meet the demands of a wide range of users.

Vidds

Vidds is an AI-enhanced video editing platform for content creators of all stripes. The platform gives users access to numerous professional-grade, user-friendly AI-driven video editing tools. The ChatGPT-enabled idea generator makes up scripts for videos based on user-supplied keywords and phrases. The software may quickly and easily make finished videos on any topic the user chooses.

Rotor

Rotor is a user-friendly platform that streamlines the video production process for artists, allowing them to make high-quality films without becoming experts in video editing or production. Creators don’t have to be video editing or production experts to make professional-caliber videos fast and easily. Promoting one’s musical releases is facilitated by the simplicity with which individuals may create various attractive video materials. Rotor’s powerful engine analyzes your music and selected footage, speeding up the process of creating expertly cut films that fit your tunes.

Vidyo

Vidyo is an AI-powered tool that streamlines the process of making engaging short films for sharing on social media. With Vidyo, users can cut long movies into entertaining snippets well-suited for sharing on services like TikTok, Instagram, YouTube, LinkedIn, and Facebook. Its AI-driven technology lets users trim down long films and add commentary, templates, emoticons, and other attention-grabbing components to create short, compelling videos. The most interesting parts of videos are automatically selected, edited, and captioned by the algorithm.

Wave Video

Wave Video is an artificial intelligence-based video production and editing system. It provides a unified and intuitive environment for video production, modification, and enhancement. Live streaming, video editing, thumbnail creation, video hosting, video recording, and a media library are just some tools that users can access on the site. Users of Wave Video can scale, crop, and merge their films with ease. Text animations, stickers, transitions, and custom visual elements can elevate user-created videos. The software also allows users to contribute their subtitles and alter the system-generated ones to improve accessibility and user experience. Users also have the option to change the layout, add titles, and delete backdrops.

YouTube Summarized

YouTube Summarized is an AI-driven service that provides a quick and simple method for summarizing videos. A robust tool is included for making detailed notes when watching films on YouTube. It has an automatic summarizing tool that lets users write in-depth summaries of various YouTube videos and podcasts. With YouTube Summarized’s credit-based method, customers don’t have to pay regularly. This allows users to make the most of their credit balances in a way that best suits them, ultimately improving their experience. Users can try the platform risk-free because it has a 30-day money-back guarantee.

YT Summary

YT Summary is a chatbot that uses artificial intelligence to summarize videos on YouTube. This cutting-edge program highlights the most important points in a YouTube video, allowing viewers to save time and learn more effectively. Users can view summaries of videos on YouTube rather than having to watch the complete clip. The YT Summary’s Outline mode can be used to create concise overviews, key highlights, and full summaries. The platform also includes a Mind Map tool for summarizing YouTube videos, giving users a tree-like visual depiction of the video’s key points. YT Summary offers multilingual support, a lengthy video summary, an exportable summary, a history, and the ability to share are just a few of the additional features.

2Short AI

With artificial intelligence (AI), 2Short AI creates compelling YouTube videos. Using its facilities, users may splice together dynamic short snippets out of their longer movies, increasing their views and subscriber count by up to 10. These previews give potential subscribers a taste of what they may expect from your channel. Facial tracking takes center stage in 2Short AI, along with animated subtitles, brand presets, and powerful editing options. In addition, it allows for an infinite number of high-quality exports, multiple aspect ratios, and other cutting-edge options.

Lumen5

Lumen5 is an AI-driven platform for online video creation to increase brand awareness and consumer demand through scalable video content generation. Its intuitive design makes it simple for anybody on your team to create high-quality videos. Users may point, click, drag, and drop components to create unique and interesting videos. This makes it as easy to understand as putting together a PowerPoint presentation. Lumen5’s AI automatically creates engaging films from static blog entries or boring Zoom recordings. Notably, customers have access to a wide variety of editable templates that are geared toward ensuring consistent and efficient communication. Lumen5 can also easily transform textual content like blog entries and whitepapers into entertaining video presentations.

RunwayML

For artificial intelligence (AI) tools, RunwayML is without a peer. Its features completely floored me. Each new version is an improvement over the last. This platform uses artificial intelligence to provide users with access to numerous machine learning models useful in creating and editing video material. YouTubers may use sophisticated artificial intelligence techniques like style transfer, image synthesis, and object detection through its intuitive interface. If you’re a YouTube creator interested in experimenting with AI-powered visual effects and expanding your content’s boundaries, RunwayML is a fantastic tool.

Descript

YouTubers may make changes to their videos and music without effort using the AI-powered software Descript. Descript’s automatic transcription tool allows you to adjust the video by modifying text based on your recorded audio—YouTubers who wish to save time and energy. At the same time, editing will benefit greatly from this novel approach.

TubeBuddy

The TubeBuddy extension and mobile app provide YouTube creators with various AI-powered tools to enhance their videos and expand their channels. Increase your films’ discoverability and viewership using TubeBuddy’s useful tools, such as keyword research, video SEO, thumbnail creation, and split testing capabilities. YouTubers who wish to keep ahead will find the tool’s extensive analytics and competitor analysis invaluable.

Synthesia

Synthesia is an artificial intelligence-driven video production tool that eliminates the need for a camera or performers while making videos on YouTube. It uses artificial intelligence to construct avatars that look and act like actual people, allowing you to give them a voice and give them life. YouTubers who want to make explanation videos, tutorials, or product reviews but want to avoid dealing with filming and editing would love Synthesia. This is the perfect tool if you’re uncomfortable in front of the camera yet want to make talking head videos. Assign a script to an avatar you pick using this tool. And it uses AI to make a video of a talking head to watch.

Pictory.ai

Create brief, interesting video summaries from your long-form material using Pictory.ai, an AI-powered video summary tool for YouTubers. This software uses sophisticated AI algorithms to examine your video, pick out the most crucial segments, and produce a condensed version that stays true to the original. YouTubers looking to make teaser films, promotional content, or repurpose old videos for new platforms will find Pictory.ai invaluable. Its new AI capabilities include the ability to animate any blog article. If you provide the link to the blog post, Pictory will find appropriate music and other media to turn it into a film.

InVideo

Create high-quality videos in minutes with the help of artificial intelligence and the InVideo platform. InVideo streamlines the video production process by providing you with pre-made templates, automated scene construction, and smart video resizing so that you can concentrate on the content. Those who want to make professional-looking videos on YouTube with little to no work can use the platform’s extensive stock footage library and sophisticated editing tools. If you want to launch a YouTube channel without human hosts, InVideo may be the greatest AI tool. You’ll have everything at your disposal to launch a successful YouTube channel. Everything you need, from stock video to music to sound effects, is in a single, easy-to-use, cloud-based program.

Canva

Canva, a well-known graphic design app, has AI-powered tools that help you make beautiful thumbnails and channel art for your videos. Canva’s artificial intelligence (AI)-driven design suggestions and templates make producing high-quality graphics for your YouTube channel simple. If you’re starting and don’t have much expertise with more advanced tools like Photoshop, this is the tool for you. All the necessary AI templates for creating attention-grabbing thumbnails are included. You can customize those samples to fit your requirements.

Veed

If you have a YouTube channel dedicated to how-to or explanation videos, Veed is another platform worth checking out. A cloud-based video editing software, it streamlines the process of creating high-quality videos for businesses. Features such as auto-translation and subtitling, background noise removal, pause removal, cropping, cutting, transitions, custom branding, compression, background replacement with a new image or video, format conversion, and the application of various visual effects and audio visualizers are all hallmarks of these programs.

Filmora

Filmora’s impressive feature set includes numerous artistic filters. More than a thousand pre-made templates are available, and the platform also offers an automatic rhythm sync function. Filmora keeps adding new features, such as plug-ins for cinematic effects, motion graphics, and automatic sound and image sync. Overexposed or underexposed areas of your movie and jitters may be easily identified and removed using this application. Meanwhile, the speed ramping function provides six options for customizing the pace at which your video clips play. Masking, split-screen editing, title editing, keyframing, a screen recorder, motion tracking, and auto reframing are some other features available in Filmora.

Movavi

Movavi has been a go-to for video editors since 2004, thanks to its professional-grade filters and transitions. It’s the best way to give form to your imagination and broadcast it to the world. One of its best features is the ability to switch backdrops using a Chroma key, and another is its picture-in-picture mode, which displays the main screen from one device and an inset window from another, allowing for creative filters and voice-overs. Quickly process and render your video files for a polished and productive editing experience. Using the extensive built-in library of, for example, video clips, stickers, and music tracks, it can quickly and easily generate short movies from your videos and images. Polish your work by adding titles, filters, transitions, and effects. It smoothly repositioned video elements on-screen with animation.

The post Best AI Tools for YouTubers in 2023 appeared first on MarkTechPost.

Zero-shot and few-shot prompting for the BloomZ 176B foundation model …

Posted on August 15, 2023 by i-genie

Amazon SageMaker JumpStart is a machine learning (ML) hub offering algorithms, models, and ML solutions. With SageMaker JumpStart, ML practitioners can choose from a growing list of best performing and publicly available foundation models (FMs) such as BLOOM, Llama 2, Falcon-40B, Stable Diffusion, OpenLLaMA, Flan-T5/UL2, or FMs from Cohere and LightOn.
In this post and accompanying notebook, we demonstrate how to deploy the BloomZ 176B foundation model using the SageMaker Python simplified SDK in Amazon SageMaker JumpStart as an endpoint and use it for various natural language processing (NLP) tasks. You can also access the foundation models thru Amazon SageMaker Studio. The BloomZ 176B model, one of the largest publicly available models, is a state-of-the-art instruction-tuned model that can perform various in-context few-shot learning and zero-shot learning NLP tasks. Instruction tuning is a technique that involves fine-tuning a language model on a collection of NLP tasks using instructions. To learn more about instruction tuning, refer to Zero-shot prompting for the Flan-T5 foundation model in Amazon SageMaker JumpStart.
Zero-shot learning in NLP allows a pre-trained LLM to generate responses to tasks that it hasn’t been specifically trained for. In this technique, the model is provided with an input text and a prompt that describes the expected output from the model in natural language. Zero-shot learning is used in a variety of NLP tasks, such as the following:

Multilingual text and sentiment classification
Multilingual question and answering
Code generation
Paragraph rephrasing
Summarization
Common sense reasoning and natural language inference
Question answering
Sentence and sentiment classification
Imaginary article generation based on a title
Summarizing a title based on an article

Few-shot learning involves training a model to perform new tasks by providing only a few examples. This is useful where limited labeled data is available for training. Few-show learning is used in a variety of tasks, including the following:

Text summarization
Code generation
Name entity recognition
Question answering
Grammar and spelling correction
Product description and generalization
Sentence and sentiment classification
Chatbot and conversational AI
Tweet generation
Machine translation
Intent classification

About Bloom
The BigScience Large Open-science Open-access Multilingual (BLOOM) language model is a transformer-based large language model (LLM). BLOOM is an autoregressive LLM trained to continue text from a prompt on vast amounts of text data using industrial-scale computational resources. As such, it is able to output coherent text that is hardly distinguishable from text written by humans. BLOOM can also be instructed to perform text tasks it hasn’t been explicitly trained for by casting them as text generation tasks.
With its 176 billion parameters, BLOOM is able to generate text in 46 natural languages and 13 programming languages. For almost all of them, such as Spanish, French, and Arabic, BLOOM is the first language model with over 100 billion parameters ever created. Researchers can download, run, and study BLOOM to investigate the performance and behavior of recently developed LLMs down to their deepest internal operations.
Solution overview
In this post, we show how to use the state-of-the-art instruction-tuned BloomZ 176B model from Hugging Face for text generation. You can use the BloomZ 176B model with few-shot learning and zero-shot learning for many NLP tasks, without fine-tuning the model. There is no need to train a new model because models like BloomZ 176B have a significant number of parameters such that they can easily adapt to many contexts without being retrained. The BloomZ 176B model has been trained with a large amount of data, making to applicable for many general-purpose tasks.
The code for all the steps in this demo is available in the following notebook.
Instruction tuning
The size and complexity of LLMs have exploded in the last few years. LLMs have demonstrated remarkable capabilities in learning the semantics of natural language and producing human-like responses. Many recent LLMs are fine-tuned with a powerful technique called instruction tuning, which helps the model perform new tasks or generate responses to novel prompts without prompt-specific fine-tuning. An instruction-tuned model uses its understanding of related tasks or concepts to generate predictions to novel prompts. Because this technique doesn’t involve updating model weights, it avoids the time-consuming and computationally expensive process required to fine-tune a model for a new, previously unseen task.
Instruction tuning involves fine-tuning a language model on a collection of NLP tasks using instructions. In this technique, the model is trained to perform tasks by following textual instructions instead of specific datasets for each task. The model is fine-tuned with a set of input and output examples for each task, allowing the model to generalize to new tasks that it hasn’t been explicitly trained on as long as prompts are provided for the tasks. Instruction tuning helps improve the accuracy and effectiveness of models and is helpful in situations where large datasets aren’t available for specific tasks.
Prompt engineering for zero-shot and few-shot NLP tasks on BLOOM models
Prompt engineering deals with creating high-quality prompts to guide the model towards the desired responses. Prompts need to be designed based on the specific task and dataset being used. The goal here is to provide the model with necessary information to generate high-quality responses while minimizing noise. This could involve keywords, additional contexts, questions, and more.
A well-designed prompt can make the model more creative and generalized so that it can easily adapt to new tasks. Prompts can also help incorporate domain knowledge on specific tasks and improve interpretability. Prompt engineering can greatly improve the performance of zero-shot and few-shot learning models. Creating high-quality prompts requires careful consideration of the task at hand, as well as a deep understanding of the model’s strengths and limitations.
In the following table, the BloomZ 176B model is used to generate responses for various zero-shot and few-shot NLP tasks. The first column shows the task, the second column contains the prompt provided to the model (where the template text is bold and the non-bold text is the example input) and the third column is the response from the model when queried against the prompt
For all the following tasks, we set the hyperparameters as 600 to be the max_length, num_return_sequences, top_k to be 250, top_p to be 0.95, and no sampling.

Task
Prompt Input
Output

Zero-shot learning tasks

Multilingual text or sentiment classification (Chinese to English)
一个传奇的开端，一个不灭的神话，这不仅仅是一部电影，而是作为一个走进新时代的标签，永远彪炳史册。Would you rate the previous review as positive, neutral or negative?
positive

Multilingual question answering (English to Chinese)
Explain to me in Traditional Chinese what is the backpropagation
梯度反向传播算法是一种机器学习算法，用于训练人工神经网络。该算法使用梯度下降来最小化损失函数。

Code generation
Write a binary search tree with O(log(n)) computational complexity. {code_start}
def binary_search(a, x): low = 0 high = len(a) – 1 while low <= high: mid = (low + high) // 2 if a[mid] < x: low = mid + 1 elif a[mid] > x: high = mid – 1 else: return mid return -1′ … else: return mid return low’ Output is truncated

Paragraph rephrase
Amazon Web Services (AWS) has announced nine major new updates for its cloud-based machine learning platform, SageMaker. SageMaker aims to provide a machine learning service which can be used to build, train, and deploy ML models for virtually any use case. During this year’s re:Invent conference, AWS made several announcements to further improve SageMaker’s capabilities.nnHow would you rephrase that briefly using English?,Amazon Web Services (AWS) has announced nine major new updates for its cloud-based machine learning platform, SageMaker. SageMaker aims to provide a machine learning service which can be used to build, train, and deploy ML models for virtually any use case. During this year’s re:Invent conference, AWS made several announcements to further improve SageMaker’s capabilities. nThe above sentence is very complicated. Please provide me a simplified synonymous version consisting of multiple sentences:
Amazon SageMaker gets a major update Amazon Web Services (AWS) has announced nine major new updates for its cloud-based machine learning platform, SageMaker. SageMaker aims to provide a machine learning service that can be used to build, train and deploy ML models for virtually any use case

Summarization
Amazon Comprehend uses natural language processing (NLP) to extract insights about the content of documents. It develops insights by recognizing the entities, key phrases, language, sentiments, and other common elements in a document. Use Amazon Comprehend to create new products based on understanding the structure of documents. For example, using Amazon Comprehend you can search social networking feeds for mentions of products or scan an entire document repository for key phrases. You can access Amazon Comprehend document analysis capabilities using the Amazon Comprehend console or using the Amazon Comprehend APIs. You can run real-time analysis for small workloads or you can start asynchronous analysis jobs for large document sets. You can use the pre-trained models that Amazon Comprehend provides, or you can train your own custom models for classification and entity recognition. All of the Amazon Comprehend features accept UTF-8 text documents as the input. In addition, custom classification and custom entity recognition accept image files, PDF files, and Word files as input. Amazon Comprehend can examine and analyze documents in a variety of languages, depending on the specific feature. For more information, see Languages supported in Amazon Comprehend. Amazon Comprehend’s Dominant language capability can examine documents and determine the dominant language for a far wider selection of languages.nn===nWrite a summary of the previous text in English:
Amazon Comprehend uses natural language processing (NLP) to extract insights about the content of documents. It develops insights by recognizing the entities, key phrases, language, sentiments, and other common elements in a document. Use Amazon Comprehend to create new products based on understanding the structure of documents. For example, using Amazon Comprehend you can search social networking feeds for mentions of products or scan an entire document repository for key phrases. You can access Amazon Comprehend document analysis capabilities using the Amazon Comprehend console or using the Amazon Comprehend APIs. You can run real-time analysis for small workloads or you can start asynchronous analysis jobs for large document sets. You can use the pre-trained models that Amazon Comprehend provides, or you can train your own custom models for classification and entity recognition. All of the Amazon Comprehend features accept UTF-8 text documents as the input. In addition, custom classification and custom entity recognition accept image files, PDF files, and Word files as input. Amazon Comprehend can examine and analyze documents in a variety of languages, depending on the specific feature. For more information, see Languages supported in Amazon Comprehend. Amazon Comprehend’s Dominant language capability can examine documents and determine the dominant language for a far wider selection of languages.

Common sense reasoning and natural language inference
Given that The world cup has kicked off in Los Angeles, United States. Does it follow that The world cup takes place in United States. Yes or no?, The world cup has kicked off in Los Angeles, United States. Using only the above description and what you know about the world, is “The world cup takes place in United States” definitely correct? Yes or no?,
Yes

Question answering
question: what are the key features of new Kindle?\nnContext: The newest and most innovative Kindle yet lets you take notes on millions of books and documents, write lists and journals, and more. For readers who have always wished they could write in their eBooks, Amazon’s new Kindle lets them do just that. The Kindle Scribe is the first Kindle for reading and writing and allows users to supplement their books and documents with notes, lists, and more. Here’s everything you need to know about the Kindle Scribe, including frequently asked questions. The Kindle Scribe makes it easy to read and write like you would on paper The Kindle Scribe features a 10.2-inch, glare-free screen (the largest of all Kindle devices), crisp 300 ppi resolution, and 35 LED front lights that automatically adjust to your environment. Further personalize your experience with the adjustable warm light, font sizes, line spacing, and more. It comes with your choice of the Basic Pen or the Premium Pen, which you use to write on the screen like you would on paper. They also attach magnetically to your Kindle and never need to be charged. The Premium Pen includes a dedicated eraser and a customizable shortcut button. The Kindle Scribe has the most storage options of all Kindle devices: choose from 8 GB, 16 GB, or 32 GB to suit your level of reading and writing. \nnAnswer
take notes on millions of books and documents, write lists and journals, and more

Sentence and sentiment classification
Review:nThis moive is so great and once again dazzles and delights usnIs this movie review sentence negative or positive?nOPTIONS:n-positive n-negative
Positive

Imaginary article generation based on title
Title: “University has new facility coming up”\nGiven the above title of an imaginary article, imagine the article.\n”””
The £1.2m project will see the creation of a new teaching and learning centre at the University of the Highlands and Islands (UHI) in Inverness. It will be built on the site of the former Inverness College of Commerce. UHI said the new facility would be used for teaching and learning, as well as student support and administration. The university said it would also be used for conferences and events. The new building will be built on the site of the former Inverness College of Commerce The university said the new building would be built on the site of the former Inverness College of Commerce, which closed in 2013. It said the new facility would be used for teaching and learning, as well as student support and administration. It would also be used for conferences and events. UHI said the new facility would be built on the site of the former Inverness College of Commerce The university said the new building would be built on the site of the former Inverness College of Commerce, which closed in 2013. It said the new facility would be used for teaching and learning, as well as student support and administration. … It would also be used for conferences and events. UHI said the new facility would be built on the site of the former Inverness…

Summarizing a title based on the article
‘The newest and most innovative Kindle yet lets you take notes on millions of books and documents, write lists and journals, and more. For readers who have always wished they could write in their eBooks, Amazon’s new Kindle lets them do just that. The Kindle Scribe is the first Kindle for reading and writing and allows users to supplement their books and documents with notes, lists, and more. Here’s everything you need to know about the Kindle Scribe, including frequently asked questions. The Kindle Scribe makes it easy to read and write like you would on paper The Kindle Scribe features a 10.2-inch, glare-free screen (the largest of all Kindle devices), crisp 300 ppi resolution, and 35 LED front lights that automatically adjust to your environment. Further personalize your experience with the adjustable warm light, font sizes, line spacing, and more. It comes with your choice of the Basic Pen or the Premium Pen, which you use to write on the screen like you would on paper. They also attach magnetically to your Kindle and never need to be charged. The Premium Pen includes a dedicated eraser and a customizable shortcut button. The Kindle Scribe has the most storage options of all Kindle devices: choose from 8 GB, 16 GB, or 32 GB to suit your level of reading and writing. nn \n\nGive me a good title for the article above.
Amazon’s Kindle Scribe: Everything you need to know

Few-shot learning tasks

Summarization
[Original]: Amazon scientists, in collaboration with researchers from the University of Sheffield, are making a large-scale fact extraction and verification dataset publicly available for the first time. The dataset, comprising more than 185,000 evidence-backed claims, is being made available to hopefully catalyze research and development that addresses the problems of fact extraction and verification in software applications or cloud-based services that perform automatic information extraction. [Summary]: Amazon and University researchers make fact extraction and verification dataset publicly available. ### [Original]: Prime members in the U.S. can get even more delivered to their door with a Prime membership. Members can now enjoy one year of Grubhub+ valued at $9.99 per month for free—at no added cost to their Prime membership. To activate this deal, visit amazon.com/grubhub. This new offer includes unlimited, $0 food delivery fees on orders over $12 as well as exclusive perks for Grubhub+ members and rewards like free food and order discounts. Plus, diners can “eat good while doing good” by opting into Grubhub’s Donate the Change program, a donation-matching initiative that raised more than $25 million in 2021 alone, benefiting more than 20 charitable organizations across the country. [Summary]: Prime members in the U.S. can enjoy one year of Grubhub+ for free, with no food-delivery fees on eligible orders. ### [Original]: Amazon scientists, in collaboration with researchers from the University of Sheffield, are making a large-scale fact extraction and verification dataset publicly available for the first time. The dataset, comprising more than 185,000 evidence-backed claims, is being made available to hopefully catalyze research and development that addresses the problems of fact extraction and verification in software applications or cloud-based services that perform automatic information extraction. [Summary]:
[Summary]: Amazon and University researchers make fact extraction and verification dataset publicly available.

Code generation
description: a orange button that says stop code: <button style=color:white; background-color:orange;>Stop</button> ### description: a blue box that contains yellow circles with red borders code: <div style=background-color: blue; padding: 20px;><div style=background-color: yellow; border: 5px solid red; border-radius: 50%; padding: 20px; width: 100px; height: 100px;> ### description: a Headline saying Welcome to AI code:
code: <h1>Welcome to AI</h1>’

Name entity recognition
[Text]: Fred is a serial entrepreneur. Co-founder and CEO of Platform.sh, he previously co-founded Commerce Guys, a leading Drupal ecommerce provider. His mission is to guarantee that as we continue on an ambitious journey to profoundly transform how cloud computing is used and perceived, we keep our feet well on the ground continuing the rapid growth we have enjoyed up until now. [Name]: Fred [Position]: Co-founder and CEO [Company]: Platform.sh ### [Text]: Microsoft (the word being a portmanteau of “microcomputer software”) was founded by Bill Gates on April 4, 1975, to develop and sell BASIC interpreters for the Altair 8800. Steve Ballmer replaced Gates as CEO in 2000, and later envisioned a “devices and services” strategy. [Name]: Steve Ballmer [Position]: CEO [Company]: Microsoft ### [Text]: Franck Riboud was born on 7 November 1955 in Lyon. He is the son of Antoine Riboud, the previous CEO, who transformed the former European glassmaker BSN Group into a leading player in the food industry. He is the CEO at Danone. [Name]: Franck Riboud [Position]: CEO [Company]: Danone ### [Text]: David Melvin is an investment and financial services professional at CITIC CLSA with over 30 years’ experience in investment banking and private equity. He is currently a Senior Adviser of CITIC CLSA.
[Name]: Fred [Position]: Co-founder and CEO [Company]: Platform.sh

Question answering
Context: NLP Cloud was founded in 2021 when the team realized there was no easy way to reliably leverage Natural Language Processing in production. Question: When was NLP Cloud founded? Answer: 2021 ### Context: NLP Cloud developed their API by mid-2020 and they added many pre-trained open-source models since then. Question: What did NLP Cloud develop? Answer: API ### Context: All plans can be stopped anytime. You only pay for the time you used the service. In case of a downgrade, you will get a discount on your next invoice. Question: When can plans be stopped? Answer: Anytime ### Context: The main challenge with GPT-J is memory consumption. Using a GPU plan is recommended. Question: Which plan is recommended for GPT-J? Answer:”””
Answer: GPU plan’

Grammar and spelling correction
I love goin to the beach. Correction: I love going to the beach. ### Let me hav it! Correction: Let me have it! ### It have too many drawbacks. Correction: It has too many drawbacks. ### I do not wan to go Correction:
Correction: I don’t want to go

Product description and generalization
Generate a product description out of keywords. Keywords: shoes, women, $59 Sentence: Beautiful shoes for women at the price of $59. ### Keywords: trousers, men, $69 Sentence: Modern trousers for men, for $69 only. ### Keywords: gloves, winter, $19 Sentence: Amazingly hot gloves for cold winters, at $19. ### Keywords: t-shirt, men, $39 Sentence:
Sentence: Cool t-shirt for men, for $39 only.

Sentence and sentiment classification
Message: When the spaceship landed on Mars, the whole humanity was excited Topic: space ### Message: I love playing tennis and golf. I’m practicing twice a week. Topic: sport ### Message: Managing a team of sales people is a tough but rewarding job. Topic: business ### Message: I am trying to cook chicken with tomatoes. Topic:
Topic: food

Chatbot and conversational AI
This is a discussion between a [human] and a [robot]. The [robot] is very nice and empathetic. [human]: Hello nice to meet you. [robot]: Nice to meet you too. ### [human]: How is it going today? [robot]: Not so bad, thank you! How about you? ### [human]: I am ok, but I am a bit sad… [robot]: Oh? Why that? ### [human]: I caught flu [robot]:
[robot]: Oh no, I am sorry to hear that!

Tweet generation
keyword: markets tweet: Take feedback from nature and markets, not from people ### keyword: children tweet: Maybe we die so we can come back as children. ### keyword: startups tweet: Startups should not worry about how to put out fires, they should worry about how to start them. ### keyword: nlp tweet:
tweet: NLP is a tool, not a goal.

Machine translation
Hugging Face a révolutionné le NLP. Translation: Hugging Face revolutionized NLP. ### Cela est incroyable! Translation: This is unbelievable! ### Désolé je ne peux pas. Translation: Sorry but I cannot. ### NLP Cloud permet de deployer le NLP en production facilement. Translation:
Translation: NLP Cloud makes it easy to deploy NLP in production.

Intent classification
I want to start coding tomorrow because it seems to be so fun! Intent: start coding ### Show me the last pictures you have please. Intent: show pictures ### Search all these files as fast as possible. Intent: search files ### Can you please teach me Chinese next week? Intent:
Intent: teach me chinese

Access the BloomZ 176B instruction-tuned model in SageMaker
SageMaker JumpStart provides two ways to get started using these instruction-tuned Bloom models: Amazon SageMaker Studio and the SageMaker SDK. The following sections illustrate what each of these options look like and how to access them.
Access the model with the simplified SageMaker JumpStart SDK
The simplified SageMaker JumpStart SDK facilitates training and deploying built-in SageMaker JumpStart models with a couple lines of code. This gives you access to the entire library of SageMaker JumpStart models, including the latest foundation models and image generation models, without having to supply any inputs besides the model ID.
You can take advantage of the model-specific default values we provide to specify the configuration, such as the Docker image, ML instance type, model artifact location, and hyperparameters, among other fields. These attributes are only default values; you can override them and retain granular control over the AWS models you create. As a result of these changes, the effort to write Python workflows to deploy and train SageMaker JumpStart models has been reduced, enabling you to spend more time on the tasks that matter. This feature is available in all Regions where JumpStart is supported, and can be accessed with the SageMaker Python SDK version 2.154.0 or later.
You can programmatically deploy an endpoint through the SageMaker SDK. You will need to specify the model ID of your desired model in the SageMaker model hub and the instance type used for deployment. The model URI, which contains the inference script, and the URI of the Docker container are obtained through the SageMaker SDK. These URIs are provided by SageMaker JumpStart and can be used to initialize a SageMaker model object for deployment.
Deploy the model and query the endpoint
This notebook requires ipywidgets. Install ipywidgets and then use the execution role associated with the current notebook as the AWS account role with SageMaker access.
Choose the pre-trained model
We choose the bloomz-176b-fp16 pre-trained model:

model_id = “huggingface-textgeneration1-bloomz-176b-fp16″

The notebook in the following sections uses BloomZ 176B as an example. For a complete list of SageMaker pre-trained models, refer to Built-in Algorithms with pre-trained Model Table.
Retrieve artifacts and deploy an endpoint
With SageMaker, we can perform inference on the pre-trained model without fine-tuning it first on a new dataset. We start by retrieving the deploy_image_uri, deploy_source_uri, and model_uri for the pre-trained model. To host the pre-trained model, we create an instance of sagemaker.model.Model and deploy it. This may take a few minutes.
Now we can deploy the model using the simplified SageMaker JumpStart SDK with the following lines of code:

from sagemaker.jumpstart.model import JumpStartModel

#if no instance for this model id is available, use a smaller id
model = JumpStartModel(model_id=model_id)

# ml.p4de.24xlarge is used by default. You can add the kwarg
# instance_type to change this setting.
predictor = model.deploy()

endpoint_name = predictor.endpoint_name

We use SageMaker large model inference (LMI) containers to host the BloomZ 176B model. LMI is an AWS-built LLM software stack (container) that offers easy-to-use functions and performance gain on generative AI models. It’s embedded with model parallelism, compilation, quantization, and other stacks to speed up inference. For details, refer to Deploy BLOOM-176B and OPT-30B on Amazon SageMaker with large model inference Deep Learning Containers and DeepSpeed.
Note that deploying this model requires a p4de.24xlarge instance and the deployment usually takes about 1 hour. If you don’t have quota for that instance, request a quota increate on the AWS Service Quotas console.
Query the endpoint and parse the response using various parameters to control the generated text
The input to the endpoint is any string of text formatted as JSON and encoded in utf-8 format. The output of the endpoint is a JSON file with generated text.
In the following example, we provide some sample input text. You can input any text and the model predicts the next words in the sequence. Longer sequences of text can be generated by calling the model repeatedly. The following code shows how to invoke an endpoint with these arguments:

from sagemaker.predictor import retrieve_default

predictor = retrieve_default(model_id=model_id, model_version=”*”, endpoint_name=endpoint=name)
response = predictor.predict(“How to make a pasta?”)
print(response[“generated_text”])

We get the following output:
[‘How to make a pasta? boil a pot of water and add salt. Add the pasta to the water and cook until al dente. Drain the pasta.’]
Access the model in SageMaker Studio
You can also access these models through the JumpStart landing page in Studio. This page lists available end-to-end ML solutions, pre-trained models, and example notebooks.
At the time of publishing the post, BloomZ 176B is only available in the us-east-2 Region.

You can choose the BloomZ 176B model card to view the notebook.

You can then import the notebook to run the notebook further.

Clean up
To avoid ongoing charges, delete the SageMaker inference endpoints. You can delete the endpoints via the SageMaker console or from the SageMaker Studio notebook using the following commands:
predictor.delete_model() predictor.delete_endpoint()
Conclusion
In this post, we gave an overview of the benefits of zero-shot and few-shot learning and described how prompt engineering can improve the performance of instruction-tuned models. We also showed how to easily deploy an instruction-tuned BloomZ 176B model from SageMaker JumpStart and provided examples to demonstrate how you can perform different NLP tasks using the deployed BloomZ 176B model endpoint in SageMaker.
We encourage you to deploy a BloomZ 176B model from SageMaker JumpStart and create your own prompts for NLP use cases.
To learn more about SageMaker JumpStart, check out the following:

Zero-shot prompting for the Flan-T5 foundation model in Amazon SageMaker JumpStart
Run text generation with Bloom and GPT models on Amazon SageMaker JumpStart
Generate images from text with the stable diffusion model on Amazon SageMaker JumpStart
Run image segmentation with Amazon SageMaker JumpStart
Run text classification with Amazon SageMaker JumpStart using TensorFlow Hub and Hugging Face models
Amazon SageMaker JumpStart models and algorithms now available via API
Incremental training with Amazon SageMaker JumpStart
Transfer learning for TensorFlow object detection models in Amazon SageMaker
Transfer learning for TensorFlow text classification models in Amazon SageMaker
Transfer learning for TensorFlow image classification models in Amazon SageMaker

About the Authors
Rajakumar Sampathkumar is a Principal Technical Account Manager at AWS, providing customers guidance on business-technology alignment and supporting the reinvention of their cloud operation models and processes. He is passionate about cloud and machine learning. Raj is also a machine learning specialist and works with AWS customers to design, deploy, and manage their AWS workloads and architectures.
Dr. Xin Huang is an Applied Scientist for Amazon SageMaker JumpStart and Amazon SageMaker built-in algorithms. He focuses on developing scalable machine learning algorithms. His research interests are in the area of natural language processing, explainable deep learning on tabular data, and robust analysis of non-parametric space-time clustering. He has published many papers in ACL, ICDM, KDD conferences, and Royal Statistical Society: Series A journal.
Evan Kravitz is a software engineer at Amazon Web Services, working on SageMaker JumpStart. He enjoys cooking and going on runs in New York City.

Build production-ready generative AI applications for enterprise searc …

Posted on August 15, 2023 by i-genie

This blog post is co-written with Tuana Çelik from deepset.
Enterprise search is a critical component of organizational efficiency through document digitization and knowledge management. Enterprise search covers storing documents such as digital files, indexing the documents for search, and providing relevant results based on user queries. With the advent of large language models (LLMs), we can implement conversational experiences in providing the results to users. However, we need to ensure that the LLMs limit the responses to company data, thereby mitigating model hallucinations.
In this post, we showcase how to build an end-to-end generative AI application for enterprise search with Retrieval Augmented Generation (RAG) by using Haystack pipelines and the Falcon-40b-instruct model from Amazon SageMaker JumpStart and Amazon OpenSearch Service. The source code for the sample showcased in this post is available in the GitHub repository
Solution overview
To restrict the generative AI application responses to company data only, we need to use a technique called Retrieval Augmented Generation (RAG). An application using the RAG approach retrieves information most relevant to the user’s request from the enterprise knowledge base or content, bundles it as context along with the user’s request as a prompt, and then sends it to the LLM to get a response. LLMs have limitations around the maximum word count for the input prompts, so choosing the right passages among thousands or millions of documents in the enterprise has a direct impact on the LLM’s accuracy.
The RAG technique has become increasingly important in enterprise search. In this post, we show a workflow that takes advantage of SageMaker JumpStart to deploy a Falcon-40b-instruct model and uses Haystack to design and run a retrieval augmented question answering pipeline. The final retrieval augmentation workflow covers the following high-level steps:

The user query is used for a retriever component, which does a vector search, to retrieve the most relevant context from our database.
This context is embedded into a prompt that is designed to instruct an LLM to generate an answer only from the provided context.
The LLM generates a response to the original query by only considering the context embedded into the prompt it received.

SageMaker JumpStart
SageMaker JumpStart serves as a model hub encapsulating a broad array of deep learning models for text, vision, audio, and embedding use cases. With over 500 models, its model hub comprises both public and proprietary models from AWS’s partners such as AI21, Stability AI, Cohere, and LightOn. It also hosts foundation models solely developed by Amazon, such as AlexaTM. Some of the models offer capabilities for you to fine-tune them with your own data. SageMaker JumpStart also provides solution templates that set up infrastructure for common use cases, and executable example notebooks for machine learning (ML) with SageMaker.
Haystack
Haystack is an open-source framework by deepset that allows developers to orchestrate LLM applications made up of different components like models, vector DBs, file converters, and countless other modules. Haystack provides pipelines and Agents, two powerful structures for designing LLM applications for various use cases including search, question answering, and conversational AI. With a big focus on state-of-the art retrieval methods and solid evaluation metrics, it provides you with everything you need to ship a reliable, trustworthy application. You can serialize pipelines to YAML files, expose them via a REST API, and scale them flexibly with your workloads, making it easy to move your application from a prototype stage to production.
Amazon OpenSearch
OpenSearch Service is a fully managed service that makes it simple to deploy, scale, and operate OpenSearch in the AWS Cloud. OpenSearch is a scalable, flexible, and extensible open-source software suite for search, analytics, security monitoring, and observability applications, licensed under the Apache 2.0 license.
In recent years, ML techniques have become increasingly popular to enhance search. Among them are the use of embedding models, a type of model that can encode a large body of data into an n-dimensional space where each entity is encoded into a vector, a data point in that space, and organized such that similar entities are closer together. A vector database provides efficient vector similarity search by providing specialized indexes like k-NN indexes.
With the vector database capabilities of OpenSearch Service, you can implement semantic search, RAG with LLMs, recommendation engines, and search rich media. In this post, we use RAG to enable us to complement generative LLMs with an external knowledge base that is typically built using a vector database hydrated with vector-encoded knowledge articles.
Application overview
The following diagram depicts the structure of the final application.

In this application, we use the Haystack Indexing Pipeline to manage uploaded documents and index documents and the Haystack Query Pipeline to perform knowledge retrieval from indexed documents.
The Haystack Indexing Pipeline includes the following high-level steps:

Upload a document.
Initialize DocumentStore and index documents.

We use OpenSearch as our DocumentStore and a Haystack indexing pipeline to preprocess and index our files to OpenSearch. Haystack FileConverters and PreProcessor allow you to clean and prepare your raw files to be in a shape and format that your natural language processing (NLP) pipeline and language model of choice can deal with. The indexing pipeline we’ve used here also uses sentence-transformers/all-MiniLM-L12-v2 to create embeddings for each document, which we use for efficient retrieval.
The Haystack Query Pipeline includes the following high-level steps:

We send a query to the RAG pipeline.
An EmbeddingRetriever component acts as a filter that retrieves the most relevant top_k documents from our indexed documents in OpenSearch. We use our choice of embedding model to embed both the query and the documents (at indexing) to achieve this.
The retrieved documents are embedded into our prompt to the Falcon-40b-instruct model.
The LLM returns with a response that is based on the retrieved documents.

For model deployment, we use SageMaker JumpStart, which simplifies deploying models through a simple push of a button. Although we’ve used and tested Falcon-40b-instruct for this example, you may use any Hugging Face model available on SageMaker.
The final solution is available on the haystack-sagemaker repository and uses the OpenSearch website and documentation (for OpenSearch 2.7) as our example data to perform retrieval augmented question answering on.
Prerequisites
The first thing to do before we can use any AWS services is to make sure we have signed up for and created an AWS account. Then you should create an administrative user and group. For instructions on both steps, refer to Set Up Amazon SageMaker Prerequisites.
To be able to use the Haystack, you’ll have to install the farm-haystack package with the required dependencies. To accomplish this, use the requirements.txt file in the GitHub repository by running pip install requirements.txt.
Index documents to OpenSearch
Haystack offers a number of connectors to databases, which are called DocumentStores. For this RAG workflow, we use the OpenSearchDocumentStore. The example repository includes an indexing pipeline and AWS CloudFormation template to set up an OpenSearchDocumentStore with documents crawled from the OpenSearch website and documentation pages.
Often, to get an NLP application working for production use cases, we end up having to think about data preparation and cleaning. This is covered with Haystack indexing pipelines, which allows you to design your own data preparation steps, which ultimately write your documents to the database of your choice.
An indexing pipeline may also include a step to create embeddings for your documents. This is highly important for the retrieval step. In our example, we use sentence-transformers/all-MiniLM-L12-v2 as our embedding model. This model is used to create embeddings for all our indexed documents, but also the user’s query at query time.
To index documents into the OpenSearchDocumentStore, we provide two options with detailed instructions in the README of the example repository. Here, we walk through the steps for indexing to an OpenSearch service deployed on AWS.
Start an OpenSearch service
Use the provided CloudFormation template to set up an OpenSearch service on AWS. By running the following command, you’ll have an empty OpenSearch service. You can then either choose to index the example data we’ve provided or use your own data, which you can clean and preprocess using the Haystack Indexing Pipeline. Note that this creates an instance that is open to the internet, which is not recommended for production use.

“`bash
aws cloudformation create-stack –stack-name HaystackOpensearch –template-body file://cloudformation/opensearch-index.yaml –parameters ParameterKey=InstanceType,ParameterValue=r5.large.search ParameterKey=InstanceCount,ParameterValue=3 ParameterKey=OSPassword,ParameterValue=Password123!
“`

Allow approximately 30 minutes for the stack launch to complete. You can check its progress on the AWS CloudFormation console by navigating to the Stacks page and looking for the stack named HaystackOpensearch.
Index documents into OpenSearch
Now that we have a running OpenSearch service, we can use the OpenSearchDocumentStore class to connect to it and write our documents to it.
To get the hostname for OpenSearch, run the following command:

“`bash
aws cloudformation describe-stacks –stack-name HaystackOpensearch –query “Stacks[0].Outputs[?OutputKey==’OpenSearchEndpoint’].OutputValue” –output text
“`

First, export the following:

“`bash
export OPENSEARCH_HOST=’your_opensearch_host’
export OPENSEARCH_PORT=443
export OPENSEARCH_USERNAME=admin
export OPENSEARCH_PASSWORD=Password123!
“`

Then, you can use the opensearch_indexing_pipeline.py script to preprocess and index the provided demo data.
If you would like to use your own data, modify the indexing pipeline in opensearch_indexing_pipeline.py to include the FileConverter and PreProcessor setup steps you require.
Implement the retrieval augmented question answering pipeline
Now that we have indexed data in OpenSearch, we can perform question answering on these documents. For this RAG pipeline, we use the Falcon-40b-instruct model that we’ve deployed on SageMaker JumpStart.
You also have the option of deploying the model programmatically from a Jupyter notebook. For instructions, refer to the GitHub repo.

Search for the Falcon-40b-instruct model on SageMaker JumpStart.
Deploy your model on SageMaker JumpStart, and take note of the endpoint name.
Export the following values:

“`bash
export SAGEMAKER_MODEL_ENDPOINT=your_falcon_40b_instruc_endpoint
export AWS_PROFILE_NAME=your_aws_profile
export AWS_REGION_NAME=your_aws_region
“`

Run python rag_pipeline.py.

This will start a command line utility that waits for a user’s question. For example, let’s ask “How can I install the OpenSearch cli?”

This result is achieved because we have defined our prompt in the Haystack PromptTemplate to be the following:

“`python
question_answering = PromptTemplate(prompt=”Given the context please answer the question. If the answer is not contained within the context below, say ‘I don’t know’.n”
“Context: {join(documents)};n Question: {query};n Answer: “, output_parser=AnswerParser(reference_pattern=r”Document[(d+)]”))
“`

Further customizations
You can make additional customizations to different elements in the solution, such as the following:

The data – We’ve provided the OpenSearch documentation and website data as example data. Remember to modify the opensearch_indexing_pipeline.py script to fit your needs if you chose to use your own data.
The model – In this example, we’ve used the Falcon-40b-instruct model. You are free to deploy and use any other Hugging Face model on SageMaker. Note that changing a model will likely mean you should adapt your prompt to something it’s designed to handle.
The prompt – For this post, we created our own PromptTemplate that instructs the model to answer questions based on the provided context and answer “I don’t know” if the context doesn’t include relevant information. You may change this prompt to experiment with different prompts with Falcon-40b-instruct. You can also simply pull some of our prompts from the PromptHub.
The embedding model – For the retrieval step, we use a lightweight embedding model: sentence-transformers/all-MiniLM-L12-v2. However, you may also change this to your needs. Remember to modify the expected embedding dimensions in your DocumentStore accordingly.
The number of retrieved documents – You may also choose to play around with the number of documents you ask the EmbeddingRetriever to retrieve for each query. In our setup, this is set to top_k=5. You may experiment with changing this figure to see if providing more context improves the accuracy of your results.

Production readiness
The proposed solution in this post can accelerate the time to value of the project development process. You can build a project that is easy to scale with the security and privacy environment on the AWS Cloud.
For security and privacy, OpenSearch Service provides data protection with identity and access management and cross-service confused proxy prevention. You may employ fine-grained user access control so that the user can only access the data they are authorized to access. Additionally, SageMaker provides configurable security settings for access control, data protection, and logging and monitoring. You can protect your data at rest and in transit with AWS Key Management Service (AWS KMS) keys. You can also track the log of SageMaker model deployment or endpoint access using Amazon CloudWatch. For more information, refer to Monitor Amazon SageMaker with Amazon CloudWatch.
For the high scalability on OpenSearch Service, you may adjust it by sizing your OpenSearch Service domains and employing operational best practices. You can also take advantage of auto scaling your SageMaker endpoint—you can automatically scale SageMaker models to adjust the endpoint both when the traffic is increased or the resources are not being used.
Clean up
To save costs, delete all the resources you deployed as part of this post. If you launched the CloudFormation stack, you can delete it via the AWS CloudFormation console. Similarly, you can delete any SageMaker endpoints you may have created via the SageMaker console.
Conclusion
In this post, we showcased how to build an end-to-end generative AI application for enterprise search with RAG by using Haystack pipelines and the Falcon-40b-instruct model from SageMaker JumpStart and OpenSearch Service. The RAG approach is critical in enterprise search because it ensures that the responses generated are in-domain and therefore mitigating hallucinations. By using Haystack pipelines, we are able to orchestrate LLM applications made up of different components like models and vector databases. SageMaker JumpStart provides us with a one-click solution for deploying LLMs, and we used OpenSearch Service as the vector database for our indexed data. You can start experimenting and building RAG proofs of concept for your enterprise generative AI applications, using the steps outlined in this post and the source code available in the GitHub repository.

About the Authors
Tuana Celik is the Lead Developer Advocate at deepset, where she focuses on the open-source community for Haystack. She leads the developer relations function and regularly speaks at events about NLP and creates learning materials for the community.
Roy Allela is a Senior AI/ML Specialist Solutions Architect at AWS based in Munich, Germany. Roy helps AWS customers—from small startups to large enterprises—train and deploy large language models efficiently on AWS. Roy is passionate about computational optimization problems and improving the performance of AI workloads.
Mia Chang is an ML Specialist Solutions Architect for Amazon Web Services. She works with customers in EMEA and shares best practices for running AI/ML workloads on the cloud with her background in applied mathematics, computer science, and AI/ML. She focuses on NLP-specific workloads, and shares her experience as a conference speaker and a book author. In her free time, she enjoys hiking, board games, and brewing coffee.
Inaam Syed is a Startup Solutions Architect at AWS, with a strong focus on assisting B2B and SaaS startups in scaling and achieving growth. He possesses a deep passion for serverless architectures and AI/ML. In his leisure time, Inaam enjoys quality moments with his family and indulges in his love for biking and badminton.
David Tippett is the Senior Developer Advocate working on open-source OpenSearch at AWS. His work involves all areas of OpenSearch from search and relevance to observability and security analytics.

This AI Research from DeepMind Aims at Reducing Sycophancy in Large La …

Posted on August 14, 2023 by i-genie

Large Language Models (LLMs) have developed significantly in recent years and are now capable of handling challenging tasks that call for reasoning. A number of researches, including those by OpenAI and Google, have emphasized a lot on these developments. LLMs have revolutionized the way humans interact with machines and is one of the greatest advancements in the field of Artificial Intelligence (AI). Researchers have been putting in efforts to research the phenomena of sycophancy, which is the term for an unfavorable behavior shown by language models in which these models modify their responses to coincide with the viewpoint of a human user, even when that viewpoint is not objectively right.

The behavior can involve a model adopting liberal beliefs just because a user self-identifies as liberal. Research has been done on emphasizing and examining the frequency of sycophancy inside language models and suggesting a reasonably simple synthetic-data-based strategy to curtail this behavior. To address that, a team of researchers from Google DeepMind has examined three different sycophancy tasks to examine the sycophancy phenomenon. These assignments entail asking models for their thoughts on topics for which there isn’t a single, undeniable right or wrong response, including those pertaining to politics.

The analysis has revealed an interesting pattern: in PaLM models, which can have up to 540 billion parameters, both the model’s size and the practice of instruction adjusting significantly boost sycophantic behavior. By analyzing the same behavior in the setting of simple addition statements, the research has gone beyond the basic scope of sycophancy tasks and has added a new dimension. Despite the fact that these added claims are intentionally inaccurate, language models have shown a propensity to agree with them when users signal their agreement. This finding highlights how persistent sycophancy may be, even when models are aware of their own shortcomings.

The research has presented a relatively straightforward but successful technique centered on synthetic data intervention to address the problem of sycophancy. This intervention makes use of Natural Language Processing (NLP) activities in these tasks to strengthen the model’s resistance to user opinions that are freely accessible to the public. A notable decrease in sycophantic behavior has been accomplished by incorporating this synthetic data through a quick fine-tuning procedure, especially when tested on novel cues.

The findings have been summarized as follows –

Model size and instruction tuning increase sycophancy – Models that were instruction-tuned or had more parameters were more likely to replicate a simulated user’s perspective when asked for opinions on topics without definitive answers, including politics.

Models may be complacent about incorrect responses – When there is no user opinion, models accurately disagree with wildly incorrect claims, such as 1 + 1 = 956446. Models also switch their previously accurate responses to follow the user if they agree with the user incorrectly.

Sycophancy can be decreased with a straightforward synthetic-data intervention, which can improve models on prompts where a claim’s truthfulness is unrelated to the user’s perception of it.

In conclusion, this approach addressed the issue of a language model repeating a user’s opinion, even when that opinion is wrong. Fine-tuning using simple synthetic data has been shown to reduce this trait.

The post This AI Research from DeepMind Aims at Reducing Sycophancy in Large Language Models (LLMs) Using Simple Synthetic Data appeared first on MarkTechPost.

AI Researchers From Apple And The University Of British Columbia Propo …

Posted on August 14, 2023 by i-genie

In recent times, there has been a growing fascination with the task of acquiring a 3D generative model from 2D images. With the advent of Neural Radiance Fields (NeRF), the quality of images produced from a 3D model has witnessed a significant advancement, rivaling the photorealism achieved by 2D models. While specific approaches focus solely on 3D representations to ensure consistency in the third dimension, this often comes at the expense of reduced photorealism. More recent studies, however, have shown that a hybrid approach can overcome this limitation, resulting in intensified photorealism. Nonetheless, a notable drawback of these models lies in the intertwining of scene elements, including geometry, appearance, and lighting, which hinders user-defined control.

Various approaches have been proposed to untangle this complexity. However, they demand collections of multiview images of the subject scene for effective implementation. Unfortunately, this requirement poses difficulties when dealing with images taken under real-world conditions. While some efforts have relaxed this condition to encompass pictures from different scenes, the necessity for multiple views of the same object persists. Furthermore, these methods lack generative capabilities and necessitate individual training for each distinct object, rendering them unable to create novel objects. When considering generative methodologies, the interlaced nature of geometry and illumination remains challenging.

The proposed framework, known as FaceLit, introduces a method for acquiring a disentangled 3D representation of a face exclusively from images.

An overview of the architecture is presented in the figure below.

At its core, the approach revolves around constructing a rendering pipeline that enforces adherence to established physical lighting models, similar to prior work, tailored to accommodate 3D generative modeling principles. Moreover, the framework capitalizes on readily available lighting and pose estimation tools.

The physics-based illumination model is integrated into the recently developed Neural Volume Rendering pipeline, EG3D, which uses tri-plane components to generate deep features from 2D images for volume rendering. Spherical Harmonics are utilized for this integration. Subsequent training focuses on realism, taking advantage of the framework’s inherent adherence to physics to generate lifelike images. This alignment with physical principles naturally facilitates the acquisition of a disentangled 3D generative model.

Crucially, the pivotal element enabling the methodology is the integration of physics-based rendering principles into neural volume rendering. As previously indicated, the strategy is designed for seamless integration with pre-existing, readily available illumination estimators by leveraging Spherical Harmonics. Within this framework, the diffuse and specular aspects of the scene are characterized by Spherical Harmonic coefficients attributed to surface normals and reflectance vectors. These coefficients encompass diffuse reflectance, material specular reflectance, and normal vectors, which are generated through a neural network. This seemingly straightforward setup, however, effectively untangles illumination from the rendering process.

The proposed approach is implemented and tested across three datasets: FFHQ, CelebA-HQ, and MetFaces. According to the authors, this yields state-of-the-art FID scores, positioning the method at the forefront of 3D-aware generative models. Some of the results produced by the discussed method are reported below.

This was the summary of FaceLit, a new AI framework for acquiring a disentangled 3D representation of a face exclusively from images. If you are interested and want to learn more about it, please feel free to refer to the links cited below.

The post AI Researchers From Apple And The University Of British Columbia Propose FaceLit: A Novel AI Framework For Neural 3D Relightable Faces appeared first on MarkTechPost.

Top 10 VS Code Extension to Increase 10x Productivity

Posted on August 14, 2023 by i-genie

Path Intellisense

Whether using C#, VB.NET, or F#, you can benefit from Path Intellisense thanks to a Visual Studio extension called Path Intellisense. It can help you easily identify the routes you need and protect you from typos and wrong ways. The project files are parsed by Path Intellisense, which then compiles a database of all the paths used in the project. Path Intellisense is a feature of some code editors that will suggest possible ways as you type. The proposed routes will help you get where you need to go fast. As you type, Path Intellisense will suggest appropriate path completions. If you try to enter a nonexistent or inaccessible path, Path IntelliSense will give you a warning. With Path Intellisense, you may easily get the term’s definition. To save time and prevent mistakes when working with paths, the Path Intellisense extension for Visual Studio is invaluable. On the Visual Studio Marketplace, you may download it for free.

Live Server

Using the Live Server extension for Visual Studio Code, you can build static and dynamic websites locally with live reload. This allows you to edit your code and immediately observe its effects in your browser. A developer can save time by not having to manually reload the browser after making each change. The interface of Live Server is intuitive. Open your project in Visual Studio Code and add the extension from the Marketplace. When you click the Live Server button, a local server will begin running, and your project will launch in a browser. When you modify your code, it will instantly appear in the browser. HTML, CSS, JavaScript, PHP, and Python are just some of the file formats that may be uploaded and run on Live Server. In addition, it interacts with frameworks like React, Angular, and Vue.js to produce dynamic web pages. If you’re a web designer, you should check out Live Server. It’s simple to operate, works with many file formats, and saves significant time.

CodeSnap

Take stunning screenshots of your code with the help of the CodeSnap plugin for Visual Studio Code. It’s a fantastic medium for distributing and documenting code. Select the code you wish to capture with CodeSnap, and then hit the shortcut combination of Ctrl + Shift + C on your keyboard. When you click the button, CodeSnap will snap a picture of your code and load it in a new tab or window. The screenshot can then be customized by zooming in or out and cropping as required. CodeSnap captures stunning screenshots of your code with crisp text and vivid colors. Your screenshots can be customized with zoom and cropping tools. The color, size, text placement, and line numbers in your screenshots are all up for grabs. PNG, JPEG, and SVG are just some formats supporting screenshot exporting. Take stunning snapshots of your code with the help of CodeSnap. It’s convenient, packed with options, and makes excellent output.

Material Icon Theme

Thanks to the Material Icon Theme extension, a lovely and uniform set of icons based on the Material Design icons can now be used in Visual Studio Code. It’s a fantastic method for giving your code editor a more polished and contemporary appearance. The Material Icon Theme extension provides a plethora of icons for use with various file formats, languages, and instructions. The icons are high-resolution, vector-based, and look fantastic at any size. The Material Icon Theme extension adds a set of custom icons to your code editor in addition to the default setting. The Material Design icon library can also make your unique icons. Adding the Material Icon Theme extension to your copy of Visual Studio Code will greatly improve its aesthetic appeal. It has a wide selection of icons and is simple to operate. The Material Icon Theme extension gives you a gorgeous, uniform set of icons in Google’s Material Design style. Your code editor will now have a more refined and contemporary appearance. The Material Icon Theme extension has no cost associated with it.

Prettier Code formatter

Prettier is a widely-used code formatter that can be used to make code uniform and easy to understand. However, the Prettier settings out of the box might only work for some. The prettier-formatter package can be used to create a customized Prettier formatting setting. With the prettier-formatter package, you can set up Prettier to suit your needs. If you use the prettier-formatter package, your code will be formatted according to the standards you define. If you want to tailor Prettier’s formatting to your specific needs, the prettier-formatter package is the way to go. It’s user-friendly, and you get a lot of leeway in setting up your formatting rules.

Project Manager

You can keep track of your many projects with the help of the Project Manager extension for Visual Studio Code. It can instantly recognize Git, Mercurial, or SVN repositories, and you may save any folder or workspace as a Project. Several useful organizational tools are available in Project Manager. You may make your projects more discoverable by adding tags with relevant keywords. Your most frequently used projects can be bookmarked for quick access. With Project Manager, you can obtain a comprehensive overview of your active projects in one convenient location. A project can be readily located by entering its name, a tag, or its whole path. Keyboard shortcuts in Project Manager allow you to switch between tasks with ease. Using Project Manager, all your projects will be neatly cataloged and easily located. Every developer that uses Visual Studio Code should install this extension.

ESLint

ESLint is widely used as a reliable JavaScript linter. It can help you find and fix bugs in your code and stick to best practices when writing code. Using ESLint in your VS Code projects is a breeze with the ESLint extension. Every time you save a file, the extension will lint your code for you. By doing so, you can spot mistakes before they become major issues. Based on the ESLint rules you set up, the extension will recommend changes to your code. Better and more quickly written code is the result. Some common ESLint mistakes can be fixed mechanically by the extension. Using this method can be more efficient. The extension gives you control over the ESLint rules applied to your code. You can now customize the linting of your code in this way.

GitHub Copilot

GitHub Copilot is a new AI-powered coding aid that can greatly improve productivity. OpenAI Codex, a massive language model educated on billions of lines of code, is behind it. Copilot is an extension for Visual Studio Code that provides auto-completion suggestions for your code. Your comments and directions can also generate complete classes or functions. Copilot is still in its early stages, but it can significantly alter how programmers work. It can assist programmers save time and effort, and it can also encourage them to produce code that is more idiomatic and efficient. As you type, Copilot will offer appropriate coding completions. Time and effort are saved, and mistakes are reduced using this method. Copilot can automatically construct complete classes or functions depending on your comments and instructions. In the case of complicated operations, this can be a huge help. In addition, Copilot may analyze your code and offer enhancement suggestions. This can help you find and solve bugs and make your code more readable and efficient.

Better Comments

Improve the quality of your code comments with the Better Comments extension for Visual Studio Code. To facilitate adding comments to your code, Better Comments provides a selection of premade comment templates. These samples include a wide range of documentation elements, including comments for individual methods and classes and TODO and FIXME lists. Your comments can be color-coded for better readability and comprehension using Better Comments. This comes in handy most when dealing with lengthy or multi-part comments. Better Comments will do it for you if your comment gets too long for a single line. This ensures that your comments are easy to read and navigate. You can alter the behavior of Better Comments in several ways by adjusting its settings. You can customize several aspects of Better Comments, such as the comment templates that will be used, the comment colors, and whether or not comments will be automatically broken into numerous lines. Better Comments is a fantastic tool for making your code comments more understandable and useful. It’s an essential extension for every programmer who cares about producing readable, concise code.

Live Sass Compiler

Sass is a robust CSS preprocessor that can facilitate the creation of both readable and easily maintained CSS. Sass files can be compiled quickly, but doing so after making changes to the code might be tedious. This is where the Visual Studio Code extension called Live Sass Compiler comes in handy. You can see the effects of your edits right away with Live Sass Compiler because it compiles Sass files in real-time. This can be a huge time- and stress-saver when working with Sass. Download the Live Sass Compiler extension from the Visual Studio Code Marketplace to start. Open a Sass file in Visual Studio Code after installing the extension to get started. The status bar icon for Live Sass Compiler will be displayed. To begin compiling your Sass file, click the button. Your Sass file can be compiled in real time, and the resulting CSS file can be downloaded. Visual Studio Code will launch a new tab with the CSS file. The CSS file will update instantly if you change your Sass file. If you’re using Sass in your development process, you should definitely check out Live Sass Compiler. Any serious Sass programmer needs this extension.

The post Top 10 VS Code Extension to Increase 10x Productivity appeared first on MarkTechPost.

Researchers from USC and Microsoft Propose UniversalNER: A New AI Mode …

Posted on August 13, 2023 by i-genie

ChatGPT and other large language models (LLMs) have shown impressive generalization abilities, but their training and inference costs are often prohibitive. Additionally, white-box access to model weights and inference probabilities is frequently crucial for explainability and confidence in mission-critical applications like healthcare. As a result, instruction tuning has gained popularity as a method for condensing LLMs into more affordable and transparent student models. These student models have shown convincing skills to mimic ChatGPT, as Alpaca and Vicuna showed. Close examination reveals that they still need to catch up to the ideal LLM, particularly in downstream applications that are specifically targeted.

Because of the restricted computing available, a generic distillation can only create a superficial approximation of the original LLM across all conceivable applications. Instead, they investigate targeted distillation in this research, where they train student models through mission-focused instruction adjustment for a diverse application class like open information extraction. They demonstrate that while maintaining its generalizability across semantic types and domains, this may maximally reproduce LLM’s capabilities for the specified application class. Since named entity recognition (NER) is one of the most fundamental problems in natural language processing, they chose it for their case study. Recent research demonstrates that LLMs still need to catch up to the most advanced supervised system for an entity type when there are many annotated instances.

There needs to be music little-annotable for most object kinds, though. Developing annotated examples is costly and time-consuming, especially in high-value sectors like biology, where annotation requires specialized knowledge. New entity types are continually emerging. Supervised NER models also show poor generalizability for new domains and entity types since they are trained on pre-specified entity types and domains. They outline a generic process for LLM targeted distillation and show how open-domain NER may use it. Researchers from the University of Southern California and Microsoft Research demonstrate how to utilize ChatGPT to create instruction-tuning data for NER from large amounts of unlabeled online text and use LLaMA to create the UniversalNER models (abbreviated UniNER).

They put up the biggest and most varied NER benchmark to date (UniversalNER benchmark), which consists of 43 datasets from 9 different disciplines, including medical, programming, social media, law, and finance. LLaMA and Alpaca score badly on this benchmark (around 0 F1) on zero-shot NER. Vicuna performs significantly better in comparison, yet in average F1, it is still behind ChatGPT by more than 20 absolute points. In contrast, UniversalNER outperforms Vicuna by over 30 absolute points in average F1 and achieves state-of-the-art NER accuracy across tens of thousands of entity types in the UniversalNER benchmark. In addition to replicating ChatGPT’s capacity to recognize any entity with a small number of parameters (7–13 billion), UniversalNER also beats its NER accuracy by 7-9 absolute points in average F1.

Surprisingly, UniversalNER significantly surpasses state-of-the-art multi-task instruction-tuned systems like InstructUIE, which uses supervised NER instances. They also undertake extensive ablation tests to evaluate the effects of different distillation components like the instruction prompts and negative sampling. They will provide their distillation recipe, data, and the UniversalNER model and present an interactive demo to aid further study on targeted distillation.

Check out the Paper, Github, and Project Page. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 28k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

The post Researchers from USC and Microsoft Propose UniversalNER: A New AI Model Trained with Targeted Distillation Recognizing 13k+ Entity Types and Outperforming ChatGPT’s NER Accuracy by 9% F1 on 43 Datasets appeared first on MarkTechPost.

Researchers at UC Santa Cruz Propose a Novel Text-to-Image Association …

Posted on August 13, 2023 by i-genie

A research team from UC Santa Cruz has introduced a novel tool called the Text to Image Association Test. This tool addresses the inadvertent biases in Text-to-Image (T2I) generative AI systems. These systems are known for their ability to create images from text descriptions but often reproduce societal biases in their outputs. Led by an Assistant Professor, the team developed a quantifiable method to measure these intricate biases.

The Text to Image Association Test offers a structured approach to assessing biases across several dimensions, such as gender, race, career, and religion. This innovative tool was presented at the 2023 Association for Computational Linguistics (ACL) conference. Its primary purpose is quantifying and identifying biases within advanced generative models, like Stable Diffusion, which can magnify existing prejudices in the images generated.

The process involves providing a neutral prompt, like “child studying science,” to the model. Subsequently, gender-specific prompts like “girl studying science” and “boy studying science” are used. By analyzing the differences between images generated from the neutral and gender-specific prompts, the tool quantifies bias within the model’s responses.

The study revealed that the Stable Diffusion model exhibited biases aligned with common stereotypes. The tool assessed connections between concepts such as science and arts and attributes like male and female, assigning scores to indicate the strength of these connections. Interestingly, the model surprisingly associated dark skin with pleasantness and light skin with unpleasantness, contrary to typical assumptions.

Moreover, the model displayed associations between attributes like science and males, art and females, careers and males, and family and females. The researchers highlighted that their tool also considers contextual elements in images, including colors and warmth, distinguishing it from prior evaluation methods.

Inspired by the Implicit Association Test in social psychology, the UCSC team’s tool represents progress in quantifying biases within T2I models during their developmental stages. The researchers anticipate that this approach will equip software engineers with more precise measurements of biases in their models, aiding in identifying and rectifying biases in AI-generated content. With a quantitative metric, the tool facilitates continuous efforts to mitigate biases and monitor progress over time.

The researchers received encouraging feedback and interest from fellow scholars at the ACL conference, with many expressing enthusiasm for the potential impact of this work. The team plans to propose strategies for mitigating biases during model training and refinement stages. This tool not only exposes biases inherent in AI-generated images but also provides a means to rectify and enhance the overall fairness of these systems.

Check out the Paper and Project Page. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 28k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

The post Researchers at UC Santa Cruz Propose a Novel Text-to-Image Association Test Tool that Quantifies the Implicit Stereotypes between Concepts and Valence and Those in the Images appeared first on MarkTechPost.