Google Researchers Introduce RO-ViT: A Simple AI Method to Pre-Train V …

Recent advancements have enabled computers to interpret and understand visual information from the world, much like human vision. It involves processing, analyzing, and extracting meaningful information from images and videos. Computer Vision enables automation of tasks that require visual interpretation, reducing the need for manual intervention. Object detection is a computer vision task that involves identifying and locating multiple objects of interest within an image or a video frame. 

Object detection aims to determine what objects are present in the scene and provide information about where they are located within the image. Most modern object detectors rely on manual annotations of regions and class labels, which limits their vocabulary size and makes it expensive to scale up further. 

Vision-language models (VLM) can be used instead to bridge the gap between image-level pretraining and object-level finetuning. However, the notion of objects/regions needs to be adequately utilized in the pretraining process in such models.

Researchers at Google Brain resent a simple model to build the gap between image-level pretraining and object-level finetuning. They present Region-aware Open-vocabulary Vision Transformers (RO-ViT) to complete the task.

RO-ViT is a simple way to pretrain vision transformers in a region-aware manner for open vocabulary object detection. Standard pretraining requires full image positional embeddings. Instead, researchers randomly crop and resize regions of positional embeddings instead of using the whole image’s positional embeddings. They call this method“ Cropped Positional Embedding.” 

The team has shown that image-text pretraining with focal loss is more effective than existing softmax CE loss. They have also proposed various novel object detection techniques. They argue that existing approaches often miss novel objects in the object proposal stage because the proposals often need to be more balanced. 

The team says their model RO-ViT achieves the state-of-the-art LVIS open-vocabulary detection benchmark. Their statistics say it archives it on 9 out of 12 metrics of image-text retrieval benchmarks. This reflects that the learned representation is beneficial at the regional level and highly effective in open-vocabulary detection. 

As object detection technology advances, responsible development, deployment, and regulation will be crucial to ensuring that its positive impacts are maximized while mitigating potential risks. Overall, the continued progress in object detection technology is expected to contribute to a brighter future by revolutionizing industries, enhancing safety and quality of life, and enabling innovations that were once considered science fiction.

Check out the Paper and Google Blog. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 29k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

The post Google Researchers Introduce RO-ViT: A Simple AI Method to Pre-Train Vision Transformers in a Region-Aware Manner to Improve Open-Vocabulary Detection appeared first on MarkTechPost.

Meet DenseDiffusion: A Training-free AI Technique To Address Dense Cap …

Recent advancements in text-to-image models have led to sophisticated systems capable of generating high-quality images based on brief scene descriptions. Nevertheless, these models encounter difficulties when confronted with intricate captions, often resulting in the omission or blending of visual attributes tied to different objects. The term “dense” in this context is rooted in the concept of dense captioning, where individual phrases are utilized to describe specific regions within an image. Additionally, users face challenges in precisely dictating the arrangement of elements within the generated images using only textual prompts.

Several recent studies have proposed solutions that empower users with spatial control by training or refining text-to-image models conditioned on layouts. While specific approaches like “Make-aScene” and “Latent Diffusion Models” construct models from the ground up with both text and layout conditions, other concurrent methods like “SpaText” and “ControlNet” introduce supplementary spatial controls to existing text-to-image models through fine-tuning. Unfortunately, training or fine-tuning a model can be computationally intensive. Moreover, the model necessitates retraining for every novel user condition, domain, or base text-to-image model.

Based on the abovementioned issues, a novel training-free technique termed DenseDiffusion is proposed to accommodate dense captions and provide layout manipulation.

Before presenting the main idea, let me briefly recap how diffusion models work. Diffusion models generate images through sequential denoising steps, starting from random noise. Noise prediction networks estimate noise added and try to render a sharper image at each step. Recent models reduce the number of denoising steps for faster results without significantly compromising the generated image. 

Two essential blocks in state-of-the-art diffusion models are the self-attention and cross-attention layers. 

Within a self-attention layer, intermediate features additionally function as contextual features. This enables the creation of globally consistent structures by establishing connections among image tokens spanning various areas. Simultaneously, a cross-attention layer adapts based on textual features obtained from the input text caption, employing a CLIP text encoder for encoding.

Rewinding, the main idea behind DenseDiffusion is the revised attention modulation process, which is presented in the figure below.

Initially, the intermediary features of a pre-trained text-to-image diffusion model are scrutinized to reveal the substantial correlation between the generated image’s layout and self-attention and cross-attention maps. Drawing from this insight, intermediate attention maps are dynamically adjusted based on the layout conditions. Furthermore, the approach involves considering the original attention score range and fine-tuning the modulation extent based on each segment’s area. In the presented work, the authors demonstrate the capability of DenseDiffusion to enhance the performance of the “Stable Diffusion” model and surpass multiple compositional diffusion models in terms of dense captions, text and layout conditions, and image quality.

Sample outcome results selected from the study are depicted in the image below. These visuals provide a comparative overview between DenseDiffusion and state-of-the-art approaches.

This was the summary of DenseDiffusion, a novel AI training-free technique to accommodate dense captions and provide layout manipulation in text-to-image synthesis.

Check out the Paper and Github. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 29k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

The post Meet DenseDiffusion: A Training-free AI Technique To Address Dense Captions and Layout Manipulation In Text-to-Image Generation appeared first on MarkTechPost.

50+ New Cutting-Edge Artificial Intelligence AI Tools (September 2023)

AI tools are rapidly increasing in development, with new ones being introduced regularly. Check out some AI tools below that can enhance your daily routines.

Boost your advertising and social media game with – the ultimate Artificial Intelligence solution. 

Hostinger AI Website Builder

The Hostinger AI Website Builder offers an intuitive interface combined with advanced AI capabilities designed for crafting websites for any purpose.


Motion is a clever tool that uses AI to create daily schedules that account for your meetings, tasks, and projects.

Otter AI

Using artificial intelligence, Otter.AI empowers users with real-time transcriptions of meeting notes that are shareable, searchable, accessible, and secure


Sanebox is a AI-powered email optimization tool. SaneBox’s A.I. identifies important email and automatically organizes the rest to help you stay focused.

Notion AI

Notion AI is a writing assistant that helps users write, brainstorm, edit, and summarize right inside the Notion workspace.

Pecan AI

Pecan AI automates predictive analytics to solve today’s business challenges: shrinking budgets, rising costs, and limited data science and AI resources. Pecan’s low-code predictive modeling platform provides AI-driven predictive analytics that guides data-driven decisions and helps business teams achieve their goals.


Get stunning professional headshots effortlessly with Aragon. Utilize the latest in A.I. technology to create high-quality headshots of yourself in a snap! Skip the hassle of booking a photography studio or dressing up.


Taskade is an AI productivity tool that helps users manage their tasks and projects efficiently.


Bubble empowers you to create CRMs, SaaS apps, dashboards, social networks, and marketplaces effortlessly, without code. 


Microsoft has launched the AI-powered Bing search engine, which is like having a research assistant, personal planner, and creative partner whenever the user searches the web.


Powered by the GPT model, this tool is a meeting recorder for Zoom and Google Meet. tl;dv transcribes and summarizes the calls for the user.


Bard is a chatbot developed by Google that helps to boost productivity and bring ideas to life.


Forefront AI is a platform that offers free access to GPT-4, image generation, custom personas, and shareable chats, thereby empowering businesses with improved efficiency and user experience.


Merlin is a ChatGPT extension that helps users to finish any task on any website providing features like blog summarizer and AI writer for Gmail.


WNR AI provides AI templates that convert a simple form into an optimized prompt to extract the best results from AI.

Chat ABC

Chat ABC is a better alternative to ChatGPT, providing features like a prompt library, team collaboration, etc.

Paperpal AI

Paperpal is an AI language assistant and online academic writing tool that identifies language errors and provides instant suggestions to the user.

Monic AI

Monic is an AI tool that makes learning interactive by turning notes, slides, articles, and textbooks into mock tests.


ChartGPT is a tool that transforms simple text into beautiful charts.

Trinka AI

Trinka is a grammar checker and language enhancement writing assistant.


Scholarcy reads the user’s articles, reports, and textbooks and converts them into flashcards.


Lavender is a sales email assistant that helps users to write better emails.


Regie is a content platform for revenue teams that allows users to create and publish sales sequences to their sales engagement platform.


Warmer is an AI email personalization tool that helps users to increase their cold emails.


Twain is a communication assistant that helps users to write clear and confident outreach messages that get answers.


Octane is a platform for data collection and personalized Facebook Messenger and SMS automation.


10Web is an automated website builder that improves the core web vitals of users’ websites.


Uncody is a landing page generator that allows users to build professional-looking websites easily.


Dora AI allows users to create editable websites just from an input prompt.


Durable is an AI website builder allowing users to instantly create websites with images and copies.


Replit is a web-based Integrated Development Environment (IDE) that enables users to build projects online.


Consensus is an AI-powered search engine that extracts findings directly from scientific research.


Writesonic is an AI writer that generates SEO-friendly content for blogs, Google ads, Facebook ads, and Shopify for free.

Yatter Plus

Yatter Plus is a WhatsApp chatbot that answers all user queries, questions, and concerns in seconds.


Typewise is a text prediction software that boosts enterprise productivity.


Cohere is a tool that provides access to advanced LLMs and NLP tools through APIs.


Quickchat is a conversational AI assistant empowering companies to build their multilingual chatbots.


Kaizan is a Client Intelligence Platform that allows its users to retain their clients and grow revenue.


Looka is an AI-powered logo maker that enables entrepreneurs to easily create a professional logo and brand identity. 


Namecheap is a free logo generator tool for businesses.


LogoAI is a brand-building platform for crafting polished logos, developing cohesive brand identities, and streamlining brand promotion through automation.


Stockimg is an AI image generator that creates logos, book covers, and posters.


Brandmark is an AI-powered logo, business card, and social media graphics designer.


Panopreter is a text-to-speech tool that converts digital content into audio.


Speechelo is a tool that generates human-sounding voiceovers from text.


Synthesys is a platform that allows users to create multilingual voiceovers and videos effortlessly.


Speechify is an AI voice generator capable of converting texts into natural-sounding voices.


Murf is an AI voice generator that makes the process of voiceovers effortless.


Pictory is an AI video generator that creates short videos from long-form content.


Synthesia generates professional videos by simply taking text as input. is an AI-powered video editing platform that allows users to add images, subtitles, convert text to videos, and much more. 


Colossyan allows users to create videos from text within minutes and auto-translate to dozens of languages.


GetIMG allows users to generate original images at scale, edit photos, and create custom AI models.


Shutterstock allows users to create unique AI photos using text prompts.


NightCafe is an AI art generator that allows users to create an artwork within seconds.


Using Artbreeder, users can make simple collages from shapes and images by describing them with a prompt.


Stablecog is an open-source, free, and multilingual AI image generator.

Speak AI

Speak AI allows marketing teams to turn unstructured audio, video, and text into insights using NLP.


AISEO is an AI-powered writing assistant which allows users to generate SEO-optimized content within minutes.


Lumen5 is an AI-powered video creation platform that allows users to easily create engaging video content within minutes.


Spellbook uses LLMs like GPT-4 to draft contracts faster.

Don’t forget to join our 29k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at

Check Out 1000’s AI Tools in AI Tools Club

Note: This post contains affiliate links. If you use these links to buy something we may earn a commission. Thanks.”

The post 50+ New Cutting-Edge Artificial Intelligence AI Tools (September 2023) appeared first on MarkTechPost.

Deploy self-service question answering with the QnABot on AWS solution …

Powered by Amazon Lex, the QnABot on AWS solution is an open-source, multi-channel, multi-language conversational chatbot. QnABot allows you to quickly deploy self-service conversational AI into your contact center, websites, and social media channels, reducing costs, shortening hold times, and improving customer experience and brand sentiment. Customers now want to apply the power of large language models (LLMs) to further improve the customer experience with generative AI capabilities. This includes automatically generating accurate answers from existing company documents and knowledge bases, and making their self-service chatbots more conversational.
Our latest QnABot releases, v5.4.0+, can now use an LLM to disambiguate customer questions by taking conversational context into account, dynamically generating answers from relevant FAQs or Amazon Kendra search results and document passages. It also provides attribution and transparency by displaying links to the reference documents and context passages that were used by the LLM to construct the answers.

When you deploy QnABot, you can choose to automatically deploy a state-of-the-art open-source LLM model (Falcon-40B-instruct) on an Amazon SageMaker endpoint. The LLM landscape is constantly evolving—new models are released frequently and our customers want to experiment with different models and providers to see what works best for their use cases. This is why QnABot also integrates with any other LLM using an AWS Lambda function that you provide. To help you get started, we’ve also released a set of sample one-click deployable Lambda functions (plugins) to integrate QnABot with your choice of leading LLM providers, including our own Amazon Bedrock service and APIs from third-party providers, Anthropic and AI21.
In this post, we introduce the new Generative AI features for QnABot and walk through a tutorial to create, deploy, and customize QnABot to use these features. We also discuss some relevant use cases.
New Generative AI features
Using the LLM, QnABot now has two new important features, which we discuss in this section.
Generate answers to questions from Amazon Kendra search results or text passages
QnABot can now generate concise answers to questions from document extracts provided by an Amazon Kendra search, or text passages created or imported directly. This provides the following advantages:

The number of FAQs that you need to maintain and import into QnABot is reduced, because you can now synthesize concise answers on the fly from your existing documents.
Generated answers can be modified to create the best experience for the intended channel. For example, you can set the answers to be short, concise, and suitable for voice channel contact center bots, and website or text bots could potentially provide more detailed information.
Generated answers are fully compatible with QnABot’s multi-language support—users can interact in their chosen languages and receive generated answers in the same language.
Generated answers can include links to the reference documents and context passages used, to provide attribution and transparency on how the LLM constructed the answers.

For example, when asked “What is Amazon Lex?”, QnABot can retrieve relevant passages from an Amazon Kendra index (containing AWS documentation). QnABot then asks (prompts) the LLM to answer the question based on the context of the passages (which can also optionally be viewed in the web client). The following screenshot shows an example.

Disambiguate follow-up questions that rely on preceding conversation context
Understanding the direction and context of an ever-evolving conversation is key to building natural, human-like conversational interfaces. User queries often require a bot to interpret requests based on conversation memory and context. Now QnABot will ask the LLM to generate a disambiguated question based on the conversation history. This can then be used as a search query to retrieve the FAQs, passages, or Amazon Kendra results to answer the user’s question. The following is an example chat history:

Human: What is Amazon Lex?
AI: “Amazon Lex is an AWS service for building conversational interfaces for applications using voice and text…”
Human: Can it integrate with my CRM?

QnABot uses the LLM to rewrite the follow-up question to make “it” unambiguous, for example, “Can Amazon Lex integrate with my CRM system?” This allows users to interact like they would in a human conversation, and QnABot generates clear search queries to find the relevant FAQs or document passages that have the information to answer the user’s question.

These new features make QnABot more conversational and provide the ability to dynamically generate responses based on a knowledge base. This is still an experimental feature with tremendous potential. We strongly encourage users to experiment to find the best LLM and corresponding prompts and model parameters to use. QnABot makes it straightforward to experiment!
Time to try it! Let’s deploy the latest QnABot (v5.4.0 or later) and enable the new Generative AI features. The high-level steps are as follows:

Create and populate an Amazon Kendra index.
Choose and deploy an LLM plugin (optional).
Deploy QnABot.
Configure QnABot for your Lambda plugin (if using a plugin).
Access the QnABot web client and start experimenting.
Customize behavior using QnABot settings.
Add curated Q&As and text passages to the knowledge base.

Create and populate an Amazon Kendra Index
Download and use the following AWS CloudFormation template to create a new Amazon Kendra index.
This template includes sample data containing AWS online documentation for Amazon Kendra, Amazon Lex, and SageMaker. Deploying the stack requires about 30 minutes followed by about 15 minutes to synchronize it and ingest the data in the index.
When the Amazon Kendra index stack is successfully deployed, navigate to the stack’s Outputs tab and note the Index Id, which you will use later when deploying QnABot.
Alternatively, if you already have an Amazon Kendra index with your own content, you can use it instead with your own example questions for the tutorial.
Choose and deploy an LLM plugin (optional)
QnABot can deploy a built-in LLM (Falcon-40B-instruct on SageMaker) or use Lambda functions to call any other LLMs of your choice. In this section, we show you how to use the Lambda option with a pre-built sample Lambda function. Skip to the next step if you want to use the built-in LLM instead.
First, choose the plugin LLM you want to use. Review your options from the qnabot-on-aws-plugin-samples repository README. As of this writing, plugins are available for Amazon Bedrock (in preview), and for AI21 and Anthropic third-party APIs. We expect to add more sample plugins over time.
Deploy your chosen plugin by choosing Launch Stack in the Deploy a new Plugin stack section, which will deploy into the us-east-1 Region by default (to deploy in other Regions, see Build and Publish QnABot Plugins CloudFormation artifacts).
When the Plugin stack is successfully deployed, navigate to the stack’s Outputs tab (see the following screenshot) and inspect its contents, which you will use in the following steps to deploy and configure QnABot. Keep this tab open in your browser.

Deploy QnABot
Choose Launch Solution from the QnABot implementation guide to deploy the latest QnABot template via AWS CloudFormation. Provide the following parameters:

For DefaultKendraIndexId, use the Amazon Kendra Index ID (a GUID) you collected earlier
For EmbeddingsApi (see Semantic Search using Text Embeddings), choose one of the following:

SAGEMAKER (the default built-in embeddings model)
LAMBDA (to use the Amazon Bedrock embeddings API with the BEDROCK-EMBEDDINGS-AND-LLM Plugin)

For EmbeddingsLambdaArn, use the EmbeddingsLambdaArn output value from your BEDROCK-EMBEDDINGS-AND-LLM Plugin stack.

For LLMApi (see Query Disambiguation for Conversational Retrieval, and Generative Question Answering), choose one of the following:

SAGEMAKER (the default built-in LLM model)
LAMBDA (to use the LLM Plugin deployed earlier)

For LLMLambdaArn, use the LLMLambdaArn output value from your Plugin stack

For all other parameters, accept the defaults (see the implementation guide for parameter definitions), and proceed to launch the QnABot stack.
Configure QnABot for your Lambda plugin (if using a plugin)
If you deployed QnABot using a sample LLM Lambda plugin to access a different LLM, update the QnABot model parameters and prompt template settings as recommended for your chosen plugin. For more information, see Update QnABot Settings. If you used the SageMaker (built-in) LLM option, skip to the next step, because the settings are already configured for you.
Access the QnABot web client and start experimenting
On the AWS CloudFormation console, choose the Outputs tab of the QnABot CloudFormation stack and choose the ClientURL link. Alternatively, launch the client by choosing QnABot on AWS Client from the Content Designer tools menu.
Now, try to ask questions related to AWS services, for example:

What is Amazon Lex?
How does SageMaker scale up inference workloads?
Is Kendra a search service?

Then you can ask follow-up questions without specifying the previously mentioned services or context, for example:

Is it secure?
Does it scale?

Customize behavior using QnABot settings
You can customize many settings on the QnABot Content Designer Settings page—see README – LLM Settings for a full list of relevant settings. For example, try the following:

Set ENABLE_DEBUG_RESPONSES to TRUE, save the settings, and try the previous questions again. Now you will see additional debug output at the top of each response, showing you how the LLM generates the Amazon Kendra search query based on the chat history, how long the LLM inferences took to run, and more. For example:

[User Input: “Is it fast?”, LLM generated query (1207 ms): “Does Amazon Kendra provide search results quickly?”, Search string: “Is it fast? / Does Amazon Kendra provide search results quickly?”[“LLM: LAMBDA”], Source: KENDRA RETRIEVE API

Set ENABLE_DEBUG_RESPONSES back to FALSE, set LLM_QA_SHOW_CONTEXT_TEXT and LLM_QA_SHOW_SOURCE_LINKS to FALSE, and try the examples again. Now the context and sources links are not shown, and the output contains only the LLM-generated response.
If you feel adventurous, experiment also with the LLM prompt template settings—LLM_GENERATE_QUERY_PROMPT_TEMPLATE and LLM_QA_PROMPT_TEMPLATE. Refer to README – LLM Settings to see how you can use placeholders for runtime values like chat history, context, user input, query, and more. Note that the default prompts can most likely be improved and customized to better suit your use cases, so don’t be afraid to experiment! If you break something, you can always revert to the default settings using the RESET TO DEFAULTS option on the settings page.

Add curated Q&As and text passages to the knowledge base
QnABot can, of course, continue to answer questions based on curated Q&As. It can also use the LLM to generate answers from text passages created or imported directly into QnABot, in addition to using Amazon Kendra index.
QnABot attempts to find a good answer to the disambiguated user question in the following sequence:

QnA items
Text passage items
Amazon Kendra index

Let’s try some examples.
On the QnABot Content Designer tools menu, choose Import, then load the two example packages:


QnABot can use text embeddings to provide semantic search capability (using QnABot’s built-in OpenSearch index as a vector store), which improves accuracy and reduces question tuning, compared to standard OpenSearch keyword based matching. To illustrate this, try questions like the following:

“Tell me about the Alexa device with the screen”
“Tell me about Amazon’s video streaming device?”

These should ideally match the sample QNA you imported, even though the words used to ask the question are poor keyword matches (but good semantic matches) with the configured QnA items: Alexa.001 (What is an Amazon Echo Show) and FireTV.001 (What is an Amazon Fire TV).

Even if you are not (yet) using Amazon Kendra (and you should!), QnABot can also answer questions based on passages created or imported into Content Designer. The following questions (and follow-up questions) are all answered from an imported text passage item that contains the nursery rhyme 0.HumptyDumpty:

“Where did Humpty Dumpty sit before he fell?”
“What happened after he fell? Was he OK?”

When using embeddings, a good answer is an answer that returns a similarity score above the threshold defined by the corresponding threshold setting. See Semantic question matching, using Large Language Model Text Embeddings for more details on how to test and tune the threshold settings.
If there are no good answers, or if the LLM’s response matches the regular expression defined in LLM_QA_NO_HITS_REGEX, then QnABot invokes the configurable Custom Don’t Know (no_hits) behavior, which, by default, returns a message saying “You stumped me.”
Try some experiments by creating Q&As or text passage items in QnABot, as well as using an Amazon Kendra index for fallback generative answers. Experiment (using the TEST tab in the designer) to find the best values to use for the embedding threshold settings to get the behavior you want. It’s hard to get the perfect balance, but see if you can find a good enough balance that results in useful answers most of the time.
Clean up
You can, of course, leave QnABot running to experiment with it and show it to your colleagues! But it does incur some cost—see Plan your deployment – Cost for more details. To remove the resources and avoid costs, delete the following CloudFormation stacks:

QnABot stack
LLM Plugin stack (if applicable)
Amazon Kendra index stack

Use case examples
These new features make QnABot relevant for many customer use cases such as self-service customer service and support bots and automated web-based Q&A bots. We discuss two such use cases in this section.
Integrate with a contact center
QnABot’s automated question answering capabilities deliver effective self-service for inbound voice calls in contact centers, with compelling outcomes. For example, see how Kentucky Transportation Cabinet reduced call hold time and improved customer experience with self-service virtual agents using Amazon Connect and Amazon Lex. Integrating the new generative AI features strengthens this value proposition further by dynamically generating reliable answers from existing content such as documents, knowledge bases, and websites. This eliminates the need for bot designers to anticipate and manually curate responses to every possible question that a user might ask. To integrate QnABot with Amazon Connect, see Connecting QnABot on AWS to an Amazon Connect call center. To integrate with other contact centers, See how Amazon Chime SDK can be used to connect Amazon Lex voice bots with 3rd party contact centers via SIPREC and Build an AI-powered virtual agent for Genesys Cloud using QnABot and Amazon Lex.
The LLM-powered QnABot can also play a pivotal role as an automated real-time agent assistant. In this solution, QnABot passively listens to the conversation and uses the LLM to generate real-time suggestions for the human agents based on certain cues. It’s straightforward to set up and try—give it a go! This solution can be utilized with both Amazon Connect and other on-prem and cloud contact centers. For more information, see Live call analytics and agent assist for your contact center with Amazon language AI services.
Integrate with a website
Embedding QnABot in your websites and applications allows users to get automated assistance with natural dialogue. For more information, see Deploy a Web UI for your Chatbot. For curated Q&A content, use markdown syntax and UI buttons and incorporate links, images, videos, and other dynamic elements that inform and delight your users. Integrate the QnABot Amazon Lex web UI with Amazon Connect live chat to facilitate quick escalation to human agents when the automated assistant cannot fully address a user’s inquiry on its own.
The QnABot on the AWS plugin samples repository
As shown in this post, QnABot v5.4.0+ not only offers built-in support for embeddings and LLM models hosted on SageMaker, but it also offers the ability to easily integrate with any other LLM by using Lambda functions. You can author your own custom Lambda functions or get started faster with one of the samples we have provided in our new qnabot-on-aws-plugin-samples repository.
This repository includes a ready-to-deploy plugin for Amazon Bedrock, which supports both embeddings and text generation requests. At the time of writing, Amazon Bedrock is available through private preview—you can request preview access. When Amazon Bedrock is generally available, we expect to integrate it directly with QnABot, but why wait? Apply for preview access and use our sample plugin to start experimenting!
Today’s LLM innovation cycle is driving a breakneck pace of new model releases, each aiming to surpass the last. This repository will expand to include additional QnABot plugin samples over time. As of this writing, we have support for two third-party model providers: Anthropic and AI21. We plan to add integrations for more LLMs, embeddings, and potentially common use case examples involving Lambda hooks and knowledge bases. These plugins are offered as-is without warranty, for your convenience—users are responsible for supporting and maintaining them once deployed.
We hope that the QnABot plugins repository will mature into a thriving open-source community project. Watch the qnabot-on-aws-plugin-samples GitHub repo to receive updates on new plugins and features, use the Issues forum to report problems or provide feedback, and contribute improvements via pull requests. Contributions are welcome!
In this post, we introduced the new generative AI features for QnABot and walked through a solution to create, deploy, and customize QnABot to use these features. We also discussed some relevant use cases. Automating repetitive inquiries frees up human workers and boosts productivity. Rich responses create engaging experiences. Deploying the LLM-powered QnABot can help you elevate the self-service experience for customers and employees.
Don’t miss this opportunity—get started today and revolutionize the user experience on your QnABot deployment!

About the authors
Clevester Teo is a Senior Partner Solutions Architect at AWS, focused on the Public Sector partner ecosystem. He enjoys building prototypes, staying active outdoors, and experiencing new cuisines. Clevester is passionate about experimenting with emerging technologies and helping AWS partners innovate and better serve public sector customers.
Windrich is a Solutions Architect at AWS who works with customers in industries such as finance and transport, to help accelerate their cloud adoption journey. He is especially interested in Serverless technologies and how customers can leverage them to bring values to their business. Outside of work, Windrich enjoys playing and watching sports, as well as exploring different cuisines around the world.
Bob Strahan is a Principal Solutions Architect in the AWS Language AI Services team.

Automatically generate impressions from findings in radiology reports …

Radiology reports are comprehensive, lengthy documents that describe and interpret the results of a radiological imaging examination. In a typical workflow, the radiologist supervises, reads, and interprets the images, and then concisely summarizes the key findings. The summarization (or impression) is the most important part of the report because it helps clinicians and patients focus on the critical contents of the report that contain information for clinical decision-making. Creating a clear and impactful impression involves much more effort than simply restating the findings. The entire process is therefore laborious, time consuming, and prone to error. It often takes years of training for doctors to accumulate enough expertise in writing concise and informative radiology report summarizations, further highlighting the significance of automating the process. Additionally, automatic generation of report findings summarization is critical for radiology reporting. It enables translation of reports into human readable language, thereby alleviating the patients’ burden of reading through lengthy and obscure reports.
To solve this problem, we propose the use of generative AI, a type of AI that can create new content and ideas, including conversations, stories, images, videos, and music. Generative AI is powered by machine learning (ML) models—very large models that are pre-trained on vast amounts of data and commonly referred to as foundation models (FMs). Recent advancements in ML (specifically the invention of the transformer-based neural network architecture) have led to the rise of models that contain billions of parameters or variables. The proposed solution in this post uses fine-tuning of pre-trained large language models (LLMs) to help generate summarizations based on findings in radiology reports.
This post demonstrates a strategy for fine-tuning publicly available LLMs for the task of radiology report summarization using AWS services. LLMs have demonstrated remarkable capabilities in natural language understanding and generation, serving as foundation models that can be adapted to various domains and tasks. There are significant benefits to using a pre-trained model. It reduces computation costs, reduces carbon footprints, and allows you to use state-of-the-art models without having to train one from scratch.
Our solution uses the FLAN-T5 XL FM, using Amazon SageMaker JumpStart, which is an ML hub offering algorithms, models, and ML solutions. We demonstrate how to accomplish this using a notebook in Amazon SageMaker Studio. Fine-tuning a pre-trained model involves further training on specific data to improve performance on a different but related task. This solution involves fine-tuning the FLAN-T5 XL model, which is an enhanced version of T5 (Text-to-Text Transfer Transformer) general-purpose LLMs. T5 reframes natural language processing (NLP) tasks into a unified text-to-text-format, in contrast to BERT-style models that can only output either a class label or a span of the input. It is fine-tuned for a summarization task on 91,544 free-text radiology reports obtained from the MIMIC-CXR dataset.
Overview of solution
In this section, we discuss the key components of our solution: choosing the strategy for the task, fine-tuning an LLM, and evaluating the results. We also illustrate the solution architecture and the steps to implement the solution.
Identify the strategy for the task
There are various strategies to approach the task of automating clinical report summarization. For example, we could use a specialized language model pre-trained on clinical reports from scratch. Alternatively, we could directly fine-tune a publicly available general-purpose language model to perform the clinical task. Using a fine-tuned domain-agnostic model may be necessary in settings where training a language model from scratch is too costly. In this solution, we demonstrate the latter approach of using a FLAN -T5 XL model, which we fine-tune for the clinical task of summarization of radiology reports. The following diagram illustrates the model workflow.

A typical radiology report is well-organized and succinct. Such reports often have three key sections:

Background – Provides general information about the demographics of the patient with essential information about the patient, clinical history, and relevant medical history and details of exam procedures
Findings – Presents detailed exam diagnosis and results
Impression – Concisely summarizes the most salient findings or interpretation of the findings with an assessment of significance and potential diagnosis based on the observed abnormalities

Using the findings section in the radiology reports, the solution generates the impression section, which corresponds to the doctors’ summarization. The following figure is an example of a radiology report .

Fine-tune a general-purpose LLM for a clinical task
In this solution, we fine-tune a FLAN-T5 XL model (tuning all the parameters of the model and optimizing them for the task). We fine-tune the model using the clinical domain dataset MIMIC-CXR, which is a publicly available dataset of chest radiographs. To fine-tune this model through SageMaker Jumpstart, labeled examples must be provided in the form of {prompt, completion} pairs. In this case, we use pairs of {Findings, Impression} from the original reports in MIMIC-CXR dataset. For inferencing, we use a prompt as shown in the following example:

The model is fine-tuned on an accelerated computing ml.p3.16xlarge instance with 64 virtual CPUs and 488 GiB memory. For validation, 5% of the dataset was randomly selected. The elapsed time of the SageMaker training job with fine-tuning was 38,468 seconds (approximately 11 hours).
Evaluate the results
When the training is complete, it’s critical to evaluate the results. For a quantitative analysis of the generated impression, we use ROUGE (Recall-Oriented Understudy for Gisting Evaluation), the most commonly used metric for evaluating summarization. This metric compares an automatically produced summary against a reference or a set of references (human-produced) summary or translation. ROUGE1 refers to the overlap of unigrams (each word) between the candidate (the model’s output) and reference summaries. ROUGE2 refers to the overlap of bigrams (two words) between the candidate and reference summaries. ROUGEL is a sentence-level metric and refers to the longest common subsequence (LCS) between two pieces of text. It ignores newlines in the text. ROUGELsum is a summary-level metric. For this metric, newlines in the text aren’t ignored but are interpreted as sentence boundaries. The LCS is then computed between each pair of reference and candidate sentences, and then union-LCS is computed. For aggregation of these scores over a given set of reference and candidate sentences, the average is computed.
Walkthrough and architecture
The overall solution architecture as shown in the following figure primarily consists of a model development environment that uses SageMaker Studio, model deployment with a SageMaker endpoint, and a reporting dashboard using Amazon QuickSight.

In the following sections, we demonstrate fine-tuning an LLM available on SageMaker JumpStart for summarization of a domain-specific task via the SageMaker Python SDK. In particular, we discuss the following topics:

Steps to set up the development environment
An overview of the radiology report datasets on which the model is fine-tuned and evaluated
A demonstration of fine-tuning the FLAN-T5 XL model using SageMaker JumpStart programmatically with the SageMaker Python SDK
Inferencing and evaluation of the pre-trained and fine-tuned models
Comparison of results from pre-trained model and fine-tuned models

The solution is available in the Generating Radiology Report Impression using generative AI with Large Language Model on AWS GitHub repo.
To get started, you need an AWS account in which you can use SageMaker Studio. You will need to create a user profile for SageMaker Studio if you don’t already have one.
The training instance type used in this post is ml.p3.16xlarge. Note that the p3 instance type requires a service quota limit increase.
The MIMIC CXR dataset can be accessed through a data use agreement, which requires user registration and completion of a credentialing process.
Set up the development environment
To set up your development environment, you create an S3 bucket, configure a notebook, create endpoints and deploy the models, and create a QuickSight dashboard.
Create an S3 bucket
Create an S3 bucket called llm-radiology-bucket to host the training and evaluation datasets. This will also be used to store the model artifact during model development.
Configure a notebook
Complete the following steps:

Launch SageMaker Studio from either the SageMaker console or the AWS Command Line Interface (AWS CLI).

For more information about onboarding to a domain, see Onboard to Amazon SageMaker Domain.

Create a new SageMaker Studio notebook for cleaning the report data and fine-tuning the model. We use an ml.t3.medium 2vCPU+4GiB notebook instance with a Python 3 kernel.

Within the notebook, install the relevant packages such as nest-asyncio, IPyWidgets (for interactive widgets for Jupyter notebook), and the SageMaker Python SDK:

!pip install nest-asyncio==1.5.5 –quiet
!pip install ipywidgets==8.0.4 –quiet
!pip install sagemaker==2.148.0 –quiet
Create endpoints and deploy the models for inference

For inferencing the pre-trained and fine-tuned models, create an endpoint and deploy each model in the notebook as follows:

Create a model object from the Model class that can be deployed to an HTTPS endpoint.
Create an HTTPS endpoint with the model object’s pre-built deploy() method:

from sagemaker import model_uris, script_uris
from sagemaker.model import Model
from sagemaker.predictor import Predictor
from sagemaker.utils import name_from_base

# Retrieve the URI of the pre-trained model
pre_trained_model_uri =model_uris.retrieve(model_id=model_id, model_version=model_version, model_scope=”inference”)


pre_trained_name = name_from_base(f”jumpstart-demo-pre-trained-{model_id}”)

# Create the SageMaker model instance of the pre-trained model
if (“small” in model_id) or (“base” in model_id):
deploy_source_uri = script_uris.retrieve(
model_id=model_id, model_version=model_version, script_scope=”inference”
pre_trained_model = Model(
# For those large models, we already repack the inference script and model
# artifacts for you, so the `source_dir` argument to Model is not required.
pre_trained_model = Model(

# Deploy the pre-trained model. Note that we need to pass Predictor class when we deploy model
# through Model class, for being able to run inference through the SageMaker API
pre_trained_predictor = pre_trained_model.deploy(

Create a QuickSight dashboard
Create a QuickSight dashboard with an Athena data source with inference results in Amazon Simple Storage Service (Amazon S3) to compare the inference results with the ground truth. The following screenshot shows our example dashboard.
Radiology report datasets
The model is now fine-tuned, all the model parameters are tuned on 91,544 reports downloaded from the MIMIC-CXR v2.0 dataset. Because we used only the radiology report text data, we downloaded just one compressed report file ( from the MIMIC-CXR website. Now we evaluate the fine-tuned model on 2,000 reports (referred to as the dev1 dataset) from the separate held out subset of this dataset. We use another 2,000 radiology reports (referred to as dev2) for evaluating the fine-tuned model from the chest X-ray collection from the Indiana University hospital network. All the datasets are read as JSON files and uploaded to the newly created S3 bucket llm-radiology-bucket. Note that all the datasets by default don’t contain any Protected Health Information (PHI); all sensitive information is replaced with three consecutive underscores (___) by the providers.
Fine-tune with the SageMaker Python SDK
For fine-tuning, the model_id is specified as huggingface-text2text-flan-t5-xl from the list of SageMaker JumpStart models. The training_instance_type is set as ml.p3.16xlarge and the inference_instance_type as ml.g5.2xlarge. The training data in JSON format is read from the S3 bucket. The next step is to use the selected model_id to extract the SageMaker JumpStart resource URIs, including image_uri (the Amazon Elastic Container Registry (Amazon ECR) URI for the Docker image), model_uri (the pre-trained model artifact Amazon S3 URI), and script_uri (the training script):

from sagemaker import image_uris, model_uris, script_uris

# Training instance will use this image
train_image_uri = image_uris.retrieve(
framework=None, # automatically inferred from model_id

# Pre-trained model
train_model_uri = model_uris.retrieve(
model_id=model_id, model_version=model_version, model_scope=”training”

# Script to execute on the training instance
train_script_uri = script_uris.retrieve(
model_id=model_id, model_version=model_version, script_scope=”training”

output_location = f”s3://{output_bucket}/demo-llm-rad-fine-tune-flan-t5/”

Also, an output location is set up as a folder within the S3 bucket.
Only one hyperparameter, epochs, is changed to 3, and the rest all are set as default:

from sagemaker import hyperparameters

# Retrieve the default hyper-parameters for fine-tuning the model
hyperparameters = hyperparameters.retrieve_default(model_id=model_id, model_version=model_version)

# We will override some default hyperparameters with custom values
hyperparameters[“epochs”] = “3”

The training metrics such as eval_loss (for validation loss), loss (for training loss), and epoch to be tracked are defined and listed:

from sagemaker.estimator import Estimator
from sagemaker.utils import name_from_base

model_name = “-“.join(model_id.split(“-“)[2:]) # get the most informative part of ID
training_job_name = name_from_base(f”js-demo-{model_name}-{hyperparameters[‘epochs’]}”)
print(f”{bold}job name:{unbold} {training_job_name}”)

training_metric_definitions = [
{“Name”: “val_loss”, “Regex”: “‘eval_loss’: ([0-9\.]+)”},
{“Name”: “train_loss”, “Regex”: “‘loss’: ([0-9\.]+)”},
{“Name”: “epoch”, “Regex”: “‘epoch’: ([0-9\.]+)”},

We use the SageMaker JumpStart resource URIs (image_uri, model_uri, script_uri) identified earlier to create an estimator and fine-tune it on the training dataset by specifying the S3 path of the dataset. The Estimator class requires an entry_point parameter. In this case, JumpStart uses The training job fails to run if this value is not set.

# Create SageMaker Estimator instance
sm_estimator = Estimator(

# Launch a SageMaker training job over data located in the given S3 path
# Training jobs can take hours, it is recommended to set wait=False,
# and monitor job status through SageMaker console{“training”: train_data_location}, job_name=training_job_name, wait=True)

This training job can take hours to complete; therefore, it’s recommended to set the wait parameter to False and monitor the training job status on the SageMaker console. Use the TrainingJobAnalytics function to keep track of the training metrics at various timestamps:

from sagemaker import TrainingJobAnalytics

# Wait for a couple of minutes for the job to start before running this cell
# This can be called while the job is still running
df = TrainingJobAnalytics(training_job_name=training_job_name).dataframe()

Deploy inference endpoints
In order to draw comparisons, we deploy inference endpoints for both the pre-trained and fine-tuned models.
First, retrieve the inference Docker image URI using model_id, and use this URI to create a SageMaker model instance of the pre-trained model. Deploy the pre-trained model by creating an HTTPS endpoint with the model object’s pre-built deploy() method. In order to run inference through SageMaker API, make sure to pass the Predictor class.

from sagemaker import image_uris
# Retrieve the inference docker image URI. This is the base HuggingFace container image
deploy_image_uri = image_uris.retrieve(
framework=None, # automatically inferred from model_id

# Retrieve the URI of the pre-trained model
pre_trained_model_uri = model_uris.retrieve(
model_id=model_id, model_version=model_version, model_scope=”inference”

pre_trained_model = Model(

# Deploy the pre-trained model. Note that we need to pass Predictor class when we deploy model
# through Model class, for being able to run inference through the SageMaker API
pre_trained_predictor = pre_trained_model.deploy(

Repeat the preceding step to create a SageMaker model instance of the fine-tuned model and create an endpoint to deploy the model.
Evaluate the models
First, set the length of summarized text, number of model outputs (should be greater than 1 if multiple summaries need to be generated), and number of beams for beam search.
Construct the inference request as a JSON payload and use it to query the endpoints for the pre-trained and fine-tuned models.
Compute the aggregated ROUGE scores (ROUGE1, ROUGE2, ROUGEL, ROUGELsum) as described earlier.
Compare the results
The following table depicts the evaluation results for the dev1 and dev2 datasets. The evaluation result on dev1 (2,000 findings from the MIMIC CXR Radiology Report) shows approximately 38 percentage points improvement in the aggregated average ROUGE1 and ROUGE2 scores compared to the pre-trained model. For dev2, an improvement of 31 percentage points and 25 percentage points is observed in ROUGE1 and ROUGE2 scores. Overall, fine-tuning led to an improvement of 38.2 percentage points and 31.3 percentage points in ROUGELsum scores for the dev1 and dev2 datasets, respectively.

Evaluation Dataset
Pre-trained Model
Fine-tuned model




The following box plots depict the distribution of ROUGE scores for the dev1 and dev2 datasets evaluated using the fine-tuned model.

(a): dev1
(b): dev2

The following table shows that ROUGE scores for the evaluation datasets have approximately the same median and mean and therefore are symmetrically distributed.

Std Deviation
25% percentile
50% percentile
75% percentile









Clean up
To avoid incurring future charges, delete the resources you created with the following code:

# Delete resources

In this post, we demonstrated how to fine-tune a FLAN-T5 XL model for a clinical domain-specific summarization task using SageMaker Studio. To increase the confidence, we compared the predictions with ground truth and evaluated the results using ROUGE metrics. We demonstrated that a model fine-tuned for a specific task returns better results than a model pre-trained on a generic NLP task. We would like to point out that fine-tuning a general-purpose LLM eliminates the cost of pre-training altogether.
Although the work presented here focuses on chest X-ray reports, it has the potential to be expanded to bigger datasets with varied anatomies and modalities, such as MRI and CT, for which radiology reports might be more complex with multiple findings. In such cases, radiologists could generate impressions in order of criticality and include follow-up recommendations. Furthermore, setting up a feedback loop for this application would enable radiologists to improve the performance of the model over time.
As we showed in this post, the fine-tuned model generates impressions for radiology reports with high ROUGE scores. You can try to fine-tune LLMs on other domain-specific medical reports from different departments.

About the authors
Dr. Adewale Akinfaderin is a Senior Data Scientist in Healthcare and Life Sciences at AWS. His expertise is in reproducible and end-to-end AI/ML methods, practical implementations, and helping global healthcare customers formulate and develop scalable solutions to interdisciplinary problems. He has two graduate degrees in Physics and a Doctorate degree in Engineering.
Priya Padate is a Senior Partner Solutions Architect with extensive expertise in Healthcare and Life Sciences at AWS. Priya drives go-to-market strategies with partners and drives solution development to accelerate AI/ML-based development. She is passionate about using technology to transform the healthcare industry to drive better patient care outcomes.
Ekta Walia Bhullar, PhD, is a senior AI/ML consultant with AWS Healthcare and Life Sciences (HCLS) professional services business unit. She has extensive experience in the application of AI/ML within the healthcare domain, especially in radiology. Outside of work, when not discussing AI in radiology, she likes to run and hike.

Alibaba Researchers Introduce the Qwen-VL Series: A Set of Large-Scale …

Large Language Models (LLMs) have lately drawn a lot of interest because of their powerful text creation and comprehension abilities. These models have significant interactive capabilities and the potential to increase productivity as intelligent assistants by further aligning instructions with user intent. Native big language models, on the other hand, are limited to the realm of pure text and cannot handle other widely used modalities, such as pictures, audio, and videos, which severely restricts the range of applications for the models. A series of big Vision Language Models (LVLMs) have been created to improve big language models with the capacity to recognize and comprehend visual information to overcome this constraint. 

These expansive vision-language models show considerable promise for resolving practical vision-central issues. Researchers from Alibaba group introduce the newest member of the open-sourced Qwen series, the Qwen-VL series models, to promote the growth of the multimodal open-source community. Large-scale visual-language models from the Qwen-VL family come in two flavors: Qwen-VL and Qwen-VL-Chat. The pre-trained model Qwen-VL connects a visual encoder to the Qwen-7B language model to provide visual capabilities. Qwen-VL can sense and comprehend visual information on multi-level scales after completing the three stages of training. Additionally, Qwen-VL-Chat is an interactive visual language model based on Qwen-VL that uses alignment methods and offers more flexible interaction, such as multiple picture inputs, multi-round discussion, and localization capability. This is seen in Fig. 1. 

Figure 1: Some qualitative samples produced by the Qwen-VL-Chat are shown in Figure 1. Multiple picture inputs, round-robin conversations, multilingual conversations, and localization capabilities are all supported by Qwen-VL-Chat.

The characteristics of the 

• Strong performance: It greatly outperforms current open-sourced Large Vision Language Models (LVLM) on several assessment benchmarks, including Zero-shot Captioning, VQA, DocVQA, and Grounding, at the same model level. 

• Multilingual LVLM promoting end-to-end recognition and anchoring of Chinese and English bilingual text and instance in images: Qwen-VL naturally enables English, Chinese, and multilingual dialogue. 

• Multi-image interleaved conversations: This feature enables comparing several pictures, specifying questions about the images, and participating in multi-image storytelling. 

• Accurate recognition and comprehension: The 448×448 resolution encourages fine-grained text recognition, document quality assurance, and bounding box identification compared to the 224×224 resolution currently employed by competing open-source LVLM.

Check out the Paper and Github. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 29k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

The post Alibaba Researchers Introduce the Qwen-VL Series: A Set of Large-Scale Vision-Language Models Designed to Perceive and Understand Both Text and Images appeared first on MarkTechPost.

This AI Paper from GSAi China Presents a Comprehensive Study of LLM-ba …

Autonomous agents represent self-operating systems that exhibit varying degrees of independence. Recent research highlights the remarkable capacity of LLMs to imitate human intelligence, a feat achieved through the combination of extensive training datasets and a substantial array of model parameters. This research article provides a comprehensive study of the architectural aspects, construction techniques, evaluation methods, and challenges associated with autonomous agents utilizing LLMs.

LLMs have been utilized as core orchestrators in the creation of autonomous agents, aiming to replicate human decision-making processes and enhance artificial intelligence systems. The above image constitutes an illustration of the growth trend in the field of LLM-based autonomous agents. It is interesting to note how the X-axis switches from years to months after the third point. Essentially, these LLM-based agents are evolving from passive language systems into active, goal-oriented agents with reasoning capabilities.

LLM-based Autonomous Agent Construction

In order to demonstrate human-like capabilities effectively, there exist two significant aspects to note:

Architectural Design: Selecting the most suitable architecture is important for harnessing the capabilities of LLMs optimally. Existing research has been systematically synthesized, leading to the development of a comprehensive and unified framework.

Learning Parameter Optimization: To enhance the architecture’s performance, three widely employed strategies have emerged:

Learning from Examples: This approach involves fine-tuning the model using carefully curated datasets.

Learning from Environment Feedback: Real-time interactions and observations are leveraged to improve the model’s abilities.

Learning from Human Feedback: Human expertise and intervention are capitalized upon to refine the model’s responses.

LLM-based Autonomous Agent Application

The application of LLM-based autonomous agents across various fields signifies a fundamental shift in how we address problem-solving, decision-making, and innovation. These agents possess language comprehension, reasoning, and adaptability, leading to a profound impact by providing unmatched insights, support, and solutions. This section largely delves into the transformative effects of LLM-based autonomous agents in three distinct domains: social science, natural science, and engineering.

LLM-based Autonomous Agent Evaluation

To assess the effectiveness of the LLM-based autonomous agents, two evaluation strategies have been introduced: subjective and objective evaluation.

Subjective Evaluation: Some potential properties, like agent’s intelligence and user-friendliness, cannot be measured by quantitative metrics as well. Therefore, subjective evaluation is indispensable for current research.

Objective Evaluation: Utilising objective evaluation presents numerous advantages in comparison to human assessments. Quantitative metrics facilitate straightforward comparisons among various approaches and the monitoring of advancements over time. The feasibility of conducting extensive automated testing enables the evaluation of numerous tasks instead of just a few.

Finally, although previous work has shown many promising directions, this field is still at its initial stage, and many challenges exist on its development road, including role-playing capability, Generalised Human Alignment, Prompt Robustness etc. In conclusion, this survey provides us with a detailed study of everything that is in the know about LLMs-based Autonomous agents and provides us with a systematic summary of the same. 

Check out the Pre-Print Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 29k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

The post This AI Paper from GSAi China Presents a Comprehensive Study of LLM-based Autonomous Agents appeared first on MarkTechPost.

Researchers From Google and Georgia Tech Introduce DiffSeg: A Straight …

The objective of the computer vision task known as semantic segmentation is to assign a class or object to each pixel in an image. A dense pixel-by-pixel segmentation map of a picture, with each pixel corresponding to a particular type or object, is what is intended. Many subsequent processes rely on it as a precursor, including image manipulation, medical imaging, autonomous driving, etc. Zero-shot segmentation for images with unknown categories is far more difficult than supervised semantic segmentation, where a target dataset is given, and the categories are known.

A remarkable zero-shot transfer to any images is achieved by training a neural network with 1.1B segmentation annotations, as demonstrated in the recent popular work SAM. This is a significant step in ensuring that segmentation may be used as a building block for various tasks rather than being constrained to a specific dataset with predefined labels. However, it’s expensive to collect labels for every pixel. For this reason, exploring unsupervised and zero-shot segmentation techniques in the least constrained situations (i.e., no annotations and no prior knowledge of the target) is of significant interest in research and production.

Researchers from Google and Georgia Tech propose harnessing the strength of a stable diffusion (SD) model to build a universal segmentation model. Recently, stable diffusion models have generated high-resolution images with optimal prompting. In a diffusion model, it is plausible to assume the presence of data about object clusters.

Since the self-attention layers in a diffusion model produce attention tensors, the team introduced DiffSeg, a straightforward yet effective post-processing method for creating segmentation masks. The algorithm’s three primary parts are attention aggregation, attention merging on an iterative basis, and non-maximal suppression. DiffSeg uses an iterative merging technique that begins with sampling a grid of anchor points to aggregate the 4D attention tensors in a spatially consistent manner, thus preserving visual information across several resolutions. Sampled anchors serve as jumping-off points for attention masks that merge similar objects. KL divergence determines the degree of similarity between two attention maps, which controls the merging process.

DiffSeg is a popular alternative to common clustering-based unsupervised segmentation algorithms because it is deterministic and does not require an input of the number of clusters. DiffSeg can take an image as input and generate a high-quality segmentation without any prior knowledge or specialized equipment (as SAM does).

Despite using fewer auxiliary data than previous efforts, DiffSeg achieves better results on both datasets. The researchers evaluate DiffSeg on two widely-used datasets: COCO-Stuff-27 for unsupervised segmentation and Cityscapes, a dedicated self-driving dataset. Compared to a previous unsupervised zero-shot SOTA method, the proposed method improves upon it by an absolute 26% pixel accuracy and 17% in mean IoU on COCO-Stuff-27.

Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 29k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

The post Researchers From Google and Georgia Tech Introduce DiffSeg: A Straightforward Post-Processing AI Method for Creating Segmentation Masks appeared first on MarkTechPost.

MLOps for batch inference with model monitoring and retraining using A …

Maintaining machine learning (ML) workflows in production is a challenging task because it requires creating continuous integration and continuous delivery (CI/CD) pipelines for ML code and models, model versioning, monitoring for data and concept drift, model retraining, and a manual approval process to ensure new versions of the model satisfy both performance and compliance requirements.
In this post, we describe how to create an MLOps workflow for batch inference that automates job scheduling, model monitoring, retraining, and registration, as well as error handling and notification by using Amazon SageMaker, Amazon EventBridge, AWS Lambda, Amazon Simple Notification Service (Amazon SNS), HashiCorp Terraform, and GitLab CI/CD. The presented MLOps workflow provides a reusable template for managing the ML lifecycle through automation, monitoring, auditability, and scalability, thereby reducing the complexities and costs of maintaining batch inference workloads in production.
Solution overview
The following figure illustrates the proposed target MLOps architecture for enterprise batch inference for organizations who use GitLab CI/CD and Terraform infrastructure as code (IaC) in conjunction with AWS tools and services. GitLab CI/CD serves as the macro-orchestrator, orchestrating model build and model deploy pipelines, which include sourcing, building, and provisioning Amazon SageMaker Pipelines and supporting resources using the SageMaker Python SDK and Terraform. SageMaker Python SDK is used to create or update SageMaker pipelines for training, training with hyperparameter optimization (HPO), and batch inference. Terraform is used to create additional resources such as EventBridge rules, Lambda functions, and SNS topics for monitoring SageMaker pipelines and sending notifications (for example, when a pipeline step fails or succeeds). SageMaker Pipelines serves as the orchestrator for ML model training and inference workflows.
This architecture design represents a multi-account strategy where ML models are built, trained, and registered in a central model registry within a data science development account (which has more controls than a typical application development account). Then, inference pipelines are deployed to staging and production accounts using automation from DevOps tools such as GitLab CI/CD. The central model registry could optionally be placed in a shared services account as well. Refer to Operating model for best practices regarding a multi-account strategy for ML.

In the following subsections, we discuss different aspects of the architecture design in detail.
Infrastructure as code
IaC offers a way to manage IT infrastructure through machine-readable files, ensuring efficient version control. In this post and the accompanying code sample, we demonstrate how to use HashiCorp Terraform with GitLab CI/CD to manage AWS resources effectively. This approach underscores the key benefit of IaC, offering a transparent and repeatable process in IT infrastructure management.
Model training and retraining
In this design, the SageMaker training pipeline runs on a schedule (via EventBridge) or based on an Amazon Simple Storage Service (Amazon S3) event trigger (for example, when a trigger file or new training data, in case of a single training data object, is placed in Amazon S3) to regularly recalibrate the model with new data. This pipeline does not introduce structural or material changes to the model because it uses fixed hyperparameters that have been approved during the enterprise model review process.
The training pipeline registers the newly trained model version in the Amazon SageMaker Model Registry if the model exceeds a predefined model performance threshold (for example, RMSE for regression and F1 score for classification). When a new version of the model is registered in the model registry, it triggers a notification to the responsible data scientist via Amazon SNS. The data scientist then needs to review and manually approve the latest version of the model in the Amazon SageMaker Studio UI or via an API call using the AWS Command Line Interface (AWS CLI) or AWS SDK for Python (Boto3) before the new version of model can be utilized for inference.
The SageMaker training pipeline and its supporting resources are created by the GitLab model build pipeline, either via a manual run of the GitLab pipeline or automatically when code is merged into the main branch of the model build Git repository.
Batch inference
The SageMaker batch inference pipeline runs on a schedule (via EventBridge) or based on an S3 event trigger as well. The batch inference pipeline automatically pulls the latest approved version of the model from the model registry and uses it for inference. The batch inference pipeline includes steps for checking data quality against a baseline created by the training pipeline, as well as model quality (model performance) if ground truth labels are available.
If the batch inference pipeline discovers data quality issues, it will notify the responsible data scientist via Amazon SNS. If it discovers model quality issues (for example, RMSE is greater than a pre-specified threshold), the pipeline step for the model quality check will fail, which will in turn trigger an EventBridge event to start the training with HPO pipeline.
The SageMaker batch inference pipeline and its supporting resources are created by the GitLab model deploy pipeline, either via a manual run of the GitLab pipeline or automatically when code is merged into the main branch of the model deploy Git repository.
Model tuning and retuning
The SageMaker training with HPO pipeline is triggered when the model quality check step of the batch inference pipeline fails. The model quality check is performed by comparing model predictions with the actual ground truth labels. If the model quality metric (for example, RMSE for regression and F1 score for classification) doesn’t meet a pre-specified criterion, the model quality check step is marked as failed. The SageMaker training with HPO pipeline can also be triggered manually (in the SageMaker Studio UI or via an API call using the AWS CLI or SageMaker Python SDK) by the responsible data scientist if needed. Because the model hyperparameters are changing, the responsible data scientist needs to obtain approval from the enterprise model review board before the new model version can be approved in the model registry.
The SageMaker training with HPO pipeline and its supporting resources are created by the GitLab model build pipeline, either via a manual run of the GitLab pipeline or automatically when code is merged into the main branch of the model build Git repository.
Model monitoring
Data statistics and constraints baselines are generated as part of the training and training with HPO pipelines. They are saved to Amazon S3 and also registered with the trained model in the model registry if the model passes evaluation. The proposed architecture for the batch inference pipeline uses Amazon SageMaker Model Monitor for data quality checks, while using custom Amazon SageMaker Processing steps for model quality check. This design decouples data and model quality checks, which in turn allows you to only send a warning notification when data drift is detected; and trigger the training with HPO pipeline when a model quality violation is detected.
Model approval
After a newly trained model is registered in the model registry, the responsible data scientist receives a notification. If the model has been trained by the training pipeline (recalibration with new training data while hyperparameters are fixed), there is no need for approval from the enterprise model review board. The data scientist can review and approve the new version of the model independently. On the other hand, if the model has been trained by the training with HPO pipeline (retuning by changing hyperparameters), the new model version needs to go through the enterprise review process before it can be used for inference in production. When the review process is complete, the data scientist can proceed and approve the new version of the model in the model registry. Changing the status of the model package to Approved will trigger a Lambda function via EventBridge, which will in turn trigger the GitLab model deploy pipeline via an API call. This will automatically update the SageMaker batch inference pipeline to utilize the latest approved version of the model for inference.
There are two main ways to approve or reject a new model version in the model registry: using the AWS SDK for Python (Boto3) or from the SageMaker Studio UI. By default, both the training pipeline and training with HPO pipeline set ModelApprovalStatus to PendingManualApproval. The responsible data scientist can update the approval status for the model by calling the update_model_package API from Boto3. Refer to Update the Approval Status of a Model for details about updating the approval status of a model via the SageMaker Studio UI.
Data I/O design
SageMaker interacts directly with Amazon S3 for reading inputs and storing outputs of individual steps in the training and inference pipelines. The following diagram illustrates how different Python scripts, raw and processed training data, raw and processed inference data, inference results and ground truth labels (if available for model quality monitoring), model artifacts, training and inference evaluation metrics (model quality monitoring), as well as data quality baselines and violation reports (for data quality monitoring) can be organized within an S3 bucket. The direction of arrows in the diagram indicates which files are inputs or outputs from their respective steps in the SageMaker pipelines. Arrows have been color-coded based on pipeline step type to make them easier to read. The pipeline will automatically upload Python scripts from the GitLab repository and store output files or model artifacts from each step in the appropriate S3 path.

The data engineer is responsible for the following:

Uploading labeled training data to the appropriate path in Amazon S3. This includes adding new training data regularly to ensure the training pipeline and training with HPO pipeline have access to recent training data for model retraining and retuning, respectively.
Uploading input data for inference to the appropriate path in S3 bucket before a planned run of the inference pipeline.
Uploading ground truth labels to the appropriate S3 path for model quality monitoring.

The data scientist is responsible for the following:

Preparing ground truth labels and providing them to the data engineering team for uploading to Amazon S3.
Taking the model versions trained by the training with HPO pipeline through the enterprise review process and obtaining necessary approvals.
Manually approving or rejecting newly trained model versions in the model registry.
Approving the production gate for the inference pipeline and supporting resources to be promoted to production.

Sample code
In this section, we present a sample code for batch inference operations with a single-account setup as shown in the following architecture diagram. The sample code can be found in the GitHub repository, and can serve as a starting point for batch inference with model monitoring and automatic retraining using quality gates often required for enterprises. The sample code differs from the target architecture in the following ways:

It uses a single AWS account for building and deploying the ML model and supporting resources. Refer to Organizing Your AWS Environment Using Multiple Accounts for guidance on multi-account setup on AWS.
It uses a single GitLab CI/CD pipeline for building and deploying the ML model and supporting resources.
When a new version of the model is trained and approved, the GitLab CI/CD pipeline is not triggered automatically and needs to be run manually by the responsible data scientist to update the SageMaker batch inference pipeline with the latest approved version of the model.
It only supports S3 event-based triggers for running the SageMaker training and inference pipelines.

You should have the following prerequisites before deploying this solution:

An AWS account
SageMaker Studio
A SageMaker execution role with Amazon S3 read/write and AWS Key Management Service (AWS KMS) encrypt/decrypt permissions
An S3 bucket for storing data, scripts, and model artifacts
Terraform version 0.13.5 or greater
GitLab with a working Docker runner for running the pipelines
Python3 (Python 3.7 or greater) and the following Python packages:


Repository structure
The GitHub repository contains the following directories and files:

/code/lambda_function/ – This directory contains the Python file for a Lambda function that prepares and sends notification messages (via Amazon SNS) about the SageMaker pipelines’ step state changes
/data/ – This directory includes the raw data files (training, inference, and ground truth data)
/env_files/ – This directory contains the Terraform input variables file
/pipeline_scripts/ – This directory contains three Python scripts for creating and updating training, inference, and training with HPO SageMaker pipelines, as well as configuration files for specifying each pipeline’s parameters
/scripts/ – This directory contains additional Python scripts (such as preprocessing and evaluation) that are referenced by the training, inference, and training with HPO pipelines
.gitlab-ci.yml – This file specifies the GitLab CI/CD pipeline configuration
/ – This file defines EventBridge resources
/ – This file defines the Lambda notification function and the associated AWS Identity and Access Management (IAM) resources
/ – This file defines Terraform data sources and local variables
/ – This file defines Amazon SNS resources
/tags.json – This JSON file allows you to declare custom tag key-value pairs and append them to your Terraform resources using a local variable
/ – This file declares all the Terraform variables

Variables and configuration
The following table shows the variables that are used to parameterize this solution. Refer to the ./env_files/dev_env.tfvars file for more details.


S3 bucket that is used to store data, scripts, and model artifacts

S3 prefix for the ML project

S3 prefix for training data

S3 prefix for inference data

Name of the Lambda function that prepares and sends notification messages about SageMaker pipelines’ step state changes

The configuration for customizing notification message for specific SageMaker pipeline steps when a specific pipeline run status is detected

The email address list for receiving SageMaker pipelines’ step state change notifications

Name of the SageMaker inference pipeline

Name of the SageMaker training pipeline

Name of SageMaker training with HPO pipeline

If set to true, the three existing SageMaker pipelines (training, inference, training with HPO) will be deleted and new ones will be created when GitLab CI/CD is run

Name of the model package group

Maximum value of MSE before requiring an update to the model

IAM role ARN of the SageMaker pipeline execution role

KMS key ARN for Amazon S3 and SageMaker encryption

Subnet ID for SageMaker networking configuration

Security group ID for SageMaker networking configuration

If set to true, training data will be uploaded to Amazon S3, and this upload operation will trigger the run of the training pipeline

If set to true, inference data will be uploaded to Amazon S3, and this upload operation will trigger the run of the inference pipeline

The employee ID of the SageMaker user that is added as a tag to SageMaker resources

Deploy the solution
Complete the following steps to deploy the solution in your AWS account:

Clone the GitHub repository into your working directory.
Review and modify the GitLab CI/CD pipeline configuration to suit your environment. The configuration is specified in the ./gitlab-ci.yml file.
Refer to the README file to update the general solution variables in the ./env_files/dev_env.tfvars file. This file contains variables for both Python scripts and Terraform automation.

Check the additional SageMaker Pipelines parameters that are defined in the YAML files under ./batch_scoring_pipeline/pipeline_scripts/. Review and update the parameters if necessary.

Review the SageMaker pipeline creation scripts in ./pipeline_scripts/ as well as the scripts that are referenced by them in the ./scripts/ folder. The example scripts provided in the GitHub repo are based on the Abalone dataset. If you are going to use a different dataset, ensure you update the scripts to suit your particular problem.
Put your data files into the ./data/ folder using the following naming convention. If you are using the Abalone dataset along with the provided example scripts, ensure the data files are headerless, the training data includes both independent and target variables with the original order of columns preserved, the inference data only includes independent variables, and the ground truth file only includes the target variable.


Commit and push the code to the repository to trigger the GitLab CI/CD pipeline run (first run). Note that the first pipeline run will fail on the pipeline stage because there’s no approved model version yet for the inference pipeline script to use. Review the step log and verify a new SageMaker pipeline named TrainingPipeline has been successfully created.

Open the SageMaker Studio UI, then review and run the training pipeline.
After the successful run of the training pipeline, approve the registered model version in the model registry, then rerun the entire GitLab CI/CD pipeline.

Review the Terraform plan output in the build stage. Approve the manual apply stage in the GitLab CI/CD pipeline to resume the pipeline run and authorize Terraform to create the monitoring and notification resources in your AWS account.
Finally, review the SageMaker pipelines’ run status and output in the SageMaker Studio UI and check your email for notification messages, as shown in the following screenshot. The default message body is in JSON format.

SageMaker pipelines
In this section, we describe the three SageMaker pipelines within the MLOps workflow.
Training pipeline
The training pipeline is composed of the following steps:

Preprocessing step, including feature transformation and encoding
Data quality check step for generating data statistics and constraints baseline using the training data
Training step
Training evaluation step
Condition step to check whether the trained model meets a pre-specified performance threshold
Model registration step to register the newly trained model in the model registry if the trained model meets the required performance threshold

Both the skip_check_data_quality and register_new_baseline_data_quality parameters are set to True in the training pipeline. These parameters instruct the pipeline to skip the data quality check and just create and register new data statistics or constraints baselines using the training data. The following figure depicts a successful run of the training pipeline.

Batch inference pipeline
The batch inference pipeline is composed of the following steps:

Creating a model from the latest approved model version in the model registry
Preprocessing step, including feature transformation and encoding
Batch inference step
Data quality check preprocessing step, which creates a new CSV file containing both input data and model predictions to be used for the data quality check
Data quality check step, which checks the input data against baseline statistics and constraints associated with the registered model
Condition step to check whether ground truth data is available. If ground truth data is available, the model quality check step will be performed
Model quality calculation step, which calculates model performance based on ground truth labels

Both the skip_check_data_quality and register_new_baseline_data_quality parameters are set to False in the inference pipeline. These parameters instruct the pipeline to perform a data quality check using the data statistics or constraints baseline associated with the registered model (supplied_baseline_statistics_data_quality and supplied_baseline_constraints_data_quality) and skip creating or registering new data statistics and constraints baselines during inference. The following figure illustrates a run of the batch inference pipeline where the data quality check step has failed due to poor performance of the model on the inference data. In this particular case, the training with HPO pipeline will be triggered automatically to fine-tune the model.

Training with HPO pipeline
The training with HPO pipeline is composed of the following steps:

Preprocessing step (feature transformation and encoding)
Data quality check step for generating data statistics and constraints baseline using the training data
Hyperparameter tuning step
Training evaluation step
Condition step to check whether the trained model meets a pre-specified accuracy threshold
Model registration step if the best trained model meets the required accuracy threshold

Both the skip_check_data_quality and register_new_baseline_data_quality parameters are set to True in the training with HPO pipeline. The following figure depicts a successful run of the training with HPO pipeline.

Clean up
Complete the following steps to clean up your resources:

Employ the destroy stage in the GitLab CI/CD pipeline to eliminate all resources provisioned by Terraform.
Use the AWS CLI to list and remove any remaining pipelines that are created by the Python scripts.
Optionally, delete other AWS resources such as the S3 bucket or IAM role created outside the CI/CD pipeline.

In this post, we demonstrated how enterprises can create MLOps workflows for their batch inference jobs using Amazon SageMaker, Amazon EventBridge, AWS Lambda, Amazon SNS, HashiCorp Terraform, and GitLab CI/CD. The presented workflow automates data and model monitoring, model retraining, as well as batch job runs, code versioning, and infrastructure provisioning. This can lead to significant reductions in complexities and costs of maintaining batch inference jobs in production. For more information about implementation details, review the GitHub repo.

About the Authors
Hasan Shojaei is a Sr. Data Scientist with AWS Professional Services, where he helps customers across different industries such as sports, insurance, and financial services solve their business challenges through the use of big data, machine learning, and cloud technologies. Prior to this role, Hasan led multiple initiatives to develop novel physics-based and data-driven modeling techniques for top energy companies. Outside of work, Hasan is passionate about books, hiking, photography, and history.
Wenxin Liu is a Sr. Cloud Infrastructure Architect. Wenxin advises enterprise companies on how to accelerate cloud adoption and supports their innovations on the cloud. He’s a pet lover and is passionate about snowboarding and traveling.
Vivek Lakshmanan is a Machine Learning Engineer at Amazon. He has a Master’s degree in Software Engineering with specialization in Data Science and several years of experience as an MLE. Vivek is excited on applying cutting-edge technologies and building AI/ML solutions to customers on cloud. He is passionate about Statistics, NLP and Model Explainability in AI/ML. In his spare time, he enjoys playing cricket and taking road trips.
Andy Cracchiolo is a Cloud Infrastructure Architect. With more than 15 years in IT infrastructure, Andy is an accomplished and results-driven IT professional. In addition to optimizing IT infrastructure, operations, and automation, Andy has a proven track record of analyzing IT operations, identifying inconsistencies, and implementing process enhancements that increase efficiency, reduce costs, and increase profits.

Revolutionizing Email Productivity: How SaneBox’s AI Transforms Your …

It seems like every time that someone writes about productivity, they start off by painting a bleak picture. “In today’s digital era, where nobody can do anything…” Or “you probably feel hopeless when it comes to getting things done.” But the fact is, most of us only need a little help. We do pretty OK, but life would be a lot easier with fewer distractions. Enter AI, and its amazing ability to boost productivity.

Yes, there’s a lot of information flowing our way these days. Between notifications, to-do lists, and inboxes that are always seeking our attention, it can make our daily productivity suffer. But the chances are that you’re not drowning in tasks, or unable to prioritize. You’re also not perfect. That’s where the AI comes in.

As challenges intensify, AI has proven itself to be a groundbreaking tool that is revolutionizing the way that we work. AI learns from data, making intelligent decisions that can help automate mundane tasks. In the world of personal productivity, AI can sift through clutter, streamline processes, and guide us in making the most of our time. One of the platforms leading the AI email revolution is SaneBox, which leverages AI to transform your email experience.

The Productivity Challenge

It would be nice if “working hard” also meant “being productive.” But that’s not always the case. In fact, productivity is usually better when “working smart” replaces working hard. There are tons of obstacles that want to prevent us from being as productive as possible. They pose challenges in both our professional and personal lives.

Why Are We Less Productive?

Information Overload: Lots of access to information also means that we have a constant influx of data. Articles, notifications, messages, and updates are always headed our way. Filtering through the sea of information to find what’s valuable ends up becoming a task in itself.

Distractions: Ping! There’s another social media notification. The allure of a trending YouTube video, or a friend’s Instagram post pulls us away far too often. These distractions fragment our concentration, making simple tasks stretch into hours.

Email Clutter: Email was supposed to make communication swift and straightforward. But the reality for most of us is that our inboxes are cluttered with hundreds (or thousands!) of unread messages. Promotional emails, updates, and work-related messages all end up in the same place. Sifting through them is a time-consuming chore.

The Cost of Decreased Productivity

Stress: As tasks pile up and deadlines loom, the mental toll starts getting real. We know the potential dangers of stress. Not only in our professional lives, but also on our bodies.

Missed Deadlines: Decreased productivity often translates to tasks taking longer than anticipated. Missed deadlines can screw up professional reputations, strain client relationships, and even lead to financial penalties.

Diminished Work Quality: When we’re rushing to catch up or are perpetually distracted, the quality of our work suffers. Errors become more frequent, and the standard of output may not reflect our true capabilities.

If we want to reclaim our time and boost our efficiency, it’s critical to understand and address these challenges. Thankfully, our AI-powered friends are here to offer some help.

Understanding SaneBox and Its AI

SaneBox is more than an email management tool—it’s a modern-day savior for anyone overwhelmed by the relentless influx of emails. Our worlds are cluttered. SaneBox gives us an elegant solution that sifts through your inbox, ensuring that you’re only seeing what matters most. It’s designed to declutter, prioritize, and streamline your email experience. The goal is to keep you out of your inbox so you can focus more time on what matters.

How Does the SaneBox AI Work?

At the heart of SaneBox‘s prowess is its cutting-edge Artificial Intelligence. But how does it work?

Learning from History: When you start using SaneBox, the AI studies your past interactions so that it’s not starting from zero. SaneBox learns what emails you engage with, which ones you ignore, and which ones get deleted.

Priority Sorting: After building its “brain”, SaneBox’s AI starts distinguishing between urgent and non-urgent messages. The important stuff goes front and center in your inbox, while less-critical messages go to a folder called SaneLater. You’ll get a daily digest, telling you what’s in your SaneLater folder, and you can clear it at your convenience.

Continuous Adaptation: The beauty of this AI system is that it’s dynamic. As your preferences and priorities change, the AI adapts. If you start interacting with a previously ignored newsletter, for instance, SaneBox will recognize this change in behavior and adjust how it handles those messages.

In essence, SaneBox‘s AI functions as a personal email assistant. It works tirelessly in the background to ensure your inbox is a space of productivity, not chaos.

How SaneBox Saves Your Productivity

Email has become both a blessing and a curse. It’s a primary mode of communication, but the sheer volume of emails many receive daily can be daunting. SaneBox offers several innovative solutions to tackle this prevalent issue of email overwhelm.

Filtering Out Unimportant Emails to the ‘SaneLater’ Folder: We talked a bit about SaneLater, but it’s worth a bigger mention. SaneBox‘s ability to distinguish between the critical and the unimportant is paramount to saving you time. The SaneLater folder is more than a collection of the unimportant stuff. It’s a full-featured bulk email handler. Of course, you can look at each message if you want, but the magic (and the productivity boost) comes from bulk actions.

Automatic Sorting and Categorization of Emails: SaneBox goes beyond priority sorting. It categorizes emails based on their content and your past interactions. Whether it’s newsletters, social media notifications, or work-related correspondence, SaneBox ensures they’re neatly organized. No more manual sorting!

SaneNoReplies: Emails that Haven’t Been Replied To: Ever sent an important email and found yourself constantly checking for a response? SaneBox‘s ‘SaneNoReplies’ tracks emails you’ve sent which haven’t received a response within a specified timeframe. This ensures you never miss out on following up or are left in the dark about pending communications.

In essence, SaneBox transforms the email experience, converting what’s traditionally been a source of stress and overwhelm into a streamlined, manageable tool that facilitates rather than hinders productivity.

The Power of AI-Based Decision Making

It’s obvious that we think a lot about AI. What’s interesting is that AI is thinking about us too. The AI behind SaneBox watches and learns your behaviors. It picks up on the emails that you open, ones you skip, and what you trash. So over time, it starts to know how you’ll behave and it can make decisions according to what you would do.

Your email experience becomes tailored to you. Important work email is always front and center. The deluge of newsletters? Pushed to SaneNews. It’s a bit like having a personal assistant who organizes your messages before you even see them.

More Features for More Productivity

We’ve focused a lot on how SaneBox enhances your productivity with sorting features. But we should also talk about the other things that it does.

Really need to keep things quiet? The Do Not Disturb feature lets you customize a schedule where you’ll be completely free of email notifications. You choose the times and days, as well as which emails you want to allow through your filter. You’re in complete control of what you see during DND time.

SaneReminders is a lifesaver. Got an email that you haven’t seen a reply to? With SaneReminders you’ll get a nudge to follow up. But going a step deeper, you can have timely messages come back by sending them to [TimeAndDay] For example, if you bought some concert tickets that you don’t need to see until three months from now, forward them to “” and you’ll get them back in your inbox when you need them.

Email Deep Clean does a job that none of us handle ourselves. It digs back through all of your messages. It will help you find large attachments, how many messages come from each sender, and more. From the Deep Clean feature, you can choose how you want those emails handled, allowing you to free up space and declutter your email archives.

One of the interesting facts about SaneBox is that the AI focuses on the big picture, as well as the small details. There are challenges at every step in digital communication. Having a second check is a great way to make sure that you’re avoiding productivity pitfalls.

Wrapping it up, we’re all trying to dodge a little digital chaos and make sense of the data that comes our way. Remember that you have tools at your disposal. AI, especially the brains behind SaneBox, can be the sidekick you never knew you needed. This isn’t about achieving email perfection, but making your digital life a whole lot smoother. By leveraging the power of SaneBox, you’re not just decluttering an inbox; you’re reclaiming time, minimizing stress, and getting back to what really counts. So, next time you feel that digital overwhelm is creeping in, remember there’s a smarter way to manage. Give SaneBox a whirl, and see the change for yourself. After all, who doesn’t like having a tech-savvy buddy sorting things out for them?

Also, don’t forget to join our 29k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

Thanks to Sanebox for the thought leadership/ Educational article. Sanebox has supported this Content.

The post Revolutionizing Email Productivity: How SaneBox’s AI Transforms Your Inbox Experience appeared first on MarkTechPost.

Meet Nous-Hermes-Llama2-70b: A State-of-the-Art Language Model Fine-Tu …

The Hugging Face Transformer is an immensely popular library in Python, which provides pre-trained models that are extraordinarily useful for a variety of Natural Language Processing tasks. It supported just PyTorch previously but, as of now, supports Tensorflow as well. Nous-Hermes-Llama2-70b is the NLP language model that uses over lakhs of instructions. This model uses the same dataset as the old Hermes model to ensure that there are no severe wide changes while training the model, and the process becomes even smoother. The model still has some deficits, like a lower hallucination rate and the absence of OpenAI censorship. 

The model training was done on the larger datasets, which were incredibly high in terms of data that was processed and the style they had. The data was found from different sources and merged into a single dataset, resulting in a diversity of knowledge in the processed dataset. The dataset collected data from different sources like Teknium, Karan4D, Emozilla, Huemin Art, and Pygmalion AI. The model is trained using the Alpaca model. The research team conducted a human evaluation on the inputs from the self-instruct evaluation dataset to evaluate Alpaca. The researchers collected this evaluation set and covered a diverse list of user-oriented instructions that covered almost everything.

Researchers also stated that the Prompt Engineers would also benefit from this model that had been executed. Researchers believe that releasing the above assets will enable the academic community to perform control scientific studies on instruction following language models and ultimately result in new techniques to address the existing deficiencies within this model. Deploying an interactive demo for Alpaca also poses potential risks, such as more widely disseminating harmful content and lowering the chances for spam. Spam Detection technique in NLP also plays an important role in this model. Researchers understand that these mitigation measures can be achieved once we release the model weights or if user train their instruction following the model.

The future plans of this project also include iterating high-quality data and applying techniques to remove the lower-quality data going forward. Researchers also need to evaluate Alpaca more rigorously. They will further start with the HELM model, which hopefully will capture more generative information. Researchers would also like to study the risks of Alpaca and would try to further improve its safety.

Check out the Project Page. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 29k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

The post Meet Nous-Hermes-Llama2-70b: A State-of-the-Art Language Model Fine-Tuned on Over 300,000 Instructions appeared first on MarkTechPost.

Top Low/No Code AI Tools (September 2023)

Applications that take advantage of machine learning in novel ways are being developed thanks to the rise of Low-Code and No-Code AI tools and platforms. AI can be used to create web services and customer-facing apps to coordinate sales and marketing efforts better. Minimal coding expertise is all that’s needed to make use of Low-Code and No-Code solutions.

Artificial intelligence technologies that require little to no coding reflect a long-sought objective in computer science. No-code is a software design system that implements software without writing a single line of code. At the same time, low-code is a software development technique that promotes faster app deliveries with little to no coding required, and low-code platforms are software tools that allow the visual development of apps using a GUI interface. This AI tool requires no coding and may be used with a simple drag-and-drop interface—code-free or low-code development environments for AI applications. 

Top low-code and no-code AI tools include the following:


Use MakeML to generate machine-learning models for object identification and segmentation without hand-coding. It simplifies the process of creating and efficiently managing a large dataset. In addition to preparing your ML models for action, you can also test them. MakeML is an online resource that can teach you all you need to know to build AI software and apply Computer Vision to an in-house problem in only a few hours. Video tutorials are also available on your mobile device to help you master Machine Learning. The skilled professionals at MakeML will assist you in developing a Computer Vision solution and incorporating it into your product. A single GPU cloud training and limited dataset import/export are provided at no cost.

Obviously AI

With Obviously AI’s Machine Learning platform, you can make accurate predictions in minutes and don’t even need to know how to code. This entails creating machine learning algorithms and forecasting their results with a single mouse click. Use the data dialog to modify your dataset without additional code, then distribute or showcase your ML models across your organization. The low-code API allows anyone to use the algorithms to make predictions and incorporate those forecasts into their real-world applications. Furthermore, Obviously, AI gives you access to state-of-the-art algorithms and technologies without compromising efficiency. It can be used for revenue forecasting, supply chain planning, and targeted advertising. Lead conversion, dynamic pricing, loan payback, and other outcomes can all be forecast in real-time. 


Create AI-Powered SuperData using SuperAnnotate. It’s an end-to-end system for AI-related tasks, including annotating, managing, and versioning “ground truth” data. With its extensive toolkit, top-tier annotation services, and solid data management system, your AI pipeline can be scaled and automated three to five times faster. High-throughput data annotation of video, text, and image to create high-quality datasets using industry-leading services and software. Project management tools and teamwork can help your model succeed in the field. Set up a streamlined annotation workflow, keep tabs on project quality, share updates with the team, and more—all with SuperAnnotate. It can speed up your annotation process because of its active learning and automation features. 

Teachable Machine

Teachable Machine allows you to teach a computer to recognize and respond to your voice, gestures, and photos. Without the need to write any code, it facilitates the rapid creation of robust ML models for integration into applications, websites, and more. Teachable Machine is a web-based low-code machine learning platform that enables the development of widely usable machine learning models. You’ll need to collect and organize examples into relevant classes to teach a computer something new. You may put your computer through its paces as a learning machine and then immediately put it to the test. You can use the model in your online projects. You can also host the model online or distribute it as a downloadable file. And the best part is the model works completely locally on your device, so none of your audio or video has to leave the system at any point. Classifying photos and body orientations is a breeze with the help of files, a camera, and short audio samples. 

Apple’s Create ML

Discover an innovative approach to teaching and training ML models on your Mac. It facilitates efficient ML model creation and Mac training using Apple’s Create ML. In a single project, you can train numerous models simultaneously, each with a unique dataset. It contains an external graphics processing unit to improve the speed of your models on your Mac. Take charge of your workout with options like pausing and resuming playback. The evaluation set will tell you how well your model performed. Examine pivotal KPIs and interconnections to spot a wide range of model-enhancing use cases, prospects, and investments in the future. Try out the model’s performance with a continuous preview using the camera on your iPhone. Train models more quickly on your Mac by using the hardware accelerators. Models can be of various kinds in Create ML. Model types include images, movies, music, speeches, texts, tables, etc. Afterward, you can train your computer with new information and settings.


You may automate your machine-learning workflows in Python with the help of PyCaret, a low-code machine-learning platform. With this basic, straightforward machine learning library, you may devote more effort to analysis, such as data pretreatment, model training, model explainability, MLOps, and exploratory data analysis, and less to writing code. PyCaret is built modularly so that different models can perform various machine-learning operations. Here, functions are the collections of processes that carry out jobs according to some procedure. Using PyCaret, virtually anyone can create complete, low-code machine-learning solutions. A Quick Start Guide, Blog, Videos, and Online Forums are all available for study. Create a basic ML app, train your model rapidly, and then instantly deploy it as a REST API after analyzing and refining it.


Use Lobe to teach your apps to recognize plants, read gestures, track reps, experience emotions, detect colors, and verify safety. It facilitates the training of ML models, provides accessible and free tools, and supplies everything required to develop such models. Provide examples of the behavior you would like to be learned by your application, and a machine-learning model will be trained automatically and ready to be released as soon as possible. This platform requires no coding experience and may be used by anyone. You can save money and time by skipping online storage and instead training locally on your PC. Lobe may be downloaded on both PCs and Macs. Furthermore, your model is cross-platform and ready for export or distribution. Your project’s ideal machine-learning architecture will be chosen automatically. 


MonkeyLearn provides state-of-the-art Artificial Intelligence tools that will make cleaning, visualizing, and labeling client feedback a breeze. It is a data visualization and no-code text analysis studio that comprehensively analyzes your data. MonkeyLearn allows you to quickly and easily generate unique data visualizations and charts, allowing for more in-depth data exploration. You may also merge and filter these findings based on data inputs like date ranges and custom fields. In addition to using pre-made machine learning models, you can create your own with MonkeyLearn. Additionally, various pre-trained classifiers are available for use—emotion analysis, topic classifiers, entity extractors, and so on- and may all be constructed rapidly. 


Akkio is a platform for artificial intelligence that doesn’t require users to write any code to build prediction models. It facilitates the easy creation of predictive models from user data for improved in-the-moment decision-making. Key business results, such as enhanced lead scoring, forecasting, text classification, and reduced churn, can be predicted with the help of Akkio’s use of existing data. It can also do advanced tasks for cleaning data, like merging columns, reshaping dates, and filtering out anomalies. Because of its intuitive interface, Akkio may be utilized by non-technical business users without the requirement for coding or machine learning knowledge. It may reduce time and increase output in various settings, from marketing and sales to finance and customer support.

Amazon SageMaker

Machine learning (ML) models can be created, trained, and deployed with the help of Amazon SageMaker, a cloud-based ML platform that offers a full suite of ML-related tools and services. SageMaker’s no-code and low-code tools streamline the machine learning (ML) model development and deployment processes for non-technical users and business analysts. Amazon SageMaker Canvas is a visual tool that facilitates ML model development and deployment without writing code. SageMaker Canvas’s intuitive drag-and-drop interface streamlines the processes of data selection, algorithm selection, and model training. SageMaker Canvas may then make predictions and put the trained model into production.

Data Robot

Data Robot is an artificial intelligence platform that streamlines the entire lifecycle of machine learning model development, deployment, and management. It’s a robust resource that serves many users, from data scientists and engineers to businesspeople. Data Robot’s flexible features make it a solid pick for those with little programming experience. Data Robot offers a visual, drag-and-drop interface for non-technical people to create and deploy machine learning models. This paves the way for business users with rudimentary technical skills to experiment with AI. Data Robot’s adaptable interface makes machine learning customization easier for non-programmers. Integration with external systems and the capability to create one’s programs fall under this category.

Google AutoML

With Google’s AutoML, programmers and data scientists can create and release machine learning models without using hand-coded solutions. If you have little experience with machine learning, you can still use this platform to construct models because it requires little to no coding. Google AutoML provides a library of pre-trained models that may be used in various scenarios. These models are accurate because they are trained on large datasets. With Google AutoML, creating and deploying models is as straightforward as dragging and dropping components. It may be used without having to learn how to code. Google AutoML takes care of tuning your models’ hyperparameters automatically. Time and energy are both conserved by this method. You may check how well your models are doing with the help of Google’s AutoML tools. This aids in making sure your models are trustworthy and correct.


NanoNets is a machine learning API that allows developers to train a model with only a tenth of the data and no prior experience with machine learning. Upload your data, wait a few minutes, and you will have a model that can be queried via their simple cloud API. Extracting structured or semi-structured data from documents is made faster and more efficient by this AI platform. The OCR technology powered by artificial intelligence can read documents of any size or complexity. The document processing workflow can be streamlined using Nanonets’ AP Automation, Touchless Invoice Processing, Email Parsing, and ERP Integrations, among other services. In addition to PDF to Excel, CSV, JSON, XML, and Text conversion, Nanonets comes with various free OCR converters.

IBM Watson Studio

IBM Watson Studio is a service that provides a central hub from which anybody can create, release, and manage AI models in the cloud. It offers features and tools that make AI development accessible to people with little coding skills. Watson Studio’s no- or low-code features are a major selling point. It’s now possible to construct AI models without resorting to custom coding. Instead, you can utilize Watson Studio’s visual tools to assemble your project by dragging and dropping individual components into place. This paves the way for non-technical people, including business users, analysts, and researchers, to construct AI models. You can get up and running quickly with Watson Studio and its many pre-trained models. Uses for these models range from spotting fraudulent activity and client segmentation to predicting the need for repairs. After finishing an AI model in Watson Studio, you can send it into production. Watson Studio allows for both cloud-based and on-premises deployments and hybrid implementations that combine the two.

H2O Driverless AI

H2O Driverless AI is an AutoML platform streamlining the machine learning lifecycle, from preprocessing data to releasing models. This is a priceless tool for data scientists and business users since it allows them to build and deploy machine learning models without writing code. H2O Driverless AI uses several methods, including imputation, modification, and selection, to autonomously engineer features from your data. In machine learning, feature engineering is frequently the most time-consuming step, so this might be a huge time saver. Decision trees, random forests, support vector machines, and neural networks are some machine learning models that H2O Driverless AI can automatically construct and analyze. In addition, it optimizes your data by adjusting the hyperparameters of each model. With H2O Driverless AI, your models are instantly deployed to production, where they may be used in making predictions.

Domino Data Lab

Domino Data Lab is a cloud-based service that facilitates creating, deploying, and managing machine learning models for data scientists, engineers, and analysts. It’s a low- or no-code artificial intelligence tool for designing and automating data science operations. Domino Code Assist is a tool that can build Python and R code for frequent data science projects. This can reduce the learning curve for non-technical users and the workload for data scientists. Domino Data Lab facilitates effective teamwork on data science initiatives. Users can collaborate on projects by sharing and analyzing code, data, and models. Data science projects are 100% reproducible in Domino Data Lab. This allows anyone to replicate a project’s outcomes without obtaining the original data or source code. Domino Data Lab has several tools that can be used to manage data science initiatives. Access control, code history, and auditing of the model’s efficacy are all part of this.

CrowdStrike Falcon Fusion

Organizations may automate their security operations, threat intelligence, and incident response with the help of CrowdStrike Falcon Fusion, a security orchestration, automation, and response (SOAR) architecture. It is based on the CrowdStrike Falcon® platform and is provided at no extra cost to CrowdStrike subscribers. Falcon Fusion is a low- to no-code tool, making it accessible to organizations of all sizes in the security industry. The software’s drag-and-drop interface simplifies the process of developing and automating workflows. Falcon Fusion also features a library of pre-built connections with various security solutions, allowing easy and rapid integration with an organization’s pre-existing infrastructure. Artificial intelligence (AI) is leveraged by Falcon Fusion to facilitate automation and better judgment. For instance, the program may analyze security telemetry data for patterns, assign priorities to incidents, and suggest courses of action using artificial intelligence. Consequently, security personnel are better able to deal with threats.


Data mining and machine learning models may be created and deployed quickly with RapidMiner, a comprehensive data science platform. Data preprocessing, feature engineering, model training, evaluation, and deployment are just some of its services. RapidMiner’s no/low code methodology is a major selling point. You may now create and release AI models without touching a single line of code. RapidMiner has a graphical user interface (GUI) lets you build your models by dragging and dropping various building blocks. This facilitates the entry of non-technical users into the field of artificial intelligence. RapidMiner has sophisticated scripting features, including a language dubbed RapidMiner R and its no/low code capabilities. You can use this language to modify your models and add new features to RapidMiner.

Don’t forget to join our 29k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at

If you like our work, you will love our newsletter..

The post Top Low/No Code AI Tools (September 2023) appeared first on MarkTechPost.

University of San Francisco Data Science Conference 2023 Datathon in p …

As part of the 2023 Data Science Conference (DSCO 23), AWS partnered with the Data Institute at the University of San Francisco (USF) to conduct a datathon. Participants, both high school and undergraduate students, competed on a data science project that focused on air quality and sustainability. The Data Institute at the USF aims to support cross-disciplinary research and education in the field of data science. The Data Institute and the Data Science Conference provide a distinctive fusion of cutting-edge academic research and the entrepreneurial culture of the technology industry in the San Francisco Bay Area.
The students used Amazon SageMaker Studio Lab, which is a free platform that provides a JupyterLab environment with compute (CPU and GPU) and storage (up to 15GB). Because most of the students were unfamiliar with machine learning (ML), they were given a brief tutorial illustrating how to set up an ML pipeline: how to conduct exploratory data analysis, feature engineering, model building, and model evaluation, and how to set up inference and monitoring. The tutorial referenced Amazon Sustainability Data Initiative (ASDI) datasets from the National Oceanic and Atmospheric Administration (NOAA) and OpenAQ to build an ML model to predict air quality levels using weather data via a binary classification AutoGluon model. Next, the students were turned loose to work on their own projects in their teams. The winning teams were led by Peter Ma, Ben Welner, and Ei Coltin, who were all awarded prizes at the opening ceremony of the Data Science Conference at USF.
Response from the event
“This was a fun event, and a great way to work with others. I learned some Python coding in class but this helped make it real. During the datathon, my team member and I conducted research on different ML models (LightGBM, logistic regression, SVM models, Random Forest Classifier, etc.) and their performance on an AQI dataset from NOAA aimed at detecting the toxicity of the atmosphere under specific weather conditions. We built a gradient boosting classifier to predict air quality from weather statistics.”
– Anay Pant, a junior at the Athenian School, Danville, California, and one of the winners of the datathon.
“AI is becoming increasingly important in the workplace, and 82% of companies need employees with machine learning skills. It’s critical that we develop the talent needed to build products and experiences that we will all benefit from, this includes software engineering, data science, domain knowledge, and more. We were thrilled to help the next generation of builders explore machine learning and experiment with its capabilities. Our hope is that they take this forward and expand their ML knowledge. I personally hope to one day use an app built by one of the students at this datathon!”
– Sherry Marcus, Director of AWS ML Solutions Lab.
“This is the first year we used SageMaker Studio Lab. We were pleased by how quickly high school/undergraduate students and our graduate student mentors could start their projects and collaborate using SageMaker Studio.”
– Diane Woodbridge from the Data Institute of the University of San Francisco.
Get started with Studio Lab
If you missed this datathon, you can still register for your own Studio Lab account and work on your own project. If you’re interested in running your own hackathon, reach out to your AWS representative for a Studio Lab referral code, which will give your participants immediate access to the service. Finally, you can look for next year’s challenge at the USF Data Institute.

About the Authors
Neha Narwal is a Machine Learning Engineer at AWS Bedrock where she contributes to development of large language models for generative AI applications. Her focus lies at the intersection of science and engineering to influence research in Natural Language Processing domain.
Vidya Sagar Ravipati is a Applied Science Manager at the Generative AI Innovation Center, where he leverages his vast experience in large-scale distributed systems and his passion for machine learning to help AWS customers across different industry verticals accelerate their AI and cloud adoption.

In-Depth Analysis of Trustworthiness in GPT Models

Over half of respondents in a recent global poll said they would utilize this emerging technology for sensitive areas like financial planning and medical guidance despite concerns that it is rife with hallucinations, disinformation, and bias. Many fields have benefited from recent developments in machine learning, especially large language models (LLMs), which have been used in anything from chatbots and medical diagnostics to robots. Different benchmarks have been developed to evaluate language models and better understand their capabilities and limits. For instance, standardized tests for gauging all-purpose language comprehension, like GLUE and SuperGLUE, have been developed.

More recently, HELM was presented as a comprehensive test of LLMs across multiple use cases and indicators. As LLMs are used in more and more fields, there are rising doubts regarding their reliability. Most existing LLM trustworthiness evaluations are narrowly focused, looking at factors like robustness or overconfidence.

Furthermore, the increasing capabilities of massive language models may worsen the trustworthiness difficulties in LLMs. In particular, GPT-3.5 and GPT-4 demonstrate an improved aptitude to follow directions, thanks to their specialized optimization for dialogue; this enables users to customize tones and roles, among other variables of adaptation and personalization. Compared to older models that were only good for text infilling, the improved capabilities allow for the addition of features like question-answering and in-context learning through brief demonstrations during a discussion.

To provide a thorough assessment of GPT models’ trustworthiness, a group of academics has zeroed down on eight trustworthiness views and evaluated them using a variety of crafted scenarios, tasks, metrics, and datasets. The group’s overarching objective is to measure the robustness of GPT models in challenging settings and assess how well they perform in various trustworthiness contexts. The review focuses on the GPT-3.5 and GPT-4 models to confirm that the findings are consistent and can be replicated.

Let’s talk about GPT-3.5 and GPT-4

New forms of interaction have been made possible by GPT-3.5 and GPT-4, the two successors of GPT-3. These cutting-edge models have undergone scalability and efficiency enhancements and improvements to their training procedures.

Pretrained autoregressive (decoder only) transformers like GPT-3.5 and GPT-4 work similarly to their predecessors, generating text tokens by token from left to right and feeding back the predictions they made on those tokens. Despite an incremental improvement over GPT-3, the number of model parameters in GPT-3.5 remains at 175 billion. While the exact size of the GPT-4 parameter set and pretraining corpus remain unknown, it is common knowledge that GPT-4 requires a bigger financial investment in training than GPT-3.5 did.

GPT-3.5 and GPT-4 use the conventional autoregressive pretraining loss to maximize the following token’s probability. To further verify that LLMs adhere to instructions and produce results that align with human ideals, GPT-3.5 and GPT-4 use Reinforcement Learning from Human Feedback (RLHF).

These models can be accessed utilizing the OpenAI API querying system. It is possible to control the output by adjusting temperature and maximum tokens through API calls. Scientists also point out that these models are not static and are subject to change. They use stable variants of these models in the experiments to guarantee the reliability of the results.

From the standpoints of toxicity, bias on stereotypes, robustness on adversarial attacks, robustness on OOD instances, robustness against adversarial demonstrations, privacy, ethics, and fairness, researchers present detailed evaluations of the trustworthiness of GPT-4 and GPT-3.5. In general, they find that GPT-4 outperforms GPT-3.5 across the board. Still, they also find that GPT-4 is more amenable to manipulation because it follows instructions more closely, raising new security concerns in the face of jailbreaking or misleading (adversarial) system prompts or demonstrations via in-context learning. Furthermore, the examples suggest that numerous characteristics and properties of the inputs would affect the model’s reliability, which is worth additional investigation.

In light of these assessments, the following avenues of research could be pursued to learn more about such vulnerabilities and to protect LLMs from them using GPT models. More collaborative assessments. They mostly use static datasets, like 1-2 rounds of discussion, to examine various trustworthiness perspectives for GPT models. It is vital to look at LLMs with interactive discussions to determine if these vulnerabilities will grow more serious as huge language models evolve.

Misleading context is a major problem with in-context learning outside of false demonstrations and system prompts. They provide a variety of jailbreaking system prompts and false (adversarial) demos to test the models’ weaknesses and get a sense of their worst-case performance. You can manipulate the model’s output by deliberately injecting false information into the dialogue (a so-called “honeypot conversation”). Observing the model’s susceptibility to various forms of bias would be fascinating.

Assessment taking into account allied foes. Most studies only take into account one enemy in each scenario. But in reality, given sufficient economic incentives, it’s plausible that diverse rivals will combine to trick the model. Because of this, investigating the model’s potential susceptibility to coordinated and covert hostile behaviors is crucial.

Evaluating credibility in specific settings. Standard tasks, such as sentiment classification and NLI tasks, illustrate the general vulnerabilities of GPT models in the evaluations presented here. Given the widespread use of GPT models in fields like law and education, assessing their weaknesses in light of these specific applications is essential.

The reliability of GPT models is checked. While empirical evaluations of LLMs are crucial, they often lack assurances, especially relevant in safety-critical sectors. Furthermore, their discontinuous structure makes GPT models difficult to verify rigorously. Providing guarantees and verification for the performance of GPT models, possibly based on their concrete functionalities, providing verification based on the model abstractions, or mapping the discrete space to their corresponding continuous space, such as an embedding space with semantic preservation, to perform verification are all examples of how the difficult problem can be broken down into more manageable sub-problems.

Including extra information and reasoning analysis to protect GPT models. Since they are based solely on statistics, GPT models must improve and can’t reason through complex problems. To assure the credibility of the model’s results, it may be necessary to provide language models with domain knowledge and the ability to reason logically and to guard their results to ensure they satisfy basic domain knowledge or logic.

Keeping game-theory-based GPT models safe. The “role-playing” system prompts used in their creation demonstrate how readily models can be tricked by simply switching and manipulating roles. This suggests that during GPT model conversations, various roles can be crafted to guarantee the consistency of the model’s responses and, thus, prevent the models from being self-conflicted. It is possible to assign specific tasks to ensure the models have a thorough grasp of the situation and deliver reliable results.

Testing GPT versions according to specific guidelines and conditions. While the models are valued based on their general applicability, users may have specialized security or reliability needs that must be considered. Therefore, to audit the model more efficiently and effectively, it is vital to map the user needs and instructions to specific logical spaces or design contexts and evaluate whether the outputs satisfy these criteria.

Check out the Paper and Reference Article. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 29k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

The post In-Depth Analysis of Trustworthiness in GPT Models appeared first on MarkTechPost.

Researchers from CMU and Tsinghua University Propose Prompt2Model: A G …

Imagine you wish to build an NLP model to solve a given problem. You need to define the task scope, then find or create data that specifies the intended system behaviour, choose a suitable model architecture, train the model, assess its performance through evaluation, and then deploy it for real-world usage. Researchers have made it possible to prototype such extensively made NLP models with a single line of code!

Prompt2Model is a system that retains the ability to specify system behaviour using simple prompts and simultaneously provides a deployable special purpose model preserving all its benefits. The figure above demonstrates the working architecture of our Prompt2Model. Essentially, it works as an automated pipeline, which extracts all the necessary details about the task from user prompts and then gathers and combines task-related information and deploys using the following different channels.

Dataset retrieval: Given a prompt, the first task is to discover existing manually annotated data that can support a user’s task description.

Dataset generation: To support a wide range of tasks, there exists a Dataset Generator to produce synthetic training data as per the user-specific requirements parsed by the Prompt Parser. The prompt parses consist of an LLM with in-context learning that is utilised to segment user prompts, employing OpenAI’s gpt-3.5-turbo-0613.

Model retrieval: Using the provided prompt, a pre-trained language model is selected with suitable knowledge for the user’s goal. This chosen model serves as the student model and is further fine-tuned and evaluated using the generated and retrieved data. 

WebApp: Finally, there exists an easy-to-use graphical user interface that allows downstream users to interact with the trained model. This web application, built using Gradio, can then be easily deployed publicly on a server. 

In conclusion, Prompt2Model is a tool for quickly building small and competent NLP systems. It can be directly used to produce task-specific models that outperform LLMs in a few hours without manual data annotation or architecture. Given the model’s extensible design, it can offer a platform for exploring new techniques in model distillation, dataset generation, synthetic evaluation, dataset retrieval, and model retrieval. 

Looking ahead, we can envision Prompt2Model as a catalyst for collaborative innovation. By proposing distinct challenges, researchers aim to foster the development of diverse implementations and improvements across the framework’s components in the future.

Check out the Paper and Github. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 29k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

The post Researchers from CMU and Tsinghua University Propose Prompt2Model: A General Purpose Method that Generates Deployable AI Models from Natural Language Instructions appeared first on MarkTechPost.