Microsoft AI Unveils LLaVA-Med: An Efficiently Trained Large Language …

There is a lot of potentials for conversational generative AI to help medical professionals, but so far, the research has only focused on text. While advances in multi-modal conversational AI have been rapid because of billions of publicly available image-text pairings, such general-domain vision-language models still need more complexity when interpreting and chatting about biological pictures. The research team at Microsoft suggests a low-effort method for teaching a vision-language conversational assistant to respond to free-form inquiries about biomedical images. The team proposes a novel curriculum learning approach to the fine-tuning of a large general-domain vision-language model using a large-scale, broad-coverage biomedical figure-caption dataset extracted from PubMed Central and GPT-4 to self-instruct open-ended instruction-following data from the captions.

The model mimics the progressive process by which a layman gains biological knowledge by initially learning to align biomedical vocabulary using the figure-caption pairs as-is and then learning to master open-ended conversational semantics using GPT-4 generated instruction-following data. In less than 15 hours (with eight A100s), researchers can train a Large Language and Vision Assistant for BioMedicine (LLaVA-Med). With its multi-modal conversational capacity and ability to follow free-form instructions, LLaVA-Med is well-suited to answering questions regarding biological images. Fine-tuned LLaVA-Med achieves state-of-the-art performance on three benchmark biomedical visual question-answering datasets. The data on how well people follow directions and the LLaVA-Med model will be made public to advance multi-modal research in biomedicine.

The team’s key contributions are summed up as follows:

Multi-modal medical training compliance statistics. By selecting biomedical picture-text pairs from PMC-15M and running GPT-4 to generate instructions from the text alone, they describe a unique data creation pipeline to generate diverse (image, instruction, output) instances.

LLaVA-Med. Using the self-generated biomedical multi-modal instruction-following dataset, they offer a novel curriculum learning method to adapt LLaVA to the biomedical domain.

Open-source. The biomedical multi-modal instruction-following dataset and the software for data generation and model training will be publicly available to promote further study in biomedical multi-modal learning.

The effectiveness of LLaVA-Med and the accuracy of the multi-modal biomedical instruction-following data obtained were the focus of the team’s investigations. Researchers look at two different contexts for evaluating research:

How effective is LLaVA-Med as a general-purpose biomedical visual chatbot?

Compared to the state-of-the-art methodologies, how does LLaVA-Med fare on industry benchmarks?

The team first proposes a novel data generation pipeline that samples 600K image-text pairs from PMC-15M, curates diverse instruction-following data through GPT-4, and aligns the created instructions to the model to solve the problem of a lack of multi-modal biomedical datasets for training an instruction-following assistant.

Researchers then introduce a new method of teaching LLaVA-Med’s curriculum. Specifically, they train the LLaVA multi-modal conversation model in broad domains and gradually shift their focus to the biomedical field. There are two phases to the training process:

Specification of a Biomedical Idea Word embeddings is aligned with the relevant image attributes of a large set of innovative biological visual concepts.

With its fine-tuned model based on biomedical language-image instructions, LLaVA-Med shows impressive zero-shot task transfer capabilities and facilitates natural user interaction.

To sum it up

The research team at Microsoft provides LLaVA-Med, a large language and vision model for the biomedical field. They use a self-instruct strategy to construct a data curation pipeline with language-only GPT-4 and external knowledge. Then they train the model on a high-quality biomedical language-image instruction-following dataset. LLaVA-Med beats earlier supervised SoTA on three VQA datasets on specific measures after fine-tuning, demonstrating great conversation abilities with domain knowledge. While LLaVA-Med is a big step in the right direction, they also recognize that it has hallucinations and a lack of depth of reasoning that plague many LMMs. Future initiatives will be towards making things more reliable and high-quality.

Check Out The Paper and Github. Don’t forget to join our 23k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Microsoft AI Unveils LLaVA-Med: An Efficiently Trained Large Language and Vision Assistant Revolutionizing Biomedical Inquiry, Delivering Advanced Multimodal Conversations in Under 15 Hours appeared first on MarkTechPost.

Google AI Introduces a New Secure AI Framework (SAIF): A Conceptual Fr …

Google has introduced the Secure AI Framework (SAIF), a conceptual framework that establishes clear industry security standards for building and deploying AI systems responsibly. SAIF draws inspiration from security best practices in software development and incorporates an understanding of security risks specific to AI systems.

The introduction of SAIF is a significant step towards ensuring that AI technology is secure by default when implemented. With the immense potential of AI, responsible actors need to safeguard the technology supporting AI advancements. SAIF addresses risks such as model theft, data poisoning, malicious input injection, and confidential information extraction from training data. As AI capabilities become increasingly integrated into products worldwide, adhering to a responsive framework like SAIF becomes even more critical.

SAIF consists of six core elements that provide a comprehensive approach to secure AI systems:

1. Expand strong security foundations to the AI ecosystem: This involves leveraging existing secure-by-default infrastructure protections and expertise to protect AI systems, applications, and users. Organizations should also develop expertise that keeps pace with AI advancements and adapts infrastructure protections accordingly.

2. Extend detection and response to bring AI into an organization’s threat universe: Timely detection and response to AI-related cyber incidents are crucial. Organizations should monitor the inputs and outputs of generative AI systems to detect anomalies and leverage threat intelligence to anticipate attacks. Collaboration with trust and safety, threat intelligence, and counter-abuse teams can enhance threat intelligence capabilities.

3. Automate defenses to keep pace with existing and new threats: The latest AI innovations can improve the scale and speed of response efforts to security incidents. Adversaries are likely to use AI to scale their impact, so utilizing AI and its emerging capabilities is essential to stay agile and cost-effective in protecting against them.

4. Harmonize platform-level controls to ensure consistent security across the organization: Consistency across control frameworks supports AI risk mitigation and enables scalable protections across different platforms and tools. Google extends secure-by-default protections to AI platforms like Vertex AI and Security AI Workbench, integrating controls and protections into the software development lifecycle.

5. Adapt controls to adjust mitigations and create faster feedback loops for AI deployment: Constant testing and continuous learning ensure that detection and protection capabilities address the evolving threat environment. Techniques like reinforcement learning based on incidents and user feedback can fine-tune models and improve security. Regular red team exercises and safety assurance measures enhance the security of AI-powered products and capabilities.

6. Contextualize AI system risks in surrounding business processes: Conducting end-to-end risk assessments helps organizations make informed decisions when deploying AI. Assessing the end-to-end business risk, including data lineage, validation, and operational behavior monitoring, is crucial. Automated checks should be implemented to validate AI performance.

Google emphasizes the importance of building a secure AI community and has taken steps to foster industry support for SAIF. This includes partnering with key contributors and engaging with industry standards organizations such as NIST and ISO/IEC. Google also collaborates directly with organizations, conducts workshops, shares insights from its threat intelligence teams, and expands bug hunter programs to incentivize research on AI safety and security.

As SAIF advances, Google remains committed to sharing research and insights to utilize AI securely. Collaboration with governments, industry, and academia is crucial to achieve common goals and ensure that AI technology benefits society. By adhering to frameworks like SAIF, the industry can build and deploy AI systems responsibly, unlocking the full potential of this transformative technology.

Check Out The Google AI Blog and Guide. Don’t forget to join our 23k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Google AI Introduces a New Secure AI Framework (SAIF): A Conceptual Framework for Ensuring the Security of AI Systems appeared first on MarkTechPost.

Researchers from China Introduce Make-Your-Video: A Video Transformati …

Videos are a commonly used digital medium prized for their capacity to present vivid and engaging visual experiences. With the ubiquitous use of smartphones and digital cameras, recording live events on camera has become simple. However, the process gets significantly more difficult and expensive when producing a video to represent the idea visually. This often calls for professional experience in computer graphics, modeling, and animation creation. Fortunately, new developments in text-to-video have made it possible to streamline this procedure by using only text prompts. 

Figure 1 shows how the model can produce temporally coherent films that adhere to the guidance intents when given text descriptions and motion structure as inputs. They demonstrate the video production outcomes in several applications, including (top) real-world scene setup to video, (middle) dynamic 3D scene modelling to video, and (bottom) video re-rendering, by constructing structure guidance from various sources.

They contend that while language is a well-known and flexible description tool, it may need to be more successful at giving precise control. Instead, it excels at communicating abstract global context. This encourages us to investigate the creation of customized videos using text to describe the setting and motion in a specific direction. As frame-wise depth maps are 3D-aware 2D data well suited to the video creation task, they are specifically chosen to describe the motion structure. The structure direction in their method might be relatively basic so that non-expert can readily prepare it. 

This architecture gives the generative model the freedom to generate realistic content without relying on meticulously produced input. For instance, creating a photorealistic outside environment can be guided by a scenario setup employing goods found in an office (Figure 1(top)). The physical objects may be substituted with specific geometrical parts or any readily available 3D asset using 3D modeling software (Figure 1(middle)). Using the calculated depth from already-existing recordings is another option (Figure 1(bottom)). To customize their movies as intended, users have both flexibility and control thanks to the mix of textual and structural instruction. 

To do this, researchers from CUHK, Tencent AI Lab and HKUST use a Latent Diffusion Model (LDM), which adopts a diffusion model in a tight lower-dimensional latent space to reduce processing costs. They suggest separating the training of spatial modules (for image synthesis) and temporal modules (for temporal coherence) for an open-world video production model. This design is based on two main factors: (i) training the model components separately reduces computational resource requirements, which is especially important for resource-intensive tasks; and (ii) as image datasets encompass a much wider variety of concepts than the existing video datasets, pre-training the model for image synthesis aids in inheriting the diverse visual concepts and transfer them to video generation. 

Achieving temporal coherence is a significant task. They keep them as the frozen spatial blocks and introduce the temporal blocks designed to learn inter-frame coherence throughout the video dataset using a pre-trained picture LDM. Notably, they incorporate spatial and temporal convolutions, increasing the pre-trained modules’ flexibility and enhancing temporal stability. Additionally, they use a straightforward but powerful causal attention mask method to enable lengthier (i.e., four times the training period) video synthesis, considerably reducing the risk of quality deterioration. 

Qualitative and quantitative evaluations show that the suggested technique outperforms the baselines, especially in terms of temporal coherence and faithfulness to user instructions. The efficiency of the proposed designs, which are essential to the operation of the approach, is supported by ablation experiments. Additionally, they demonstrated several fascinating applications made possible by their methodology, and the outcomes illustrate the potential for real-world applications. 

The following is a summary of their contributions: • They offer textual and structural assistance to present an effective method for producing customized videos. Their approach produces the best results in both quantitative and qualitative terms for regulated text-to-video production. • They provide a method for using pre-trained image LDMs to generate videos that inherit rich visual notions and have good temporal coherence. • They include a temporal masking approach to extend the duration of video synthesis while minimizing quality loss.

Check Out The Paper, Project and Github. Don’t forget to join our 23k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Researchers from China Introduce Make-Your-Video: A Video Transformation Method by Employing Textual and Structural Guidance appeared first on MarkTechPost.

Enhancing Task-Specific Adaptation for Video Foundation Models: Introd …

Large text-to-video models trained on internet-scale data have shown extraordinary capabilities to generate high-fidelity films from arbitrarily written descriptions. However, fine-tuning a pretrained huge model might be prohibitively expensive, making it difficult to adapt these models to applications with limited domain-specific data, such as animation or robotics videos. Researchers from Google DeepMind, UC Berkeley, MIT and the University of Alberta look into how a large pretrained text-to-video model can be customized to a variety of downstream domains and tasks without fine-tuning, inspired by how a small modifiable component (such as prompts, prefix-tuning) can enable a large language model to perform new tasks without requiring access to the model weights. To address this, they present Video Adapter, a method for generating task-specific tiny video models by using a large pretrained video diffusion model’s score function as a prior probabilistic. Experiments demonstrate that Video Adapters can use as few as 1.25 percent of the pretrained model’s parameters to include the wide knowledge and maintain the high fidelity of a big pretrained video model in a task-specific tiny video model. High-quality, task-specific movies can be generated using Video Adapters for various uses, including but not limited to animation, egocentric modeling, and the modeling of simulated and real-world robotics data.

Researchers test Video Adapter on various video creation jobs. On the difficult Ego4D data and the robotic Bridge data, Video Adapter generates videos with better FVD and Inception Scores than a high-quality pretrained big video model while using up to 80x fewer parameters. Researchers demonstrate qualitatively that Video Adapter permits the production of genre-specific videos like those found in science fiction and animation. In addition, the study’s authors show how Video Adapter can pave the way for bridging robotics’ infamous sim-to-real gap by modeling both real and simulated robotic films and allowing data augmentation on actual robotic videos via individualized stylization.

Key Features

To achieve high-quality yet versatile video synthesis without requiring gradient updates on the pretrained model, Video Adapter combines the scores of a pretrained text-to-video model with the scores of a domain-specific tiny model (with 1% parameters) at sampling time.

Pretrained video models can be easily adapted using Video Adapter to movies of humans and robotic data.

Under the same number of TPU hours, Video Adapter gets higher FVD, FID, and Inception Scores than the pretrained and task-specific models.

Potential uses for video adapters range from use in anime production to domain randomization to bridge the simulation-reality gap in robotics.

As opposed to a huge video model pretrained from internet data, Video Adapter requires training a tiny domain-specific text-to-video model with orders of magnitude fewer parameters. Video Adapter achieves high-quality and adaptable video synthesis by composing the pretrained and domain-specific video model scores during sampling.

With Video Adapter, you may give a video a unique look using a model only exposed to one type of animation.

Using a Video Adapter, a pretrained model of considerable size can take on the visual characteristics of an animation model of a much smaller size.

With the help of a Video Adapter, a massive pre-trained model can take on the visual aesthetic of a diminutive Sci-Fi animation model.

Video Adapters may generate various movies in various genres and styles, including videos with egocentric motions based on manipulation and navigation, videos with individualized genres like animation and science fiction, and videos with simulated and genuine robotic motions.

Limitations

A small video model still needs to be trained on domain-specific data; therefore, while Video Adapter can effectively adapt big pretrained text-to-video models, it is not training-free. Another difference between Video Adapter and other text-to-image and text-to-video APIs is that it requires the score to be output alongside the resulting video. Video Adapter effectively makes text-to-video research more accessible to small industrial and academic institutions by addressing the lack of free access to model weights and computing efficiency.

To sum it up

It is obvious that when text-to-video foundation models expand in size, they will need to be effectively adapted to task-specific usage. Researchers have developed Video Adapter, a powerful method for generating domain and task-specific films by employing huge pretrained text-to-video models as a probabilistic prior. Video Adapters may synthesize high-quality videos in specialized disciplines or desired aesthetics without requiring more fine-tuning of the massive pretrained model.

Check Out The Paper and Github. Don’t forget to join our 23k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Enhancing Task-Specific Adaptation for Video Foundation Models: Introducing Video Adapter as a Probabilistic Framework for Adapting Text-to-Video Models appeared first on MarkTechPost.

This AI Paper Presents An Efficient Solution For Solving Common Practi …

Researchers have proposed a novel approach to enforcing distributional constraints in machine learning models using multi-marginal optimal transport. This approach is designed to be computationally efficient and allows for efficient computation of gradients during backpropagation.

Existing methods for enforcing distributional constraints in machine learning models can be computationally expensive and difficult to integrate into machine learning pipelines. In contrast, the proposed method uses multi-marginal optimal transport to enforce distributional constraints in a way that is both computationally efficient and allows for efficient computation of gradients during backpropagation. This makes it easier to integrate the method into existing machine-learning pipelines and enables more accurate modeling of complex distributions.

The proposed method uses multi-marginal optimal transport to enforce distributional constraints by minimizing the distance between probability distributions. This approach is both computationally efficient and allows for efficient computation of gradients during backpropagation, making it well-suited for use in machine learning models. The researchers evaluated the performance of their proposed method on several benchmark datasets and found that it outperformed existing methods in terms of accuracy and computational efficiency.

In conclusion, researchers have proposed a novel approach to enforcing distributional constraints in machine learning models using multi-marginal optimal transport. This approach is designed to be computationally efficient and allows for efficient computation of gradients during backpropagation, making it well-suited for use in a wide range of applications. The proposed method outperformed existing methods in terms of accuracy and computational efficiency, demonstrating its potential as a valuable tool for improving the performance of machine learning models.

Check Out The Paper and Github. Don’t forget to join our 23k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post This AI Paper Presents An Efficient Solution For Solving Common Practical Multi-Marginal Optimal Transport Problems appeared first on MarkTechPost.

MIT Researchers Propose A New Multimodal Technique That Blends Machine …

Artificial intelligence is revolutionary in all the major use cases and applications we encounter daily. One such area revolves around a lot of audio and visual media. Think about all the AI-powered apps that can generate funny videos, and artistically astounding images, copy a celebrity’s voice, or note down the entire lecture for you with just one click. All of these models require a huge corpus of data to train. And most of the successful systems rely on annotated datasets to teach themselves. 

The biggest challenge is to store and annotate this data and transform it into usable data points which models can ingest. Easier said than done; companies need help gathering and creating gold-standard data points every year. 

Now, researchers from MIT, the MIT-IBM Watson AI Lab, IBM Research, and other institutions have developed a groundbreaking technique that can efficiently solve these issues by analyzing unlabeled audio and visual data. This model has a lot of promise and potential to improve how current models train. This method resonates with many models, such as speech recognition models, transcribing and audio creation engines, and object detection. It combines two self-supervised learning architectures, contrastive learning, and masked data modeling. This approach follows one basic idea: replicate how humans perceive and understand the world and then replicate the same behavior. 

As explained by Yuan Gong, an MIT Postdoc, self-supervised learning is essential because if you look at how humans gather and learn from the data, a big portion is without direct supervision. The goal is to enable the same procedure in machines, allowing them to learn as many features as possible from unlabelled data. This training becomes a strong foundation that can be utilized and improved with the help of supervised learning or reinforcement learning, depending on the use cases. 

The technique used here is contrastive audio-visual masked autoencoder (CAV-MAE), which uses a neural network to extract and map meaningful latent representations from audio and visual data. The models can be trained on large datasets of 10-second YouTube clips, utilizing audio and video components. The researchers claimed that CAV-MAE is much better than any other previous approaches because it explicitly emphasizes the association between audio and visual data, which other methods don’t incorporate. 

The CAV-MAE method incorporates two approaches: masked data modeling and contrastive learning. Masked data modeling involves:

Taking a video and its matched audio waveform.

Converting the audio to a spectrogram.

Masking 75% of the audio and video data.

The model then recovers the missing data through a joint encoder/decoder. The reconstruction loss, which measures the difference between the reconstructed prediction and the original audio-visual combination, is used to train the model. The main aim of this approach is to map similar representations close to one another. It does so by associating the relevant parts of audio and video data, such as connecting the mouth movements of spoken words. 

The testing of CAV-MAE-based models with other models proved to be very insightful. The tests were conducted on audio-video retrieval and audio-visual classification tasks. The results demonstrated that contrastive learning and masked data modeling are complementary methods. CAV-MAE outperformed previous techniques in event classification and remained competitive with models trained using industry-level computational resources. In addition, multi-modal data significantly improved fine-tuning of single-modality representation and performance on audio-only event classification tasks.

The researchers at MIT believe that CAV-MAE represents a breakthrough in progress in self-supervised audio-visual learning. They envision that its use cases can range from action recognition, including sports, education, entertainment, motor vehicles, and public safety, to cross-linguistic automatic speech recognition and audio-video generations. While the current method focuses on audio-visual data, the researchers aim to extend it to other modalities, recognizing that human perception involves multiple senses beyond audio and visual cues. 

It will be interesting to see how this approach performs over time and how many existing models try to incorporate such techniques. 

The researchers hope that as machine learning advances, techniques like CAV-MAE will become increasingly valuable, enabling models to understand better and interpret the world.

Check Out The Paper and MIT Blog. Don’t forget to join our 23k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post MIT Researchers Propose A New Multimodal Technique That Blends Machine Learning Methods To Learn More Similarly To Humans appeared first on MarkTechPost.

Meet SpQR (Sparse-Quantized Representation): A Compressed Format And Q …

Large Language Models (LLMs) have demonstrated incredible capabilities in recent times. Learning from massive amounts of data, these models have been performing tasks with amazing applications, including human-like textual content generation, question-answering, code completion, text summarization, creation of highly-skilled virtual assistants, and so on. Though LLMs have been performing greatly, now there has been a shift toward developing smaller models trained on even more data. Smaller models require less computational resources as compared to the larger ones; for example, the LLaMA model having 7 billion parameters and trained on 1 trillion tokens, produces results that are 25 times better than those of the much bigger GPT-3 model despite being 25 times smaller.

Compressing the LLMs so that they fit into memory-limited devices, laptops, and mobile phones accompanies challenges such as difficulty in maintaining generative quality, accuracy degradation in 3 to 4-bit quantization techniques in models with 1 to 10 Billion parameters, etc. The limitations are due to the sequential nature of LLM generation, where little errors can add up to produce outputs that are seriously damaged, to avoid which it is important to design low-bit-width quantization methods that do not reduce predictive performance compared to the original 16-bit model.

To overcome the accuracy limitations, a team of researchers has introduced Sparse-Quantized Representation (SpQR), a compressed format and quantization technique. This hybrid sparse-quantized format enables nearly lossless compression of precise pretrained LLMs down to 3–4 bits per parameter. It is the first weight quantization technique to achieve such compression ratios with an end-to-end accuracy error of less than 1% in comparison to the dense baseline, as evaluated by perplexity.

SpQR makes use of two ways. Firstly, it begins by locating outlier weights that, when quantized, give excessively high errors, and these weights are stored in high precision, while the remaining weights are stored in a much lower format, typically 3 bits. Secondly, SpQR employs a variant of grouped quantization with very small group size, such as 16 contiguous elements, and even the quantization scales themselves can be represented in a 3-bit format.

For converting a pretrained LLM into the SpQR format, the team has adopted an extended version of the post-training quantization (PTQ) approach, which, inspired by GPTQ, passes calibration data through the uncompressed model. SpQR allows for running 33 billion parameter LLMs on a single 24 GB consumer GPU without any performance degradation while providing a 15% speedup at 4.75 bits. This makes powerful LLMs accessible to consumers without suffering from any performance penalties.

SpQR offers effective methods for encoding and decoding weights into their format at runtime. These algorithms are made to maximize the SpQR memory compression advantages. A powerful GPU inference algorithm has also been created for SpQR, enabling faster inference than 16-bit baselines while maintaining comparable levels of accuracy. Because of this, SpQR provides memory compression benefits of more than 4x, making it very effective for use on devices with limited memory. In conclusion, SpQR seems like a promising technique as it efficiently addresses the challenge of accuracy loss associated with low-bit quantization in LLMs.

Check Out The Paper and Github. Don’t forget to join our 23k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Meet SpQR (Sparse-Quantized Representation): A Compressed Format And Quantization Technique That Enables Near-Lossless Large Language Model Weight Compression appeared first on MarkTechPost.

Best AI Plugins for WordPress (2023)

WordPress is just another area artificial intelligence (AI) has positively impacted. Yes! Much artificial intelligence (AI) WordPress plugins exist nowadays to facilitate the work of WordPress users. AI is assisting virtually every aspect of business, from advertising to content creation to customer service to virus detection.

While many WordPress AI plugins exist, not all are created equal in terms of service. To reiterate, only some plugins are perfect. The top WordPress AI plugins, together with their most important features, benefits, downsides, and costs, are outlined in this article.

Without further ado, let’s dive into the meat of the matter and become familiar with WordPress’s top artificial intelligence (AI) plugins.

WP AI Assistant 

With this one-of-a-kind plugin for WordPress, you can turn your website into a conversational interface powered by artificial intelligence. WP AI Assistant may email the admin with conversations, guide customers to relevant pages, and provide advice on making purchases. It’s compatible with WPML, so you can easily translate any custom AI content you create. 

Speaker 

An add-on for WordPress, Speaker turns text into synthesized speech. More than 235 voices in 40+ languages and variants are used to turn written text into human-sounding speech. It complies with the Speech Synthesis Markup Language (SSML) standard, allowing you to customize the voiceover for each blog post. You have full control over pauses, intonation, and the standard human number reading format.

Voicer 

The WordPress plugin Voiсer aims to make the text sound like a real person speaking it. The Google Cloud Platform forms the backbone of the plugin, guaranteeing its global availability and lightning-fast performance. More than 200 voices in 30+ languages and variants are available in the Voicer WordPress plugin, turning text into human-like speech. This simple plugin will allow you to provide superior support to your customers by simulating natural conversations with your consumers.

WordLift 

WordLift automates search engine optimization by scanning a website’s content, enriching it with structured data or schema markup, and submitting it to Google. To generate metadata, this plugin employs AI to recognize unique terms throughout the site. Users can manually select their preferred keywords and shape the appearance of the website’s knowledge graphs in the process. 

SEOPress 

SEOPress is a plugin for WordPress that provides an intuitive interface for optimizing one’s site for search engines. SEOPress, in contrast to most other SEO plugins, is compatible with OpenAI. SEO metadata (meta titles and descriptions) are automatically generated by this function using AI analysis of the post’s content. Because it can be performed in bulk, it is ideal for optimizing large websites with hundreds of pages. Hopefully, with future editions of SEOPress, we will see more examples of how AI can enhance SEO, as the plugin’s authors have ambitions to extend AI usage.

Akismet 

To prevent spam comments from ever reaching the pending state, the WordPress development team at Automattic created Akismet. There are already more than five million active installations of this utility, and it has been shown to be particularly efficient against spambots. Akismet uses machine learning techniques to continuously enhance its performance after being installed on a WordPress site. Website administrators can review Akismet-caught comments and highlight spam comments that evaded the filtering system.

AI Engine 

Jordi Meaw’s AI Engine is a relatively new plugin, but it’s already garnering much attention. Users have praised the plugin’s originality and its seamless integration of AI with WordPress, and after 2000+ active installations on wordpress.org, it has received solely 5-star evaluations. You’ll need to generate an OpenAI key and enter it into the plugin’s configurations before you can use AI Engine. This enables you to equip your website with an advanced chatbot and content generator powered by cutting-edge OpenAI technology. 

Bertha AI

Bertha AI is one of the best artificial intelligence assistants made for WordPress users, and it uses OpenAI’s GPT-3 language model. Once the plugin has been installed and activated, a new animated figure will display in all text windows across your site, including the WordPress editor. Every piece of content created by Bertha AI, from section headings to entire paragraphs, is saved so that it can be used again. Bertha AI’s ability to automatically generate product descriptions while recommending long-tail keywords and SEO description tags is one of its strongest features.

ContentBot 

ContentBot is an additional content generator that makes use of the OpenAI GPT-3 NLP engine. Once the plugin is installed, new posts can be created without leaving the WordPress dashboard. It’s a fantastic method for bloggers to produce more placements in less time. ContentBot will rewrite your text once you’ve written a few phrases, asked it to continue, and deleted the sections you didn’t like. Thanks to the built-in plagiarism checker, all of the AI’s work will be completely original. 

Tidio 

This plugin provides a comprehensive method of interaction. Its purpose is to enhance the service provided to customer care on the website by introducing new ways of contact and increasing sales through pre-programmed chatbots. Using a simple drag-and-drop editor, chatbot templates may be easily customized or made from scratch. Tidio is a popular live chat plugin used by over 100,000 people and featured prominently on wordpress.org.

GetGenie 

GetGenie is an AI writing assistant made to hasten the production of unique, high-quality blog entries. The many available templates in this writing aid cover a wide range of frameworks, including AIDA, BAB, and PAS. In addition to displaying on-page SEO ratings for rapid content optimization, it offers helpful advice for making SEO-friendly slugs. The “Genie Mode” makes generating content based on user requests simple and provides thorough responses. Additionally, GetGenie has just released its chat platform, “GenieChat,” which may provide instant responses.

Link Whisper

When making internal links, Link Whisper is your best bet. While linking between interior website pieces is often overlooked, it can be crucial for user engagement and search engine optimization. The plugin’s AI-powered engine reads through your website’s text and makes contextual recommendations for you in a sidebar. Link Whisper also offers comprehensive reports on 404 problems and broken links.

RankMath 

Over a million users have downloaded the RankMath AI-powered SEO plugin. With its support, you can manage the indexed material and its visibility on search engines. Website content may be easily optimized with the help of multiple in-built suggestions, and users can gain insight into excellent SEO techniques in the process. RankMath’s well-considered design and set-up wizard make it accessible to anyone with no prior expertise using similar software.

CodeWP 

The revolutionary AI code generator CodeWP was built and taught to write code exclusively for WordPress sites. It’s an effective and affordable alternative to spending time and money exploring StackOverflow for code solutions or paying engineers. CodeWP is a flexible tool for website owners and developers thanks to its support for PHP, JS, WooCommerce, and several popular WordPress plugins. In addition, CodeWP provides a large library of pre-made code snippets that can be imported into and used on websites immediately. This time-saving addition gives users a great place to begin while working with WordPress.

Quttera 

Quttera Web Malware Scanner should be placed on your WordPress site if you wish to protect it from malware, trojans, backdoors, worms, viruses, shells, spyware, and the like. The AI WordPress plugin not only aids in malware removal but also protects potentially harmful users by blocking their IP addresses. For these reasons, it has found widespread adoption in 32 different nations. 

Don’t forget to join our 23k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Best AI Plugins for WordPress (2023) appeared first on MarkTechPost.

Host ML models on Amazon SageMaker using Triton: ONNX Models

ONNX (Open Neural Network Exchange) is an open-source standard for representing deep learning models widely supported by many providers. ONNX provides tools for optimizing and quantizing models to reduce the memory and compute needed to run machine learning (ML) models. One of the biggest benefits of ONNX is that it provides a standardized format for representing and exchanging ML models between different frameworks and tools. This allows developers to train their models in one framework and deploy them in another without the need for extensive model conversion or retraining. For these reasons, ONNX has gained significant importance in the ML community.
In this post, we showcase how to deploy ONNX-based models for multi-model endpoints (MMEs) that use GPUs. This is a continuation of the post Run multiple deep learning models on GPU with Amazon SageMaker multi-model endpoints, where we showed how to deploy PyTorch and TensorRT versions of ResNet50 models on Nvidia’s Triton Inference server. In this post, we use the same ResNet50 model in ONNX format along with an additional natural language processing (NLP) example model in ONNX format to show how it can be deployed on Triton. Furthermore, we benchmark the ResNet50 model and see the performance benefits that ONNX provides when compared to PyTorch and TensorRT versions of the same model, using the same input.

ONNX Runtime
ONNX Runtime is a runtime engine for ML inference designed to optimize the performance of models across multiple hardware platforms, including CPUs and GPUs. It allows the use of ML frameworks like PyTorch and TensorFlow. It facilitates performance tuning to run models cost-efficiently on the target hardware and has support for features like quantization and hardware acceleration, making it one of the ideal choices for deploying efficient, high-performance ML applications. For examples of how ONNX models can be optimized for Nvidia GPUs with TensorRT, refer to TensorRT Optimization (ORT-TRT) and ONNX Runtime with TensorRT optimization.
The Amazon SageMaker Triton container flow is depicted in the following diagram.

Users can send an HTTPS request with the input payload for real-time inference behind a SageMaker endpoint. The user can specify a TargetModel header that contains the name of the model that the request in question is destined to invoke. Internally, the SageMaker Triton container implements an HTTP server with the same contracts as mentioned in How Containers Serve Requests. It has support for dynamic batching and supports all the backends that Triton provides. Based on the configuration, the ONNX runtime is invoked and the request is processed on CPU or GPU as predefined in the model configuration provided by the user.
Solution overview
To use the ONNX backend, complete the following steps:

Compile the model to ONNX format.
Configure the model.
Create the SageMaker endpoint.

Prerequisites
Ensure that you have access to an AWS account with sufficient AWS Identity and Access Management IAM permissions to create a notebook, access an Amazon Simple Storage Service (Amazon S3) bucket, and deploy models to SageMaker endpoints. See Create execution role for more information.
Compile the model to ONNX format
The transformers library provides for convenient method to compile the PyTorch model to ONNX format. The following code achieves the transformations for the NLP model:

onnx_inputs, onnx_outputs = transformers.onnx.export(
preprocessor=tokenizer,
model=model,
config=onnx_config,
opset=12,
output=save_path
)

Exporting models (either PyTorch or TensorFlow) is easily achieved through the conversion tool provided as part of the Hugging Face transformers repository.
The following is what happens under the hood:

Allocate the model from transformers (PyTorch or TensorFlow).
Forward dummy inputs through the model. This way, ONNX can record the set of operations run.
The transformers inherently take care of dynamic axes when exporting the model.
Save the graph along with the network parameters.

A similar mechanism is followed for the computer vision use case from the torchvision model zoo:

torch.onnx.export(
resnet50,
dummy_input,
args.save,
export_params=True,
opset_version=11,
do_constant_folding=True,
input_names=[“input”],
output_names=[“output”],
dynamic_axes={“input”: {0: “batch_size”}, “output”: {0: “batch_size”}},
)

Configure the model
In this section, we configure the computer vision and NLP model. We show how to create a ResNet50 and RoBERTA large model that has been pre-trained for deployment on a SageMaker MME by utilizing Triton Inference Server model configurations. The ResNet50 notebook is available on GitHub. The RoBERTA notebook is also available on GitHub. For ResNet50, we use the Docker approach to create an environment that already has all the dependencies required to build our ONNX model and generate the model artifacts needed for this exercise. This approach makes it much easier to share dependencies and create the exact environment that is needed to accomplish this task.
The first step is to create the ONNX model package per the directory structure specified in ONNX Models. Our aim is to use the minimal model repository for a ONNX model contained in a single file as follows:

<model-repository-path> /
Model_name
├── 1
│ └── model.onnx
└── config.pbtxt

Next, we create the model configuration file that describes the inputs, outputs, and backend configurations for the Triton Server to pick up and invoke the appropriate kernels for ONNX. This file is known as config.pbtxt and is shown in the following code for the RoBERTA use case. Note that the BATCH dimension is omitted from the config.pbtxt. However, when sending the data to the model, we include the batch dimension. The following code also shows how you can add this feature with model configuration files to set dynamic batching with a preferred batch size of 5 for the actual inference. With the current settings, the model instance is invoked instantly when the preferred batch size of 5 is met or the delay time of 100 microseconds has elapsed since the first request reached the dynamic batcher.

name: “nlp-onnx”
platform: “onnxruntime_onnx”
backend: “onnxruntime”
max_batch_size: 32

input {
name: “input_ids”
data_type: TYPE_INT64
dims: [512]
}
input {
name: “attention_mask”
data_type: TYPE_INT64
dims: [512]
}

output {
name: “last_hidden_state”
data_type: TYPE_FP32
dims: [-1, 768]
}
output {
name: “1550”
data_type: TYPE_FP32
dims: [768]
}
instance_group {
count: 1
kind: KIND_GPU
}
dynamic_batching {
max_queue_delay_microseconds: 100
preferred_batch_size:5
}

The following is the similar configuration file for the computer vision use case:

name: “resenet_onnx”
platform: “onnxruntime_onnx”
max_batch_size : 128
input [
{
name: “input”
data_type: TYPE_FP32
format: FORMAT_NCHW
dims: [ 3, 224, 224 ]
}
]
output [
{
name: “output”
data_type: TYPE_FP32
dims: [ 1000 ]
}
]

Create the SageMaker endpoint
We use the Boto3 APIs to create the SageMaker endpoint. For this post, we show the steps for the RoBERTA notebook, but these are common steps and will be the same for the ResNet50 model as well.
Create a SageMaker model
We now create a SageMaker model. We use the Amazon Elastic Container Registry (Amazon ECR) image and the model artifact from the previous step to create the SageMaker model.
Create the container
To create the container, we pull the appropriate image from Amazon ECR for Triton Server. SageMaker allows us to customize and inject various environment variables. Some of the key features are the ability to set the BATCH_SIZE; we can set this per model in the config.pbtxt file, or we can define a default value here. For models that can benefit from larger shared memory size, we can set those values under SHM variables. To enable logging, set the log verbose level to true. We use the following code to create the model to use in our endpoint:

mme_triton_image_uri = (
f”{account_id_map[region]}.dkr.ecr.{region}.{base}” + “/sagemaker-tritonserver:22.12-py3”
)
container = {
“Image”: mme_triton_image_uri,
“ModelDataUrl”: mme_path,
“Mode”: “MultiModel”,
“Environment”: {
“SAGEMAKER_TRITON_SHM_DEFAULT_BYTE_SIZE”: “16777216000”, # “16777216”, #”16777216000″,
“SAGEMAKER_TRITON_SHM_GROWTH_BYTE_SIZE”: “10485760”,
},
}
from sagemaker.utils import name_from_base
model_name = name_from_base(f”flan-xxl-fastertransformer”)
print(model_name)
create_model_response = sm_client.create_model(
ModelName=model_name,
ExecutionRoleArn=role,
PrimaryContainer={
“Image”: inference_image_uri,
“ModelDataUrl”: s3_code_artifact
},
)
model_arn = create_model_response[“ModelArn”]
print(f”Created Model: {model_arn}”)

Create a SageMaker endpoint
You can use any instances with multiple GPUs for testing. In this post, we use a g4dn.4xlarge instance. We don’t set the VolumeSizeInGB parameters because this instance comes with local instance storage. The VolumeSizeInGB parameter is applicable to GPU instances supporting the Amazon Elastic Block Store (Amazon EBS) volume attachment. We can leave the model download timeout and container startup health check at the default values. For more details, refer to CreateEndpointConfig.

endpoint_config_response = sm_client.create_endpoint_config(
EndpointConfigName=endpoint_config_name,
ProductionVariants=[{
“VariantName”: “AllTraffic”,
“ModelName”: model_name,
“InstanceType”: “ml.g4dn.4xlarge”,
“InitialInstanceCount”: 1,
#”VolumeSizeInGB” : 200,
#”ModelDataDownloadTimeoutInSeconds”: 600,
#”ContainerStartupHealthCheckTimeoutInSeconds”: 600,
},
],)’

Lastly, we create a SageMaker endpoint:

create_endpoint_response = sm_client.create_endpoint(
EndpointName=f”{endpoint_name}”, EndpointConfigName=endpoint_config_name)

Invoke the model endpoint
This is a generative model, so we pass in the input_ids and attention_mask to the model as part of the payload. The following code shows how to create the tensors:

tokenizer(“This is a sample”, padding=”max_length”, max_length=max_seq_len)

We now create the appropriate payload by ensuring the data type matches what we configured in the config.pbtxt. This also give us the tensors with the batch dimension included, which is what Triton expects. We use the JSON format to invoke the model. Triton also provides a native binary invocation method for the model.

response = runtime_sm_client.invoke_endpoint(
EndpointName=endpoint_name,
ContentType=”application/octet-stream”,
Body=json.dumps(payload),
TargetModel=f”{tar_file_name}”,
# TargetModel=f”roberta-large-v0.tar.gz”,
)

Note the TargetModel parameter in the preceding code. We send the name of the model to be invoked as a request header because this is a multi-model endpoint, therefore we can invoke multiple models at runtime on an already deployed inference endpoint by changing this parameter. This shows the power of multi-model endpoints!
To output the response, we can use the following code:

import numpy as np

resp_bin = response[“Body”].read().decode(“utf8”)
# — keys are — “outputs”:[{“name”:”1550″,”datatype”:”FP32″,”shape”:[1,768],”data”: [0.0013,0,3433…]}]
for data in json.loads(resp_bin)[“outputs”]:
shape_1 = list(data[“shape”])
dat_1 = np.array(data[“data”])
dat_1.resize(shape_1)
print(f”Data Outputs recieved back :Shape:{dat_1.shape}”)

ONNX for performance tuning
The ONNX backend uses C++ arena memory allocation. Arena allocation is a C++-only feature that helps you optimize your memory usage and improve performance. Memory allocation and deallocation constitutes a significant fraction of CPU time spent in protocol buffers code. By default, new object creation performs heap allocations for each object, each of its sub-objects, and several field types, such as strings. These allocations occur in bulk when parsing a message and when building new messages in memory, and associated deallocations happen when messages and their sub-object trees are freed.
Arena-based allocation has been designed to reduce this performance cost. With arena allocation, new objects are allocated out of a large piece of pre-allocated memory called the arena. Objects can all be freed at once by discarding the entire arena, ideally without running destructors of any contained object (though an arena can still maintain a destructor list when required). This makes object allocation faster by reducing it to a simple pointer increment, and makes deallocation almost free. Arena allocation also provides greater cache efficiency: when messages are parsed, they are more likely to be allocated in continuous memory, which makes traversing messages more likely to hit hot cache lines. The downside of arena-based allocation is the C++ heap memory will be over-allocated and stay allocated even after the objects are deallocated. This might lead to out of memory or high CPU memory usage. To achieve the best of both worlds, we use the following configurations provided by Triton and ONNX:

arena_extend_strategy – This parameter refers to the strategy used to grow the memory arena with regards to the size of the model. We recommend setting the value to 1 (= kSameAsRequested), which is not a default value. The reasoning is as follows: the drawback of the default arena extend strategy (kNextPowerOfTwo) is that it might allocate more memory than needed, which could be a waste. As the name suggests, kNextPowerOfTwo (the default) extends the arena by a power of 2, whereas kSameAsRequested extends by a size that is the same as the allocation request each time. kSameAsRequested is suited for advanced configurations where you know the expected memory usage in advance. In our testing, because we know the size of models is a constant value, we can safely choose kSameAsRequested.
gpu_mem_limit – We set the value to the CUDA memory limit. To use all possible memory, pass in the maximum size_t. It defaults to SIZE_MAX if nothing is specified. We recommend keeping it as default.
enable_cpu_mem_arena – This enables the memory arena on CPU. The arena may pre-allocate memory for future usage. Set this option to false if you don’t want it. The default is True. If you disable the arena, heap memory allocation will take time, so inference latency will increase. In our testing, we left it as default.
enable_mem_pattern – This parameter refers to the internal memory allocation strategy based on input shapes. If the shapes are constant, we can enable this parameter to generate a memory pattern for the future and save some allocation time, making it faster. Use 1 to enable the memory pattern and 0 to disable. It’s recommended to set this to 1 when the input features are expected to be the same. The default value is 1.
do_copy_in_default_stream – In the context of the CUDA execution provider in ONNX, a compute stream is a sequence of CUDA operations that are run asynchronously on the GPU. The ONNX runtime schedules operations in different streams based on their dependencies, which helps minimize the idle time of the GPU and achieve better performance. We recommend using the default setting of 1 for using the same stream for copying and compute; however, you can use 0 for using separate streams for copying and compute, which might result in the device pipelining the two activities. In our testing of the ResNet50 model, we used both 0 and 1 but couldn’t find any appreciable difference between the two in terms of performance and memory consumption of the GPU device.
Graph optimization – The ONNX backend for Triton supports several parameters that help fine-tune the model size as well as runtime performance of the deployed model. When the model is converted to the ONNX representation (the first box in the following diagram at the IR stage), the ONNX runtime provides graph optimizations at three levels: basic, extended, and layout optimizations. You can activate all levels of graph optimizations by adding the following parameters in the model configuration file:

optimization {
graph : {
level : 1
}}

cudnn_conv_algo_search – Because we’re using CUDA-based Nvidia GPUs in our testing, for our computer vision use case with the ResNet50 model, we can use the CUDA execution provider-based optimization at the fourth layer in the following diagram with the cudnn_conv_algo_search parameter. The default option is exhaustive (0), but when we changed this configuration to 1 – HEURISTIC, we saw the model latency in steady state reduce to 160 milliseconds. The reason this happens is because the ONNX runtime invokes the lighter weight cudnnGetConvolutionForwardAlgorithm_v7 forward pass and therefore reduces latency with adequate performance.
Run mode – The next step is selecting the correct execution_mode at layer 5 in the following diagram. This parameter controls whether you want to run operators in your graph sequentially or in parallel. Usually when the model has many branches, setting this option to ExecutionMode.ORT_PARALLEL (1) will give you better performance. In the scenario where your model has many branches in its graph, setting the run mode to parallel will help with better performance. The default mode is sequential, so you can enable this to suit your needs.

parameters { key: “execution_mode” value: { string_value: “1” } }

For a deeper understanding of the opportunities for performance tuning in ONNX, refer to the following figure.

Source: https://static.linaro.org/connect/san19/presentations/san19-211.pdf

Benchmark numbers and performance tuning
By turning on the graph optimizations, cudnn_conv_algo_search, and parallel run mode parameters in our testing of the ResNet50 model, we saw the cold start time of the ONNX model graph reduce from 4.4 seconds to 1.61 seconds. An example of a complete model configuration file is provided in the ONNX configuration section of the following notebook.
The testing benchmark results are as follows:

PyTorch – 176 milliseconds, cold start 6 seconds
TensorRT – 174 milliseconds, cold start 4.5 seconds
ONNX – 168 milliseconds, cold start 4.4 seconds

The following graphs visualize these metrics.

Furthermore, in our testing of computer vision use cases, consider sending the request payload in binary format using the HTTP client provided by Triton because it significantly improves model invoke latency.
Other parameters that SageMaker exposes for ONNX on Triton are as follows:

Dynamic batching – Dynamic batching is a feature of Triton that allows inference requests to be combined by the server, so that a batch is created dynamically. Creating a batch of requests typically results in increased throughput. The dynamic batcher should be used for stateless models. The dynamically created batches are distributed to all model instances configured for the model.
Maximum batch size – The max_batch_size property indicates the maximum batch size that the model supports for the types of batching that can be exploited by Triton. If the model’s batch dimension is the first dimension, and all inputs and outputs to the model have this batch dimension, then Triton can use its dynamic batcher or sequence batcher to automatically use batching with the model. In this case, max_batch_size should be set to a value greater than or equal to 1, which indicates the maximum batch size that Triton should use with the model.
Default max batch size – The default-max-batch-size value is used for max_batch_size during autocomplete when no other value is found. The onnxruntime backend will set the max_batch_size of the model to this default value if autocomplete has determined the model is capable of batching requests and max_batch_size is 0 in the model configuration or max_batch_size is omitted from the model configuration. If max_batch_size is more than 1 and no scheduler is provided, the dynamic batch scheduler will be used. The default max batch size is 4.

Clean up
Ensure that you delete the model, model configuration, and model endpoint after running the notebook. The steps to do this are provided at the end of the sample notebook in the GitHub repo.
Conclusion
In this post, we dove deep into the ONNX backend that Triton Inference Server supports on SageMaker. This backend provides for GPU acceleration of your ONNX models. There are many options to consider to get the best performance for inference, such as batch sizes, data input formats, and other factors that can be tuned to meet your needs. SageMaker allows you to use this capability using single-model and multi-model endpoints. MMEs allow a better balance of performance and cost savings. To get started with MME support for GPU, see Host multiple models in one container behind one endpoint.
We invite you to try Triton Inference Server containers in SageMaker, and share your feedback and questions in the comments.

About the authors
Abhi Shivaditya is a Senior Solutions Architect at AWS, working with strategic global enterprise organizations to facilitate the adoption of AWS services in areas such as Artificial Intelligence, distributed computing, networking, and storage. His expertise lies in Deep Learning in the domains of Natural Language Processing (NLP) and Computer Vision. Abhi assists customers in deploying high-performance machine learning models efficiently within the AWS ecosystem.
James Park is a Solutions Architect at Amazon Web Services. He works with Amazon.com to design, build, and deploy technology solutions on AWS, and has a particular interest in AI and machine learning. In h is spare time he enjoys seeking out new cultures, new experiences,  and staying up to date with the latest technology trends.You can find him on LinkedIn.
Rupinder Grewal is a Sr Ai/ML Specialist Solutions Architect with AWS. He currently focuses on serving of models and MLOps on SageMaker. Prior to this role he has worked as Machine Learning Engineer building and hosting models. Outside of work he enjoys playing tennis and biking on mountain trails.
Dhawal Patel is a Principal Machine Learning Architect at AWS. He has worked with organizations ranging from large enterprises to mid-sized startups on problems related to distributed computing, and Artificial Intelligence. He focuses on Deep learning including NLP and Computer Vision domains. He helps customers achieve high performance model inference on SageMaker.

Fast-track graph ML with GraphStorm: A new way to solve problems on en …

We are excited to announce the open-source release of GraphStorm 0.1, a low-code enterprise graph machine learning (ML) framework to build, train, and deploy graph ML solutions on complex enterprise-scale graphs in days instead of months. With GraphStorm, you can build solutions that directly take into account the structure of relationships or interactions between billions of entities, which are inherently embedded in most real-world data, including fraud detection scenarios, recommendations, community detection, and search/retrieval problems.
Until now, it has been notoriously hard to build, train, and deploy graph ML solutions for complex enterprise graphs that easily have billions of nodes, hundreds of billions of edges, and dozens of attributes—just think about a graph capturing Amazon.com products, product attributes, customers, and more. With GraphStorm, we release the tools that Amazon uses internally to bring large-scale graph ML solutions to production. GraphStorm doesn’t require you to be an expert in graph ML and is available under the Apache v2.0 license on GitHub. To learn more about GraphStorm, visit the GitHub repository.
In this post, we provide an introduction to GraphStorm, its architecture, and an example use case of how to use it.
Introducing GraphStorm
Graph algorithms and graph ML are emerging as state-of-the-art solutions for many important business problems like predicting transaction risks, anticipating customer preferences, detecting intrusions, optimizing supply chains, social network analysis, and traffic prediction. For example, Amazon GuardDuty, the native AWS threat detection service, uses a graph with billions of edges to improve the coverage and accuracy of its threat intelligence. This allows GuardDuty to categorize previously unseen domains as highly likely to be malicious or benign based on their association to known malicious domains. By using Graph Neural Networks (GNNs), GuardDuty is able to enhance its capability to alert customers.
However, developing, launching, and operating graph ML solutions takes months and requires graph ML expertise. As a first step, a graph ML scientist has to build a graph ML model for a given use case using a framework like the Deep Graph Library (DGL). Training such models is challenging due to the size and complexity of graphs in enterprise applications, which routinely reach billions of nodes, hundreds of billions of edges, different node and edge types, and hundreds of node and edge attributes. Enterprise graphs can require terabytes of memory storage, requiring graph ML scientists to build complex training pipelines. Finally, after a model has been trained, they have to be deployed for inference, which requires inference pipelines that are just as difficult to build as the training pipelines.
GraphStorm 0.1 is a low-code enterprise graph ML framework that allows ML practitioners to easily pick predefined graph ML models that have been proven to be effective, run distributed training on graphs with billions of nodes, and deploy the models into production. GraphStorm offers a collection of built-in graph ML models, such as Relational Graph Convolutional Networks (RGCN), Relational Graph Attention Networks (RGAT), and Heterogeneous Graph Transformer (HGT) for enterprise applications with heterogeneous graphs, which allow ML engineers with little graph ML expertise to try out different model solutions for their task and select the right one quickly. End-to-end distributed training and inference pipelines, which scale to billion-scale enterprise graphs, make it easy to train, deploy, and run inference. If you are new to GraphStorm or graph ML in general, you will benefit from the pre-defined models and pipelines. If you are an expert, you have all options to tune the training pipeline and model architecture to get the best performance. GraphStorm is built on top of the DGL, a widely popular framework for developing GNN models, and available as open-source code under the Apache v2.0 license.
“GraphStorm is designed to help customers experiment and operationalize graph ML methods for industry applications to accelerate the adoption of graph ML,” says George Karypis, Senior Principal Scientist in Amazon AI/ML research. “Since its release inside Amazon, GraphStorm has reduced the effort to build graph ML-based solutions by up to five times.”
“GraphStorm enables our team to train GNN embedding in a self-supervised manner on a graph with 288 million nodes and 2 billion edges,” Says Haining Yu, Principal Applied Scientist at Amazon Measurement, Ad Tech, and Data Science. “The pre-trained GNN embeddings show a 24% improvement on a shopper activity prediction task over a state-of-the-art BERT- based baseline; it also exceeds benchmark performance in other ads applications.”
“Before GraphStorm, customers could only scale vertically to handle graphs of 500 million edges,” says Brad Bebee, GM for Amazon Neptune and Amazon Timestream. “GraphStorm enables customers to scale GNN model training on massive Amazon Neptune graphs with tens of billions of edges.”
GraphStorm technical architecture
The following figure shows the technical architecture of GraphStorm.

GraphStorm is built on top of PyTorch and can run on a single GPU, multiple GPUs, and multiple GPU machines. It consists of three layers (marked in the yellow boxes in the preceding figure):

Bottom layer (Dist GraphEngine) – The bottom layer provides the basic components to enable distributed graph ML, including distributed graphs, distributed tensors, distributed embeddings, and distributed samplers. GraphStorm provides efficient implementations of these components to scale graph ML training to billion-node graphs.
Middle layer (GS training/inference pipeline) – The middle layer provides trainers, evaluators, and predictors to simplify model training and inference for both built-in models and your custom models. Basically, by using the API of this layer, you can focus on the model development without worrying about how to scale the model training.
Top layer (GS general model zoo) – The top layer is a model zoo with popular GNN and non-GNN models for different graph types. As of this writing, it provides RGCN, RGAT, and HGT for heterogeneous graphs and BERTGNN for textual graphs. In the future, we will add support for temporal graph models such as TGAT for temporal graphs as well as TransE and DistMult for knowledge graphs.

How to use GraphStorm
After installing GraphStorm, you only need three steps to build and train GML models for your application.
First, you preprocess your data (potentially including your custom feature engineering) and transform it into a table format required by GraphStorm. For each node type, you define a table that lists all nodes of that type and their features, providing a unique ID for each node. For each edge type, you similarly define a table in which each row contains the source and destination node IDs for an edge of that type (for more information, see Use Your Own Data Tutorial). In addition, you provide a JSON file that describes the overall graph structure.
Second, via the command line interface (CLI), you use GraphStorm’s built-in construct_graph component for some GraphStorm-specific data processing, which enables efficient distributed training and inference.
Third, you configure the model and training in a YAML file (example) and, again using the CLI, invoke one of the five built-in components (gs_node_classification, gs_node_regression, gs_edge_classification, gs_edge_regression, gs_link_prediction) as training pipelines to train the model. This step results in the trained model artifacts. To do inference, you need to repeat the first two steps to transform the inference data into a graph using the same GraphStorm component (construct_graph) as before.
Finally, you can invoke one of the five built-in components, the same that was used for model training, as an inference pipeline to generate embeddings or prediction results.
The overall flow is also depicted in the following figure.

In the following section, we provide an example use case.
Make predictions on raw OAG data
For this post, we demonstrate how easily GraphStorm can enable graph ML training and inference on a large raw dataset. The Open Academic Graph (OAG) contains five entities (papers, authors, venues, affiliations, and field of study). The raw dataset is stored in JSON files with over 500 GB.
Our task is to build a model to predict the field of study of a paper. To predict the field of study, you can formulate it as a multi-label classification task, but it’s difficult to use one-hot encoding to store the labels because there are hundreds of thousands of fields. Therefore, you should create field of study nodes and formulate this problem as a link prediction task, predicting which field of study nodes a paper node should connect to.
To model this dataset with a graph method, the first step is to process the dataset and extract entities and edges. You can extract five types of edges from the JSON files to define a graph, shown in the following figure. You can use the Jupyter notebook in the GraphStorm example code to process the dataset and generate five entity tables for each entity type and five edge tables for each edge type. The Jupyter notebook also generates BERT embeddings on the entities with text data, such as papers.

After defining the entities and edges between the entities, you can create mag_bert.json, which defines the graph schema, and invoke the built-in graph construction pipeline construct_graph in GraphStorm to build the graph (see the following code). Even though the GraphStorm graph construction pipeline runs in a single machine, it supports multi-processing to process nodes and edge features in parallel (–num_processes) and can store entity and edge features on external memory (–ext-mem-workspace) to scale to large datasets.

python3 -m graphstorm.gconstruct.construct_graph
–num-processes 16
–output-dir /data/oagv2.1/mag_bert_constructed
–graph-name mag –num-partitions 4
–skip-nonexist-edges
–ext-mem-workspace /mnt/raid0/tmp_oag
–ext-mem-feat-size 16 –conf-file mag_bert.json

To process such a large graph, you need a large-memory CPU instance to construct the graph. You can use an Amazon Elastic Compute Cloud (Amazon EC2) r6id.32xlarge instance (128 vCPU and 1 TB RAM) or r6a.48xlarge instances (192 vCPU and 1.5 TB RAM) to construct the OAG graph.
After constructing a graph, you can use gs_link_prediction to train a link prediction model on four g5.48xlarge instances. When using the built-in models, you only invoke one command line to launch the distributed training job. See the following code:

python3 -m graphstorm.run.gs_link_prediction
–num-trainers 8
–part-config /data/oagv2.1/mag_bert_constructed/mag.json
–ip-config ip_list.txt
–cf ml_lp.yaml
–num-epochs 1
–save-model-path /data/mag_lp_model

After the model training, the model artifact is saved in the folder /data/mag_lp_model.
Now you can run link prediction inference to generate GNN embeddings and evaluate the model performance. GraphStorm provides multiple built-in evaluation metrics to evaluate model performance. For link prediction problems, for example, GraphStorm automatically outputs the metric mean reciprocal rank (MRR). MRR is a valuable metric for evaluating graph link prediction models because it assesses how high the actual links are ranked among the predicted links. This captures the quality of predictions, making sure our model correctly prioritizes true connections, which is our objective here.
You can run inference with one command line, as shown in the following code. In this case, the model reaches an MRR of 0.31 on the test set of the constructed graph.

python3 -m graphstorm.run.gs_link_prediction
–inference –num_trainers 8
–part-config /data/oagv2.1/mag_bert_constructed/mag.json
–ip-config ip_list.txt
–cf ml_lp.yaml
–num-epochs 3
–save-embed-path /data/mag_lp_model/emb
–restore-model-path /data/mag_lp_model/epoch-0/

Note that the inference pipeline generates embeddings from the link prediction model. To solve the problem of finding the field of study for any given paper, simply perform a k-nearest neighbor search on the embeddings.
Conclusion
GraphStorm is a new graph ML framework that makes it easy to build, train, and deploy graph ML models on industry graphs. It addresses some key challenges in graph ML, including scalability and usability. It provides built-in components to process billion-scale graphs from raw input data to model training and model inference and has enabled multiple Amazon teams to train state-of-the-art graph ML models in various applications. Check out our GitHub repository for more information.

About the Authors
Da Zheng is a senior applied scientist at AWS AI/ML research leading a graph machine learning team to develop techniques and frameworks to put graph machine learning in production. Da got his PhD in computer science from the Johns Hopkins University.
Florian Saupe is a Principal Technical Product Manager at AWS AI/ML research supporting advanced science teams like the graph machine learning group and improving products like Amazon DataZone with ML capabilities. Before joining AWS, Florian lead technical product management for automated driving at Bosch, was a strategy consultant at McKinsey & Company, and worked as a control systems/robotics scientist – a field in which he holds a phd.

Google Researchers Introduce StyleDrop: An AI Method that Enables the …

A group of researchers from Google have recently unveiled StyleDrop, an innovative neural network developed in collaboration with Muse’s fast text-to-image model. This groundbreaking technology allows users to generate images that faithfully embody a specific visual style, capturing nuances and intricacies. By selecting an original image with the desired style, users can seamlessly transfer it to new images while preserving all the unique characteristics of the chosen style. The versatility of StyleDrop extends to working with entirely different images, enabling users to transform a children’s drawing into a stylized logo or character.

Powered by Muse’s advanced generative vision transformer, StyleDrop undergoes training using a combination of user feedback, generated images, and Clip Score. The neural network is fine-tuned with minimal trainable parameters, comprising less than 1% of the total model parameters. Through iterative training, StyleDrop continually enhances the quality of generated images, ensuring impressive results in just a matter of minutes.

This innovative tool proves invaluable for brands seeking to develop their unique visual style. With StyleDrop, creative teams and designers can efficiently prototype ideas in their preferred manner, making it an indispensable asset. Extensive studies have been conducted on StyleDrop’s performance, comparing it to other methods such as DreamBooth, Textual Inversion on Imagen, and Stable Diffusion. The results consistently showcase StyleDrop’s superiority, delivering high-quality images closely adhering to the user-specified style.

The image generation process of StyleDrop relies on the text-based prompts provided by users. StyleDrop accurately captures the desired style’s essence by appending a natural language style descriptor during training and generation. StyleDrop allows users to train the neural network with their brand assets, facilitating the seamless integration of their unique visual identity.

One of StyleDrop’s most remarkable features is its remarkably efficient generation process, typically taking only three minutes. This quick turnaround time empowers users to explore numerous creative possibilities and experiment with different styles swiftly. However, it is essential to note that while StyleDrop demonstrates immense potential for brand development, the application has not yet been released to the public. 

Additionally, the experiments conducted to assess StyleDrop’s performance provide further evidence of its capabilities and superiority over existing methods. These experiments encompass a variety of styles and demonstrate StyleDrop’s ability to capture the nuances of texture, shading, and structure across a wide range of visual styles. The quantitative results, based on CLIP scores measuring style consistency and textual alignment, reinforce the effectiveness of StyleDrop in faithfully transferring styles.

However, it is crucial to acknowledge the limitations of StyleDrop. While the presented results are impressive, visual styles are diverse and warrant further exploration. Future studies could focus on a more comprehensive examination of various visual styles, including formal attributes, media, history, and art style. Additionally, the societal impact of StyleDrop should be carefully considered, particularly regarding the responsible use of the technology and the potential for unauthorized copying of individual artists’ styles.

StyleDrop represents a significant advancement in the field of neural networks, enabling the faithful transfer of visual styles to new images. With its user-friendly interface and ability to generate high-quality results, StyleDrop is poised to revolutionize brand development and empower creative individuals to express their unique visual identities easily.

Check Out The Paper and Github. Don’t forget to join our 23k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Google Researchers Introduce StyleDrop: An AI Method that Enables the Synthesis of Images that Faithfully Follow a Specific Style Using a Text-to-Image Model appeared first on MarkTechPost.

ETH Zurich and HKUST Researchers Propose HQ-SAM: A High-Quality Zero-S …

Accurate segmentation of multiple objects is essential for various scene understanding applications, such as image/video processing, robotic perception, and AR/VR. The Segment Anything Model (SAM) was recently released, a basic vision model for broad image segmentation. It was trained using billion-scale mask labels. SAM can segment various objects, components, and visual structures in multiple contexts by using a sequence of points, a bounding box, or a coarse mask as input. Its zero-shot segmentation capabilities have sparked a quick paradigm change since they can be used in many applications with just a few basic prompts. 

Despite its outstanding performance, SAM’s segmentation outcomes still need improvement. Two significant issues plague SAM: 1) Rough mask borders, frequently omitting to segment thin object structures, as demonstrated in Figure 1. 2) Wrong forecasts, damaged masks, or significant inaccuracies in difficult instances. This is frequently connected to SAM’s tendency to misread thin structures, like the kite lines in the figure’s top right-hand column. The application and efficacy of fundamental segmentation methods, such as SAM, are significantly constrained by these errors, especially for automated annotation and image/video editing jobs where extremely precise picture masks are essential. 

Figure 1: Ccompares the predicted masks of SAM and our HQ-SAM using input prompts of a single red box or a number of points on the object. With extremely precise bounds, HQ-SAM generates findings that are noticeably more detailed. In the rightmost column, SAM misinterprets the kite lines’ thin structure and generates a significant number of mistakes with broken holes for the input box prompt.

Researchers from ETH Zurich and HKUST suggest HQ-SAM, which maintains the original SAM’s robust zero-shot capabilities and flexibility while being able to anticipate very accurate segmentation masks, even in extremely difficult circumstances (see Figure 1). They suggest a minor adaption of SAM, adding less than 0.5% parameters, to increase its capacity for high-quality segmentation while maintaining efficiency and zero-shot performance. The general arrangement of zero-shot segmentation is substantially hampered by directly adjusting the SAM decoder or adding a new decoder module. Therefore, they suggest the HQ-SAM design completely retains the zero-shot efficiency, integrating with and reusing the current learned SAM structure. 

In addition to the original prompt and output tokens, they create a learnable HQ-Output Token fed into SAM’s mask decoder. Their HQ-Output Token and its related MLP layers are taught to forecast a high-quality segmentation mask, in contrast to the original output tokens. Second, their HQ-Output Token operates on an improved feature set to produce precise mask information instead of only employing the SAM’s mask decoder capabilities. They combine SAM’s mask decoder features with the early and late feature maps from its ViT encoder to use global semantic context and fine-grained local features. 

The complete pre-trained SAM parameters are frozen during training, and just the HQ-Output Token, the related three-layer MLPs, and a tiny feature fusion block are updated. A dataset with precise mask annotations of various objects with intricate and complicated geometries is necessary for learning accurate segmentation. The SA-1B dataset, which has 11M photos and 1.1 billion masks created automatically using a model similar to SAM, is used to train SAM. However, SAM’s performance in Figure 1 shows that employing this large dataset has major economic consequences. It fails to produce the necessary high-quality mask generations targeted in their study. 

As a result, they create HQSeg-44K, a new dataset that comprises 44K highly fine-grained picture mask annotations. Six existing picture datasets are combined with very precise mask annotations to make the HQSeg-44K, which spans over 1,000 different semantic classes. HQ-SAM can be trained on 8 RTX 3090 GPUs in under 4 hours thanks to the smaller dataset and their simple integrated design. They conduct a rigorous quantitative and qualitative experimental study to verify the efficacy of HQ-SAM. 

On a collection of nine distinct segmentation datasets from various downstream tasks, they compare HQ-SAM with SAM, seven of which are under a zero-shot transfer protocol, including COCO, UVO, LVIS, HQ-YTVIS, BIG, COIFT, and HR-SOD. This thorough analysis shows that the proposed HQ-SAM can manufacture masks of a greater caliber while still having a zero-shot capability compared to SAM. A virtual demo is present on their GitHub page.

the first high-quality zero-shot segmentation model by introducing negligible overhead to the original SAM

Check Out The Paper and Github. Don’t forget to join our 23k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post ETH Zurich and HKUST Researchers Propose HQ-SAM: A High-Quality Zero-Shot Segmentation Model By Introducing Negligible Overhead To The Original SAM appeared first on MarkTechPost.

Meet Pix2Act: An AI Agent That Can Interact With GUIs Using The Same C …

By enabling users to connect with tools and services, systems that can follow directions from graphical user interfaces (GUIs) can automate laborious jobs, increase accessibility, and increase the utility of digital assistants.

Many GUI-based digital agent implementations rely on HTML-derived textual representations, which aren’t always readily available. People utilize GUIs by perceiving the visual input and acting on it with standard mouse and keyboard shortcuts; they don’t need to look at the application’s source code to figure out how the program works. Regardless of the underlying technology, they can rapidly pick up new programs with intuitive graphical user interfaces. 

The Atari game system is just one example of how well a system that learns from pixel-only inputs may do. However, there are many obstacles presented by learning from pixel-only inputs in conjunction with generic low-level actions when attempting GUI-based instruction following tasks. To visually interpret a GUI, one must be familiar with the interface’s structure, able to recognize and interpret visually located natural language, recognize and identify visual elements and forecast the functions and interaction methods of those elements. 

Google DeepMind and Google introduce PIX2ACT, a model that takes pixel-based screenshots as input and chooses actions matching fundamental mouse and keyboard controls. For the first time, the research group demonstrates that an agent with only pixel inputs and a generic action space can outperform human crowdworkers, achieving performance on par with state-of-the-art agents that use DOM information and a comparable number of human demonstrations.

For this, the researchers expand upon PIX2STRUCT. This Transformer-based image-to-text model has already been trained on large-scale online data to convert screenshots into structured representations based on HTML. PIX2ACT applies tree search to repeatedly construct new expert trajectories for training, employing a combination of human demonstrations and interactions with the environment. 

The team’s effort here entails the creation of a framework for universal browser-based environments and adapting two benchmark datasets, MiniWob++ and WebShop, for use in their environment using a standard, cross-domain observation and action format. Using their proposed option (CC-Net without DOM), PIX2ACT outperforms human crowdworkers approximately four times on MiniWob++. Ablations demonstrate that PIX2STRUCT’s pixel-based pre-training is essential to PIX2ACT’s performance.

For GUI-based instruction following pixel-based inputs, the findings demonstrate the efficacy of PIX2STRUCT’s pre-training via screenshot parsing. Pre-training in a behavioral cloning environment raises MiniWob++ and WebShop task scores by 17.1 and 46.7, respectively. Although there is still a performance disadvantage compared to larger language models using HTML-based inputs and task-specific actions, this work set the first baseline in this environment. 

Check Out The Paper. Don’t forget to join our 23k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Meet Pix2Act: An AI Agent That Can Interact With GUIs Using The Same Conceptual Interface That Humans Commonly Use Via Pixel-Based Screenshots And Generic Keyboard And Mouse Actions appeared first on MarkTechPost.

Get started with the open-source Amazon SageMaker Distribution

Data scientists need a consistent and reproducible environment for machine learning (ML) and data science workloads that enables managing dependencies and is secure. AWS Deep Learning Containers already provides pre-built Docker images for training and serving models in common frameworks such as TensorFlow, PyTorch, and MXNet. To improve this experience, we announced a public beta of the SageMaker open-source distribution at 2023 JupyterCon. This provides a unified end-to-end ML experience across ML developers of varying levels of expertise. Developers no longer need to switch between different framework containers for experimentation, or as they move from local JupyterLab environments and SageMaker notebooks to production jobs on SageMaker. The open-source SageMaker Distribution supports the most common packages and libraries for data science, ML, and visualization, such as TensorFlow, PyTorch, Scikit-learn, Pandas, and Matplotlib. You can start using the container from the Amazon ECR Public Gallery starting today.
In this post, we show you how you can use the SageMaker open-source distribution to quickly experiment on your local environment and easily promote them to jobs on SageMaker.
Solution overview
For our example, we showcase training an image classification model using PyTorch. We use the KMNIST dataset available publicly on PyTorch. We train a neural network model, test the model’s performance, and finally print the training and test loss. The full notebook for this example is available in the SageMaker Studio Lab examples repository. We start experimentation on a local laptop using the open-source distribution, move it to Amazon SageMaker Studio for using a larger instance, and then schedule the notebook as a notebook job.
Prerequisites
You need the following prerequisites:

Docker installed.
An active AWS account with administrator permissions.
An environment with the AWS Command Line Interface (AWS CLI) and Docker installed.
An existing SageMaker domain. To create a domain, refer to Onboard to Amazon SageMaker Domain.

Set up your local environment
You can directly start using the open-source distribution on your local laptop. To start JupyterLab, run the following commands on your terminal:

export ECR_IMAGE_ID=’public.ecr.aws/sagemaker/sagemaker-distribution:latest-cpu’
docker run -it
-p 8888:8888
–user `id -u`:`id -g`
-v `pwd`/sample-notebooks:/home/sagemaker-user/sample-notebooks
$ECR_IMAGE_ID jupyter-lab –no-browser –ip=0.0.0.0

You can replace ECR_IMAGE_ID with any of the image tags available in the Amazon ECR Public Gallery, or choose the latest-gpu tag if you are using a machine that supports GPU.
This command will start JupyterLab and provide a URL on the terminal, like http://127.0.0.1:8888/lab?token=<token>. Copy the link and enter it in your preferred browser to start JupyterLab.
Set up Studio
Studio is an end-to-end integrated development environment (IDE) for ML that lets developers and data scientists build, train, deploy, and monitor ML models at scale. Studio provides an extensive list of first-party images with common frameworks and packages, such as Data Science, TensorFlow, PyTorch, and Spark. These images make it simple for data scientists to get started with ML by simply choosing a framework and instance type of their choice for compute.
You can now use the SageMaker open-source distribution on Studio using Studio’s bring your own image feature. To add the open-source distribution to your SageMaker domain, complete the following steps:

Add the open-source distribution to your account’s Amazon Elastic Container Registry (Amazon ECR) repository by running the following commands on your terminal:

# Use the latest-cpu or latest-gpu tag based on your requirements
export ECR_GALLERY_IMAGE_ID=’sagemaker-distribution:latest-cpu’
export SAGEMAKER_IMAGE_NAME=’sagemaker-runtime’
export SAGEMAKER_STUDIO_DOMAIN_ID=’d-xxxx’
export SAGEMAKER_STUDIO_IAM_ROLE_ARN='<studio-default-execution-role-arn>’

docker pull public.ecr.aws/sagemaker/$ECR_GALLERY_IMAGE_ID

export ECR_PRIVATE_REPOSITORY_NAME=’sm-distribution’
export ECR_IMAGE_TAG=’sagemaker-runtime-cpu’
export AWS_ACCOUNT_ID=’0123456789′
export AWS_ECR_REPOSITORY_REGION=’us-east-1′

# create repository
aws –region ${AWS_ECR_REPOSITORY_REGION} ecr create-repository –repository-name $ECR_PRIVATE_REPOSITORY_NAME
aws –region ${AWS_ECR_REPOSITORY_REGION} ecr get-login-password | docker login –username AWS –password-stdin ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_ECR_REPOSITORY_REGION}.amazonaws.com
export ECR_IMAGE_URI=$AWS_ACCOUNT_ID.dkr.ecr.$AWS_ECR_REPOSITORY_REGION.amazonaws.com/$ECR_PRIVATE_REPOSITORY_NAME:$ECR_IMAGE_TAG

# Tag
docker tag public.ecr.aws/sagemaker/$ECR_GALLERY_IMAGE_ID $ECR_IMAGE_URI
# Push the image to your private repository
docker push $ECR_IMAGE_URI

Create a SageMaker image and attach the image to the Studio domain:

# Create a SageMaker image
aws sagemaker create-image
–image-name $SAGEMAKER_IMAGE_NAME
–role-arn $SAGEMAKER_STUDIO_IAM_ROLE_ARN
# Create a SageMaker Image Version.
aws sagemaker create-image-version
–image-name $SAGEMAKER_IMAGE_NAME
–base-image $ECR_IMAGE_URI

# Optionally, describe the image version to ensure it’s succesfully created
aws sagemaker describe-image-version
–image-name $SAGEMAKER_IMAGE_NAME
–version-number 1

# Create the app image configuration file
cat > /tmp/app-config.json << EOF
{
“AppImageConfigName”: “app-image-config-$SAGEMAKER_IMAGE_NAME”,
“KernelGatewayImageConfig”: {
“FileSystemConfig”: {
“DefaultGid”: 100,
“DefaultUid”: 1000,
“MountPath”: “/home/sagemaker-user”
},
“KernelSpecs”: [
{
“DisplayName”: “Python 3 (ipykernel)”,
“Name”: “python3”
}
]
}
}
EOF

# Create an Amazon SageMaker App Image Config.
aws sagemaker create-app-image-config
–cli-input-json file:///tmp/app-config.json

# Create a default user settings file
# Update the file with your existing settings if you have additional custom images
cat > /tmp/default-user-settings.json << EOF
{
“DefaultUserSettings”: {
“KernelGatewayAppSettings”: {
“CustomImages”: [
{
“ImageName”: “$SAGEMAKER_IMAGE_NAME”,
“AppImageConfigName”: “app-image-config-$SAGEMAKER_IMAGE_NAME”,
“ImageVersionNumber”: 1
}
]
}
}
}
EOF

# Update Amazon SageMaker Domain with the new default User Settings.
aws sagemaker update-domain
–domain-id $SAGEMAKER_STUDIO_DOMAIN_ID
–cli-input-json file:///tmp/default-user-settings.json

On the SageMaker console, launch Studio by choosing your domain and existing user profile.
Optionally, restart Studio by following the steps in Shut down and update SageMaker Studio.

Download the notebook
Download the sample notebook locally from the GitHub repo.
Open the notebook in your choice of IDE and add a cell to the beginning of the notebook to install torchsummary. The torchsummary package is not part of the distribution, and installing this on the notebook will ensure the notebook runs end to end. We recommend using conda or micromamba to manage environments and dependencies. Add the following cell to the notebook and save the notebook:

%pip install torchsummary

Experiment on the local notebook
Upload the notebook to the JupyterLab UI you launched by choosing the upload icon as shown in the following screenshot.

When it’s uploaded, launch the cv-kmnist.ipynb notebook. You can start running the cells immediately, without having to install any dependencies such as torch, matplotlib, or ipywidgets.
If you followed the preceding steps, you can see that you can use the distribution locally from your laptop. In the next step, we use the same distribution on Studio to take advantage of Studio’s features.
Move the experimentation to Studio (optional)
Optionally, let’s promote the experimentation to Studio. One of the advantages of Studio is that the underlying compute resources are fully elastic, so you can easily dial the available resources up or down, and the changes take place automatically in the background without interrupting your work. If you wanted to run the same notebook from earlier on a larger dataset and compute instance, you can migrate to Studio.
Navigate to the Studio UI you launched earlier and choose the upload icon to upload the notebook.

After you launch the notebook, you will be prompted to choose the image and instance type. On the kernel launcher, choose sagemaker-runtime as the image and an ml.t3.medium instance, then choose Select.

You can now run the notebook end to end without needing any changes on the notebook from your local development environment to Studio notebooks!
Schedule the notebook as a job
When you’re done with your experimentation, SageMaker provides multiple options to productionalize your notebook, such as training jobs and SageMaker pipelines. One such option is to directly run the notebook itself as a non-interactive, scheduled notebook job using SageMaker notebook jobs. For example, you might want to retrain your model periodically, or get inferences on incoming data periodically and generate reports for consumption by your stakeholders.
From Studio, choose the notebook job icon to launch the notebook job. If you have installed the notebook jobs extension locally on your laptop, you can also schedule the notebook directly from your laptop. See Installation Guide to set up the notebook jobs extension locally.

The notebook job automatically uses the ECR image URI of the open-source distribution, so you can directly schedule the notebook job.

Choose Run on schedule, choose a schedule, for example every week on Saturday, and choose Create. You can also choose Run now if you’d like to view the results immediately.

When the first notebook job is complete, you can view the notebook outputs directly from the Studio UI by choosing Notebook under Output files.

Additional considerations
In addition to using the publicly available ECR image directly for ML workloads, the open-source distribution offers the following advantages:

The Dockerfile used to build the image is available publicly for developers to explore and build their own images. You can also inherit this image as the base image and install your custom libraries to have a reproducible environment.
If you’re not used to Docker and prefer to use Conda environments on your JupyterLab environment, we provide an env.out file for each of the published versions. You can use the instructions in the file to create your own Conda environment that will mimic the same environment. For example, see the CPU environment file cpu.env.out.
You can use the GPU versions of the image to run GPU-compatible workloads such as deep learning and image processing.

Clean up
Complete the following steps to clean up your resources:

If you have scheduled your notebook to run on a schedule, pause or delete the schedule on the Notebook Job Definitions tab to avoid paying for future jobs.
Shut down all Studio apps to avoid paying for unused compute usage. See Shut down and Update Studio Apps for instructions.
Optionally, delete the Studio domain if you created one.

Conclusion
Maintaining a reproducible environment across different stages of the ML lifecycle is one of the biggest challenges for data scientists and developers. With the SageMaker open-source distribution, we provide an image with mutually compatible versions of the most common ML frameworks and packages. The distribution is also open source, providing developers with transparency into the packages and build processes, making it easier to customize their own distribution.
In this post, we showed you how to use the distribution on your local environment, on Studio, and as the container for your training jobs. This feature is currently in public beta. We encourage you to try this out and share your feedback and issues on the public GitHub repository!

About the authors
Durga Sury is an ML Solutions Architect on the Amazon SageMaker Service SA team. She is passionate about making machine learning accessible to everyone. In her 4 years at AWS, she has helped set up AI/ML platforms for enterprise customers. When she isn’t working, she loves motorcycle rides, mystery novels, and long walks with her 5-year-old husky.
Ketan Vijayvargiya is a Senior Software Development Engineer in Amazon Web Services (AWS). His focus areas are machine learning, distributed systems and open source. Outside work, he likes to spend his time self-hosting and enjoying nature.

Exploring Generative AI in conversational experiences: An Introduction …

Customers expect quick and efficient service from businesses in today’s fast-paced world. But providing excellent customer service can be significantly challenging when the volume of inquiries outpaces the human resources employed to address them. However, businesses can meet this challenge while providing personalized and efficient customer service with the advancements in generative artificial intelligence (generative AI) powered by large language models (LLMs).
Generative AI chatbots have gained notoriety for their ability to imitate human intellect. However, unlike task-oriented bots, these bots use LLMs for text analysis and content generation. LLMs are based on the Transformer architecture, a deep learning neural network introduced in June 2017 that can be trained on a massive corpus of unlabeled text. This approach creates a more human-like conversation experience and accommodates several topics.
As of this writing, companies of all sizes want to use this technology but need help figuring out where to start. If you are looking to get started with generative AI and the use of LLMs in conversational AI, this post is for you. We have included a sample project to quickly deploy an Amazon Lex bot that consumes a pre-trained open-source LLM. The code also includes the starting point to implement a custom memory manager. This mechanism allows an LLM to recall previous interactions to keep the conversation’s context and pace. Finally, it’s essential to highlight the importance of experimenting with fine-tuning prompts and LLM randomness and determinism parameters to obtain consistent results.
Solution overview
The solution integrates an Amazon Lex bot with a popular open-source LLM from Amazon SageMaker JumpStart, accessible through an Amazon SageMaker endpoint. We also use LangChain, a popular framework that simplifies LLM-powered applications. Finally, we use a QnABot to provide a user interface for our chatbot.

First, we start by describing each component in the preceding diagram:

JumpStart offers pre-trained open-source models for various problem types. This enables you to begin machine learning (ML) quickly. It includes the FLAN-T5-XL model, an LLM deployed into a deep learning container. It performs well on various natural language processing (NLP) tasks, including text generation.
A SageMaker real-time inference endpoint enables fast, scalable deployment of ML models for predicting events. With the ability to integrate with Lambda functions, the endpoint allows for building custom applications.
The AWS Lambda function uses the requests from the Amazon Lex bot or the QnABot to prepare the payload to invoke the SageMaker endpoint using LangChain. LangChain is a framework that lets developers create applications powered by LLMs.
The Amazon Lex V2 bot has the built-in AMAZON.FallbackIntent intent type. It is triggered when a user’s input doesn’t match any intents in the bot.
The QnABot is an open-source AWS solution to provide a user interface to Amazon Lex bots. We configured it with a Lambda hook function for a CustomNoMatches item, and it triggers the Lambda function when QnABot can’t find an answer. We assume you have already deployed it and included the steps to configure it in the following sections.

The solution is described at a high level in the following sequence diagram.
Major tasks performed by the solution
In this section, we look at the major tasks performed in our solution. This solution’s entire project source code is available for your reference in this GitHub repository.
Handling chatbot fallbacks
The Lambda function handles the “don’t know” answers via AMAZON.FallbackIntent in Amazon Lex V2 and the CustomNoMatches item in QnABot. When triggered, this function looks at the request for a session and the fallback intent. If there is a match, it hands off the request to a Lex V2 dispatcher; otherwise, the QnABot dispatcher uses the request. See the following code:

def dispatch_lexv2(request):
“””Summary
Args:
request (dict): Lambda event containing a user’s input chat message and context (historical conversation)
Uses the LexV2 sessions API to manage past inputs https://docs.aws.amazon.com/lexv2/latest/dg/using-sessions.html

Returns:
dict: Description
“””
lexv2_dispatcher = LexV2SMLangchainDispatcher(request)
return lexv2_dispatcher.dispatch_intent()

def dispatch_QnABot(request):
“””Summary

Args:
request (dict): Lambda event containing a user’s input chat message and context (historical conversation)

Returns:
dict: Dict formatted as documented to be a lambda hook for a “don’t know” answer for the QnABot on AWS Solution
see https://docs.aws.amazon.com/solutions/latest/QnABot-on-aws/specifying-lambda-hook-functions.html
“””
request[‘res’][‘message’] = “Hi! This is your Custom Python Hook speaking!”
qna_intent_dispatcher = QnASMLangchainDispatcher(request)
return qna_intent_dispatcher.dispatch_intent()

def lambda_handler(event, context):
print(event)
if ‘sessionState’ in event:
if ‘intent’ in event[‘sessionState’]:
if ‘name’ in event[‘sessionState’][‘intent’]:
if event[‘sessionState’][‘intent’][‘name’] == ‘FallbackIntent’:
return dispatch_lexv2(event)
else:
return dispatch_QnABot(event)

Providing memory to our LLM
To preserve the LLM memory in a multi-turn conversation, the Lambda function includes a LangChain custom memory class mechanism that uses the Amazon Lex V2 Sessions API to keep track of the session attributes with the ongoing multi-turn conversation messages and to provide context to the conversational model via previous interactions. See the following code:

class LexConversationalMemory(BaseMemory, BaseModel):

“””Langchain Custom Memory class that uses Lex Conversation history

Attributes:
history (dict): Dict storing conversation history that acts as the Langchain memory
lex_conv_context (str): LexV2 sessions API that serves as input for convo history
Memory is loaded from here
memory_key (str): key to for chat history Langchain memory variable – “history”
“””
history = {}
memory_key = “chat_history” #pass into prompt with key
lex_conv_context = “”

def clear(self):
“””Clear chat history
“””
self.history = {}

@property
def memory_variables(self) -> List[str]:
“””Load memory variables

Returns:
List[str]: List of keys containing Langchain memory
“””
return [self.memory_key]

def load_memory_variables(self, inputs: Dict[str, Any]) -> Dict[str, str]:
“””Load memory from lex into current Langchain session memory

Args:
inputs (Dict[str, Any]): User input for current Langchain session

Returns:
Dict[str, str]: Langchain memory object
“””
input_text = inputs[list(inputs.keys())[0]]

ccontext = json.loads(self.lex_conv_context)
memory = {
self.memory_key: ccontext[self.memory_key] + input_text + “nAI: “,
}
return memory

The following is the sample code we created for introducing the custom memory class in a LangChain ConversationChain:

# Create a conversation chain using the prompt,
# llm hosted in Sagemaker, and custom memory class
self.chain = ConversationChain(
llm=sm_flant5_llm,
prompt=prompt,
memory=LexConversationalMemory(lex_conv_context=lex_conv_history),
verbose=True
)

Prompt definition
A prompt for an LLM is a question or statement that sets the tone for the generated response. Prompts function as a form of context that helps direct the model toward generating relevant responses. See the following code:

# define prompt
prompt_template = “””The following is a friendly conversation between a human and an AI. The AI is
talkative and provides lots of specific details from its context. If the AI does not know
the answer to a question, it truthfully says it does not know. You are provided with information
about entities the Human mentions, if relevant.

Chat History:
{chat_history}

Conversation:
Human: {input}
AI:”””

Using an Amazon Lex V2 session for LLM memory support
Amazon Lex V2 initiates a session when a user interacts to a bot. A session persists over time unless manually stopped or timed out. A session stores metadata and application-specific data known as session attributes. Amazon Lex updates client applications when the Lambda function adds or changes session attributes. The QnABot includes an interface to set and get session attributes on top of Amazon Lex V2.
In our code, we used this mechanism to build a custom memory class in LangChain to keep track of the conversation history and enable the LLM to recall short-term and long-term interactions. See the following code:

class LexV2SMLangchainDispatcher():

def __init__(self, intent_request):
# See lex bot input format to lambda https://docs.aws.amazon.com/lex/latest/dg/lambda-input-response-format.html
self.intent_request = intent_request
self.localeId = self.intent_request[‘bot’][‘localeId’]
self.input_transcript = self.intent_request[‘inputTranscript’] # user input
self.session_attributes = utils.get_session_attributes(
self.intent_request)
self.fulfillment_state = “Fulfilled”
self.text = “” # response from endpoint
self.message = {‘contentType’: ‘PlainText’,’content’: self.text}

class QnABotSMLangchainDispatcher():
def __init__(self, intent_request):
# QnABot Session attributes
self.intent_request = intent_request
self.input_transcript = self.intent_request[‘req’][‘question’]
self.intent_name = self.intent_request[‘req’][‘intentname’]
self.session_attributes = self.intent_request[‘req’][‘session’]

Prerequisites
To get started with the deployment, you need to fulfill the following prerequisites:

Access to the AWS Management Console via a user who can launch AWS CloudFormation stacks
Familiarity navigating the Lambda and Amazon Lex consoles

Deploy the solution
To deploy the solution, proceed with the following steps:

Choose Launch Stack to launch the solution in the us-east-1 Region:
For Stack name, enter a unique stack name.
For HFModel, we use the Hugging Face Flan-T5-XL model available on JumpStart.
For HFTask, enter text2text.
Keep S3BucketName as is.

These are used to find Amazon Simple Storage Service (Amazon S3) assets needed to deploy the solution and may change as updates to this post are published.

Acknowledge the capabilities.
Choose Create stack.

There should be four successfully created stacks.

Configure the Amazon Lex V2 bot
There is nothing to do with the Amazon Lex V2 bot. Our CloudFormation template already did the heavy lifting.
Configure the QnABot
We assume you already have an existing QnABot deployed in your environment. But if you need help, follow these instructions to deploy it.

On the AWS CloudFormation console, navigate to the main stack that you deployed.
On the Outputs tab, make a note of the LambdaHookFunctionArn because you need to insert it in the QnABot later.

Log in to the QnABot Designer User Interface (UI) as an administrator.
In the Questions UI, add a new question.

Enter the following values:

ID – CustomNoMatches
Question – no_hits
Answer – Any default answer for “don’t know”

Choose Advanced and go to the Lambda Hook section.
Enter the Amazon Resource Name (ARN) of the Lambda function you noted previously.

Scroll down to the bottom of the section and choose Create.

You get a window with a success message.

Your question is now visible on the Questions page.

Test the solution
Let’s proceed with testing the solution. First, it’s worth mentioning that we deployed the FLAN-T5-XL model provided by JumpStart without any fine-tuning. This may have some unpredictability, resulting in slight variations in responses.
Test with an Amazon Lex V2 bot
This section helps you test the Amazon Lex V2 bot integration with the Lambda function that calls the LLM deployed in the SageMaker endpoint.

On the Amazon Lex console, navigate to the bot entitled Sagemaker-Jumpstart-Flan-LLM-Fallback-Bot. This bot has been configured to call the Lambda function that invokes the SageMaker endpoint hosting the LLM as a fallback intent when no other intents are matched.
Choose Intents in the navigation pane.

On the top right, a message reads, “English (US) has not built changes.”

Choose Build.
Wait for it to complete.

Finally, you get a success message, as shown in the following screenshot.

Choose Test.

A chat window appears where you can interact with the model.

We recommend exploring the built-in integrations between Amazon Lex bots and Amazon Connect. And also, messaging platforms (Facebook, Slack, Twilio SMS) or third-party Contact Centers using Amazon Chime SDK and Genesys Cloud, for example.
Test with a QnABot instance
This section tests the QnABot on AWS integration with the Lambda function that calls the LLM deployed in the SageMaker endpoint.

Open the tools menu in the top left corner.

Choose QnABot Client.

Choose Sign In as Admin.

Enter any question in the user interface.
Evaluate the response.

Clean up
To avoid incurring future charges, delete the resources created by our solution by following these steps:

On the AWS CloudFormation console, select the stack named SagemakerFlanLLMStack (or the custom name you set to the stack).
Choose Delete.
If you deployed the QnABot instance for your tests, select the QnABot stack.
Choose Delete.

Conclusion
In this post, we explored the addition of open-domain capabilities to a task-oriented bot that routes the user requests to an open-source large language model.
We encourage you to:

Save the conversation history to an external persistence mechanism. For example, you can save the conversation history to Amazon DynamoDB or an S3 bucket and retrieve it in the Lambda function hook. In this way, you don’t need to rely on the internal non-persistent session attributes management offered by Amazon Lex.
Experiment with summarization – In multiturn conversations, it’s helpful to generate a summary that you can use in your prompts to add context and limit the usage of conversation history. This helps to prune the bot session size and keep the Lambda function memory consumption low.
Experiment with prompt variations –  Modify the original prompt description that matches your experimentation purposes.
Adapt the language model for optimal results – You can do this by fine-tuning the advanced LLM parameters such as randomness (temperature) and determinism (top_p) according to your applications. We demonstrated a sample integration using a pre-trained model with sample values, but have fun adjusting the values for your use cases.

In our next post, we plan to help you discover how to fine-tune pre-trained LLM-powered chatbots with your own data.
Are you experimenting with LLM chatbots on AWS? Tell us more in the comments!
Resources and references

Companion source code for this post
Amazon Lex V2 Developer Guide
AWS Solutions Library: QnABot on AWS
Text2Text Generation with FLAN T5 models
LangChain – Building applications with LLMs
Amazon SageMaker Examples with Jumpstart Foundation Models
Amazon BedRock – The easiest way to build and scale generative AI applications with foundation models
Quickly build high-accuracy Generative AI applications on enterprise data using Amazon Kendra, LangChain, and large language models

About the Authors
Marcelo Silva is an experienced tech professional who excels in designing, developing, and implementing cutting-edge products. Starting off his career at Cisco, Marcelo worked on various high-profile projects including deployments of the first ever carrier routing system and the successful rollout of ASR9000. His expertise extends to cloud technology, analytics, and product management, having served as senior manager for several companies like Cisco, Cape Networks, and AWS before joining GenAI. Currently working as a Conversational AI/GenAI Product Manager, Marcelo continues to excel in delivering innovative solutions across industries.
Victor Rojo is a highly experienced technologist who is passionate about the latest in AI, ML, and software development. With his expertise, he played a pivotal role in bringing Amazon Alexa to the US and Mexico markets while spearheading the successful launch of Amazon Textract and AWS Contact Center Intelligence (CCI) to AWS Partners. As the current Principal Tech Leader for the Conversational AI Competency Partners program, Victor is committed to driving innovation and bringing cutting-edge solutions to meet the evolving needs of the industry.
Justin Leto is a Sr. Solutions Architect at Amazon Web Services with a specialization in machine learning. His passion is helping customers harness the power of machine learning and AI to drive business growth. Justin has presented at global AI conferences, including AWS Summits, and lectured at universities. He leads the NYC machine learning and AI meetup. In his spare time, he enjoys offshore sailing and playing jazz. He lives in New York City with his wife and baby daughter.
Ryan Gomes is a Data & ML Engineer with the AWS Professional Services Intelligence Practice. He is passionate about helping customers achieve better outcomes through analytics and machine learning solutions in the cloud. Outside work, he enjoys fitness, cooking, and spending quality time with friends and family.
Mahesh Birardar is a Sr. Solutions Architect at Amazon Web Services with specialization in DevOps and Observability. He enjoys helping customers implement cost-effective architectures that scale. Outside work, he enjoys watching movies and hiking.
Kanjana Chandren is a Solutions Architect at Amazon Web Services (AWS) who is passionate about Machine Learning. She helps customers in designing, implementing and managing their AWS workloads. Outside of work she loves travelling, reading and spending time with family and friends.