How Can We Generate A New Concept That Has Never Been Seen? Researcher …

Recent developments in the field of Artificial Intelligence have led to solutions to a variety of use cases. Different text-to-image generative models have paved the way for an exciting new field where written words can be transformed into vibrant, engrossing visual representations. The capacity to conceptualize distinctive ideas inside fresh circumstances has been further expanded by the explosion of personalization techniques as a logical evolution. A number of algorithms have been developed that simulate creative behaviors or aim to enhance and augment human creative processes.

Researchers have been putting in efforts to find out how one can use these technologies to create wholly original and inventive notions. For that, in a recent research paper, a team of researchers introduced Concept Lab in the field of inventive text-to-image generation. The basic goal in this domain is to provide fresh examples that fall within a broad categorization. Considering the challenge of developing a new breed of pet that is radically different from all the breeds we are accustomed to, the domain of Diffusion Prior models is the main tool in this research.

This approach has drawn its inspiration from token-based personalization, which is a pre-trained generative model’s text encoder using a token to express a unique concept. Since there are no previous photographs of the intended subject, creating a new notion is more difficult than using a conventional inversion technique. The CLIP vision-language model has been used to direct the optimization process in order to address this. There are positive and negative sides to the limitations; while the negative limitations cover the existing members of the category from which the generation should deviate, the positive constraint promotes the development of images that are in line with the wide category.

The authors have shown how the difficulty of creating really original content can be effectively articulated as an optimization process occurring over the diffusion prior to output space. The process of optimization results in what they refer to as prior constraints. The researchers have incorporated a question-answering model into the framework to ensure that the generated concepts do not simply converge toward existing category members. This adaptive model is crucial to the optimization process by repeatedly adding new restrictions.

These extra constraints have guided the optimization process, which encourages it to find increasingly unique and distinctive inventions. The model gradually explores unknown areas of imagination thanks to the adaptive nature of this system, which encourages the model to push its creative limits. The authors have also emphasized the adaptability of the suggested previous limitations. They act as a powerful mixing mechanism in addition to making it easier to create solo, original concepts. The capacity to mix concepts allows for the creation of hybrids, which are creative fusions of the generated notions. This additional degree of adaptability enhances the creative process and produces even more interesting and varied outcomes.

In conclusion, the main goal of this study is to develop unique and creative notions by combining contemporary text-to-image generating models, under-researched Diffusion Prior models, and an adaptive constraint expansion mechanism powered by a question-answering model. The result is a thorough strategy that produces original, eye-catching content and encourages a fluid exploration of creative space.

Check out the Paper, Project Page, and Github. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 28k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

The post How Can We Generate A New Concept That Has Never Been Seen? Researchers at Tel Aviv University Propose ConceptLab: Creative Generation Using Diffusion Prior Constraints appeared first on MarkTechPost.

Researchers from the University of Pennsylvania are Developing Machine …

AI has emerged as a beacon of hope for individuals by analyzing a certain genetic variation in minimizing the risk of kidney transplantation. The evaluation of graft failure risks in kidney transplants has traditionally relied on HLA (Human Leukocyte Antigen) mismatches. A research team from the University of Pennsylvania has explored an innovative machine-learning algorithm that can help unveil the hidden connections between amino-acid mismatches (AA-MMs) and the likelihood of graft failure. 

Their approach, termed FIBRES (Feature Inclusion Bin Evolver for Risk Stratification), utilizes evolutionary algorithms to automatically construct AA-MMs bins, minimizing the assumptions about bin composition. It helps in effectively stratifying the transplant pairs into high-risk and low-risk groups for graft survival. By analyzing a dataset of 1,66,754 dataset of kidney transplants of deceased donors from (the Scientific Registry of Transplant Recipients)SRTR data using the FIBRES approach, the researchers found the limitations of traditional methods in graft failure risk. They emphasized the role of amino acid variability, allowing FIBRES to identify more than twice the number of low-risk patients.

FIBRES harnessed an evolutionary algorithm to iteratively optimize the AA-MMs bins’ fitness for graft failure risk stratification. It selected higher performing bind as “parent “ for generating novel offspring bins by ‘recombining’ (i.e., crossover) and ‘mutating’ (i.e., replacing, adding, and deleting) the AA positions within bins. FIBRES incorporates a “risk strata minimum” to ensure the statistical reliability of the results obtained. 

This approach is applied in three analyses:(1) constructing bins using AA-MMs across five HLA loci and comparing risk stratification, (2) Binning AA-MMs within each HLA separately, and (3) Evaluating the performance using cross-validation. It helped in enhancing the risk stratification compared to 0- ABDR antigen mismatch. It was found that 24.4% of kidney transplants were low risk by AA-MM assessment versus 9.1% by 0-ABDR. Cross-validation demonstrated the generalisability of FIBERS bin risk prediction, confirming their robustness.

The researchers highlighted that FIBRES could be more holistic in determining which AA-MMs impact risk. However, they require much larger datasets. In the future, the researchers aim to address limitations by (1) extending binning to additional HLA loci, (2) comparing results between first transplant and re-transplant recipients, and (3) adapting FIBERS to optimize bins that can stratify donor/recipient pairs into any number of risk groups, learn group cutoffs, and learn AA-MM weights to infer the importance of a given MM. 

Check out the Paper and Reference Article. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 28k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

The post Researchers from the University of Pennsylvania are Developing Machine Learning Strategies to Improve Kidney Matching and Decrease the Risk of Graft Failure appeared first on MarkTechPost.

Attention Gaming Industry! No More Weird Mirrors With Mirror-NeRF

NeRFs or Neural Radiance Fields use a combination of RNN and CNN to capture the physical characteristics of an object, such as the shape, material, and texture. They can generate realistic images of objects in different lighting conditions. They have proved most useful in medicine, robotics, and entertainment due to their ability to create high-resolution images.

3D reconstruction and rendering of scenes with mirrors, which exist ubiquitously in the real world, have been a long-standing challenge in computer vision. Dealing with the inconsistencies in reconstruction with mirrors with NeRF, researchers at Zhejiang University are introducing Mirror-NeRF that correctly renders the reflection in the mirror in a unified radiance field by submitting the reflection probability and tracing the rays following the light transport model of Whitted Ray Tracing. 

NeRF, RefNeRF, and NeRFReN all three methods generated mirror reflection from new viewpoints by interpolating the previously learned reflections. However, they have limitations regarding reliably inferring reflections not seen during training and synthesizing reflections for newly introduced items or mirrors in the scene. The freshly introduced technique Mirror-NeRF can accurately draw the reflection in the mirror and serve various scene modification applications by integrating the physical ray tracing into the neural rendering process. 

Five synthetic and four real datasets were created, and quantitative comparisons of novel view synthesis on the metrics Peak signal-to-noise ratio (PSNR), structural similarity index measure (SSIM), and Learned Perceptual Image Patch Similarity (LPIPS) were made. Due to the mirror’s bumpy surface greatly affecting the reflection quality, several regularisation terms were also introduced in the optimization process. When all regularisation terms were enabled, we successfully obtained the precise reflection in the mirror with the highest image quality. 

The findings showed NeRF, Ref-NeRF, and NeRFReN struggled to produce the reflection of the objects whose reflection has high-frequency variations in color, such as the distorted hanging picture in the mirror of the meeting room, a blurry curtain in the mirror of the office and the lounge, and a “fogged” clothes in the mirror of the clothing store. On the other hand, the new method rendered detailed reflections of objects by tracing the reflected rays. Although there is immense advancement in the work with mirrors, researchers are yet to incorporate refraction in the framework. 

In conclusion, this breakthrough promises new avenues in the gaming and film industries. Artists may desire to create complex visual effects and utilize mirror manipulation, for example, substituting the reflections in the mirror with a different scene. We can synthesize the photo-realistic view of the new scene in the mirror with multi-view consistency. 

Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 28k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

The post Attention Gaming Industry! No More Weird Mirrors With Mirror-NeRF appeared first on MarkTechPost.

Comparing Machine Learning Methods: Traditional vs. Cost-saving Altern …

Artificial Intelligence is tremendously increasing daily in various profiles like Cloud platforms, finance, quantitative finance, product design, and many more. Many researchers are still working on the role of Human chatbots and the application of machine-learning techniques in developing these chatbot models. Implementing a chatbot model, Training it, and Testing it requires huge data and cost implementation. This comes under a broad category of Natural Language Processing as well as Computer Vision. To solve this crisis of the economy, Researchers at the University College London and the University of Edinburgh are working on Machine Learning techniques to build a better model to solve this crisis. 

The researchers are still working to solve these problems related to the economy of cloud platforms like AWS. The team of research scientists developed a Machine Learning approach which was based on the measurement system. There was a comparison between the normal Machine Learning models as well as the new model developed via Machine learning. This resulted in a cost-saving approach, which was quite good but also had some disadvantages. These cost-saving models predicted the minimal or the least possible results. The solution of problem statement was further solved by the researchers dividing it into three main categories.

The researchers implemented batch selection as its first approach. This involves an extensive large number of images stacked together. These were arranged one by one orderly in a specific pattern. Batch Selection was one of the cheaper approaches used to date but still has some deficits. The second approach that researchers used is called Layer Stacking. This involves multiple neural networks stacked together. This model uses stacking to implement the model. Sentiment Analysis also plays a major role in the Layer Stacking process. The third approach designed by the researchers was based on efficient optimizers. This approach was based on making minimal wasteful things and also accelerates the search function. This approach was the most optimum as it provided solutions with excellent accuracy. Optimizers that were used in the process were twice as fast as the Adam Optimizer.

Using all the data simultaneously and leaving the gangue information doesn’t allow proper output to be generated. Out of all three outputs, layer stacking was the only approach that involved minimal validation and training gains. Such processes are improving on a large scale nowadays. Many researchers are working on the same process. The researchers developed an optimization technique that used less computing power than before. The overall result of ‘No train, no gain’ was passed after the research project was completed.

Check out the Paper and GitHub. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 28k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

The post Comparing Machine Learning Methods: Traditional vs. Cost-saving Alternatives – What Really Works? appeared first on MarkTechPost.

Amazon Translate enhances its custom terminology to improve translatio …

Amazon Translate is a neural machine translation service that delivers fast, high-quality, affordable, and customizable language translation. When you translate from one language to another, you want your machine translation to be accurate, fluent, and most importantly contextual. Domain-specific and language-specific customizable terminology is a key requirement for many government and commercial organizations.
Custom terminology enables you to customize your translation output such that your domain and organization-specific vocabulary, such as brand names, character names, model names, and other unique content (named entities), are translated exactly the way you need. To use the custom terminology feature, you should create a terminology file (CSV or TMX file format) and specify the custom terminology as a parameter in an Amazon Translate real-time translation or asynchronous batch processing request. Refer to Customize Amazon Translate output to meet your domain and organization specific vocabulary to get started on custom terminology.
In this post, we explore key enhancements to custom terminology, which doesn’t just do a simple match and replace but adds context-sensitive match and replace, which preserves the sentence construct. This enhancement aims to create contextually appropriate versions of matching target terms to generate translations of higher quality and fluency.
Solution overview
We use the following custom terminology file to explore the enhanced custom terminology features. For instructions on creating a custom terminology, refer to Customize Amazon Translate output to meet your domain and organization specific vocabulary.

en
fr
es

tutor
éducateur
tutor

sheep
agneau
oveja

walking
promenant
para caminar

burger
sandwich
hamburguesa

action-specific
spécifique à l’action
especifico de acción

order
commande
commande

Exploring the custom terminology feature
Let’s translate the sentence “she was a great tutor” with Amazon Translate. Complete the following steps:

On Amazon Translate console, choose Real-time translation in the navigation pane.
Choose the Text tab.
For Target language, choose French.
Enter the text “she was a great tutor.”

As shown in the following screenshot, the translation in French as “elle était une excellente tutrice.”

Under Additional settings¸ select Custom terminology and choose your custom terminology file.

The translation in French is changed to “elle était une excellente éducatrice.”

In the custom terminology file, we have specified the translation for “tutor” as “éducateur.” “Éducateur” is masculine in French, whereas “tutor” in English is gender neutral. Custom terminology did not perform a match and replace here, instead it used the target word and applied the correct gender based on the context.
Now let’s test the feature with the source sentence “he has 10 sheep.” The translation in French is “il a 10 agneaux.” We provided custom terminology for “sheep” as “agneau.” “Agneau” in French means “baby sheep” and is singular. In this case, the target word is changed to inflect plural.
The source sentence “walking in the evening is precious to me” is translated to “me promener le soir est précieux pour moi.” The custom terminology target word “promenant” is changed to “promener” to inflect the correct verb tense.
The source sentence “I like burger” will be translated to “J’aime les sandwichs” to inflect the correct noun based on the context.
Now let’s test sentences with the target language as Spanish.
The source sentence “any action-specific parameters are listed in the topic for that action” is translated to “odos los parámetros especificos de acción aparecen en el tema de esa acción” to inflect the correct adjective.
The source sentence “in order for us to help you, please share your name” will be translated to “pour que nous puissions vous aider, veuillez partager votre nom.”
Some words may have entirely different meanings based on context. For example, the word “order” in English can be a sequence (as is in the source sentence) or a command or instruction (as in “I order books”). It’s difficult to know which meaning is intended without explicit information. In this case, “order” should not be translated as “commande” because it means “command” or “instruct” in French.
Conclusion
The custom terminology feature in Amazon Translate can help you customize translations based on your domain or language constructs. Recent enhancements to the custom terminology feature create contextually appropriate versions of matching terms to generate translations of higher quality. This enhancement improves the translation accuracy and fluency. There is no change required for existing customers to use the enhanced feature.
For more information about Amazon Translate, visit Amazon Translate resources to find video resources and blog posts, and refer to AWS Translate FAQs.

About the Authors
Sathya Balakrishnan is a Senior Consultant in the Professional Services team at AWS, specializing in data and ML solutions. He works with US federal financial clients. He is passionate about building pragmatic solutions to solve customers’ business problems. In his spare time, he enjoys watching movies and hiking with his family.
Sid Padgaonkar is the Senior Product Manager for Amazon Translate, AWS’s natural language processing service. On weekends, you will find him playing squash and exploring the food scene in the Pacific Northwest.

Zero-shot text classification with Amazon SageMaker JumpStart

Natural language processing (NLP) is the field in machine learning (ML) concerned with giving computers the ability to understand text and spoken words in the same way as human beings can. Recently, state-of-the-art architectures like the transformer architecture are used to achieve near-human performance on NLP downstream tasks like text summarization, text classification, entity recognition, and more.
Large language models (LLMs) are transformer-based models trained on a large amount of unlabeled text with hundreds of millions (BERT) to over a trillion parameters (MiCS), and whose size makes single-GPU training impractical. Due to their inherent complexity, training an LLM from scratch is a very challenging task that very few organizations can afford. A common practice for NLP downstream tasks is to take a pre-trained LLM and fine-tune it. For more information about fine-tuning, refer to Domain-adaptation Fine-tuning of Foundation Models in Amazon SageMaker JumpStart on Financial data and Fine-tune transformer language models for linguistic diversity with Hugging Face on Amazon SageMaker.
Zero-shot learning in NLP allows a pre-trained LLM to generate responses to tasks that it hasn’t been explicitly trained for (even without fine-tuning). Specifically speaking about text classification, zero-shot text classification is a task in natural language processing where an NLP model is used to classify text from unseen classes, in contrast to supervised classification, where NLP models can only classify text that belong to classes in the training data.
We recently launched zero-shot classification model support in Amazon SageMaker JumpStart. SageMaker JumpStart is the ML hub of Amazon SageMaker that provides access to pre-trained foundation models (FMs), LLMs, built-in algorithms, and solution templates to help you quickly get started with ML. In this post, we show how you can perform zero-shot classification using pre-trained models in SageMaker Jumpstart. You will learn how to use the SageMaker Jumpstart UI and SageMaker Python SDK to deploy the solution and run inference using the available models.
Zero-shot learning
Zero-shot classification is a paradigm where a model can classify new, unseen examples that belong to classes that were not present in the training data. For example, a language model that has beed trained to understand human language can be used to classify New Year’s resolutions tweets on multiple classes like career, health, and finance, without the language model being explicitly trained on the text classification task. This is in contrast to fine-tuning the model, since the latter implies re-training the model (through transfer learning) while zero-shot learning doesn’t require additional training.
The following diagram illustrates the differences between transfer learning (left) vs. zero-shot learning (right).

Yin et al. proposed a framework for creating zero-shot classifiers using natural language inference (NLI). The framework works by posing the sequence to be classified as an NLI premise and constructs a hypothesis from each candidate label. For example, if we want to evaluate whether a sequence belongs to the class politics, we could construct a hypothesis of “This text is about politics.” The probabilities for entailment and contradiction are then converted to label probabilities. As a quick review, NLI considers two sentences: a premise and a hypothesis. The task is to determine whether the hypothesis is true (entailment) or false (contradiction) given the premise. The following table provides some examples.

Premise
Label
Hypothesis

A man inspects the uniform of a figure in some East Asian country.
Contradiction
The man is sleeping.

An older and younger man smiling.
Neutral
Two men are smiling and laughing at the cats playing on the floor.

A soccer game with multiple males playing.
entailment
Some men are playing a sport.

Solution overview
In this post, we discuss the following:

How to deploy pre-trained zero-shot text classification models using the SageMaker JumpStart UI and run inference on the deployed model using short text data
How to use the SageMaker Python SDK to access the pre-trained zero-shot text classification models in SageMaker JumpStart and use the inference script to deploy the model to a SageMaker endpoint for a real-time text classification use case
How to use the SageMaker Python SDK to access pre-trained zero-shot text classification models and use SageMaker batch transform for a batch text classification use case

SageMaker JumpStart provides one-click fine-tuning and deployment for a wide variety of pre-trained models across popular ML tasks, as well as a selection of end-to-end solutions that solve common business problems. These features remove the heavy lifting from each step of the ML process, simplifying the development of high-quality models and reducing time to deployment. The JumpStart APIs allow you to programmatically deploy and fine-tune a vast selection of pre-trained models on your own datasets.
The JumpStart model hub provides access to a large number of NLP models that enable transfer learning and fine-tuning on custom datasets. As of this writing, the JumpStart model hub contains over 300 text models across a variety of popular models, such as Stable Diffusion, Flan T5, Alexa TM, Bloom, and more.
Note that by following the steps in this section, you will deploy infrastructure to your AWS account that may incur costs.
Deploy a standalone zero-shot text classification model
In this section, we demonstrate how to deploy a zero-shot classification model using SageMaker JumpStart. You can access pre-trained models through the JumpStart landing page in Amazon SageMaker Studio. Complete the following steps:

In SageMaker Studio, open the JumpStart landing page. Refer to Open and use JumpStart for more details on how to navigate to SageMaker JumpStart.
In the Text Models carousel, locate the “Zero-Shot Text Classification” model card.
Choose View model to access the facebook-bart-large-mnli model. Alternatively, you can search for the zero-shot classification model in the search bar and get to the model in SageMaker JumpStart.
Specify a deployment configuration, SageMaker hosting instance type, endpoint name, Amazon Simple Storage Service (Amazon S3) bucket name, and other required parameters.
Optionally, you can specify security configurations like AWS Identity and Access Management (IAM) role, VPC settings, and AWS Key Management Service (AWS KMS) encryption keys.
Choose Deploy to create a SageMaker endpoint.

This step takes a couple of minutes to complete. When it’s complete, you can run inference against the SageMaker endpoint that hosts the zero-shot classification model.
In the following video, we show a walkthrough of the steps in this section.

Use JumpStart programmatically with the SageMaker SDK
In the SageMaker JumpStart section of SageMaker Studio, under Quick start solutions, you can find the solution templates. SageMaker JumpStart solution templates are one-click, end-to-end solutions for many common ML use cases. As of this writing, over 20 solutions are available for multiple use cases, such as demand forecasting, fraud detection, and personalized recommendations, to name a few.
The “Zero Shot Text Classification with Hugging Face” solution provides a way to classify text without the need to train a model for specific labels (zero-shot classification) by using a pre-trained text classifier. The default zero-shot classification model for this solution is the facebook-bart-large-mnli (BART) model. For this solution, we use the 2015 New Year’s Resolutions dataset to classify resolutions. A subset of the original dataset containing only the Resolution_Category (ground truth label) and the text columns is included in the solution’s assets.

The input data includes text strings, a list of desired categories for classification, and whether the classification is multi-label or not for synchronous (real-time) inference. For asynchronous (batch) inference, we provide a list of text strings, the list of categories for each string, and whether the classification is multi-label or not in a JSON lines formatted text file.

The result of the inference is a JSON object that looks something like the following screenshot.

We have the original text in the sequence field, the labels used for the text classification in the labels field, and the probability assigned to each label (in the same order of appearance) in the field scores.
To deploy the Zero Shot Text Classification with Hugging Face solution, complete the following steps:

On the SageMaker JumpStart landing page, choose Models, notebooks, solutions in the navigation pane.
In the Solutions section, choose Explore All Solutions.
On the Solutions page, choose the Zero Shot Text Classification with Hugging Face model card.
Review the deployment details and if you agree, choose Launch.

The deployment will provision a SageMaker real-time endpoint for real-time inference and an S3 bucket for storing the batch transformation results.
The following diagram illustrates the architecture of this method.

Perform real-time inference using a zero-shot classification model
In this section, we review how to use the Python SDK to run zero-shot text classification (using any of the available models) in real time using a SageMaker endpoint.

First, we configure the inference payload request to the model. This is model dependent, but for the BART model, the input is a JSON object with the following structure:

{
“inputs”: # The text to be classified
“parameters”: {
“candidate_labels”: # A list of the labels we want to use for the text classification
“multi_label”: True | False
}
}

Note that the BART model is not explicitly trained on the candidate_labels. We will use the zero-shot classification technique to classify the text sequence to unseen classes. The following code is an example using text from the New Year’s resolutions dataset and the defined classes:

classification_categories = [‘Health’, ‘Humor’, ‘Personal Growth’, ‘Philanthropy’, ‘Leisure’, ‘Career’, ‘Finance’, ‘Education’, ‘Time Management’]
data_zero_shot = {
“inputs”: “#newyearsresolution :: read more books, no scrolling fb/checking email b4 breakfast, stay dedicated to pt/yoga to squash my achin’ back!”,
“parameters”: {
“candidate_labels”: classification_categories,
“multi_label”: False
}
}

Next, you can invoke a SageMaker endpoint with the zero-shot payload. The SageMaker endpoint is deployed as part of the SageMaker JumpStart solution.

response = runtime.invoke_endpoint(EndpointName=sagemaker_endpoint_name,
ContentType=’application/json’,
Body=json.dumps(payload))

parsed_response = json.loads(response[‘Body’].read())

The inference response object contains the original sequence, the labels sorted by score from max to min, and the scores per label:

{‘sequence’: “#newyearsresolution :: read more books, no scrolling fb/checking email b4 breakfast, stay dedicated to pt/yoga to squash my achin’ back!”,
‘labels’: [‘Personal Growth’,
‘Health’,
‘Time Management’,
‘Leisure’,
‘Education’,
‘Humor’,
‘Career’,
‘Philanthropy’,
‘Finance’],
‘scores’: [0.4198768436908722,
0.2169460505247116,
0.16591140627861023,
0.09742163866758347,
0.031757451593875885,
0.027988269925117493,
0.015974704176187515,
0.015464971773326397,
0.008658630773425102]}

Run a SageMaker batch transform job using the Python SDK
This section describes how to run batch transform inference with the zero-shot classification facebook-bart-large-mnli model using the SageMaker Python SDK. Complete the following steps:

Format the input data in JSON lines format and upload the file to Amazon S3. SageMaker batch transform will perform inference on the data points uploaded in the S3 file.
Set up the model deployment artifacts with the following parameters:

model_id – Use huggingface-zstc-facebook-bart-large-mnli.
deploy_image_uri – Use the image_uris Python SDK function to get the pre-built SageMaker Docker image for the model_id. The function returns the Amazon Elastic Container Registry (Amazon ECR) URI.
deploy_source_uri – Use the script_uris utility API to retrieve the S3 URI that contains scripts to run pre-trained model inference. We specify the script_scope as inference.
model_uri – Use model_uri to get the model artifacts from Amazon S3 for the specified model_id.

#imports
from sagemaker import image_uris, model_uris, script_uris, hyperparameters

#set model id and version
model_id, model_version, = (
“huggingface-zstc-facebook-bart-large-mnli”,
“*”,
)

# Retrieve the inference Docker container URI. This is the base Hugging Face container image for the default model above.
deploy_image_uri = image_uris.retrieve(
region=None,
framework=None, # Automatically inferred from model_id
image_scope=”inference”,
model_id=model_id,
model_version=model_version,
instance_type=”ml.g4dn.xlarge”,
)

# Retrieve the inference script URI. This includes all dependencies and scripts for model loading, inference handling, and more.
deploy_source_uri = script_uris.retrieve(model_id=model_id, model_version=model_version, script_scope=”inference”)

# Retrieve the model URI. This includes the pre-trained model and parameters.
model_uri = model_uris.retrieve(model_id=model_id, model_version=model_version, model_scope=”inference”) 

Use HF_TASK to define the task for the Hugging Face transformers pipeline and HF_MODEL_ID to define the model used to classify the text:

# Hub model configuration <https://huggingface.co/models>
hub = {
‘HF_MODEL_ID’:’facebook/bart-large-mnli’, # The model_id from the Hugging Face Hub
‘HF_TASK’:’zero-shot-classification’ # The NLP task that you want to use for predictions
}
For a complete list of tasks, see Pipelines in the Hugging Face documentation.
Create a Hugging Face model object to be deployed with the SageMaker batch transform job:

# Create HuggingFaceModel class
huggingface_model_zero_shot = HuggingFaceModel(
model_data=model_uri, # path to your trained sagemaker model
env=hub, # configuration for loading model from Hub
role=role, # IAM role with permissions to create an endpoint
transformers_version=”4.17″, # Transformers version used
pytorch_version=”1.10″, # PyTorch version used
py_version=’py38′, # Python version used
)

Create a transform to run a batch job:

# Create transformer to run a batch job
batch_job = huggingface_model_zero_shot.transformer(
instance_count=1,
instance_type=’ml.m5.xlarge’,
strategy=’SingleRecord’,
assemble_with=’Line’,
output_path=s3_path_join(“s3://”,sagemaker_config[‘S3Bucket’],”zero_shot_text_clf”, “results”), # we are using the same s3 path to save the output with the input
)

Start a batch transform job and use S3 data as input:

batch_job.transform(
data=data_upload_path,
content_type=’application/json’,
split_type=’Line’,
logs=False,
wait=True
)

You can monitor your batch processing job on the SageMaker console (choose Batch transform jobs under Inference in the navigation pane). When the job is complete, you can check the model prediction output in the S3 file specified in output_path.
For a list of all the available pre-trained models in SageMaker JumpStart, refer to Built-in Algorithms with pre-trained Model Table. Use the keyword “zstc” (short for zero-shot text classification) in the search bar to locate all the models capable of doing zero-shot text classification.
Clean up
After you’re done running the notebook, make sure to delete all resources created in the process to ensure that the costs incurred by the assets deployed in this guide are stopped. The code to clean up the deployed resources is provided in the notebooks associated with the zero-shot text classification solution and model.
Default security configurations
The SageMaker JumpStart models are deployed using the following default security configurations:

The models are deployed with a default SageMaker execution role. You can specify your own role or use an existing one. For more information, refer to SageMaker Roles.
The model will not connect to a VPC and no VPC will be provisioned for your model. You can specify VPC configuration to connect to your model from within the security options. For more information, see Give SageMaker Hosted Endpoints Access to Resources in Your Amazon VPC.
Default KMS keys will be used to encrypt your model’s artifacts. You can specify your own KMS keys or use existing one. For more information, refer to Using server-side encryption with AWS KMS keys (SSE-KMS).

To learn more about SageMaker security-related topics, check out Configure security in Amazon SageMaker.
Conclusion
In this post, we showed you how to deploy a zero-shot classification model using the SageMaker JumpStart UI and perform inference using the deployed endpoint. We used the SageMaker JumpStart New Year’s resolutions solution to show how you can use the SageMaker Python SDK to build an end-to-end solution and implement zero-shot classification application. SageMaker JumpStart provides access to hundreds of pre-trained models and solutions for tasks like computer vision, natural language processing, recommendation systems, and more. Try out the solution on your own and let us know your thoughts.

About the authors
David Laredo is a Prototyping Architect at AWS Envision Engineering in LATAM, where he has helped develop multiple machine learning prototypes. Previously, he has worked as a Machine Learning Engineer and has been doing machine learning for over 5 years. His areas of interest are NLP, time series, and end-to-end ML.
Vikram Elango is an AI/ML Specialist Solutions Architect at Amazon Web Services, based in Virginia, US. Vikram helps financial and insurance industry customers with design and thought leadership to build and deploy machine learning applications at scale. He is currently focused on natural language processing, responsible AI, inference optimization, and scaling ML across the enterprise. In his spare time, he enjoys traveling, hiking, cooking, and camping with his family.
Dr. Vivek Madan is an Applied Scientist with the Amazon SageMaker JumpStart team. He got his PhD from University of Illinois at Urbana-Champaign and was a Post Doctoral Researcher at Georgia Tech. He is an active researcher in machine learning and algorithm design and has published papers in EMNLP, ICLR, COLT, FOCS, and SODA conferences.

Breakthrough in the Intersection of Vision-Language: Presenting the Al …

Powering the meteoric rise of AI chatbots, LLMs are the talk of the town. They are showing mind-blowing capabilities in user-tailored natural language processing functions but seem to be lacking the ability to understand the visual world. To bridge the gap between the vision and language world, researchers have presented the All-Seeing (AS) project. 

The AS Project is for open-world panoptic visual recognition and understanding, driven by the goal of creating a vision system that mimics human cognition. The term “panoptic” refers to including everything visible in one view.

The AS Project consists of:

The All-Seeing 1B (AS-1B) dataset covers a wide range of 3.5 million common and rare concepts in the real world and has 132.2 billion tokens that describe the concepts and their attributes.

The All-Seeing model (ASM) is a unified location-aware image-text foundation model. The model consists of two key components: a location-aware image tokenizer and an LLM-based decoder.

The dataset comprises over 1 billion region annotations in various formats, such as semantic tags, locations, question-answering pairs, and captions. Compared with the previous visual recognition datasets like ImageNet and COCO, visual understanding datasets like Visual Genome and Laion-5B, the AS-1B dataset stands out due to its rich and diverse instance-level location annotation and corresponding detailed object concepts and descriptions. 

The architecture of the AS model comprises a unified framework of varying levels. It supports contrastive and generative image-text tasks at both the image level and region levels. By leveraging pre-trained LLMs and powerful vision foundation models (VFMs), the model demonstrates promising performance in discriminative tasks like image-text retrieval and zero classification, as well as generative tasks such as visual question answering (VQA), visual reasoning, image captioning, region captioning/VQA, etc. Additionally, researchers claim to see potential in grounding tasks like phrase grounding and referring expression comprehension with the assistance of a class-agnostic detector.

The All-Seeing Model (ASM) comprises of three key designs:

A location-aware image tokenizer extracts feature from the image and region levels based on the input image and bounding box, respectively.

A trainable task prompt is incorporated at the beginning of the vision and text tokens to guide the model in distinguishing between discriminative and generative tasks.

An LLM-based decoder is utilized to extract vision and text features for discriminative tasks and auto-regressively generate response tokens in generative tasks. 

Extensive data analysis in terms of quality, scaling, diversity, and experiments was conducted by analyzing and comparing the proposed ASM with a CLIP-based baseline model(displays zero-shot capabilities of GPT-2 and 3) and leading Multimodality Large Language models (VLLMs) on representative vision tasks including zero-shot region recognition, image-level caption, and region-level caption. The findings highlighted the strong region-level text generation capabilities of our model while also showcasing its ability to comprehend the entire image. The human evaluation results indicated that captions generated by our ASM are preferred over those from MiniGPT4 and LLaVA.

The model is trained with open-ended language prompts and locations, which allows it to generalize to various vision and language tasks with remarkable zero-shot performance, including region-text retrieval, region recognition, captioning, and question-answering. This, according to researchers, has given LLMs a “all-seeing eye” and has revolutionized the intersection of vision and language.

Check out the Paper and GitHub. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 28k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

The post Breakthrough in the Intersection of Vision-Language: Presenting the All-Seeing Project appeared first on MarkTechPost.

Tailoring the Fabric of Generative AI: FABRIC is an AI Approach That P …

Generative AI is a term that we all are familiar with nowadays. They have advanced a lot in recent years and have become a key tool in multiple applications. 

The star of the generative AI show is the diffusion models. They have emerged as a powerful class of generative models, revolutionizing image synthesis and related tasks. These models have shown remarkable performance in generating high-quality and diverse images. Unlike traditional generative models such as GANs and VAEs, diffusion models work by iteratively refining a noise source, allowing for stable and coherent image generation.

Diffusion models have gained significant traction due to their ability to generate high-fidelity images with enhanced stability and reduced mode collapse during training. This has led to their widespread adoption and application across diverse domains, including image synthesis, inpainting, and style transfer.

However, they are not perfect. Despite their impressive capabilities, one of the challenges with diffusion models lies in effectively steering the model towards specific desired outputs based on textual descriptions. It is usually annoying to precisely describe the preferences through text prompts, sometimes, they are just not enough, or the model insists on ignoring them. So, you usually need to refine the generated image to make it usable.

But you know what you wanted the model to draw. So, in theory, you are the best person to evaluate the quality of the generated image; how close it resembles your imagination. What if we could integrate this feedback into the image generation pipeline so the model could understand what we wanted to see? Time to meet with FABRIC.

FABRIC (Feedback via Attention-Based Reference Image Conditioning) is a novel approach to enable the integration of iterative feedback into the generative process of diffusion models.

FABRIC works based on user feedback. Source: https://arxiv.org/pdf/2307.10159.pdf

FABRIC utilizes positive and negative feedback images gathered from previous generations or human input. This enables it to leverage reference image-conditioning to refine future results. This iterative workflow facilitates the fine-tuning of generated images based on user preferences, providing a more controllable and interactive text-to-image generation process. 

FABRIC is inspired by ControlNet, which introduced the ability to generate new images similar to reference images. FABRIC leverages the self-attention module in the U-Net, allowing it to “pay attention” to other pixels in the image and inject additional information from a reference image. The keys and values for reference injection are computed by passing the noised reference image through the U-Net of Stable Diffusion. These keys and values are stored in the self-attention layers of the U-Net, allowing the denoising process to attend to the reference image and incorporate semantic information.

Overview of FABRIC. Source: https://arxiv.org/pdf/2307.10159.pdf

Moreover, FABRIC is extended to incorporate multi-round positive and negative feedback, where separate U-Net passes are performed for each liked and disliked image, and the attention scores are reweighted based on the feedback. The feedback process can be scheduled according to denoising steps, allowing for iterative refinement of the generated images.

Check out the Paper and GitHub. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 28k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

The post Tailoring the Fabric of Generative AI: FABRIC is an AI Approach That Personalizes Diffusion Models with Iterative Feedback appeared first on MarkTechPost.

Stability AI Announces the Release of StableCode: It’s very First LL …

Stability AI has just introduced a game-changing product named StableCode, marking its debut in AI-powered coding assistance. Designed to aid both experienced programmers and newcomers looking to upskill, StableCode brings together practical utility and learning support in a unique blend.

The heart of StableCode lies in its three distinct models, set to reshape the coding landscape. The journey begins with the base model, which underwent rigorous training using a diverse set of programming languages from the stack dataset (v1.2) by BigCode. This foundation was then reinforced with popular languages like Python, Go, Java, JavaScript, C, Markdown, and C++, creating a well-rounded reservoir of programming knowledge. This training process was no small feat, encompassing a staggering 560 billion code tokens powered by a High-Performance Computing (HPC) cluster.

However, the innovation didn’t stop there. The instruction model, the next layer in the StableCode framework, was meticulously calibrated to cater to specific programming challenges. Around 120,000 instruction/response pairs in the Alpaca format were subjected to the refined base model, leading to a specialized solution capable of tackling complex programming tasks with finesse.

The real gem in StableCode’s offering is the long-context window model, designed to redefine autocomplete suggestions. Unlike its predecessors with a context window of 16,000 tokens, this model has a higher capacity, accommodating 2-4 times more code. This means programmers can now effortlessly manage the equivalent of multiple average-sized Python files in one go. This expanded capability is a boon for beginners seeking to explore more intricate coding challenges.

StableCode’s performance stacks up impressively against similarly scaled models. Evaluated against a well-established HumanEval benchmark using pass@1 and pass@10 metrics, StableCode shines, proving its mettle in real-world scenarios.

Benchmark scores of StableCode

HumanEval Benchmark Comparison with models of similar size(3B).

Stability AI’s vision is firmly rooted in making technology accessible to all, and StableCode is a significant stride in that direction. By democratizing AI-powered coding assistance, Stability AI opens the door for individuals from various backgrounds to harness the power of technology in problem-solving through coding. This approach could level the global technology playing field, offering equitable access to coding resources.

In a world increasingly intertwined with technology, StableCode emerges as a tool of simplicity and empowerment. By fusing cutting-edge AI capabilities with a commitment to accessibility, Stability AI paves the way for the next generation of software developers. These developers won’t just learn to code; they’ll contribute to a future where technology knows no bounds.

Check out the Reference Article. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 28k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

The post Stability AI Announces the Release of StableCode: It’s very First LLM Generative AI Product for Coding appeared first on MarkTechPost.

Build a centralized monitoring and reporting solution for Amazon SageM …

Amazon SageMaker is a fully managed machine learning (ML) platform that offers a comprehensive set of services that serve end-to-end ML workloads. As recommended by AWS as a best practice, customers have used separate accounts to simplify policy management for users and isolate resources by workloads and account. However, when more users and teams are using the ML platform in the cloud, monitoring the large ML workloads in a scaling multi-account environment becomes more challenging. For better observability, customers are looking for solutions to monitor the cross-account resource usage and track activities, such as job launch and running status, which is essential for their ML governance and management requirements.
SageMaker services, such as Processing, Training, and Hosting, collect metrics and logs from the running instances and push them to users’ Amazon CloudWatch accounts. To view the details of these jobs in different accounts, you need to log in to each account, find the corresponding jobs, and look into the status. There is no single pane of glass that can easily show this cross-account and multi-job information. Furthermore, the cloud admin team needs to provide individuals access to different SageMaker workload accounts, which adds additional management overhead for the cloud platform team.
In this post, we present a cross-account observability dashboard that provides a centralized view for monitoring SageMaker user activities and resources across multiple accounts. It allows the end-users and cloud management team to efficiently monitor what ML workloads are running, view the status of these workloads, and trace back different account activities at certain points of time. With this dashboard, you don’t need to navigate from the SageMaker console and click into each job to find the details of the job logs. Instead, you can easily view the running jobs and job status, troubleshoot job issues, and set up alerts when issues are identified in shared accounts, such as job failure, underutilized resources, and more. You can also control access to this centralized monitoring dashboard or share the dashboard with relevant authorities for auditing and management requirements.
Overview of solution
This solution is designed to enable centralized monitoring of SageMaker jobs and activities across a multi-account environment. The solution is designed to have no dependency on AWS Organizations, but can be adopted easily in an Organizations or AWS Control Tower environment. This solution can help the operation team have a high-level view of all SageMaker workloads spread across multiple workload accounts from a single pane of glass. It also has an option to enable CloudWatch cross-account observability across SageMaker workload accounts to provide access to monitoring telemetries such as metrics, logs, and traces from the centralized monitoring account. An example dashboard is shown in the following screenshot.

The following diagram shows the architecture of this centralized dashboard solution.

SageMaker has native integration with the Amazon EventBridge, which monitors status change events in SageMaker. EventBridge enables you to automate SageMaker and respond automatically to events such as a training job status change or endpoint status change. Events from SageMaker are delivered to EventBridge in near-real time. For more information about SageMaker events monitored by EventBridge, refer to Automating Amazon SageMaker with Amazon EventBridge. In addition to the SageMaker native events, AWS CloudTrail publishes events when you make API calls, which also streams to EventBridge so that this can be utilized by many downstream automation or monitoring use cases. In our solution, we use EventBridge rules in the workload accounts to stream SageMaker service events and API events to the monitoring account’s event bus for centralized monitoring.
In the centralized monitoring account, the events are captured by an EventBridge rule and further processed into different targets:

A CloudWatch log group, to use for the following:

Auditing and archive purposes. For more information, refer to the Amazon CloudWatch Logs User Guide.
Analyzing log data with CloudWatch Log Insights queries. CloudWatch Logs Insights enables you to interactively search and analyze your log data in CloudWatch Logs. You can perform queries to help you more efficiently and effectively respond to operational issues. If an issue occurs, you can use CloudWatch Logs Insights to identify potential causes and validate deployed fixes.
Support for the CloudWatch Metrics Insights query widget for high-level operations in the CloudWatch dashboard, adding CloudWatch Insights Query to dashboards, and exporting query results.

An AWS Lambda function to complete the following tasks:

Perform custom logic to augment SageMaker service events. One example is performing a metric query on the SageMaker job host’s utilization metrics when a job completion event is received.
Convert event information into metrics in certain log formats as ingested as EMF logs. For more information, refer to Embedding metrics within logs.

The example in this post is supported by the native CloudWatch cross-account observability feature to achieve cross-account metrics, logs, and trace access. As shown at the bottom of the architecture diagram, it integrates with this feature to enable cross-account metrics and logs. To enable this, necessary permissions and resources need to be created in both the monitoring accounts and source workload accounts.
You can use this solution for either AWS accounts managed by Organizations or standalone accounts. The following sections explain the steps for each scenario. Note that within each scenario, steps are performed in different AWS accounts. For your convenience, the account type to perform the step is highlighted at the beginning each step.
Prerequisites
Before starting this procedure, clone our source code from the GitHub repo in your local environment or AWS Cloud9. Additionally, you need the following:

Node.js 14.15.0 (or later) and nmp installed
The AWS Command Line Interface (AWS CLI) version 2 installed
The AWS CDK Toolkit
Docker Engine installed (in running state when performing the deployment procedures)

Deploy the solution in an Organizations environment
If the monitoring account and all SageMaker workload accounts are all in the same organization, the required infrastructure in the source workload accounts is created automatically via an AWS CloudFormation StackSet from the organization’s management account. Therefore, no manual infrastructure deployment into the source workload accounts is required. When a new account is created or an existing account is moved into a target organizational unit (OU), the source workload infrastructure stack will be automatically deployed and included in the scope of centralized monitoring.
Set up monitoring account resources
We need to collect the following AWS account information to set up the monitoring account resources, which we use as the inputs for the setup script later on.

Input
Description
Example

Home Region
The Region where the workloads run.
ap-southeast-2

Monitoring account AWS CLI profile name
You can find the profile name from ~/.aws/config. This is optional. If not provided, it uses the default AWS credentials from the chain.
.

SageMaker workload OU path
The OU path that has the SageMaker workload accounts. Keep the / at the end of the path.
o-1a2b3c4d5e/r-saaa/ou-saaa-1a2b3c4d/

To retrieve the OU path, you can go to the Organizations console, and under AWS accounts, find the information to construct the OU path. For the following example, the corresponding OU path is o-ye3wn3kyh6/r-taql/ou-taql-wu7296by/.

After you retrieve this information, run the following command to deploy the required resources on the monitoring account:

./scripts/organization-deployment/deploy-monitoring-account.sh

You can get the following outputs from the deployment. Keep a note of the outputs to use in the next step when deploying the management account stack.

Set up management account resources
We need to collect the following AWS account information to set up the management account resources, which we use as the inputs for the setup script later on.

Input
Description
Example

Home Region
The Region where the workloads run. This should be the same as the monitoring stack.
ap-southeast-2

Management account AWS CLI profile name
You can find the profile name from ~/.aws/config. This is optional. If not provided, it uses the default AWS credentials from the chain.
.

SageMaker workload OU ID
Here we use just the OU ID, not the path.
ou-saaa-1a2b3c4d

Monitoring account ID
The account ID where the monitoring stack is deployed to.
.

Monitoring account role name
The output for MonitoringAccountRoleName from the previous step.
.

Monitoring account event bus ARN
The output for MonitoringAccountEventbusARN from the previous step.
.

Monitoring account sink identifier
The output from MonitoringAccountSinkIdentifier from the previous step.
.

You can deploy the management account resources by running the following command:

./scripts/organization-deployment/deploy-management-account.sh

Deploy the solution in a non-Organizations environment
If your environment doesn’t use Organizations, the monitoring account infrastructure stack is deployed in a similar manner but with a few changes. However, the workload infrastructure stack needs to be deployed manually into each workload account. Therefore, this method is suitable for an environment with a limited number of accounts. For a large environment, it’s recommended to consider using Organizations.
Set up monitoring account resources
We need to collect the following AWS account information to set up the monitoring account resources, which we use as the inputs for the setup script later on.

Input
Description
Example

Home Region
The Region where the workloads run.
ap-southeast-2

SageMaker workload account list
A list of accounts that run the SageMaker workload and stream events to the monitoring account, separated by commas.
111111111111,222222222222

Monitoring account AWS CLI profile name
You can find the profile name from ~/.aws/config. This is optional. If not provided, it uses the default AWS credentials from the chain.
.

We can deploy the monitoring account resources by running the following command after you collect the necessary information:

./scripts/individual-deployment/deploy-monitoring-account.sh

We get the following outputs when the deployment is complete. Keep a note of the outputs to use in the next step when deploying the management account stack.
Set up workload account monitoring infrastructure
We need to collect the following AWS account information to set up the workload account monitoring infrastructure, which we use as the inputs for the setup script later on.

Input
Description
Example

Home Region
The Region where the workloads run. This should be the same as the monitoring stack.
ap-southeast-2

Monitoring account ID
The account ID where the monitoring stack is deployed to.
.

Monitoring account role name
The output for MonitoringAccountRoleName from the previous step.
.

Monitoring account event bus ARN
The output for MonitoringAccountEventbusARN from the previous step.
.

Monitoring account sink identifier
The output from MonitoringAccountSinkIdentifier from the previous step.
.

Workload account AWS CLI profile name
You can find the profile name from ~/.aws/config. This is optional. If not provided, it uses the default AWS credentials from the chain.
.

We can deploy the monitoring account resources by running the following command:

./scripts/individual-deployment/deploy-workload-account.sh

Visualize ML tasks on the CloudWatch dashboard
To check if the solution works, we need to run multiple SageMaker processing jobs and SageMaker training jobs on the workload accounts that we used in the previous sections. The CloudWatch dashboard is customizable based on your own scenarios. Our sample dashboard consists of widgets for visualizing SageMaker Processing jobs and SageMaker Training jobs. All jobs for monitoring workload accounts are displayed in this dashboard. In each type of job, we show three widgets, which are the total number of jobs, the number of failing jobs, and the details of each job. In our example, we have two workload accounts. Through this dashboard, we can easily find that one workload account has both processing jobs and training jobs, and another workload account only has training jobs. As with the functions we use in CloudWatch, we can set the refresh interval, specify the graph type, and zoom in or out, or we can run actions such as download logs in a CSV file.
Customize your dashboard
The solution provided in the GitHub repo includes both SageMaker Training job and SageMaker Processing job monitoring. If you want to add more dashboards to monitor other SageMaker jobs, such as batch transform jobs, you can follow the instructions in this section to customize your dashboard. By modifying the index.py file, you can customize the fields what you want to display on the dashboard. You can access all details that are captured by CloudWatch through EventBridge. In the Lambda function, you can choose the necessary fields that you want to display on the dashboard. See the following code:

@metric_scope
def lambda_handler(event, context, metrics):

try:
event_type = None
try:
event_type = SAGEMAKER_STAGE_CHANGE_EVENT(event[“detail-type”])
except ValueError as e:
print(“Unexpected event received”)

if event_type:
account = event[“account”]
detail = event[“detail”]

job_detail = {
“DashboardQuery”: “True”
}
job_detail[“Account”] = account
job_detail[“JobType”] = event_type.name

metrics.set_dimensions({“account”: account, “jobType”: event_type.name}, use_default=False)
metrics.set_property(“JobType”, event_type.value)

if event_type == SAGEMAKER_STAGE_CHANGE_EVENT.PROCESSING_JOB:
job_status = detail.get(“ProcessingJobStatus”)

metrics.set_property(“JobName”, detail.get(“ProcessingJobName”))
metrics.set_property(“ProcessingJobArn”, detail.get(“ProcessingJobArn”))

job_detail[“JobName”] = detail.get(“ProcessingJobName”)
job_detail[“ProcessingJobArn”] = detail.get(“ProcessingJobArn”)
job_detail[“Status”] = job_status
job_detail[“StartTime”] = detail.get(“ProcessingStartTime”)
job_detail[“InstanceType”] = detail.get(“ProcessingResources”).get(“ClusterConfig”).get(“InstanceType”)
job_detail[“InstanceCount”] = detail.get(“ProcessingResources”).get(“ClusterConfig”).get(“InstanceCount”)
if detail.get(“FailureReason”):

To customize the dashboard or widgets, you can modify the source code in the monitoring-account-infra-stack.ts file. Note that the field names you use in this file should be the same as those (the keys of  job_detail) defined in the Lambda file:

// CloudWatch Dashboard
const sagemakerMonitoringDashboard = new cloudwatch.Dashboard(
this, ‘sagemakerMonitoringDashboard’,
{
dashboardName: Parameters.DASHBOARD_NAME,
widgets: []
}
)

// Processing Job
const processingJobCountWidget = new cloudwatch.GraphWidget({
title: “Total Processing Job Count”,
stacked: false,
width: 12,
height: 6,
left:[
new cloudwatch.MathExpression({
expression: `SEARCH(‘{${AWS_EMF_NAMESPACE},account,jobType} jobType=”PROCESSING_JOB” MetricName=”ProcessingJobCount_Total”‘, ‘Sum’, 300)`,
searchRegion: this.region,
label: “${PROP(‘Dim.account’)}”,
})
]
});
processingJobCountWidget.position(0,0)
const processingJobFailedWidget = new cloudwatch.GraphWidget({
title: “Failed Processing Job Count”,
stacked: false,
width: 12,
height:6,
right:[
new cloudwatch.MathExpression({
expression: `SEARCH(‘{${AWS_EMF_NAMESPACE},account,jobType} jobType=”PROCESSING_JOB” MetricName=”ProcessingJobCount_Failed”‘, ‘Sum’, 300)`,
searchRegion: this.region,
label: “${PROP(‘Dim.account’)}”,
})
]
})
processingJobFailedWidget.position(12,0)

const processingJobInsightsQueryWidget = new cloudwatch.LogQueryWidget(
{
title: ‘SageMaker Processing Job History’,
logGroupNames: [ingesterLambda.logGroup.logGroupName],
view: cloudwatch.LogQueryVisualizationType.TABLE,
queryLines: [
‘sort @timestamp desc’,
‘filter DashboardQuery == “True”‘,
‘filter JobType == “PROCESSING_JOB”‘,
‘fields Account, JobName, Status, Duration, InstanceCount, InstanceType, Host, fromMillis(StartTime) as StartTime, FailureReason’,
‘fields Metrics.CPUUtilization as CPUUtil, Metrics.DiskUtilization as DiskUtil, Metrics.MemoryUtilization as MemoryUtil’,
‘fields Metrics.GPUMemoryUtilization as GPUMemoeyUtil, Metrics.GPUUtilization as GPUUtil’,
],
width:24,
height: 6,
}
);
processingJobInsightsQueryWidget.position(0, 6)
sagemakerMonitoringDashboard.addWidgets(processingJobCountWidget);
sagemakerMonitoringDashboard.addWidgets(processingJobFailedWidget);
sagemakerMonitoringDashboard.addWidgets(processingJobInsightsQueryWidget);

After you modify the dashboard, you need to redeploy this solution from scratch. You can run the Jupyter notebook provided in the GitHub repo to rerun the SageMaker pipeline, which will launch the SageMaker Processing jobs again. When the jobs are finished, you can go to the CloudWatch console, and under Dashboards in the navigation pane, choose Custom Dashboards. You can find the dashboard named SageMaker-Monitoring-Dashboard.
Clean up
If you no longer need this custom dashboard, you can clean up the resources. To delete all the resources created, use the code in this section. The cleanup is slightly different for an Organizations environment vs. a non-Organizations environment.
For an Organizations environment, use the following code:

make destroy-management-stackset # Execute against the management account
make destroy-monitoring-account-infra # Execute against the monitoring account

For a non-Organizations environment, use the following code:

make destroy-workload-account-infra # Execute against each workload account
make destroy-monitoring-account-infra # Execute against the monitoring account

Alternatively, you can log in to the monitoring account, workload account, and management account to delete the stacks from the CloudFormation console.
Conclusion
In this post, we discussed the implementation of a centralized monitoring and reporting solution for SageMaker using CloudWatch. By following the step-by-step instructions outlined in this post, you can create a multi-account monitoring dashboard that displays key metrics and consolidates logs related to their various SageMaker jobs from different accounts in real time. With this centralized monitoring dashboard, you can have better visibility into the activities of SageMaker jobs across multiple accounts, troubleshoot issues more quickly, and make informed decisions based on real-time data. Overall, the implementation of a centralized monitoring and reporting solution using CloudWatch offers an efficient way for organizations to manage their cloud-based ML infrastructure and resource utilization.
Please try out the solution and send us the feedback, either in the AWS forum for Amazon SageMaker, or through your usual AWS contacts.
To learn more about the cross-account observability feature, please refer to the blog Amazon CloudWatch Cross-Account Observability

About the Authors
Jie Dong is an AWS Cloud Architect based in Sydney, Australia. Jie is passionate about automation, and loves to develop solutions to help customer improve productivity. Event-driven system and serverless framework are his expertise. In his own time, Jie loves to work on building smart home and explore new smart home gadgets.
Melanie Li, PhD, is a Senior AI/ML Specialist TAM at AWS based in Sydney, Australia. She helps enterprise customers build solutions using state-of-the-art AI/ML tools on AWS and provides guidance on architecting and implementing ML solutions with best practices. In her spare time, she loves to explore nature and spend time with family and friends.
Gordon Wang, is a Senior AI/ML Specialist TAM at AWS. He supports strategic customers with AI/ML best practices cross many industries. He is passionate about computer vision, NLP, Generative AI and MLOps. In his spare time, he loves running and hiking.

This AI Research Introduces LISA: Large Language Instructed Segmentati …

Imagine you want to have coffee, and you instruct a robot to make it. Your instruction involves “ Make a cup of coffee “ but not step-by-step instructions such as “ Go to the kitchen, find the coffee machine, and switch it on.” Present existing systems contain models that rely on human instructions to identify any targeted object. They lack the ability of reasoning and active comprehension of the user’s intentions. To tackle this, researchers at Microsoft Research, the University of Hong Kong, and SmartMore propose a new task called reasoning segmentation. This self-reasoning ability is crucial in developing next-generation intelligent perception systems. 

Reasoning segmentation involves designing the output as a segmentation mask for a complex and implicit query text. They also create a benchmark comprising over a thousand image-instruction pairs with reasoning and world knowledge for evaluation. They built an assistant similar to Google Assistant and Siri called Language Instructed Segmentation Assistant ( LISA ). It inherits the language generation capabilities of the multi-modal Large Language Model while processing the ability to produce segmentation tasks.

LISA can handle complex reasoning, world knowledge, explanatory answers, and multi-conversations. Researchers say their model can demonstrate robust zero-shots when trained on reasoning-free datasets. Fine-tuning their model with just 239 reasoning segmentation image-instruction pairs resulted in an enhancement of the performance. 

The reasoning segmentation task differs from the previous referring segmentation, which requires the model to possess reasoning ability or access world knowledge. Only by completely understanding the query the model can well perform the task. Researchers say their method unlocks new reasoning segmentation, which proves effective compared to complex and standard reasoning.

The researcher used the training dataset, which does not include any reasoning segmentation sample. It contained only the instances where the target objects were explicitly indicated in the query test. Even without the complex reasoning training dataset, they found that LISA demonstrated impressive zero-shot ability on ReasonSeg ( the benchmark ). 

Researchers find that LISA accomplishes complex reasoning tasks with more than a 20% gIoU performance boost. Where gIoU is the average of all per-image Intersection-over-Unions (IoUs). They also find that the LISA-13B outperforms the 7B with long query scenarios. This implies that a stronger multi-modal LLM might lead to even better results in performance. Researchers also show that their model is competent with vanilla referring segmentation tasks. 

Their future work will emphasize more on the importance of self-reasoning ability, which is crucial for building a genuinely intelligent perception system. Establishing a benchmark is essential for evaluation and encourages the community to develop new techniques. 

Check out the Paper and Github. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 28k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

The post This AI Research Introduces LISA: Large Language Instructed Segmentation Assistant that Inherits the Language Generation Capabilities of the Multi-Modal Large Language Model (LLM) appeared first on MarkTechPost.

SalesForce Releases Einstein Studio And The Ability To Bring Your Own …

As part of its Data Cloud service, Salesforce unveiled a new AI and generative AI model training tool called Einstein Studio. To get the most out of their AI and data investments, businesses can now leverage Salesforce’s Einstein Studio, a new, user-friendly “bring your model” (BYOM) solution that allows enterprises to deploy their own unique AI models to power any sales, service, marketing, commerce, and IT application within Salesforce. 

With Einstein Studio, data scientists and engineering teams can efficiently and cheaply maintain and deploy AI models. Salesforce’s Data Cloud makes it simple for businesses to train models with their private data using its curated AI model ecosystem, including AWS’s Amazon SageMaker, Google Cloud’s Vertex AI, and other AI services.

Data Cloud is the first real-time data platform for customer relationship management (CRM), and Einstein Studio uses it to train artificial intelligence models. To speed up the delivery of complete AI, this BYOM solution will allow users to combine their unique AI models with readymade LLMs provided by Einstein GPT.

Incorporating trustworthy, open, and real-time AI experiences into every application and process, Einstein Studio makes it simple to run and deploy enterprise-ready AI across the whole business.

How does it work?

By importing client information from Data Cloud into Einstein Studio, businesses can train AI models tailored to their unique problems using data they own. With Einstein Studio’s BYOM solution, companies can train their preferred AI model with Data Cloud, which unifies customer data from all sources into a unified profile that dynamically responds to each customer’s actions in real-time.  

By providing a pre-built, zero-ETL connection, Einstein Studio shortens the time taken to train AI models by simplifying data transfer between systems. Using the “point and click” interface of Einstein Studio, technical teams can easily access their data in Data Cloud and create and train their AI models for use in all Salesforce applications. This method generates up-to-date and useful client information for AI forecasting and content automation.

Data scientists and engineers can now manage their data access to AI platforms for training with the help of Einstein Studio’s centralized management interface. 

The zero-ETL framework in Einstein Studio eliminates the requirement for laborious system-to-system data integration, allowing businesses to run their own unique AI models. Clients can save time and money by expediting AI deployment and avoiding the extract, transform, and load (ETL) procedure when connecting Data Cloud to other AI tools.

It may integrate with LLMs to automatically send maintenance reminder emails to prevent costly breakdowns. By linking to a graph database comprised of data within Salesforce, Salesforce hopes they will decrease hallucinations, which occur when the model makes stuff up when it doesn’t have a solid answer. In this way, the LLM has access to a comprehensive view of the relevant data for a certain consumer, empowering the model to produce a more personalized email depending on the facts it finds.

Analysts predict that the built-in capabilities of the technology, such as zero-ETL (extract, transform, and load), will help businesses save money, time, and effort while accelerating their time to market.

The company claims that other elements included in Einstein Studio can benefit businesses in serving models and monitoring them for anomalies. The new tool can also aid businesses in connecting data to artificial intelligence or massive language models built on other platforms like Amazon SageMaker and Google Vertex AI.

Advantages

Increase sales, decrease client defections, and deliver remarkable service. Using the customer’s data, the brand may train an AI to provide individualized service across all brand channels.  

Using the selected technologies to construct machine learning models can boost the productivity of the data team. Through Data Cloud, customers who have already built models with Amazon SageMaker or Google Cloud’s Vertex AI can leverage Salesforce data to train those models. 

Extract additional value from the data without costly integration overhauls. Companies may leverage their customer data to train stronger machine learning models if they can easily access Salesforce data with their existing AI technologies. 

Leverage the knowledge and resources that you already have in IT and AI. Teams can use their unique AI models stored in Data Cloud and invoke them using Einstein Studio. The Salesforce Platform allows customers to transform the results of artificial intelligence models into actionable insights that can be used to direct flow automation, activate Apex code, or provide salespeople and contact center workers with information via AI outputs surfaced across the Lightning experience.

Financial institutions can use real-time customer engagement data to develop bespoke cross-selling models to assist advisers in suggesting complementary goods and services.

The customer’s demographics, buying history, and other criteria can be used to categorize them into distinct groups, which retailers can target with tailored product recommendations, pricing, and other services. 

Automakers can anticipate when a vehicle will require service, identify false insurance claims, and tailor their marketing strategies to each potential customer.

Salesforce said it would give a dashboard for data scientists and engineers to manage data flow to their preferred AI training platforms. The service, which several different companies have piloted, is now available to all Salesforce Data Cloud customers.

Check out the SF Release and Model. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 28k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

The post SalesForce Releases Einstein Studio And The Ability To Bring Your Own Model (BYOM) appeared first on MarkTechPost.

15+ AI Tools For Developers (August 2023)

Otter AI

Using artificial intelligence, Otter.AI empowers users with real-time transcriptions of meeting notes that are shareable, searchable, accessible, and secure. Get a meeting assistant that records audio, writes notes, automatically captures slides, and generates summaries.

Notion AI

Inside the Notion workspace, the AI assistant Notion may help with various writing-related tasks, including creativity, revision, and summary. It improves the speed and quality of writing things like emails, job descriptions, and blog posts. The ‘Notion AI is an AI system that may be used to automate a wide variety of writing tasks, from blogs and lists to brainstorming sessions and creative writing. The AI-generated content in Notion may be easily reorganized and transformed using the tool’s drag-and-drop text editor.

Gretel.ai

Gretel AI is a platform for creating synthetic data that mimics real data but keeps users’ privacy intact. Gretel.ai’s APIs make it simple for programmers to create anonymous, encrypted synthetic data, which boosts innovation while preserving private data. The platform has all the features needed to quickly and easily train AI models, validate use cases, and generate data as needed. Sample notebooks or user-friendly online apps that cater to technical and non-technical users allow developers to explore synthetic data. Gretel AI will enable programmers to use synthetic data while meeting all necessary privacy requirements.

Pieces for Developers

Pieces for Developers is an AI-powered snippet manager that can save, create, enrich, reuse, and distribute code across your development process. The desktop software and suite of integrations with existing developer tools increase your efficiency when conducting research in a web browser, working with a team, and writing code in an integrated development environment (IDE). In one potent, centralized tool, you can produce code tailored to your specific repository, extract code from screenshots, and automatically add inline comments to your code. Save time and effort when you code with their free resources.

LangChain

The LangChain framework was created to reduce the complexity of working with huge language models in software applications. It simplifies working with language models by providing modular abstractions and implementations for the various components. Also, developers may quickly create and tweak apps for niche uses like document analysis, chatbots, and code analysis with the help of LangChain’s use-case-specific chains. In sum, LangChain equips programmers with the tools to use language models efficiently and create cutting-edge software.

YOU

You.com is an AI-powered search engine that protects users’ privacy and offers a personalized search experience. It’s a whole suite of applications with many useful AI-powered capabilities and functions. You can utilize artificial intelligence to create blog posts, emails, and social media updates with YOUwrite. Discover and make gorgeous AI-generated photos with YOU. Code mode AI chat allows you to write code and receive assistance during development. You can use study mode chat to access materials around the web, allowing you to study or acquire new abilities. Get to know YOURSELF.

AgentGPT

AgentGPT is a web-based system that facilitates the development and distribution of user-created autonomous AI agents. Agents created by users will strive to accomplish the aim the user specifies after being given a name and an objective. The agents reach their goal iteratively using a cascade of language models for reasoning, carrying out actions, assessing results, and creating fresh assignments. AgentGPT provides developers a potent instrument for building individualized AI agents to achieve various objectives.

Jam

Thousands of teams rely on Jam.dev because of its user-friendly nature. Bugs may be reported quickly without interfering with development processes, and detailed reports can be generated that include browser and operating system details, console logs, user actions, network logs, and related services. It can enhance bug reporting on any preferred platform by seamlessly integrating common issue trackers and tools. In addition, Jam includes JamGPT, an AI debugging helper that can quickly evaluate bug reports, find correlations, and offer solutions. JamGPT is a free add-on for Jam users and a ChatGPT program that launches instantly and is available only on macOS and may be launched using a keyboard shortcut.

Decktopus AI

With Decktopus, engineers and product managers can make professional-looking presentations quickly, easily, and without the need for any prior experience with design. It helps people focus on what matters by freeing up time and energy. Decktopus helps developers efficiently communicate with various audiences by creating engaging presentations for project updates, technical documentation, and product demos. Incorporating a wide range of themes, layouts, and design options allows product managers to create compelling presentations for product roadmaps, market research, and customer feedback. Forms, voice recording, custom domain connection, webhook capability, and multimedia embedding are just some of the built-in capabilities that Decktopus provides to improve any presentation.

ChatPDF

ChatPDF is an artificial intelligence platform that reads and interprets PDFs like a human conversation partner. It’s like ChatGPT, but it’s tailored to writing academic papers. Submit a PDF to their website to start using ChatPDF. It’s a chat room tailored solely to real-time PDF communication. Users may now quickly and easily execute tasks and extract information from massive PDF files with the help of this program. It can read PDFs written in any language and communicate with users in any language.

Durable

Using AI, Durable can help you create a website in less than a minute. Within seconds, our AI-powered website generator can produce a fully functional website with graphics and text. If you own a small business and need to learn how to code, this is the tool. A basic editor allows for site updates, and a new design can be generated with just a few lines of AI-written instructions. No complicated procedure is required to acquire a website, CRM, analytics, and supplemental invoicing. Durable makes it easy for developers to set up a website for their work in a matter of seconds. You should write less code and construct more.

Leap AI

Developers can access Leap AI’s AI APIs. Includes many different types of artificial intelligence, such as image recognition, text analysis, and NLP. The intuitive design of Leap AI’s APIs makes it possible for programmers without AI expertise to use them effectively. You can scale the requests made to these APIs to meet your specific requirements. You can count on Leap AI’s APIs to work reliably and be accessible whenever needed. Leap AI is a great option if you need a supplier with a wide range of services, simple APIs, and scalability. Join forces with 5,000+ other programs without touching a line of code.

AssemblyAI

Regarding artificial intelligence models for speech transcription and understanding, AssemblyAI is the gold standard platform. Their simple API gives you access to state-of-the-art AI models that can summarize speeches and identify their speakers. AssemblyAI, built on state-of-the-art AI research, provides trustworthy and scalable models via a private API that a wide range of businesses and organizations rely on worldwide. AssemblyAI provides developers with extensive resources, such as tutorials and documentation, making it simple to connect the API and create novel solutions that utilize voice recognition and understanding. To effectively transcribe and comprehend speech data in their projects, developers can leverage AssemblyAI’s cutting-edge AI models.

Microsoft Designer

Signs, invitations, logos, social media postings, and website banners are some of the many things that can be made with Microsoft Designer. Thanks to its AI features, you may quickly start designing with your images or AI-generated alternatives. It helps you from the moment of inspiration to the moment of completion in your creative process. Powered by artificial intelligence, it can create eye-catching graphics and visuals based on your input, in addition to offering writing help and auto-suggesting layouts. Using AI-generated graphic design it can assist you in spreading the word about your apps and products.

SuperAGI

SuperAGI is an accessible open-source system for creating and deploying intelligent agents. Easy AI agent development and management is enabled via a graphical user interface, an action console, concurrent agents, and several database configuration choices. SuperAGI is an autonomous AI framework that aims to make programming these agents easier for programmers. Recently, it introduced SuperCoder, a SuperAGI agent template for developing basic apps by predefined requirements.

Replicate

Replicate is a service that helps programmers work more efficiently with machine learning. Open-source models can be executed with its scalable API without requiring in-depth familiarity with machine learning. Replicate provides a Python library that developers can use or use other tools to issue API queries. Experts in many different areas of machine learning share their models for use in everything from language processing to video creation on this platform. Replicate, and other technologies like Next.js and Vercel allow developers to implement their ideas quickly and see their work on sites like Hacker News. Replicate also makes it easier to deploy models by integrating the open-source tool Cog, which containers models for use in production. Overall, Replicate facilitates quick and painless machine learning incorporation.

Hugging Face

You can create, train, and deploy state-of-the-art models with Hugging Face since it is an AI community driving the future of machine learning. Hugging Face is a community of over 5,000 businesses working together to solve problems in audio, vision, and language using artificial intelligence. Several machine learning models, including Flair, Asteroid, ESPnet, and Pyannote, are supported by their open-source natural language processing framework, Transformers. For advanced language modeling, Hugging Face provides an Inference API for streamlined model deployment and the creation of novel technologies like T0 Multitask Prompted Training, DistilBERT, HMTL, and Dynamic Language Models.

Pinecone

Pinecone’s scalability and user-friendliness make it ideal for creating high-performance vector search apps. Its low latency and minimal overhead facilitate the research-to-production pipeline without requiring DevOps. Launching, utilizing, and scaling your AI solution is a breeze with Pinecone, and there’s no need to worry about infrastructure upkeep or algorithm problems.

Midjourney

Midjourney is an artificial intelligence (AI)-driven program that creates breathtaking photographs with cutting-edge algorithms and hardware. It’s a helpful resource for programmers since it lets them make engaging visuals for their websites, apps, and games. In addition, developers can use Midjourney to experiment with AI and ML methods and incorporate them into their work. Midjourney is a potent tool for developers since it allows them to experiment with state-of-the-art AI approaches while also improving the visual appeal of their work.

Don’t forget to join our 26k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

The post 15+ AI Tools For Developers (August 2023) appeared first on MarkTechPost.

Generate creative advertising using generative AI deployed on Amazon S …

Creative advertising has the potential to be revolutionized by generative AI (GenAI). You can now create a wide variation of novel images, such as product shots, by retraining a GenAI model and providing a few inputs into the model, such as textual prompts (sentences describing the scene and objects to be produced by the model). This technique has shown promising results starting in 2022 with the explosion of a new class of foundation models (FMs) called latent diffusion models such as Stable Diffusion, Midjourney, and Dall-E-2. However, to use these models in production, the generation process requires constant refining to generate consistent outputs. This often means creating a large number of sample images of the product and clever prompt engineering, which makes the task difficult at scale.
In this post, we explore how this transformative technology can be harnessed to generate captivating and innovative advertisements at scale, especially when dealing with large catalogs of images. By using the power of GenAI, specifically through the technique of inpainting, we can seamlessly create image backgrounds, resulting in visually stunning and engaging content and reducing unwanted image artifacts (termed model hallucinations). We also delve into the practical implementation of this technique by utilizing Amazon SageMaker endpoints, which enable efficient deployment of the GenAI models driving this creative process.
We use inpainting as the key technique within GenAI-based image generation because it offers a powerful solution for replacing missing elements in images. However, this presents certain challenges. For instance, precise control over the positioning of objects within the image can be limited, leading to potential issues such as image artifacts, floating objects, or unblended boundaries, as shown in the following example images.
  
To overcome this, we propose in this post to strike a balance between creative freedom and efficient production by generating a multitude of realistic images using minimal supervision. To scale the proposed solution for production and streamline the deployment of AI models in the AWS environment, we demonstrate it using SageMaker endpoints.
In particular, we propose to split the inpainting process as a set of layers, each one potentially with a different set of prompts. The process can be summarized as the following steps:

First, we prompt for a general scene (for example, “park with trees in the back”) and randomly place the object on that background.
Next, we add a layer in the lower mid-section of the object by prompting where the object lies (for example, “picnic on grass, or wooden table”).
Finally, we add a layer similar to the background layer on the upper mid-section of the object using the same prompt as the background.

The benefit of this process is the improvement in the realism of the object because it’s perceived with better scaling and positioning relative to the background environment that matches with human expectations. The following figure shows the steps of the proposed solution.

Solution overview
To accomplish the tasks, the following flow of the data is considered:

Segment Anything Model (SAM) and Stable Diffusion Inpainting models are hosted in SageMaker endpoints.
A background prompt is used to create a generated background image using the Stable Diffusion model
A base product image is passed through SAM to generate a mask. The inverse of the mask is called the anti-mask.
The generated background image, mask, along with foreground prompts and negative prompts are used as input to the Stable Diffusion Inpainting model to generate a generated intermediate background image.
Similarly, the generated background image, anti-mask, along with foreground prompts and negative prompts are used as input to the Stable Diffusion Inpainting model to generate a generated intermediate foreground image.
The final output of the generated product image is obtained by combining the generated intermediate foreground image and generated intermediate background image.

Prerequisites
We have developed an AWS CloudFormation template that will create the SageMaker notebooks used to deploy the endpoints and run inference.
You will need an AWS account with AWS Identity and Access Management (IAM) roles that provides access to the following:

AWS CloudFormation
SageMaker

Although SageMaker endpoints provide instances to run ML models, in order to run heavy workloads like generative AI models, we use the GPU-enabled SageMaker endpoints. Refer to Amazon SageMaker Pricing for more information about pricing.
We use the NVIDIA A10G-enabled instance ml.g5.2xlarge to host the models.

Amazon Simple Storage Service (Amazon S3)

For more details, check out the GitHub repository and the CloudFormation template.
Mask the area of interest of the product
In general, we need to provide an image of the object that we want to place and a mask delineating the contour of the object. This can be done using tools such as Amazon SageMaker Ground Truth. Alternatively, we can automatically segment the object using AI tools such as Segment Anything Models (SAM), assuming that the object is in the center of the image.
Use SAM to generate a mask
With SAM, an advanced generative AI technique, we can effortlessly generate high-quality masks for various objects within images. SAM uses deep learning models trained on extensive datasets to accurately identify and segment objects of interest, providing precise boundaries and pixel-level masks. This breakthrough technology revolutionizes image processing workflows by automating the time-consuming and labor-intensive task of manually creating masks. With SAM, businesses and individuals can now rapidly generate masks for object recognition, image editing, computer vision tasks, and more, unlocking a world of possibilities for visual analysis and manipulation.
Host the SAM model on a SageMaker endpoint
We use the notebook 1_HostGenAIModels.ipynb to create SageMaker endpoints and host the SAM model.
We use the inference code in inference_sam.py and package that into a code.tar.gz file, which we use to create the SageMaker endpoint. The code downloads the SAM model, hosts it on an endpoint, and provides an entry point to run inference and generate output:
SAM_ENDPOINT_NAME = ‘sam-pytorch-‘ + str(datetime.utcnow().strftime(‘%Y-%m-%d-%H-%M-%S-%f’))
prefix_sam = “SAM/demo-custom-endpoint”
model_data_sam = s3.S3Uploader.upload(“code.tar.gz”, f’s3://{bucket}/{prefix_sam}’)
model_sam = PyTorchModel(entry_point=’inference_sam.py’,
model_data=model_data_sam,
                     framework_version=’1.12′,
                     py_version=’py38′,
                     role=role,
                     env={‘TS_MAX_RESPONSE_SIZE’:’2000000000′, ‘SAGEMAKER_MODEL_SERVER_TIMEOUT’ : ‘300’},
                     sagemaker_session=sess,
                     name=’model-‘+SAM_ENDPOINT_NAME)
predictor_sam = model_sam.deploy(initial_instance_count=1,
                         instance_type=INSTANCE_TYPE,
                         deserializers=JSONDeserializer(),
                         endpoint_name=SAM_ENDPOINT_NAME)
Invoke the SAM model and generate a mask
The following code is part of the 2_GenerateInPaintingImages.ipynb notebook, which is used to run the endpoints and generate results:
raw_image = Image.open(“images/speaker.png”).convert(“RGB”)
predictor_sam = PyTorchPredictor(endpoint_name=SAM_ENDPOINT_NAME,
                                deserializer=JSONDeserializer())
output_array = predictor_sam.predict(raw_image, initial_args={‘Accept’: ‘application/json’})
mask_image = Image.fromarray(np.array(output_array).astype(np.uint8))
# save the mask image using PIL Image
mask_image.save(‘images/speaker_mask.png’)
The following figure shows the resulting mask obtained from the product image.

Use inpainting to create a generated image
By combining the power of inpainting with the mask generated by SAM and the user’s prompt, we can create remarkable generated images. Inpainting utilizes advanced generative AI techniques to intelligently fill in the missing or masked regions of an image, seamlessly blending them with the surrounding content. With the SAM-generated mask as guidance and the user’s prompt as a creative input, inpainting algorithms can generate visually coherent and contextually appropriate content, resulting in stunning and personalized images. This fusion of technologies opens up endless creative possibilities, allowing users to transform their visions into vivid, captivating visual narratives.
Host a Stable Diffusion Inpainting model on a SageMaker endpoint
Similarly to 2.1, we use the notebook 1_HostGenAIModels.ipynb to create SageMaker endpoints and host the Stable Diffusion Inpainting model.
We use the inference code in inference_inpainting.py and package that into a code.tar.gz file, which we use to create the SageMaker endpoint. The code downloads the Stable Diffusion Inpainting model, hosts it on an endpoint, and provides an entry point to run inference and generate output:
INPAINTING_ENDPOINT_NAME = ‘inpainting-pytorch-‘ + str(datetime.utcnow().strftime(‘%Y-%m-%d-%H-%M-%S-%f’))
prefix_inpainting = “InPainting/demo-custom-endpoint”
model_data_inpainting = s3.S3Uploader.upload(“code.tar.gz”, f”s3://{bucket}/{prefix_inpainting}”)

model_inpainting = PyTorchModel(entry_point=’inference_inpainting.py’,
                     model_data=model_data_inpainting,
                 framework_version=’1.12′,
                     py_version=’py38′,
                     role=role,
                     env={‘TS_MAX_RESPONSE_SIZE’:’2000000000′, ‘SAGEMAKER_MODEL_SERVER_TIMEOUT’ : ‘300’},
                     sagemaker_session=sess,
                     name=’model-‘+INPAINTING_ENDPOINT_NAME)

predictor_inpainting = model_inpainting.deploy(initial_instance_count=1,
                         instance_type=INSTANCE_TYPE,
                         serializer=JSONSerializer(),
                         deserializers=JSONDeserializer(),
                         endpoint_name=INPAINTING_ENDPOINT_NAME,
                         volume_size=128)
Invoke the Stable Diffusion Inpainting model and generate a new image
Similarly to the step to invoke the SAM model, the notebook 2_GenerateInPaintingImages.ipynb is used to run the inference on the endpoints and generate results:
raw_image = Image.open(“images/speaker.png”).convert(“RGB”)
mask_image = Image.open(‘images/speaker_mask.png’).convert(‘RGB’)
prompt_fr = “table and chair with books”
prompt_bg = “window and couch, table”
negative_prompt = “longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, letters”

inputs = {}
inputs[“image”] = np.array(raw_image)
inputs[“mask”] = np.array(mask_image)
inputs[“prompt_fr”] = prompt_fr
inputs[“prompt_bg”] = prompt_bg
inputs[“negative_prompt”] = negative_prompt

predictor_inpainting = PyTorchPredictor(endpoint_name=INPAINTING_ENDPOINT_NAME,
                                        serializer=JSONSerializer(),
                                        deserializer=JSONDeserializer())

output_array = predictor_inpainting.predict(inputs, initial_args={‘Accept’: ‘application/json’})
gai_image = Image.fromarray(np.array(output_array[0]).astype(np.uint8))
gai_background = Image.fromarray(np.array(output_array[1]).astype(np.uint8))
gai_mask = Image.fromarray(np.array(output_array[2]).astype(np.uint8))
post_image = Image.fromarray(np.array(output_array[3]).astype(np.uint8))

# save the generated image using PIL Image
post_image.save(‘images/speaker_generated.png’)
The following figure shows the refined mask, generated background, generated product image, and postprocessed image.

The generated product image uses the following prompts:

Background generation – “chair, couch, window, indoor”
Inpainting – “besides books”

Clean up
In this post, we use two GPU-enabled SageMaker endpoints, which contributes to the majority of the cost. These endpoints should be turned off to avoid extra cost when the endpoints are not being used. We have provided a notebook, 3_CleanUp.ipynb, which can assist in cleaning up the endpoints. We also use a SageMaker notebook to host the models and run inference. Therefore, it’s good practice to stop the notebook instance if it’s not being used.
Conclusion
Generative AI models are generally large-scale ML models that require specific resources to run efficiently. In this post, we demonstrated, using an advertising use case, how SageMaker endpoints offer a scalable and managed environment for hosting generative AI models such as the text-to-image foundation model Stable Diffusion. We demonstrated how two models can be hosted and run as needed, and multiple models can also be hosted from a single endpoint. This eliminates the complexities associated with infrastructure provisioning, scalability, and monitoring, enabling organizations to focus solely on deploying their models and serving predictions to solve their business challenges. With SageMaker endpoints, organizations can efficiently deploy and manage multiple models within a unified infrastructure, achieving optimal resource utilization and reducing operational overhead.
The detailed code is available on GitHub. The code demonstrates the use of AWS CloudFormation and the AWS Cloud Development Kit (AWS CDK) to automate the process of creating SageMaker notebooks and other required resources.

About the authors
Fabian Benitez-Quiroz is a IoT Edge Data Scientist in AWS Professional Services. He holds a PhD in Computer Vision and Pattern Recognition from The Ohio State University. Fabian is involved in helping customers run their machine learning models with low latency on IoT devices and in the cloud across various industries.
Romil Shah is a Sr. Data Scientist at AWS Professional Services. Romil has more than 6 years of industry experience in computer vision, machine learning, and IoT edge devices. He is involved in helping customers optimize and deploy their machine learning models for edge devices and on the cloud. He works with customers to create strategies for optimizing and deploying foundation models.
Han Man is a Senior Data Science & Machine Learning Manager with AWS Professional Services based in San Diego, CA. He has a PhD in Engineering from Northwestern University and has several years of experience as a management consultant advising clients in manufacturing, financial services, and energy. Today, he is passionately working with key customers from a variety of industry verticals to develop and implement ML and GenAI solutions on AWS.

Meet AnyLoc: The Latest Universal Method For Visual Place Recognition …

As the field of Artificial Intelligence is constantly progressing, it has paved its way into a number of use cases, including robotics. Considering Visual Place Recognition (VPR) is a critical skill for estimating robot status and is widely used in a variety of robotic systems, such as wearable technology, drones, autonomous vehicles, and ground-based robots. With the utilization of visual data, VPR enables robots to recognize and comprehend their current location or place within their surroundings.

It has been difficult to achieve universal application for VPR across a variety of contexts. Though modern VPR methods perform well when applied to contexts that are comparable to those in which they were taught, such as urban driving scenarios, these techniques display a significant decline in effectiveness in various settings, such as aquatic or aerial environments. Efforts have been put into designing a universal VPR solution that can operate without error in any environment, including aerial, underwater, and subterranean environments, at any time, being resilient to changes like day-night or seasonal variations, and from any viewpoint remaining unaffected by variations in perspective, including diametrically opposite views.

To address the limitations, a group of researchers has introduced a new baseline VPR method called AnyLoc. The team has examined the visual feature representations taken from large-scale pretrained models, which they refer to as foundation models, as an alternative to merely relying on VPR-specific training. Although these models are not initially trained for VPR, they do store a wealth of visual features that may one day form the cornerstone of an all-encompassing VPR solution.

In the AnyLoc technique, the best foundation models and visual features with the required invariance attributes are carefully chosen in which the invariance attributes include the capacity of the model to maintain specific visual qualities despite changes in the surroundings or point of view. The prevalent local-aggregation methods that are frequently utilized in VPR literature are then merged with these chosen attributes. Making more educated conclusions about location recognition requires the consolidation of data from different areas of the visual input using local aggregation techniques.

AnyLoc works by fusing the foundation models’ rich visual elements with local aggregation techniques, making the AnyLoc-equipped robot extremely adaptable and useful in various settings. It can conduct visual location recognition in a wide range of environments, at various times of the day or year, and from varied perspectives. The team has summarized the findings as follows.

Universal VPR Solution: AnyLoc has been proposed as a new baseline for VPR, which works seamlessly across 12 diverse datasets encompassing place, time, and perspective variations.

Feature-Method Synergy: Combining self-supervised features like DINOv2 with unsupervised aggregation like VLAD or GeM yields significant performance gains over the direct use of per-image features from off-the-shelf models.

Semantic Feature Characterization: Analyzing semantic properties of aggregated local features uncovers distinct domains in the latent space, enhancing VLAD vocabulary construction and boosting performance.

Robust Evaluation: The team has evaluated AnyLoc on diverse datasets in challenging VPR conditions, such as day-night variations and opposing viewpoints, setting a strong baseline for future universal VPR research.

Check out the Paper, GitHub, and Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 27k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

The post Meet AnyLoc: The Latest Universal Method For Visual Place Recognition (VPR) appeared first on MarkTechPost.