Use AWS PrivateLink to set up private access to Amazon Bedrock

Amazon Bedrock is a fully managed service provided by AWS that offers developers access to foundation models (FMs) and the tools to customize them for specific applications. It allows developers to build and scale generative AI applications using FMs through an API, without managing infrastructure. You can choose from various FMs from Amazon and leading AI startups such as AI21 Labs, Anthropic, Cohere, and Stability AI to find the model that’s best suited for your use case. With the Amazon Bedrock serverless experience, you can quickly get started, easily experiment with FMs, privately customize them with your own data, and seamlessly integrate and deploy them into your applications using AWS tools and capabilities.
Customers are building innovative generative AI applications using Amazon Bedrock APIs using their own proprietary data. When accessing Amazon Bedrock APIs, customers are looking for mechanism to set up a data perimeter without exposing their data to internet so they can mitigate potential threat vectors from internet exposure. The Amazon Bedrock VPC endpoint powered by AWS PrivateLink allows you to establish a private connection between the VPC in your account and the Amazon Bedrock service account. It enables VPC instances to communicate with service resources without the need for public IP addresses.
In this post, we demonstrate how to set up private access on your AWS account to access Amazon Bedrock APIs over VPC endpoints powered by PrivateLink to help you build generative AI applications securely with your own data.
Solution overview
You can use generative AI to develop a diverse range of applications, such as text summarization, content moderation, and other capabilities. When building such generative AI applications using FMs or base models, customers want to generate a response without going over the public internet or based on their proprietary data that may reside in their enterprise databases.
In the following diagram, we depict an architecture to set up your infrastructure to read your proprietary data residing in Amazon Relational Database Service (Amazon RDS) and augment the Amazon Bedrock API request with product information when answering product-related queries from your generative AI application. Although we use Amazon RDS in this diagram for illustration purposes, you can test the private access of the Amazon Bedrock APIs end to end using the instructions provided in this post.

The workflow steps are as follows:

AWS Lambda running in your private VPC subnet receives the prompt request from the generative AI application.
Lambda makes a call to proprietary RDS database and augments the prompt query context (for example, adding product information) and invokes the Amazon Bedrock API with the augmented query request.
The API call is routed to the Amazon Bedrock VPC endpoint that is associated to the VPC endpoint policy with Allow permissions to Amazon Bedrock APIs.
The Amazon Bedrock service API endpoint receives the API request over PrivateLink without traversing the public internet.
You can change the Amazon Bedrock VPC endpoint policy to Deny permissions to validate that Amazon Bedrock APIs calls are denied.
You can also privately access Amazon Bedrock APIs over the VPC endpoint from your corporate network through an AWS Direct Connect gateway.

Before you get started, make sure you have the following prerequisites:

An AWS account
An AWS Identity and Access Management (IAM) federation role with access to do the following:

Create, edit, view, and delete VPC network resources
Create, edit, view and delete Lambda functions
Create, edit, view and delete IAM roles and policies
List foundation models and invoke the Amazon Bedrock foundation model

For this post, we use the us-east-1 Region
Request foundation model access via the Amazon Bedrock console

Set up the private access infrastructure
In this section, we set up the infrastructure such as VPC, private subnets, security groups, and Lambda function using an AWS CloudFormation template.
Use the following template to create the infrastructure stack Bedrock-GenAI-Stack in your AWS account.
The CloudFormation template creates the following resources on your behalf:

A VPC with two private subnets in separate Availability Zones
Security groups and routing tables
IAM role and policies for use by Lambda, Amazon Bedrock, and Amazon Elastic Compute Cloud (Amazon EC2)

Set up the VPC endpoint for Amazon Bedrock
In this section, we use Amazon Virtual Private Cloud (Amazon VPC) to set up the VPC endpoint for Amazon Bedrock to facilitate private connectivity from your VPC to Amazon Bedrock.

On the Amazon VPC console, under Virtual private cloud in the navigation pane, choose Endpoints.
Choose Create endpoint.
For Name tag, enter bedrock-vpce.
Under Services, search for bedrock-runtime and select com.amazonaws.<region>.bedrock-runtime.
For VPC, specify the VPC Bedrock-GenAI-Project-vpc that you created through the CloudFormation stack in the previous section.
In the Subnets section, and select the Availability Zones and choose the corresponding subnet IDs from the drop-down menu.
For Security groups, select the security group with the group name Bedrock-GenAI-Stack-VPCEndpointSecurityGroup- and description Allow TLS for VPC Endpoint.

A security group acts as a virtual firewall for your instance to control inbound and outbound traffic. Note that this VPC endpoint security group only allows traffic originating from the security group attached to your VPC private subnets, adding a layer of protection.

Choose Create endpoint.
In the Policy section, select Custom and enter the following least privilege policy to ensure only certain actions are allowed on the specified foundation model resource, arn:aws:bedrock:*::foundation-model/anthropic.claude-instant-v1 for a given principal (such as Lambda function IAM role).

“Version”: “2012-10-17”,
“Statement”: [
“Action”: [
“Resource”: [
“Effect”: “Allow”,
“Principal”: {
“AWS”: “arn:aws:iam::<accountid>:role/GenAIStack-Bedrock”

It may take up to 2 minutes until the interface endpoint is created and the status changes to Available. You can refresh the page to check the latest status.

Set up the Lambda function over private VPC subnets
Complete the following steps to configure the Lambda function:

On the Lambda console, choose Functions in the navigation pane.
Choose the function gen-ai-lambda-stack-BedrockTestLambdaFunction-XXXXXXXXXXXX.
On the Configuration tab, choose Permissions in the left pane.
Under Execution role¸ choose the link for the role gen-ai-lambda-stack-BedrockTestLambdaFunctionRole-XXXXXXXXXXXX.

You’re redirected to the IAM console.

In the Permissions policies section, choose Add permissions and choose Create inline policy.
On the JSON tab, modify the policy as follows:

“Version”: “2012-10-17”,
“Statement”: [
“Sid”: “eniperms”,
“Effect”: “Allow”,
“Action”: [
“Resource”: “*”

Choose Next.
For Policy name, enter enivpce-policy.
Choose Create policy.
Add the following inline policy (provide your source VPC endpoints) for restricting Lambda access to Amazon Bedrock APIs only via VPC endpoints:

“Id”: “lambda-bedrock-sourcevpce-access-only”,
“Version”: “2012-10-17”,
“Statement”: [
“Effect”: “Allow”,
“Action”: [
“Resource”: “*”,
“Condition”: {
“ForAnyValue:StringEquals”: {
“aws:sourceVpce”: [

On Lambda function page, on the Configuration tab, choose VPC in the left pane, then choose Edit.
For VPC, choose Bedrock-GenAI-Project-vpc.
For Subnets, choose the private subnets.
For Security groups, choose gen-ai-lambda-stack-SecurityGroup- (the security group for the Amazon Bedrock workload in private subnets).
Choose Save.

Test private access controls
Now you can test the private access controls (Amazon Bedrock APIs over VPC endpoints).

On the Lambda console, choose Functions in the navigation pane.
Choose the function gen-ai-lambda-stack-BedrockTestLambdaFunction-XXXXXXXXXXXX.
On the Code tab, choose Test.

You should see the following response from the Amazon Bedrock API call (Status: Succeeded).

To deny access to Amazon Bedrock APIs over VPC endpoints, navigate to the Amazon VPC console.
Under Virtual private cloud in the navigation pane, choose Endpoints.
Choose your policy and navigate to the Policy tab.

Currently, the VPC endpoint policy is set to Allow.

To deny access, choose Edit Policy.
Change Allow to Deny and choose Save.

It may take up to 2 minutes for the policy for the VPC endpoint to update.

“Version”: “2012-10-17”,
“Statement”: [
“Action”: [
“Resource”: [
“Effect”: “Deny”,
“Principal”: {
“AWS”: “arn:aws:iam::<accountid>:role/GenAIStack-Bedrock”

Return to the Lambda function page and on the Code tab, choose Test.

As shown in the following screenshot, the access request to Amazon Bedrock over the VPC endpoint was denied (Status: Failed).

Through this testing process, we demonstrated how traffic from your VPC to the Amazon Bedrock API endpoint is traversing over the PrivateLink connection and not through the internet connection.
Clean up
Follow these steps to avoid incurring future charges:

Clean up the VPC endpoints.
Clean up the VPC.
Delete the CloudFormation stack.

In this post, we demonstrated how to set up and operationalize a private connection between a generative AI workload deployed on your customer VPC and Amazon Bedrock using an interface VPC endpoint powered by PrivateLink. When using the architecture discussed in this post, the traffic between your customer VPC and Amazon Bedrock will not leave the Amazon network, ensuring your data is not exposed to the public internet and thereby helping with your compliance requirements.
As a next step, try the solution out in your account and share your feedback.

About the Authors
Ram Vittal is a Principal ML Solutions Architect at AWS. He has over 3 decades of experience architecting and building distributed, hybrid, and cloud applications. He is passionate about building secure and scalable AI/ML and big data solutions to help enterprise customers with their cloud adoption and optimization journey to improve their business outcomes. In his spare time, he rides his motorcycle and walks with his 3-year-old Sheepadoodle!
Ray Khorsandi is an AI/ML specialist at AWS, supporting strategic customers with AI/ML best practices. With an M.Sc. and Ph.D. in Electrical Engineering and Computer Science, he leads enterprises to build secure, scalable AI/ML and big data solutions to optimize their cloud adoption. His passions include computer vision, NLP, generative AI, and MLOps. Ray enjoys playing soccer and spending quality time with family.
Michael Daniels is an AI/ML Specialist at AWS. His expertise lies in building and leading AI/ML and generative AI solutions for complex and challenging business problems, which is enhanced by his Ph.D. from the Univ. of Texas and his M.Sc. in Computer Science specialization in Machine Learning from the Georgia Institute of Technology. He excels in applying cutting-edge cloud technologies to innovate, inspire, and transform industry-leading organizations, while also effectively communicating with stakeholders at any level or scale. In his spare time, you can catch Michael skiing or snowboarding in the mountains.

Deploy and fine-tune foundation models in Amazon SageMaker JumpStart w …

We are excited to announce a simplified version of the Amazon SageMaker JumpStart SDK that makes it straightforward to build, train, and deploy foundation models. The code for prediction is also simplified. In this post, we demonstrate how you can use the simplified SageMaker Jumpstart SDK to get started with using foundation models in just a couple of lines of code.
For more information about the simplified SageMaker JumpStart SDK for deployment and training, refer to Low-code deployment with the JumpStartModel class and Low-code fine-tuning with the JumpStartEstimator class, respectively.
Solution overview
SageMaker JumpStart provides pre-trained, open-source models for a wide range of problem types to help you get started with machine learning (ML). You can incrementally train and fine-tune these models before deployment. JumpStart also provides solution templates that set up infrastructure for common use cases, and executable example notebooks for ML with Amazon SageMaker. You can access the pre-trained models, solution templates, and examples through the SageMaker JumpStart landing page in Amazon SageMaker Studio or use the SageMaker Python SDK.
To demonstrate the new features of the SageMaker JumpStart SDK, we show you how to use the pre-trained Flan T5 XL model from Hugging Face for text generation for summarization tasks. We also showcase how, in just a few lines of code, you can fine-tune the Flan T5 XL model for summarization tasks. You can use any other model for text generation like Llama2, Falcon, or Mistral AI.
You can find the notebook for this solution using Flan T5 XL in the GitHub repo.
Deploy and invoke the model
Foundation models hosted on SageMaker JumpStart have model IDs. For the full list of model IDs, refer to Built-in Algorithms with pre-trained Model Table. For this post, we use the model ID of the Flan T5 XL text generation model. We instantiate the model object and deploy it to a SageMaker endpoint by calling its deploy method. See the following code:

from sagemaker.jumpstart.model import JumpStartModel

# Replace with larger model if needed
pretrained_model = JumpStartModel(model_id=”huggingface-text2text-flan-t5-base”)
pretrained_predictor = pretrained_model.deploy()

Next, we invoke the model to create a summary of the provided text using the Flan T5 XL model. The new SDK interface makes it straightforward for you to invoke the model: you just need to pass the text to the predictor and it returns the response from the model as a Python dictionary.

text = “””Summarize this content – Amazon Comprehend uses natural language processing (NLP) to extract insights about the content of documents. It develops insights by recognizing the entities, key phrases, language, sentiments, and other common elements in a document. Use Amazon Comprehend to create new products based on understanding the structure of documents. For example, using Amazon Comprehend you can search social networking feeds for mentions of products or scan an entire document repository for key phrases.
You can access Amazon Comprehend document analysis capabilities using the Amazon Comprehend console or using the Amazon Comprehend APIs. You can run real-time analysis for small workloads or you can start asynchronous analysis jobs for large document sets. You can use the pre-trained models that Amazon Comprehend provides, or you can train your own custom models for classification and entity recognition. “””
query_response = pretrained_predictor.predict(text)

The following is the output of the summarization task:

Understand how Amazon Comprehend works. Use Amazon Comprehend to analyze documents.

Fine-tune and deploy the model
The SageMaker JumpStart SDK provides you with a new class, JumpStartEstimator, which simplifies fine-tuning. You can provide the location of fine-tuning data and optionally pass validations datasets as well. After you fine-tune the model, use the deploy method of the Estimator object to deploy the fine-tuned model:

from sagemaker.jumpstart.estimator import JumpStartEstimator

estimator = JumpStartEstimator(
estimator.set_hyperparameters(instruction_tuned=”True”, epoch=”3″, max_input_length=”1024″){“training”: train_data_location})
finetuned_predictor = estimator.deploy()

Customize the new classes in the SageMaker SDK
The new SDK makes it straightforward to deploy and fine-tune JumpStart models by defaulting many parameters. You still have the option to override the defaults and customize the deployment and invocation based on your requirements. For example, you can customize input payload format type, instance type, VPC configuration, and more for your environment and use case.
The following code shows how to override the instance type while deploying your model:

finetuned_predictor = estimator.deploy(instance_type=’ml.g5.2xlarge’)

The SageMaker JumpStart SDK deploy function will automatically select a default content type and serializer for you. If you want to change the format type of the input payload, you can use serializers and content_types objects to introspect the options available to you by passing the model_id of the model you are working with. In the following code, we set the payload input format as JSON by setting JSONSerializer as serializer and application/json as content_type:

from sagemaker import serializers
from sagemaker import content_types

serializer_options = serializers.retrieve_options(model_id=model_id, model_version=model_version)
content_type_options = content_types.retrieve_options(model_id=model_id, model_version=model_version)

pretrained_predictor.serializer = serializers.JSONSerializer()
pretrained_predictor.content_type = ‘application/json’

Next, you can invoke the Flan T5 XL model for the summarization task with a payload of the JSON format. In the following code, we also pass inference parameters in the JSON payload for making responses more accurate:

from sagemaker import serializers

input_text= “””Summarize this content – Amazon Comprehend uses natural language processing (NLP) to extract insights about the content of documents. It develops insights by recognizing the entities, key phrases, language, sentiments, and other common elements in a document. Use Amazon Comprehend to create new products based on understanding the structure of documents. For example, using Amazon Comprehend you can search social networking feeds for mentions of products or scan an entire document repository for key phrases.
You can access Amazon Comprehend document analysis capabilities using the Amazon Comprehend console or using the Amazon Comprehend APIs. You can run real-time analysis for small workloads or you can start asynchronous analysis jobs for large document sets. You can use the pre-trained models that Amazon Comprehend provides, or you can train your own custom models for classification and entity recognition. “””

parameters = {
“max_length”: 600,
“num_return_sequences”: 1,
“top_p”: 0.01,
“do_sample”: False,

payload = {“text_inputs”: input_text, **parameters} #JSON Input format

pretrained_predictor.serializer = serializers.JSONSerializer()
query_response = pretrained_predictor.predict(payload)

If you’re looking for more ways to customize the inputs and other options for hosting and fine-tuning, refer to the documentation for the JumpStartModel and JumpStartEstimator classes.
In this post, we showed you how you can use the simplified SageMaker JumpStart SDK for building, training, and deploying task-based and foundation models in just a few lines of code. We demonstrated the new classes like JumpStartModel and JumpStartEstimator using the Hugging Face Flan T5-XL model as an example. You can use any of the other SageMaker JumpStart foundation models for use cases such as content writing, code generation, question answering, summarization, classification, information retrieval, and more. To see the whole list of models available with SageMaker JumpStart, refer to Built-in Algorithms with pre-trained Model Table. SageMaker JumpStart also supports task-specific models for many popular problem types.
We hope the simplified interface of the SageMaker JumpStart SDK will help you get started quickly and enable you to deliver faster. We look forward to hearing how you use the simplified SageMaker JumpStart SDK to create exciting applications!

About the authors
Evan Kravitz is a software engineer at Amazon Web Services, working on SageMaker JumpStart. He is interested in the confluence of machine learning with cloud computing. Evan received his undergraduate degree from Cornell University and master’s degree from the University of California, Berkeley. In 2021, he presented a paper on adversarial neural networks at the ICLR conference. In his free time, Evan enjoys cooking, traveling, and going on runs in New York City.
Rachna Chadha is a Principal Solution Architect AI/ML in Strategic Accounts at AWS. Rachna is an optimist who believes that ethical and responsible use of AI can improve society in the future and bring economic and social prosperity. In her spare time, Rachna likes spending time with her family, hiking, and listening to music.
Jonathan Guinegagne is a Senior Software Engineer with Amazon SageMaker JumpStart at AWS. He got his master’s degree from Columbia University. His interests span machine learning, distributed systems, and cloud computing, as well as democratizing the use of AI. Jonathan is originally from France and now lives in Brooklyn, NY.
Dr. Ashish Khetan is a Senior Applied Scientist with Amazon SageMaker built-in algorithms and helps develop machine learning algorithms. He got his PhD from University of Illinois Urbana-Champaign. He is an active researcher in machine learning and statistical inference, and has published many papers in NeurIPS, ICML, ICLR, JMLR, ACL, and EMNLP conferences.

Meet GROOT: A Robust Imitation Learning Framework for Vision-Based Man …

With the increase in the popularity and use cases of Artificial Intelligence, Imitation learning (IL) has shown to be a successful technique for teaching neural network-based visuomotor strategies to perform intricate manipulation tasks. The problem of building robots that can do a wide variety of manipulation tasks has long plagued the robotics community. Robots face a variety of environmental elements in real-world circumstances, including shifting camera views, changing backgrounds, and the appearance of new object instances. These perception differences have frequently been shown to be obstacles to conventional robotics methods.

Improving the robustness and adaptability of IL algorithms to environmental variables is critical in order to utilise their capabilities. Previous research has shown that even little visual changes in the environment, including backdrop colour changes, camera viewpoint alterations, or the addition of new object instances, can have an impact on end-to-end learning policies, as a result of which, IL policies are usually assessed in controlled circumstances using cameras that are calibrated correctly and fixed backgrounds.

Recently, a team of researchers from The University of Texas at Austin and Sony AI has introduced GROOT, a unique imitation learning technique that builds strong policies for manipulation tasks involving vision. It tackles the problem of allowing robots to function well in real-world settings, where there are frequent changes in background, camera viewpoint, and object introduction, among other perceptual alterations. In order to overcome these obstacles, GROOT focuses on building object-centric 3D representations and reasoning over them using a transformer-based strategy and also proposes a connection model for segmentation, which allows rules to generalise to new objects in testing.

The development of object-centric 3D representations is the core of GROOT’s innovation. The purpose of these representations is to direct the robot’s perception, help it concentrate on task-relevant elements, and help it block out visual distractions. GROOT gives the robot a strong framework for decision-making by thinking in three dimensions, which provides it with a more intuitive grasp of the environment. GROOT uses a transformer-based approach to reason over these object-centric 3D representations. It is able to efficiently analyse the 3D representations and make judgements and is a significant step towards giving robots more sophisticated cognitive capabilities.

GROOT has the ability to generalise outside of the initial training settings and is good at adjusting to various backgrounds, camera angles, and the presence of items that haven’t been observed before, whereas many robotic learning techniques are inflexible and have trouble in such settings. GROOT is an exceptional solution to the intricate problems that robots encounter in the actual world because of its exceptional generalisation potential.

GROOT has been tested by the team through a number of extensive studies. These tests thoroughly assess GROOT’s capabilities in both simulated and real-world settings. It has been shown to perform exceptionally well in simulated situations, especially when perceptual differences are present. It outperforms the most recent techniques, such as object proposal-based tactics and end-to-end learning methodologies.

In conclusion, in the area of robotic vision and learning, GROOT is a major advancement. Its emphasis on robustness, adaptability, and generalisation in real-world scenarios can make numerous applications possible. GROOT has addressed the problems of robust robotic manipulation in a dynamic world and has led to robots functioning well and seamlessly in complicated and dynamic environments.

Check out the Paper, Github, and Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 32k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..
The post Meet GROOT: A Robust Imitation Learning Framework for Vision-Based Manipulation with Object-Centric 3D Priors and Adaptive Policy Generalization appeared first on MarkTechPost.

Optimizing Computational Costs with AutoMix: An AI Strategic Approach …

AutoMix is an innovative approach that optimises the allocation of queries to larger language models (LLMs) by assessing the approximate correctness of responses from a smaller LM. It incorporates a few-shot self-verification process and a meta-verifier to enhance accuracy. AutoMix showcases its efficiency in balancing computational cost and performance in language processing tasks.

When it comes to verifying information, AutoMix takes a different approach than other methods. Rather than solely relying on LLM knowledge, it uses context to ensure accuracy. Its unique few-shot self-verification mechanism and meta-verifier assess the reliability of its output without requiring any training. This emphasis on context and robust self-verification aligns with conformal prediction. Unlike other approaches that require verifier training or architectural modifications, AutoMix provides flexibility between models and only requires black-box access to APIs.

The iterative model-switching method used by the problem-solving approach AutoMix involves querying models of different sizes and capabilities, with feedback verification at each step to determine whether to accept the output or switch to a more capable model. This approach doesn’t need separate models or access to model weights and gradients, as it utilises black-box language model APIs. The process is more efficient and effective by introducing few-shot learning and self-verification for solution generation, verification, and model switching.

AutoMix employs a few-shot self-verification process to assess its output reliability without training. It enhances accuracy with a meta-verifier. Queries are categorised into Simple, Complex, or Unsolvable using a Partially Observable Markov Decision Process (POMDP) framework. AutoMix intelligently routes queries to larger language models based on approximate output correctness from smaller models. The Incremental Benefit Per Unit Cost (IBC) metric quantifies the efficiency of combining smaller and larger language models, optimising computational cost and performance in language processing tasks.

Through context-grounded reasoning, AutoMix has significantly enhanced IBC (Intentional Behaviour Change) performance, outperforming baseline methods by up to 89% across five datasets. The meta-verifier included in this tool consistently shows superior IBC performance, particularly in the LLAMA2-1370B datasets. The top performer in three of five datasets is AutoMix-POMDP, which offers significant improvements in most of them. It maintains a positive IBC across all evaluated costs, indicating consistent enhancements. The POMDP-based meta-verifier in AutoMix has also been shown to outperform Verifier-Self-Consistency by up to 42% across all datasets.

In conclusion, AutoMix is a promising framework that effectively combines black-box LLM APIs in a multi-step problem-solving approach. Its self-verification and context-grounded few-shot verification demonstrate a good balance between performance and computational cost, making it suitable for various scenarios. Furthermore, integrating a POMDP in AutoMix enhances the accuracy of the few-shot verifier, highlighting its potential to improve the performance of LLM during inference. Overall, AutoMix shows promising capabilities for language processing tasks.

Future research can explore AutoMix’s application in various domains and tasks to assess its versatility. Evaluating AutoMix’s performance with diverse language model combinations is crucial, ensuring scalability to larger models. Refinement of the few-shot self-verification mechanism, potentially incorporating contextual or external information, is needed for improved accuracy. Alternative meta-verifiers or verification techniques can be investigated to enhance AutoMix. User studies are essential to evaluate AutoMix’s practical usability and user satisfaction in real-world scenarios.

Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 32k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..
The post Optimizing Computational Costs with AutoMix: An AI Strategic Approach to Leveraging Large Language Models from the Cloud appeared first on MarkTechPost.

NYU Researchers have Created a Neural Network for Genomics that can Ex …

In the world of biological research, machine-learning models are making significant strides in advancing our understanding of complex processes, with a particular focus on RNA splicing. However, a common limitation of many machine learning models in this field is their lack of interpretability – they can predict outcomes accurately but struggle to explain how they arrived at those predictions.

To address this issue, NYU researchers have introduced an “interpretable-by-design” approach that not only ensures accurate predictive outcomes but also provides insights into the underlying biological processes, specifically RNA splicing. This innovative model has the potential to significantly enhance our understanding of this fundamental process.

Machine learning models like neural networks have been instrumental in advancing scientific discovery and experimental design in biological sciences. However, their non-interpretability has been a persistent challenge. Despite their high accuracy, they often cannot shed light on the reasoning behind their predictions.

The new “interpretable-by-design” approach overcomes this limitation by creating a neural network model explicitly designed to be interpretable while maintaining predictive accuracy on par with state-of-the-art models. This approach is a game-changer in the field, as it bridges the gap between accuracy and interpretability, ensuring that researchers not only have the right answers but also understand how those answers were derived.

The model was meticulously trained with an emphasis on interpretability, using Python 3.8 and TensorFlow 2.6. Various hyperparameters were tuned, and the training process incorporated progressive steps to gradually introduce learnable parameters. The model’s interpretability was further enhanced through the introduction of regularization terms, ensuring that the learned features were concise and comprehensible.

One remarkable aspect of this model is its ability to generalize and make accurate predictions on various datasets from different sources, highlighting its robustness and its potential to capture essential aspects of splicing regulatory logic. This means that it can be applied to diverse biological contexts, providing valuable insights across different RNA splicing scenarios.

The model’s architecture includes sequence and structure filters, which are instrumental in understanding RNA splicing. Importantly, it assigns quantitative strengths to these filters, shedding light on the magnitude of their influence on splicing outcomes. Through a visualization tool called the “balance plot,” researchers can explore and quantify how multiple RNA features contribute to the splicing outcomes of individual exons. This tool simplifies the understanding of the complex interplay of various features in the splicing process.

Moreover, this model has not only confirmed previously established RNA splicing features but also uncovered two uncharacterized exon-skipping features related to stem loop structures and G-poor sequences. These findings are significant and have been experimentally validated, reinforcing the model’s credibility and the biological relevance of these features.

In conclusion, the “interpretable-by-design” machine learning model represents a powerful tool in the biological sciences. It not only achieves high predictive accuracy but also provides a clear and interpretable understanding of RNA splicing processes. The model’s ability to quantify the contributions of specific features to splicing outcomes has the potential for various applications in medical and biotechnology fields, from genome editing to the development of RNA-based therapeutics. This approach is not limited to splicing but can also be applied to decipher other complex biological processes, opening new avenues for scientific discovery.

Check out the Paper and Github. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 32k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..
The post NYU Researchers have Created a Neural Network for Genomics that can Explain How it Reaches its Predictions appeared first on MarkTechPost.

Enhancing Engineering Design Evaluation through Comprehensive Metrics …

In engineering design, the reliance on deep generative models (DGMs) has surged in recent years. However, evaluating these models has predominantly revolved around statistical similarity, often neglecting critical aspects such as design constraints, diversity, and novelty. As a result, the need for a more comprehensive and nuanced evaluation framework has become increasingly apparent. To address this, a research team has set out to develop and propose a complete set of design-focused metrics, aiming to offer a more holistic understanding of the capabilities and limitations of DGMs in engineering design tasks.

The evaluation of deep generative models in engineering design heavily leans on statistical similarity as the primary metric. However, this approach overlooks crucial design constraints, limiting the potential for exploring diverse and novel design solutions. Recognizing these limitations, the research team has proposed a curated set of alternative evaluation metrics tailored for engineering design tasks. These metrics encompass a range of critical aspects, including constraint satisfaction, diversity, novelty, and target achievement, providing a more comprehensive and insightful assessment of the capabilities of DGMs in engineering design.

The newly introduced evaluation metrics address various facets crucial to engineering design tasks. These metrics encompass constraint satisfaction, performance, conditioning adherence, design exploration, and target achievement. Each metric is meticulously designed to capture the intricacies and complexities of engineering design, enabling a more profound understanding of the strengths and weaknesses of DGMs. By integrating these metrics into the evaluation process, researchers and practitioners can gain deeper insights into the design space, fostering the identification of novel and diverse design solutions while ensuring adherence to critical constraints.

The proposed metrics have been developed through a rigorous process that accounts for the multifaceted nature of engineering design tasks. They provide a comprehensive framework for assessing the performance and capabilities of DGMs, empowering researchers and practitioners to make informed decisions and advancements in engineering design. Integrating these metrics facilitates a more robust and insightful evaluation process, facilitating the identification of superior design solutions that adhere to stringent constraints and offer novel and diverse perspectives.

The research highlights the critical importance of comprehensive evaluation metrics in the domain of deep generative models for engineering design. By offering a more nuanced and holistic approach to assessing the capabilities of DGMs, the proposed metrics pave the way for substantial advancements in engineering design. The comprehensive evaluation framework enables researchers and practitioners to explore the design space more thoroughly, promoting the discovery of innovative and diverse solutions while ensuring compliance with stringent design constraints. With the integration of these metrics, the field of engineering design is poised for a significant transformation, fostering a more innovative and dynamic landscape that embraces novel design possibilities.

Check out the Paper and MIT Article. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 32k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..
The post Enhancing Engineering Design Evaluation through Comprehensive Metrics for Deep Generative Models appeared first on MarkTechPost.

A New AI Research from Fujitsu Improves Weakly-Supervised Action Segme …

Recent developments in the field of human action recognition have enabled some amazing breakthroughs in Human-Robot Interaction (HRI). With this technology, robots have begun to understand human behavior and react accordingly. Action segmentation, which is the process of determining the labels and temporal bounds of human actions, is a crucial part of action recognition. Robots must have this skill in order to dynamically localize human behaviors and work well with people.

Conventional methods for action-segmentation model training demand a large number of labels. For thorough supervision, it is ideal to have frame-wise labels, i.e., labels applied to every frame of action, but these labels provide two significant difficulties. First of all, it can be expensive and time-consuming to annotate action labels for each frame. Second, there may be bias in the data due to inconsistent labeling from multiple annotators and unclear time boundaries between actions.

To address these challenges, in recent research, a team of researchers has proposed a new and unique learning technique during the training phase. Their method maximizes the likelihood of action union for unlabeled frames that fall between two consecutive timestamps. The probability that a given frame has a mix of actions indicated by the labels of the surrounding timestamps is known as action union. This approach improves the quality of the training process by giving more dependable learning targets for unlabeled frames by taking the action union probability into account.

The team has developed a novel refining method during the inference step to provide better hard-assigned action labels from the model’s soft-assigned predictions. The action classes that are allocated to frames are made more precise and reliable through this refinement process. It considers not only the frame-by-frame predictions but also the consistency and smoothness of action labels over time in different video segments. This improves the model’s capacity to provide accurate action categorizations.

The techniques created in this research are intended to be model-agnostic, implying they can be utilized with various current action segmentation frameworks. These methods’ adaptability makes it possible to include them in various robot learning systems without having to make significant changes. These techniques’ effectiveness was assessed using three widely used action-segmentation datasets. The outcomes demonstrated that this method achieved new state-of-the-art performance levels by outperforming earlier timestamp-supervision techniques. The team also pointed out that their method produced similar outcomes with less than 1% of fully-supervised labels, which makes it an extremely economical solution that can equal or even outperform fully-supervised techniques in terms of performance. This illustrates how their suggested method might effectively advance the field of action segmentation and its applications in human-robot interaction.

The primary contributions have been summarized as follows.

Action-union optimization has been introduced into action-segmentation training, enhancing model performance. This innovative approach considers the probability of action combinations for unlabeled frames between timestamps.

A new and extremely beneficial post-processing technique has been introduced to improve the action-segmentation models’ output. The action classifications’ correctness and dependability are greatly increased by this refinement process.

The method has produced new state-of-the-art outcomes on pertinent datasets, demonstrating its potential to further Human-Robot Interaction research. 

Check out the Paper and Github. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 32k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..
The post A New AI Research from Fujitsu Improves Weakly-Supervised Action Segmentation For Human-Robot Interaction With Action-Union Learning appeared first on MarkTechPost.

Meet Eureka: A Human-Level Reward Design Algorithm Powered by Large La …

Large Language Models (LLMs) are great at high-level planning but need to help master low-level tasks like pen spinning. However, a team of researchers from NVIDIA, UPenn, Caltech, and UT Austin have developed an algorithm called EUREKA that uses advanced LLMs, such as GPT-4, to create reward functions for complex skill acquisition through reinforcement learning. EUREKA outperforms human-engineered rewards by providing safer and higher-quality tips through gradient-free, in-context learning based on human feedback. This breakthrough paves the way for LLM-powered skill acquisition, as demonstrated by the simulated Shadow Hand mastering pen spinning tricks.

Reward engineering in reinforcement learning has posed challenges, with existing methods like manual trial-and-error and inverse reinforcement learning needing more scalability and adaptability. EUREKA introduces an approach by utilising LLMs to generate interpretable reward codes, enhancing rewards in real-time. While previous works have explored LLMs for decision-making, EUREKA is groundbreaking in its application to low-level skill-learning tasks, pioneering evolutionary algorithms with LLMs for reward design without initial candidates or few-shot prompting.

LLMs excel in high-level planning but need help with low-level skills like pen spinning. Reward design in reinforcement learning often relies on time-consuming trial and error. Their study presents EUREKA leveraging advanced coding LLMs, such as GPT-4, to create reward functions for various tasks autonomously, outperforming human-engineered rewards in diverse environments. EUREKA also enables in-context learning from human feedback, enhancing reward quality and safety. It addresses the challenge of dexterous manipulation tasks unattainable through manual reward engineering.

EUREKA, an algorithm powered by LLMs like GPT-4, autonomously generates reward functions, excelling in 29 RL environments. It employs in-context learning from human feedback (RLHF) to enhance reward quality and safety without model updates. EUREKA’s rewards enable training a simulated Shadow Hand in pen spinning and rapid pen manipulation. It pioneers evolutionary algorithms with LLMs for reward design, eliminating the need for initial candidates or few-shot prompting, marking a significant advancement in reinforcement learning.

EUREKA outperforms L2R, showcasing its reward generation expressiveness. EUREKA consistently improves, with its best rewards eventually surpassing human benchmarks. It creates unique rewards weakly correlated with human ones, potentially uncovering counterintuitive design principles. Reward reflection enhances performance in higher-dimensional tasks. Together with curriculum learning, EUREKA succeeds in dexterous pen-spinning tasks using a simulated Shadow Hand.

EUREKA, a reward design algorithm driven by LLMs, attains human-level reward generation, excelling in 83% of tasks with an average of 52% improvement. Combining LLMs with evolutionary algorithms proves a versatile and scalable approach for reward design in challenging, open-ended problems. EUREKA’s success in dexterity is evident in solving complex tasks, such as dexterous pen spinning, using curriculum learning. Its adaptability and substantial performance enhancements are promising for diverse reinforcement learning and reward design applications.

Future research avenues include evaluating EUREKA’s adaptability and performance in more diverse and complex environments and with different robot designs. Assessing its real-world applicability beyond simulation is crucial. Exploring synergies with reinforcement learning techniques, like model-based methods or meta-learning, could further enhance EUREKA’s capabilities. Investigating the interpretability of EUREKA’s generated reward functions is essential for understanding its underlying decision-making processes. Enhancing human feedback integration and exploring EUREKA’s potential in various domains beyond robotics are promising directions.

Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 32k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..
The post Meet Eureka: A Human-Level Reward Design Algorithm Powered by Large Language Model LLMs appeared first on MarkTechPost.

A Comprehensive Review of Video Diffusion Models in the Artificial Int …

Artificial Intelligence is booming, and so is its sub-field, i.e., the domain of Computer Vision. From researchers and academics to scholars, it is getting a lot of attention and is making a big impact on a lot of different industries and applications, like computer graphics, art and design, medical imaging, etc. Diffusion models have been the main technique for image production among the various approaches. They have outperformed strategies based on generative adversarial networks (GANs) and auto-regressive Transformers. These diffusion-based techniques are preferred because they are controllable, can create a wide range of outputs, and can produce extremely realistic images. They have found use in a variety of computer vision tasks, including 3D generation, video synthesis, dense prediction, and image editing.

The diffusion model has been crucial to the considerable advancements in computer vision, as evidenced by the recent boom in AI-generated content (AIGC). These models are not only achieving remarkable results in image generation and editing, but they are also leading the way in research connected to videos. While surveys addressing diffusion models in the context of picture production have been published, there are few recent reviews that examine their use in the video domain. Recent work provides a thorough evaluation of video diffusion models in the AIGC era in order to close this gap.

In a recent research paper, a team of researchers has highlighted how crucial diffusion models are in showing remarkable generative powers, surpassing alternative techniques, and exhibiting noteworthy performance in image generation and editing, as well as in the field of video-related research. The paper’s main focus is a thorough investigation of video diffusion models in the context of AIGC. It is separated into three main sections: duties related to creating, editing, and comprehending videos. The report summarises the practical contributions made by researchers, reviews the body of literature that has already been written in these fields, and organizes the work.

The paper has also shared the difficulties that researchers in this field face. It also delineates prospective avenues for future research and development in the field of video diffusion models and offers perspectives on potential future directions for the area as well as challenges that still need to be solved.

The primary contributions of the research paper are as follows.

Methodical monitoring and synthesis of current research on video dissemination models has been included, such as a range of topics like video creation, editing, and comprehension.

Background information and pertinent data on video diffusion models have been introduced, along with datasets, assessment measures, and problem definitions.

A summary of the most influential works on the topic, focusing on common technical information, has been shared.

An in-depth examination and contrast of video-generating benchmarks and settings, addressing a critical need in the literature, has also been shared.

To sum up, this study is an invaluable tool for anyone curious about the most recent developments in video diffusion models in the context of AIGC. It also acknowledges the need for additional studies and reviews in the video domain, emphasizing the importance of diffusion models in the context of computer vision. The study provides a thorough overview of the topic by classifying and assessing previous work, highlighting potential future trends and obstacles for further investigation.

Check out the Paper and Github link. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 32k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..
The post A Comprehensive Review of Video Diffusion Models in the Artificial Intelligence Generated Content (AIGC) appeared first on MarkTechPost.

Meet Llemma: The Next-Gen Mathematical Open-Language Model Surpassing …

Language models trained on diverse mixtures of text display remarkably general language understanding and generation capabilities, serving as base models that are adapted to a wide range of applications.

In this study, a team of researchers from Princeton University, EleutherAI, University of Toronto, Vector Institute, University of Cambridge, Carnegie Mellon University and University of Washington have developed a domain-specific language model tailored for mathematics. They have articulated several motivations for pursuing this endeavour. First, solving mathematical problems necessitates the ability to discern patterns within a substantial corpus of specialised prior knowledge, making it an ideal context for domain adaptation. Second, mathematical reasoning itself represents a central task within the field of artificial intelligence and continues to be a topic of contemporary research. Third, the development of language models capable of robust mathematical reasoning has broader implications for various research areas, including reward modelling, reinforcement learning for reasoning in the context, and algorithmic reasoning.

The above image demonstrates Continued pretraining on ProofPile-2 yields LLEMMA, a base model with improved mathematical capabilities. The contributions made by the authors are as follows:

They have trained and made available the LLEMMA models, comprising 7B and 34B parameter language models that are specifically tailored for mathematical tasks. These LLEMMA models represent a new state-of-the-art in the realm of publicly released base models for mathematics.

They have introduced the AlgebraicStack, a dataset encompassing 11B tokens of code that is intricately linked to mathematical contexts.

Their research showcases the LLEMMA models’ proficiency in employing computational tools for solving mathematical problems, including the Python interpreter and formal theorem provers.

In contrast to earlier mathematics language models like Minerva (Lewkowycz et al., 2022), the LLEMMA models are openly accessible, and the authors have made their training data and code open source. This decision facilitates LLEMMA’s role as a platform for advancing future research in the field of mathematical reasoning.

Their work extends the research conducted in Minerva, as outlined by Lewkowycz et al. (2022), with several notable distinctions:

(1) Their model, LLEMMA, encompasses a broader spectrum of data and tasks during both training and evaluation. This includes the incorporation of code data, such as the AlgebraicStack, utilization of various tools, and engagement in formal mathematics tasks.

(2) The authors’ approach relies solely on publicly accessible tools and data sources.

(3) They introduce new analyses that pertain to aspects such as the composition of the training data mixture, memorization patterns, and supplementary supervised fine-tuning.

(4) Importantly, all the artefacts related to their work are made openly available to the public.

The researchers anticipate that LLEMMA and Proof-Pile-2 will provide a solid groundwork for future investigations. These resources are poised to support research efforts in areas such as language model generalization, dataset composition analysis, the extension of domain-specific language models, the utilization of language models as tools for mathematicians, and the enhancement of language models’ mathematical capabilities.

Check out the Paper and Github link. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 32k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..
The post Meet Llemma: The Next-Gen Mathematical Open-Language Model Surpassing Current Benchmarks appeared first on MarkTechPost.

Meet FourCastNet: A Global Data-Driven Weather Forecasting Model Revol …

In the 1920s, numerical weather prediction (NWP) emerged. They are pervasive and help with economic planning in important industries, including transportation, logistics, agriculture, and energy production. Numerous lives have been saved by accurate weather predictions that warned of severe catastrophes in advance. Over the past few decades, weather forecasts have improved in quality. Lewis Fry Richardson used a slide rule and a table of logarithms to calculate the first dynamically modelled numerical weather prediction for a single place in 1922. It took him six weeks to produce a 6-hour forecast of the atmosphere. Early electronic computers significantly increased forecasting speed by the 1950s, enabling operational predictions to be computed quickly enough to be helpful for future predictions. 

Improvements in weather forecasting have been made possible in addition to improved computational power by better parameterising small-scale phenomena through a deeper knowledge of their physics and better atmospheric observations. By assimilating data, the latter has led to better model initializations. Because they have orders of magnitude cheaper processing costs than cutting-edge NWP models, data-driven Deep Learning (DL) models are becoming more and more popular for weather forecasting. Building data-driven models for predicting the large-scale circulation of the atmosphere has been the subject of several research. These models have been trained using climate model outputs, general circulation models (GCM), reanalysis products, or a combination of climate model outputs and reanalysis products. 

By removing model biases prevalent in NWP models and enabling the production of large ensembles for probabilistic forecasting and data assimilation at low computing cost, data-driven models offer a significant potential to enhance weather forecasts. By training on the reanalysis of data or observations, data-driven models can get around constraints in NWP models, including biases in convection parameterization schemes that significantly impact precipitation forecasts. Once trained, data-driven models generate forecasts via inference orders of magnitude quicker than typical NWP models, allowing for the production of very large ensembles. In this context, researchers have demonstrated that large data-driven ensembles outperform operational NWP models that can only include a limited number of ensemble members in subseasonal-to-seasonal (S2S) forecasts. 

Additionally, a sizable ensemble supports short- and long-term forecasts with data-driven predictions of extreme weather occurrences. However, most data-driven weather models employ low-resolution data for training, often at the 5.625 or 2 resolution. Forecasting some of the broad, low-resolution atmospheric variables has been successful in the past. However, the coarsening process causes the loss of important, fine-scale physical information. Data-driven models must provide forecasts with the same or better resolution as the most recent state-of-the-art numerical weather models run at 0.1 resolution to be genuinely effective. For example, estimates at 5.625 spatial resolution provide a meager 32 64-pixel grid representing the world. 

A prediction like this cannot distinguish features smaller than 500 km. The significant impacts of small-scale dynamics on big scales and the influence of topographic factors like mountain ranges and lakes on small-scale dynamics are not considered by such imprecise projections. Low-resolution predictions may only be used in certain situations as a result. High-resolution data (e.g., at 0.25 resolution) can significantly improve the predictions of data-driven models for variables like low-level winds (U10 and V10) that have complex fine-scale structures, even though low-resolution forecasts may be justified for variables like the geo-potential height at 500 hPa (Z500) that do not possess many small-scale structures. 

Furthermore, a coarser grid would not accurately depict the creation and behaviour of high-impact severe events like tropical cyclones. High-resolution models can address these aspects. Their strategy: Researchers from NVIDIA Corporation, Lawrence Berkeley, Rice University, University of Michigan, California Institute of Technology and Purdue University create FourCastNet, a Fourier-based neural network forecasting model, to produce global data-driven forecasts of important atmospheric variables at a resolution of 0.25, or roughly 30 km near the equator, and a global grid size of 720*1440 pixels. This enables us to compare our results directly for the first time with those obtained by the ECMWF’s high-resolution Integrated Forecasting System (IFS) model. 

Figure 1 illustrates a worldwide near-surface wind speed forecast with a 96-hour lead time. They emphasize significant high-resolution features resolved and reliably tracked by their prediction, such as Super Typhoon Mangkhut and three named cyclones (Florence, Issac, and Helene) moving towards the eastern coast of the United States.

In conclusion, FourCastNet offers four novel improvements to data-driven weather forecasting: 

1. FourCastNet accurately forecasts difficult variables like surface winds and precipitation at forecast lead periods of up to one week. Surface wind forecasting on a global scale has yet to be tried using any deep learning (DL) models. Furthermore, global DL models for precipitation have yet to be able to resolve small-scale features. Planning for wind energy resources and catastrophe mitigation are both significantly impacted by this. 

2. FourCastNet offers an eight times higher resolution than cutting-edge DL-based global weather models. FourCastNet resolves severe occurrences like tropical cyclones and atmospheric rivers that need more represented by earlier DL models due to their coarser grids, high resolution, and precision. 

3. At lead periods of up to three days, FourCastNet’s predictions are equivalent to those of the IFS model in terms of metrics such as Root Mean Squared Error (RMSE) and Anomaly Correlation Coefficient (ACC). Then, for lead periods of up to a week, projections of all modelled variables behind IFS by a significant margin. FourCastNet models 20 variables at five vertical levels and is only driven by data, in contrast to the IFS model, which has been built over decades, comprises more than 150 variables at more than 50 vertical levels in the atmosphere, and is governed by physics. This contrast demonstrates the immense potential of data-driven modelling to someday replace and supplement NWP. 

4. Compared to current NWP ensembles, which have at most about 50 members due to their high computational cost, FourCastNet’s reliable, quick, and computationally affordable forecasts enable the generation of very large ensembles, allowing estimation of well-calibrated and constrained uncertainties in extremes with higher confidence. What is achievable in probabilistic weather forecasting is drastically altered by the quick development of 1,000-member ensembles, improving the accuracy of early warnings of extreme weather occurrences and making it possible to evaluate their effects rapidly.

Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 32k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..
The post Meet FourCastNet: A Global Data-Driven Weather Forecasting Model Revolutionizing Weather Predictions with Fast and Accurate Deep Learning Approach appeared first on MarkTechPost.

Elevate your marketing solutions with Amazon Personalize and generativ …

Generative artificial intelligence is transforming how enterprises do business. Organizations are using AI to improve data-driven decisions, enhance omnichannel experiences, and drive next-generation product development. Enterprises are using generative AI specifically to power their marketing efforts through emails, push notifications, and other outbound communication channels. Gartner predicts that “by 2025, 30% of outbound marketing messages from large organizations will be synthetically generated.” However, generative AI alone isn’t enough to deliver engaging customer communication. Research shows that the most impactful communication is personalized—showing the right message to the right user at the right time. According to McKinsey, “71% of consumers expect companies to deliver personalized interactions.” Customers can use Amazon Personalize and generative AI to curate concise, personalized content for marketing campaigns, increase ad engagement, and enhance conversational chatbots.
Developers can use Amazon Personalize to build applications powered by the same type of machine learning (ML) technology used by for real-time personalized recommendations. With Amazon Personalize, developers can improve user engagement through personalized product and content recommendations with no ML expertise required. Using recipes (algorithms prepared to support specific uses cases) provided by Amazon Personalize, customers can deliver a wide array of personalization, including specific product or content recommendations, personalized ranking, and user segmentation. Additionally, as a fully managed artificial intelligence service, Amazon Personalize accelerates customers’ digital transformations with ML, making it easier to integrate personalized recommendations into existing websites, applications, email marketing systems, and so on.
In this post, we illustrate how you can elevate your marketing campaigns using Amazon Personalize and generative AI with Amazon Bedrock. Together, Amazon Personalize and generative AI help you tailor your marketing to individual consumer preferences.
How exactly do Amazon Personalize and Amazon Bedrock work together to achieve this? Imagine as a marketer that you want to send tailored emails to users recommending movies they would enjoy based on their interactions across your platform. Or perhaps you want to send targeted emails to a segment of users promoting a new shoe they might be interested in. The following use cases use generative AI to enhance two common marketing emails.
Use Case 1: Use generative AI to deliver targeted one-to-one personalized emails
With Amazon Personalize and Amazon Bedrock, you can generate personalized recommendations and create outbound messages with a personal touch tailored to each of your users.
The following diagram illustrates the architecture and workflow for delivering targeted personalized emails powered by generative AI.

First, import your dataset of users’ interactions into Amazon Personalize for training. Amazon Personalize automatically trains a model using the Top Picks for You recipe. As an output, Amazon Personalize provides recommendations that align with the users’ preferences.
You can use the following code to identify recommended items for users:

get_recommendations_response = personalize_runtime.get_recommendations(
recommenderArn = workshop_recommender_top_picks_arn,
userId = str(user_id),
numResults = number_of_movies_to_recommend)

For more information, see the Amazon Personalize API reference.
The provided output of recommendations is ingested by Amazon Bedrock using a prompt, which includes your user preferences, demographics, and Amazon Personalize recommended items.
For example, a marketer who wants to create a personalized email that is charming and fun for a user might use the following prompt:

Create a personalized email which is charming and fun so that the user is engaged. The user has recently watched family-friendly films. These are the recommended items – The Little Mermaid, Encanto, Spider-Man: Into the Spider-Verse.

By invoking one of the foundation models (FM) provided in Amazon Bedrock, such as Claude 2, with the prompt and sample code that follows, you can create a personalized email for a user:

personalized_email_response = bedrock_client.invoke_model(
body = prompt,
modelId = identifier_of_the_model)

For more information, see the Amazon Bedrock API reference.
Amazon Bedrock returns a personalized email for the user:

Subject: Fall in love with this recommended selection for movie night!
Dear <user name>,
Desiring the cozy feel of fall? No problem! Check our top three recommendations for movies that will have you cozy on the couch with your loved ones:
1. The Little Mermaid: This classic Disney movie is all about a mermaid princess named Ariel, who dreams of the human world. Because of her fascination, she makes a deal with the sea witch Ursula and learns a major lesson.
2. Encanto: This Disney movie is about the Madrigals, a Colombian family who lives in a magical house. Each member of the family has a unique gift, except for young Maribel who must help save her family.
3. Spider-Man: Into the Spider-Verse: This animated superhero movie is a must-see action movie. Spider-man, a Brooklyn teen named Miles Morales, teams up with other spider-powered people to save the multiverse.
With lovable characters, catchy tunes, and moving stories, you really can’t go wrong with any of these three. Grab the popcorn because you’re in for a treat!

Use case 2: Use generative AI to elevate one-to-many marketing campaigns
When it comes to one-to-many email marketing, generic content can result in low engagement (that is, low open rates and unsubscribes). One way companies circumvent this outcome is to manually craft variations of outbound messages with compelling subjects. This can lead to inefficient use of time. By integrating Amazon Personalize and Amazon Bedrock into your workflow, you can quickly identify the interested segment of users and create variations of email content with greater relevance and engagement.
The following diagram illustrates the architecture and workflow for elevating marketing campaigns powered by generative AI.

To compose one-to-many emails, first import your dataset of users’ interactions into Amazon Personalize for training. Amazon Personalize trains the model using the user segmentation recipe. With the user segmentation recipe, Amazon Personalize automatically identifies the individual users that demonstrate a propensity for the chosen items as the target audience.
To identify the target audience and retrieve metadata for an item you can use the following sample code:

create_batch_segment_response = personalize.create_batch_segment_job(
jobName = job_name,
solutionVersionArn = solution_version_arn,
numResults = number_of_users_to_recommend
jobInput = {
“s3DataSource”: {
“path”: batch_input_path
jobOutput = {
“s3DataDestination”: {
“path”: batch_output_path

For more information, see the Amazon Personalize API reference.
Amazon Personalize delivers a list of recommended users to target for each item to batch_output_path. You can then invoke the user segment into Amazon Bedrock using one of the FMs along with your prompt.
For this use case, you might want to market a newly released sneaker through email. An example prompt might include the following:

For the user segment “sneaker heads”, create a catchy email that promotes the latest sneaker “Ultra Fame II”. Provide users with discount code FAME10 to save 10%.

Similar to the first use case, you’ll use the following code in Amazon Bedrock:

personalized_email_response = bedrock_client.invoke_model(
body = prompt,
modelId = identifier_of_the_model)

For more information, see the Amazon Bedrock API reference.
Amazon Bedrock returns a personalized email based on the items chosen for each user as shown:

Subject: <<name>>, your ticket to the Hall of Fame awaits
Hey <<name>>,
The wait is over. Check out the new Ultra Fame II! It’s the most innovative and comfortable Ultra Fame shoe yet. Its new design will have you turning heads with every step. Plus, you’ll get a mix of comfort, support, and style that’s just enough to get you into the Hall of Fame.
Don’t wait until it’s too late. Use the code FAME10 to save 10% on your next pair.

To test and determine the email that leads to the highest engagement, you can use Amazon Bedrock to generate a variation of catchy subject lines and content in a fraction of the time it would take to manually produce test content.
By integrating Amazon Personalize and Amazon Bedrock, you are enabled to deliver personalized promotional content to the right audience.
Generative AI powered by FMs is changing how businesses build hyper-personalized experiences for consumers. AWS AI services, such as Amazon Personalize and Amazon Bedrock, can help recommend and deliver products, content, and compelling marketing messages personalized to your users. For more information on working with generative AI on AWS, see to Announcing New Tools for Building with Generative AI on AWS.

About the Authors
Ba’Carri Johnson is a Sr. Technical Product Manager working with AWS AI/ML on the Amazon Personalize team. With a background in computer science and strategy, she is passionate about product innovation. In her spare time, she enjoys traveling and exploring the great outdoors.
Ragini Prasad is a Software Development Manager with the Amazon Personalize team focused on building AI-powered recommender systems at scale. In her spare time, she enjoys art and travel.
Jingwen Hu is a Sr. Technical Product Manager working with AWS AI/ML on the Amazon Personalize team. In her spare time, she enjoys traveling and exploring local food.
Anna Grüebler is a Specialist Solutions Architect at AWS focusing on artificial intelligence. She has more than 10 years of experience helping customers develop and deploy machine learning applications. Her passion is taking new technologies and putting them in the hands of everyone and solving difficult problems by taking advantage of using AI in the cloud.
Tim Wu Kunpeng is a Sr. AI Specialist Solutions Architect with extensive experience in end-to-end personalization solutions. He is a recognized industry expert in e-commerce and media and entertainment, with expertise in generative AI, data engineering, deep learning, recommendation systems, responsible AI, and public speaking.

Meta AI Introduces Habitat 3.0, Habitat Synthetic Scenes Dataset, and …

Facebook AI Research (FAIR) is dedicated to advancing the field of socially intelligent robotics. The primary objective is to develop robots capable of assisting with everyday tasks while adapting to the unique preferences of their human partners. The work involves delving deep into embedded systems to establish the foundation for the next generation of AR and VR experiences. The goal is to make robotics an integral part of our lives, reducing the burden of routine chores and improving the quality of life for individuals. FAIR’s multifaceted approach emphasizes the importance of merging AI, AR, VR, and robotics to create a future where technology seamlessly augments our daily experiences and empowers us in previously unimagined ways.

FAIR has made three significant advancements to address scalability and safety challenges in training and testing AI agents in physical environments:

Habitat 3.0 is a high-quality simulator for robots and avatars, facilitating human-robot collaboration in a home-like setting.

The Habitat Synthetic Scenes Dataset (HSSD-200) is a 3D dataset designed by artists to provide exceptional generalization when training navigation agents.

The HomeRobot platform offers an affordable home robot assistant for open vocabulary tasks in simulated and physical-world environments, thereby accelerating the development of AI agents that can assist humans.

Habitat 3.0 is a simulator designed to facilitate robotics research by enabling quick and safe testing of algorithms in virtual environments before deploying them on physical robots. It allows for collaboration between humans and robots while performing daily tasks and includes realistic humanoid avatars to enable AI training in diverse home-like settings. Habitat 3.0 offers benchmark tasks that promote collaborative robot-human behaviors in real indoor scenarios, such as cleaning and navigation, thereby introducing new avenues to explore socially embodied AI.

HSSD-200 is a synthetic 3D scene dataset that provides a more realistic and compact option for training robots in simulated environments. It comprises 211 high-quality 3D sets replicating physical interiors and contains 18,656 models from 466 semantic categories. Although it has a smaller scale, ObjectGoal navigation agents trained on HSSD-200 perform comparably to those introduced on much larger datasets. In some cases, training on just 122 HSSD-200 scenes outperforms agents trained on 10,000 scenes from prior datasets, demonstrating its efficiency in generalization to physical-world scenarios.

In the field of robotics research, having a shared platform is crucial. HomeRobot seeks to address this need by defining motivating tasks, providing versatile software interfaces, and fostering community engagement. Open-vocabulary mobile manipulation serves as the motivating task, challenging robots to manipulate objects in diverse environments. The HomeRobot library supports navigation and manipulation for Hello Robot’s Stretch and Boston Dynamics’ Spot, both in simulated and physical-world settings, thus promoting replication of experiments. The platform emphasizes transferability, modularity, and baseline agents, with a benchmark showcasing a 20% success rate in physical-world tests.

The field of Embodied AI research is constantly evolving to cater to dynamic environments that involve human-robot interactions. Facebook AI’s vision for developing socially intelligent robots is not limited to static scenarios. Instead, their focus is on collaboration, communication, and predicting future states in dynamic settings. To achieve this, Researchers are using Habitat 3.0 and HSSD-200 as tools to train AI models in simulation. Their aim is to assist and adapt to human preferences while deploying these trained models in the physical world to assess their real-world performance and capabilities.

Check out the Reference Page. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..
The post Meta AI Introduces Habitat 3.0, Habitat Synthetic Scenes Dataset, and HomeRobot: 3 Major Advancements in the Development of Social Embodied AI Agents appeared first on MarkTechPost.

Meet FreeU: A Novel AI Technique To Enhance Generative Quality Without …

Probabilistic diffusion models, a cutting-edge category of generative models, have become a critical point in the research landscape, particularly for tasks related to computer vision. Distinct from other classes of generative models, such as Variational Autoencoder (VAE), Generative Adversarial Networks (GANs), and vector-quantized approaches, diffusion models introduce a novel generative paradigm. These models employ a fixed Markov chain to map the latent space, facilitating intricate mappings that capture latent structural complexities within a dataset. Recently, their impressive generative capabilities, ranging from the high level of detail to the diversity of the generated examples, have pushed groundbreaking advancements in various computer vision applications such as image synthesis, image editing, image-to-image translation, and text-to-video generation.

The diffusion models consist of two primary components: the diffusion process and the denoising process. During the diffusion process, Gaussian noise is progressively incorporated into the input data, gradually transforming it into nearly pure Gaussian noise. In contrast, the denoising process aims to recover the original input data from its noisy state using a sequence of learned inverse diffusion operations. Typically, a U-Net is employed to predict the noise removal iteratively at each denoising step. Existing research predominantly focuses on the use of pre-trained diffusion U-Nets for downstream applications, with limited exploration of the internal characteristics of the diffusion U-Net.

A joint study from the S-Lab and the Nanyang Technological University departs from the conventional application of diffusion models by investigating the effectiveness of the diffusion U-Net in the denoising process. To gain a deeper understanding of the denoising process, the researchers introduce a paradigm shift towards the Fourier domain to observe the generation process of diffusion models—a relatively unexplored research area. 

The figure above illustrates the progressive denoising process in the top row, showcasing the generated images at successive iterations. In contrast, the following two rows present the associated low-frequency and high-frequency spatial domain information after the inverse Fourier Transform, corresponding to each respective step. This figure reveals a gradual modulation of low-frequency components, indicating a subdued rate of change, whereas high-frequency components exhibit more pronounced dynamics throughout the denoising process. These findings can be intuitively explained: low-frequency components inherently represent an image’s global structure and characteristics, encompassing global layouts and smooth colors. Drastic alterations to these components are generally unsuitable in denoising processes as they can fundamentally reshape the image’s essence. On the other hand, high-frequency components capture rapid changes in the images, such as edges and textures, and are highly sensitive to noise. Denoising processes must remove noise while preserving these intricate details.

Considering these observations regarding low-frequency and high-frequency components during denoising, the investigation extends to determine the specific contributions of the U-Net architecture within the diffusion framework. At each stage of the U-Net decoder, skip features from the skip connections and backbone features are combined. The study reveals that the primary backbone of the U-Net plays a significant role in denoising, while the skip connections introduce high-frequency features into the decoder module, aiding in the recovery of fine-grained semantic information. However, this propagation of high-frequency features can inadvertently weaken the inherent denoising capabilities of the backbone during the inference phase, potentially leading to the generation of abnormal image details, as depicted in the first row of Figure 1.

In light of this discovery, the researchers propose a new approach referred to as “FreeU,” which can enhance the quality of generated samples without requiring additional computational overhead from training or fine-tuning. The overview of the framework is reported below.

During the inference phase, two specialized modulation factors are introduced to balance the contributions of features from the primary backbone and skip connections of the U-Net architecture. The first factor, known as “backbone feature factors,” is designed to amplify the feature maps of the primary backbone, thereby strengthening the denoising process. However, it is observed that the inclusion of backbone feature scaling factors, while yielding significant improvements, can occasionally result in undesired over-smoothing of textures. To address this concern, the second factor, “skip feature scaling factors,” is introduced to mitigate the problem of texture over-smoothing.

The FreeU framework demonstrates seamless adaptability when integrated with existing diffusion models, including applications like text-to-image generation and text-to-video generation. A comprehensive experimental evaluation of this approach is conducted using foundational models such as Stable Diffusion, DreamBooth, ReVersion, ModelScope, and Rerender for benchmark comparisons. When FreeU is applied during the inference phase, these models show a noticeable enhancement in the quality of the generated outputs. The visual representation in the illustration below provides evidence of FreeU’s effectiveness in significantly improving both intricate details and the overall visual fidelity of the generated images.

This was the summary of FreeU, a novel AI technique that enhances generative models’ output quality without additional training or fine-tuning. If you are interested and want to learn more about it, please feel free to refer to the links cited below. 

Check out the Paper and Project Page. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 32k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..
The post Meet FreeU: A Novel AI Technique To Enhance Generative Quality Without Additional Training Or Fine-tuning appeared first on MarkTechPost.

Meet Gradio-lite: A JavaScript Library Elevating Interactive Machine L …

Gradio is an open-source Python library that simplifies the creation of user interfaces for machine learning models. It is open-source and allows developers and data scientists to build interactive web applications without extensive web development knowledge. The library is reliable and supports a wide range of machine-learning models, making it an ideal tool for enhancing the user experience of your models.

Gradio provides a high-level interface for defining input and output components, making it easy to create customizable interfaces for tasks such as image classification, text generation, and more. It supports various input types, including text, images, audio, and video, making it a versatile tool for showcasing and deploying machine learning models with user-friendly interfaces. 

Gradio-Lite is a JavaScript library that enables the execution of Gradio applications directly within web browsers. It achieves this by utilizing Pyodide, a Python runtime for WebAssembly. Pyodide allows Python code to run in the browser environment, which makes it possible for developers to use regular Python code for their Gradio applications. It eliminates the need for server-side infrastructure and ensures seamless execution of Gradio applications in web browsers.

Gradio-Lite presents numerous advantages, such as serverless deployment, which eliminates the need for server infrastructure, simplifies deployment, and reduces costs. It also ensures low-latency interactions by running within the browser, providing faster responses and a smoother user experience. Moreover, Gradio-Lite enhances privacy and security since all processing occurs within the user’s browser. It ensures that user data remains on their device, thus instilling confidence in data handling.

Gradio-Lite has a significant limitation: it may take longer for Gradio apps to load in the browser initially due to the need to load the Pyodide runtime before rendering Python code. Additionally, Pyodide doesn’t support all Python packages. While popular packages like Gradio, NumPy, Scikit-learn, and Transformers-js can be used, apps with many dependencies should check if those dependencies are available in Pyodide or can be installed using micropip.

Gradio is a Python library for user-friendly machine learning interfaces, while Gradio-Lite is a JavaScript library that runs Gradio applications directly in web browsers. It offers serverless deployment for cost savings, low-latency interactions for a better user experience, and improved privacy and security. However, it may have longer initial load times and limited support for Python packages, potentially requiring adaptations for some applications.

Check out the Reference Page. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..
The post Meet Gradio-lite: A JavaScript Library Elevating Interactive Machine Learning-Based Library (Gradio) to the Browser with Pyodide appeared first on MarkTechPost.