Alibaba Cloud Unveils Tongyi Wanxiang: An AI Image Generation Model to …

Tongyi Wanxiang (‘Wanxiang’ means ‘tens of thousands of photos’) is the latest AI image creation model announced by Alibaba Cloud, the digital technology and intelligence backbone of the Alibaba Group, during the World Artificial Intelligence Conference 2023. Enterprise customers in China can now participate in a beta test of the state-of-the-art generative AI model. Tongyi Wanxiang (literally “tens of thousands of images”) is a state-of-the-art AI image-generating model now in beta development and offered to enterprise customers in China.

In addition, the market leader in cloud computing has introduced ModelScopeGPT, a flexible framework for using several AI models on ModelScope to complete advanced and specialized AI tasks in the language, vision, and voice domains. ModelScope is a Model-as-a-Service (MaaS) platform with over 900 AI models released last year as open source by Alibaba Cloud.

The generative AI model is versatile and can produce rich images in a wide variety of styles in response to text prompts in both Chinese and English. These styles range from watercolors and oil paintings to sketches, flat illustrations, and 3D cartoons. In addition, the model can take any image and generate a similar-looking new image through “style transfer,” which keeps the original image’s content intact while giving it the visual style of another image. The model takes advantage of multilingual resources for better training, and it is powered by the cutting-edge technologies of Alibaba Cloud in knowledge arrangement, visual AI, and natural language processing (NLP). It has powerful semantic understanding capabilities, which lead to improved image quality and contextual relevance.

In addition, the model’s capability to generate high-contrast, visually beautiful images with clean backgrounds can be improved by optimizing the high-resolution diffusion process depending on the signal-to-noise ratio. Tongyi Wanxiang was built with Composer, a big proprietary model on Alibaba Cloud that allows for finer-grained control over image synthesis outcomes, including spatial arrangement and palette, without sacrificing picture synthesis quality or originality.

ModelScopeGPT, a robust framework built to make the most of the platform’s Large Language Models (LLMs), was also introduced by Alibaba Cloud. LLMs will be the controller for the vast network of ModelScope open-source community domain-specific expert models that ModelScopeGPT will bring together. ModelScopeGPT was developed using the several Model-as-a-Service options available on Alibaba Cloud. ModelScopeGPT is a free resource that can be used by businesses and developers to gain access to and run the most appropriate models for completing complex AI tasks per user request, such as creating multilingual films.

In April, Alibaba Cloud introduced a new LLM called Tongyi Qianwen to enhance the user experience across all of Alibaba’s operations. Customers and developers can use the model to cheaply make bespoke AI features for the company. There have been over 300,000 requests for beta testing since the model’s release from businesses across industries such as banking, electronics, transportation, apparel, and dairy.

With the addition of Tongyi Qianwen, Alibaba Cloud’s intelligent assistant, Tingwu can better understand and analyze multimedia information. Since its release, the AI-driven assistant has been used by over 360,000 people.

In addition to holding its first-ever AI Hackathon in China, ModelScope offered cash prizes and investment opportunities from top venture capital firms to encourage developing and implementing AI models in the industry. There were almost 300 teams that started, but only 56 advanced to the championship. Two courses were used for the final competition for the top prize. One must improve upon a substantial linguistic model to address a practical issue. The second is using already-trained models for a specific activity, such as text-to-image generation, or constructing an LLM-powered autonomous agent that knows which models to use for which jobs.

After the success of OpenAI’s ChatGPT chatbot, many major Chinese tech firms are moving toward introducing their artificial intelligence products and services. Tongyi Qianwen, Alibaba’s artificial intelligence chatbot, was released in April to compete with ChatGPT.

Check out the Project and Reference Article. Don’t forget to join our 26k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club

The post Alibaba Cloud Unveils Tongyi Wanxiang: An AI Image Generation Model to Help Businesses to Unleash Creativity and Productivity appeared first on MarkTechPost.

Best AI GIF Generators (2023)

GIFs are a fantastic choice if you’re searching for a fun and original approach to spice up your web material. With the development of artificial intelligence gif generators, making professional-grade animations without effort is possible.

This article looks closely at several of the top artificial intelligence gif makers.

Simplified

Simplified is an AI that can create gifs for a variety of purposes. Whether you want to start with a pre-made design or create something fresh, Simplified’s GIF generator gives you access to a huge content library that inspires you. The design editor is intuitive, with various fonts, photos, movies, and sounds available. After finishing your GIF, save it as an MP4 and publish it on your preferred social media sites. The best part is that you can start making gorgeous GIFs right now, without spending a dime, using Simplified’s AI gif generator program.

DALL-E 2

Open AI, a research group, has created a wonderful AI gif generation application called DALL-E 2. With DALL-E 2’s paintbrush tool, you can add depth and dimension to your gifs with the help of shadows and highlights. If you want your artwork to look like an expert created it but want to avoid going through a lot of hassle to get there, this is the tool for you. DALL-E 2 is remarkable because it combines artificial intelligence with user input to produce a stunning final product. In addition to editing, it can make new gifs. It’s not surprising that the developers and users are raving about it.

Gfycat 

You can easily make professional-quality gifs with the help of Gfycat, an elegant and intuitive AI-powered gif generator. Gfycat’s powerful editing tools provide you with complete freedom to express yourself through animated GIFs. Gfycat simplifies the process of making gifs that reflect your creativity by allowing you to add text, filters, and effects easily. Not only that, though. Gfycat also features a comprehensive library of content from which to draw ideas and a search option that allows you to locate gifs on virtually any subject. Anyone with access to the program can make original and visually appealing gifs.

Meta Make-A-Video

The new artificial intelligence tool from Meta, called Make-A-Video, is special software capable of producing short videos responding to text requests. Make-A-Video’s technology might be adapted to generate GIFs using AI, even though this was not its original purpose. Meta may modify the Make-A-Video system to generate GIFs from text prompts or picture data, similar to how the system creates short videos in response to word prompts. Make-A-Video’s deep learning methods for analyzing image data might be used by Meta to create a system that generates unique and interesting GIFs for each user. If this were possible, it would significantly advance current methods for making GIFs.

Imgflip 

Imgflip is an AI-powered GIF generator that watches videos to isolate the most interesting clips. The resulting GIF is a beautiful visual representation of the events it depicts. In addition to being simple to use, the program lets you personalize your GIFs by adding subtitles. As a bonus, it allows you to change the animation pace, crop videos, and more.

Mage Space

Mage Space is an easy-to-navigate AI GIF maker that doesn’t cost a dime. Over 60 bespoke AI models covering various topics, styles, and genres are available as you make your animated GIFs from the text. In addition, you may use the enhancement function to make your photos look better by instantly boosting the resolution to 2048 x 2048. Everyone can try out Mage Space’s essential features with the no-cost plan. More creative options are available to those who can afford premium subscriptions.

Artbreeder

Artbreeder is an original and cutting-edge GIF generation platform powered by AI art that uses a credit system. You can use this program to make portraits, landscapes, animals, anime, abstract pictures, and videos. It also allows you to combine and morph various elements, such as photos, styles, and genes, to produce novel iterations. In addition, Artbreeder is a platform that encourages users to work together and explore their creative potential by allowing them to find and share the works of other users. You can interact with the community by looking at, liking, commenting on, and remixing the work of other users or by participating in group photo projects and challenges.

Picsart

You may make animated GIFs using Picsart by following simple prompts from the app’s artificial intelligence. Picsart allows you to enter any word or phrase related to the GIF you desire, and it will return a unique and original GIF that has never been used before. It also provides a variety of artistic options for personalizing your GIFs. Picsart’s user-friendly design means virtually anyone may use it. With just a few clicks, you’ll have access to all the tools and options, and you’ll be able to check out your GIFs before storing or sharing them.

Don’t forget to join our 25k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club

The post Best AI GIF Generators (2023) appeared first on MarkTechPost.

Google AI Proposes ‘Thought Experiments’ to Enhance Moral Reasonin …

Language models have made significant strides in natural language processing tasks. However, deploying large language models (LLMs) in real-world applications requires addressing their deficit in moral reasoning capabilities. To tackle this challenge, a Google research team introduces a groundbreaking framework called “Thought Experiments,” which utilizes counterfactuals to improve a language model’s moral reasoning. This innovative approach has demonstrated an impressive 9-16% increase in accuracy in the Moral Scenarios task.

The Thought Experiments Framework

The Thought Experiments framework is a multi-step prompting approach that iteratively refines the model’s responses. The researchers summarize the framework’s steps as follows:

1. Pose counterfactual questions: The model is presented with Moral Scenarios questions without answer options.

2. Answer counterfactual questions: Questions generated in the previous step are presented to the model, which is prompted to answer them.

3. Summarize: The model is asked to summarize its thoughts using the counterfactual questions and answers.

4. Choose: Multiple decodes from the previous step are provided, and the model selects the best one. This step is necessary due to the multiple ways of considering a situation morally.

5. Answer: The chosen summary and original answer choices are presented to the model, allowing it to provide a final zero-shot answer.

To evaluate the effectiveness of the Thought Experiments framework, the research team conducted experiments on the Moral Scenarios subtask within the MMLU benchmark. They compared their framework to four baselines for the zero-shot prompting approach: direct zero-shot, zero-shot Chain-of-Thought (CoT) with and without self-consistency.

The results were promising. The zero-shot Thought Experiments framework achieved an accuracy of 66.15% and 66.26% without and with self-consistency, respectively. This marks a significant improvement of 9.06% and 12.29% over the direct zero-shot baseline, as well as 12.97% and 16.26% over the CoT baseline.

The research showcases the effectiveness of the Thought Experiments prompting framework in enhancing moral reasoning within the Moral Scenarios task. It emphasizes the potential for future work to explore open-ended generations for addressing more ambiguous cases, such as moral dilemmas.

In summary, the Google research team’s innovative Thought Experiments framework presents a promising solution to augment the moral reasoning capabilities of language models. By incorporating counterfactuals and a multi-step prompting approach, this framework demonstrates significant improvements in accuracy. As the development of language models continues, it is crucial to prioritize responsible and ethical AI implementations, ensuring their alignment with human moral values.

Check out the Paper. Don’t forget to join our 26k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club

The post Google AI Proposes ‘Thought Experiments’ to Enhance Moral Reasoning in Language Models appeared first on MarkTechPost.

The AI-Makeup Artist that Covers Your Identity: CLIP2Protect is an AI …

The 90s Sci-fi movies are full of computers that show this rotating profile of a person and display all types of information about the person. This face-recognition technology is expected to be so advanced that no data about you can stay hidden from the big-brother.

We cannot claim they were wrong, unfortunately. Face recognition technology has witnessed significant advancements with the advent of deep learning-based systems, revolutionizing various applications and industries. Whether this revolution was something good or bad is a topic for another post, but the reality is that our faces can be linked to so much data about us in our world. In this case, privacy plays a crucial role.

In response to these concerns, the research community has been actively exploring methods and techniques to develop facial privacy protection algorithms that can safeguard individuals against the potential risks associated with face recognition systems.

The goal of facial privacy protection algorithms is to find a balance between preserving an individual’s privacy and maintaining the usability of their facial images. While the primary objective is to protect individuals from unauthorized identification or tracking, it is equally important to ensure that the protected images retain visual fidelity and resemblance to the original faces so that the system cannot be tricked with a fake face. 

Achieving this balance is challenging, particularly when using noise-based methods that overlay adversarial artifacts on the original face image. Several approaches have been proposed to generate unrestricted adversarial examples, with adversarial makeup-based methods being the most popular ones for their ability to embed adversarial modifications in a more natural manner. However, existing techniques suffer from limitations such as makeup artifacts, dependence on reference images, the need for retraining for each target identity, and a focus on impersonation rather than privacy preservation.

So, there is a need for a reliable method to protect facial privacy, but existing ones suffer from obvious shortcomings. How can we solve this? Time to meet CLIP2Protect.

CLIP2Protect is a novel approach for protecting user facial privacy on online platforms. It involves searching for adversarial latent codes in a low-dimensional manifold learned by a generative model. These latent codes can be used to generate high-quality face images that maintain a realistic face identity while deceiving black-box FR systems. 

A key component of CLIP2Protect is using textual prompts to facilitate adversarial makeup transfer, allowing the traversal of the generative model’s latent manifold to find transferable adversarial latent codes. This technique effectively hides attack information within the desired makeup style without requiring large makeup datasets or retraining for different target identities. CLIP2Protect  also introduces an identity-preserving regularization technique to ensure the protected face images visually resemble the original faces.

Overview of CLIP2Protect. Source: https://fahadshamshad.github.io/Clip2Protect/

To ensure the naturalness and fidelity of the protected images, the search for adversarial faces is constrained to stay close to the clean image manifold learned by the generative model. This restriction helps mitigate the generation of artifacts or unrealistic features that could be easily detected by human observers or automated systems. Additionally, CLIP2Protect  focuses on optimizing only the identity-preserving latent codes in the latent space, ensuring that the protected faces retain the human-perceived identity of the individual.

To introduce privacy-enhancing perturbations, CLIP2Protect  utilizes text prompts as guidance for generating makeup-like transformations. This approach offers greater flexibility to the user than reference image-based methods, as it allows for the specification of desired makeup styles and attributes through textual descriptions. By leveraging these textual prompts, the method can effectively embed privacy protection information in the makeup style without needing a large makeup dataset or retraining for different target identities.

Sample face protection results. Source: https://arxiv.org/pdf/2306.10008.pdf

Extensive experiments are conducted to evaluate the effectiveness of the CLIP2Protect  in face verification and identification scenarios. The results demonstrate its efficacy against black-box FR models and online commercial facial recognition APIs

Check out the Paper and Project Page. Don’t forget to join our 25k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club

The post The AI-Makeup Artist that Covers Your Identity: CLIP2Protect is an AI Model That Uses Text-Guided Makeup to Protect Facial Privacy appeared first on MarkTechPost.

Top Cloud Storage Services (2023)

If you are still trying to decide what to look for, picking the right cloud storage provider for your needs might take time and effort. Because of this, we’ve included a primer on cloud storage and why you would want to use it to back up your data.

You can choose from various pricing structures, as the best cloud storage services typically provide free, personal, premium, and business tiers. You can avoid much stress if you lose or delete your files using a cloud storage and backup service.

We’ll discuss paid plans, such as the top business cloud storage. However, there are excellent free options as well. Even if they don’t run a business, individuals can benefit greatly from a premium plan’s additional features and enhanced security.   

Google cloud storage

Android and Google Workspace are compatible with Google’s cloud storage. If these platforms are a part of your workflow, Google Drive offers a feature-rich, affordable native backup-and-sync option. We loved Google Drive’s easy-to-use user interface. But it goes beyond cloud storage. This web-based interface can create, edit, store, view, and sync files using Google Workspace’s Google Photos, Docs, Sheets, and Slides. New users of Google Drive receive 15GB of free storage. This plan is yours indefinitely, though you can upgrade to one with more storage. The 100GB premium plans for Google One start at $1.99 per month. You can get 30TB of additional storage for $299.99 per month. 

Dropbox 

Dropbox is a fantastic cloud storage option. It offers a great user experience, integrates with various other services, and costs effectively. Whether or whether the other person has Dropbox, sending and receiving large files is a breeze. Unfortunately, external and network drives cannot be backed up with Dropbox since it only keeps data transferred to your device’s Dropbox folder. That’s what sets it apart from rivals like IDrive. There is encryption for all files stored and in motion, but no end-to-end encryption. Account safety is ensured with two-factor authentication. There are two individual plans plus a free 2GB option. There is a $9.99 monthly fee for the Plus Plan, which provides 2TB of storage and 2GB of file transfers. The $16.99 Family Plan allows six people to store and transmit data. Dropbox is still a viable alternative for cloud storage. 

Sync 

Sync is a cheap cloud storage service that is easy to use, quick, and safe. You can increase your free cloud storage space from 5 GB to 27 GB by inviting friends, making new folders, and accomplishing other tasks. Sync’s paid subscriptions are billed annually, but you can cancel within the first 30 days if you change your mind. For $8 a month, the Solo Basic plan provides 2 TB of secure space and sophisticated sharing features. For an additional $20 a month, you can upgrade to Sync’s Solo Professional plan, which boosts your storage capacity to 6TB and adds unique branding and advanced sharing options. Contrary to popular belief, sync is a cloud storage service that only synchronizes a single folder’s worth of data between your local PCs and the cloud. This restriction might put some people off, but others could find the service’s lack of bells and whistles appealing. You can use the Vault function to keep files in the cloud that you don’t wish to share across devices.

Zoolz 

Zoolz provides reliable cloud storage supported by the AWS cloud platform. It complies with several regulatory frameworks, including the Health Insurance Portability and Accountability Act (HIPAA) and the General Data Protection Regulation (GDPR). It offers top-notch security features, including complete 256-bit AES (Advanced Encryption Standard) encryption. You may test out the service risk-free with a 50GB trial that’s completely functioning. The cost of 1 terabyte of storage is $14.99 per month, while 50 TB is $679.99 per month. Annual payments qualify for considerable price reductions. Unlimited backups on the server and external drive connections are included in all business plans. Users can also access first-rate live support round-the-clock and a detailed knowledge base.

Microsoft OneDrive

For users of Windows 10/11, Microsoft 365, macOS, Android, and iOS, Microsoft OneDrive is the finest cloud storage option. It allows Xbox players to save their save data on the cloud safely. The platform offers ample storage space, straightforward interfaces, seamless Microsoft 365 integration, and robust safety measures. After initial configuration, the operation is as described with no further input from the user. Files stored in the cloud can be edited without downloading them beforehand. OneDrive is compatible with both in-house and external applications like Outlook and AutoCAD. Using two-factor authentication and Bitlocker encryption, Personal Vault keeps sensitive data safe. Microsoft 365 users get 1TB of free storage space in OneDrive. If you’re not, you can try it out with a 5GB capacity. Add 100GB to your plan for only $1.99 per month.

pCloud 

pCloud is one of the few cloud storage services to offer a lifetime subscription, with 500GB for $175 or 2TB for $350.00. Prices for the annual plans range from $47.18 to $95.88. A free ten gigabyte (GB) plan is available, and additional storage can be purchased by the month. There’s also a family plan that accommodates four people. All plans allow for the sharing and collaboration of files and data backup from several external sources. All data is protected using TLS/SSL and 256-bit AES encryption. Robust Android and iOS apps for mobile management and automatic social media uploads exist. Extensions for Opera, Chrome, and Firefox browsers are offered, and support is provided for Android, iOS, Windows, Linux, and Mac. However, it might benefit from a better UI, a document editor, and the ability to collaborate with others.

Icedrive 

Icedrive’s affordable costs and innovative features have garnered media attention. AES 256-bit encryption, which is the standard in the industry, is not used by Icedrive. Instead, Twofish is used. Around the turn of the century, Twofish took part in the Advanced Encryption Standard competition. In premium accounts, data is encrypted so that even the provider cannot decrypt it. Unlike other zero-knowledge providers, Icedrive can decrypt files as they are streamed to your computer. Icedrive’s privacy policies and 2FA security measures are among the most stringent in the industry. Since Icedrive is a UK company, it must comply with the General Data Protection Regulation. One of Icedrive’s biggest draws is its affordable price. The monthly cost of 1TB at the annual rate is $4.17. If you require more space, the next pricing tier offers 5TB of cloud file storage for $15/month (for annual subscribers). That’s a hefty increase, but you get lifetime access.

IDrive

IDrive is unlike other cloud storage options because it also serves as an online backup. It supports file syncing and sharing, disk imaging, and courier data retrieval. Any folder on your device can be designated as a sync folder and relocated. If you only want to sync the smaller files, you can use selective sync to minimize time and data transfer. Although you can set permissions on share links, file sharing is simpler. While IDrive primarily focuses on online backup, the service also excels in cloud storage. IDrive is a fantastic option for automatic device backups because you won’t need to manually set anything you want to safeguard into a sync folder. IDrive provides a large amount of space at an affordable price. IDrive’s two individual plans offer a lot of storage space. Priced at $6.63 and $8.29 monthly, respectively, they offer 5TB and 10TB of storage space. 

Box 

Box may be a business-oriented cloud storage company, but it provides individual customers with two options (free and premium). Box’s intended application in the corporate world means it’s jam-packed with useful extras. There is a comprehensive set of capabilities, including robust choices for collaboration, two-factor authentication, note-taking, and project management. Private key management, the primary condition for zero-knowledge privacy, is not provided by default (we’ll prove ourselves out). Box’s pricing for this premium function is private; you’ll need to contact the firm to determine how much it costs. The first is completely free, granting you access to Box’s core functions and 10GB of storage space. The premium plan is a horrible deal if you need some extra space. For about $14/month or $120/year, you can get 100GB of storage space.

Koofr 

Koofr’s unique features make it stand out from the competition. It offers a free tier with 10GB of storage and paid tiers with either 100GB or less. Connecting it to your other cloud services is a great feature. You may now integrate your Dropbox, OneDrive, and Google Drive accounts into a convenient location. However, considering the prevalence of major alternative platforms, this is merely a hiccup in an otherwise novel offering from Koofr. You can search for files across all platforms, move and copy files between your linked accounts, and more once you’ve connected your accounts. Your Koofr storage will remain the same amount of data you have in other cloud services. Prices are in Euros, so be aware that your actual cost may be slightly different each month.

Don’t forget to join our 25k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club

The post Top Cloud Storage Services (2023) appeared first on MarkTechPost.

Meet KITE: An AI Framework for Semantic Manipulation Using Keypoints a …

With the growing advancement in the field of Artificial Intelligence, AI technology is getting started to combine with robotics. From Computer Vision and Natural Language Processing to Edge computing, AI is getting integrated with robotics to develop meaningful and effective solutions. AI robots are machines that act in the real world. It is important to consider the possibility of language as a means of communication between people and robots. However, two main issues prevent modern robots from efficiently handling free-form language inputs. The first challenge is of enabling a robot to reason about what it needs to manipulate based on the instructions provided. Another is pick-and-place tasks in which careful discernment is needed when picking up objects like teddy animals by their ears as opposed to their legs or soap bottles by their dispensers as opposed to their sides.

Robots must extract scene and object semantics from input instructions and plan accurate low-level actions in accordance to perform semantic manipulation. To overcome these challenges, researchers from Stanford University have introduced KITE (Keypoints + Instructions to Execution), a two-step framework for semantic manipulation. Scene semantics and object semantics are both taken into account in KITE. While object semantics precisely localizes various portions within an object instance, scene semantics involves discriminating between various objects in a visual scene.

KITE’s first phase entails employing 2D picture key points to ground an input instruction in a visual context. For subsequent action inference, this procedure offers a very precise object-centric bias. Robot develops a precise comprehension of the items and their pertinent features by mapping the command to key points in the scene. The second step of KITE involves executing a learned keypoint-conditioned skill based on the RGB-D scene observation. The robot uses these parameterized talents to carry out the provided instruction. Keypoints and parameterized skills work together to provide fine-grained manipulation and generalization to differences in scenes and objects.

For evaluation, the team has assessed KITE’s performance in three actual environments: high-precision coffee-making, semantic grasping, and long-horizon 6-DoF tabletop manipulation. KITE finished the task of preparing coffee with a success rate of 71%, a success rate of 70% for semantic grasping, and a success rate of 75% for instruction-following in the tabletop manipulation scenario. KITE outperformed frameworks that use keypoint-based grounding as opposed to pre-trained visual language models. It performed better than frameworks that emphasize end-to-end visuomotor control over the usage of skills. 

KITE accomplished these results despite having had the same or fewer demonstrations throughout training, demonstrating its effectiveness and efficiency. To map an image and a language phrase to a saliency heatmap and produce a key point, KITE employs a CLIPort-style technique. In order to output skill waypoints, the skilled architecture modifies PointNet++ to accept an input multi-view point cloud annotated with a key point. 2D key points enable KITE to precisely attend to visual features, while 3D point clouds provide the necessary 6DoF context for planning.

In conclusion, the KITE framework presents a promising solution to the longstanding challenge of enabling robots to interpret and follow natural language commands in the context of manipulation. It achieves fine-grained semantic manipulation with high precision and generalization by utilizing the power of key points and instruction grounding.

Check out the Paper and Project. Don’t forget to join our 25k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club

The post Meet KITE: An AI Framework for Semantic Manipulation Using Keypoints as a Representation for Visual Grounding and Precise Action Inference appeared first on MarkTechPost.

Storybird lets anyone make visual stories in seconds with the power of …

StoryBird.AI lets anyone make visual stories in seconds with the power of AI. Their Stories plugin is among the most popular plugins in the ChatGPT plugin store. Using either the plugin or their website, anyone can craft compelling stories and books, all with the help of artificial intelligence. The platform is incredibly user-friendly, and you can dive right in using the Stories plugin in ChatGPT, which is one of the most sought-after plugins in OpenAI’s ChatGPT store. Exciting, isn’t it?

The stories are nothing short of spectacular, and you can explore a vast library of examples on Storybird.ai, such as below.

Using StoryBird.ai, you can write, edit, publish, and even earn money from the books you sell. . It’s an AI solution that’s unparalleled in its simplicity and effectiveness.

The team at Storybird has figured out how to harness LLMs and GANs to make it seamless.

Key Features:

Generative editing: This allows you to edit the story using generative techniques..

Speed: The process is lightning-fast, taking only a few seconds.

Personalization & Customization: The platform allows you to tailor the story by editing the generated content on each page. Even better, you can regenerate the associated image or illustration based on your edits. It’s like magic, and the story becomes uniquely yours.

Impressive Results: The stories and illustrations are genuinely impressive.

Stories ChatGPT plug in:

It is easy to add, just search for “stories”

Storybird.ai provided some handy tips for creating captivating stories:

Begin with a short description of your story, ranging from 20 to 1000 characters.

Include the character’s name, if applicable.

Provide details about the character (e.g., a girl with brown hair) and the setting for optimal results.

In chatGPT, you can easily initiate the process, for instance: 

and then your receive a quick results like this

Below is another example with the following initial prompt:

“Write a story about a 12-year-old girl named Olivia who wakes up early every morning to practice soccer, dreaming of becoming a professional player one day.”

We want to change the backpack to “Red,” and that is easily done..then we re-generated the illustration.

Who’s it for?

StoryBird AI is a tool that can be used to create personalized stories for parents, educators, and authors.

Parents can use StoryBird AI to create stories that engage and entertain their children.

Educators can use StoryBird AI to create interactive storytelling activities for students.

Authors can use StoryBird AI to experiment with personalized storytelling.

StoryBird AI is a great way to nurture a love for reading and imagination in children. It is also a novel way to engage young readers and create interactive experiences that leave a lasting impression.

Bonus!

Storybird.ai offers an alpha feature that allows you to publish on Amazon. By clicking “Publish on Amazon,” you agree to release the rights to the book’s IP. While there’s no guarantee that your book will be accepted by Amazon, Storybird.ai is exploring the possibility of a revenue-sharing program.

Text to Cartoon…yes Storybird is working on this …

Give it a try today!

Note: Thanks to the Storybird AI team for the thought leadership/ Educational article above. This Article is supported by Storybird AI.

The post Storybird lets anyone make visual stories in seconds with the power of AI appeared first on MarkTechPost.

This AI Research Explains the Synthetic Personality Traits in Large La …

An individual’s personality consists of a unique combination of qualities, characteristics, and ways of thinking. It shapes our most fundamental social interactions and preferences due to our shared biological and environmental histories. Due to their extensive exposure to human-generated data during training, LLMs can convincingly portray human-like personas in their outputs and, in effect, demonstrate a synthetic personality.

Due to their extensive exposure to human-generated data during training, LLMs can convincingly portray human-like personas in their outputs and, in effect, demonstrate a synthetic personality. Recent research has attempted to identify unintended consequences of LLMs’ enhanced abilities, such as the tendency to produce violent language and the production of deceptive and manipulative language in experiments. Conversations, explanations, and knowledge extraction from LLMs are not always reliable.

Understanding the personality trait-related properties of the language created by these models is vital as LLMs become the dominant human-computer interaction (HCI) interface, as is learning how to safely, appropriately, and effectively engineer personality profiles generated by LLMs. Researchers have studied methods including few-shot prompting to lessen the impact of negative and severe personality traits in LLM results. Even though LLMs have very variable outputs and are hypersensitive to prompting, no work has yet addressed how to scientifically and systematically quantify their personality.

Researchers from Google DeepMind, the University of Cambridge, Google Research, Keio University, and the University of California, Berkeley propose rigorous, verified psychometric approaches to characterize and mold LLM-based personality syntheses. 

The team first creates a methodology for utilizing previously existing psychometric tests to establish the construct validity of characterizing personalities in LLM-generated literature. They present a novel approach of mimicking population variance in LLM responses through controlled prompting to test the statistical correlations between personality and its external correlates as they exist in human social science data. Finally, they contribute a method for molding personality that operates independently of LLM and results in observable changes in trait levels.

The researchers test the approach on LLMs ranging in size and training methods in two natural interaction settings: MCQA and long-form text generation. The findings show the following observations:

 LLMs can reliably and validly simulate personality in their outputs (under certain prompting configurations.

Evidence of LLM-simulated personality’s reliability and validity is stronger for larger, instruction-fine-tuned models.

Personality in LLM outputs can be shaped along desired dimensions to mimic specific personality profiles.

Check out the Paper. Don’t forget to join our 25k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club

The post This AI Research Explains the Synthetic Personality Traits in Large Language Models (LLMs) appeared first on MarkTechPost.

Meet JourneyDB: A Large Scale Dataset with 4 Million Diverse and High- …

With the advancement of Large Language Models like ChatGPT and DALL-E and the rise in popularity of generative Artificial Intelligence, generating content like a human is no more a dream. Everything is now feasible, including question answering, code completion, and generation of content from textual descriptions, as well as the creation of images from both text and images. Recently, AI has been on par with human ingenuity. The well-known chatbot developed by OpenAI, called ChatGPT, is based on GPT 3.5’s transformer architecture and is being used by almost everyone. The latest version of GPT, i.e., GPT 4, is multimodal in nature, unlike the previous version, GPT 3.5, which only lets ChatGPT take textual inputs. 

The quality of generative content has significantly increased as a result of the development of diffusion models. Because of these developments, Artificial Intelligence Generative Content (AIGC) platforms, like DALLE, Stability AI, Runway, and Midjourney, have become increasingly popular as these systems let users create high-quality images based on text prompts provided in natural language. Despite advances in multimodal understanding, vision-language models still have difficulty understanding generated visuals. In comparison to real data, synthetic images display a larger degree of content and style variability, making it far more challenging for models to understand them properly.

To address these issues, a team of researchers has introduced JourneyDB, a large-scale dataset specifically curated for multimodal visual understanding of generative images. JourneyDB has 4 million unique, high-quality generated photos that have been created using different text prompts. This dataset focuses on both content and style interpretation and seeks to offer a complete resource for training and assessing models’ abilities to comprehend generated images.

The four tasks included in the suggested benchmark are as follows. 

Prompt inversion – Prompt inversion has been used to find the text prompts that the user used to generate an image. This tests the model’s comprehension of the generated images’ content and style.

Style retrieval – The team has focused on style retrieval so that the model identifies and retrieves similar generative images based on their stylistic attributes. This assesses the model’s proficiency in discerning stylistic nuances within generative images.

Image captioning – In image captioning, the model is tasked with generating descriptive captions that accurately represent the content of the generative image, which thus evaluates the model’s capability to comprehend and express the visual elements of the generated content effectively in natural language.

Visual Question Answering – Through Visual Question Answering (VQA), the model provides accurate answers to questions related to the generative image. The model is able to comprehend the visual and style content and provide relevant responses based on the given questions.

The team gathered 4,692,751 image-text prompt pairs and divided them into three sets: a training set, a validation set, and a test set. For evaluation, the team conducted extensive experiments using the benchmark dataset. The results showed that current state-of-the-art multimodal models don’t perform as well as they do on real datasets,  but a few adjustments on the proposed dataset greatly improved their performance.

Check out the Paper, Code, and Project. Don’t forget to join our 25k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club

The post Meet JourneyDB: A Large Scale Dataset with 4 Million Diverse and High-Quality Generated Images Curated for Multimodal Visual Understanding appeared first on MarkTechPost.

This AI paper Introduces DreamDiffusion: A Thoughts-to-Image Model for …

The ability to generate images from brain activity has witnessed significant advancements in recent years, particularly with text-to-image generation breakthroughs. However, translating thoughts directly into images using brain electroencephalogram (EEG) signals remains an intriguing challenge. DreamDiffusion aims to bridge this gap by harnessing pre-trained text-to-image diffusion models to generate realistic, high-quality images solely from EEG signals. The method explores the temporal aspects of EEG signals, addresses noise and limited data challenges, and aligns EEG, text, and image spaces. DreamDiffusion opens up possibilities for efficient, artistic creation, dream visualization, and potential therapeutic applications for individuals with autism or language disabilities.

Previous research has explored the generation of images from brain activity, utilizing techniques like functional Magnetic Resonance Imaging (fMRI) and EEG signals. While fMRI-based methods require expensive and non-portable equipment, EEG signals provide a more accessible and low-cost alternative. DreamDiffusion builds upon existing fMRI-based approaches, such as MinD-Vis, by leveraging the power of pre-trained text-to-image diffusion models. DreamDiffusion overcomes challenges specific to EEG signals, employing masked signal modeling for pre-training the EEG encoder and utilizing the CLIP image encoder to align EEG, text, and image spaces.

The DreamDiffusion method comprises three main components: masked signal pre-training, fine-tuning with limited EEG-image pairs using pre-trained Stable Diffusion, and alignment of EEG, text, and image spaces using CLIP encoders. Masked signal modeling is employed to pre-train the EEG encoder, enabling effective and robust EEG representations by reconstructing masked tokens based on contextual cues. The CLIP image encoder is incorporated to refine EEG embeddings further and align them with CLIP text and image embeddings. The resulting EEG embeddings are then used for image generation with improved quality.

Limitations of DreamDiffusion

DreamDiffusion, despite its remarkable achievements, has certain limitations that need to be acknowledged. One major limitation is that EEG data provide only coarse-grained information at the category level. Some failure cases showed instances where certain categories were mapped to others with similar shapes or colors. This discrepancy may be attributed to the human brain’s consideration of shape and color as crucial factors in object recognition. 

Despite these limitations, DreamDiffusion holds significant potential for various applications in neuroscience, psychology, and human-computer interaction. The ability to generate high-quality images directly from EEG signals opens up new avenues for research and practical implementations in these fields. With further advancements, DreamDiffusion can overcome its limitations and contribute to a wide range of interdisciplinary areas. Researchers and enthusiasts can access the DreamDiffusion source code on GitHub, facilitating further exploration and development in this exciting field.

Check out the Paper and Github. Don’t forget to join our 25k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club

The post This AI paper Introduces DreamDiffusion: A Thoughts-to-Image Model for Generating High-Quality Images Directly from Brain EEG Signals appeared first on MarkTechPost.

Playing Where’s Waldo? in 3D: OpenMask3D is an AI Model That Can Seg …

Image segmentation has come a long way in the last decade, thanks to the advancement in neural networks. It is now possible to segment multiple objects in complex scenes in just a manner of milliseconds, and the results are pretty accurate. On the other hand, we have another task in our hands for the 3D, the instance segmentation, and we have a way to go until we catch up with the 2D image segmentation performance.

3D instance segmentation has emerged as a critical task with significant applications in fields such as robotics and augmented reality. The objective of 3D instance segmentation is to predict object instance masks and their corresponding categories in a 3D scene. While notable progress has been made in this field, existing methods predominantly operate under a closed-set paradigm, where the set of object categories is limited and closely tied to the datasets used for training. 

This limitation poses two fundamental problems. First, closed-vocabulary approaches struggle to understand scenes beyond the object categories encountered during training, leading to potential difficulties in recognizing novel objects or misclassifying them. Second, these methods are inherently limited in their capacity to handle free-form queries, impeding their effectiveness in scenarios that require understanding and acting upon specific object properties or descriptions.

Open-vocabulary approaches are proposed to tackle these challenges. These approaches can handle free-form queries and enable zero-shot learning of object categories not present in the training data. By adopting a more flexible and expansive approach, open-vocabulary methods offer several advantages in tasks such as scene understanding, robotics, augmented reality, and 3D visual search. 

Enabling open-vocabulary 3D instance segmentation can significantly enhance the flexibility and practicality of applications that rely on understanding and manipulating complex 3D scenes. Let’s meet OpenMask3D, the promising 3D instance segmentation model.

OpenMask3D can segment instances of objects. Source: https://arxiv.org/pdf/2306.13631.pdf

OpenMask3D aims to overcome the limitations of closed-vocabulary approaches. It tackles the task of predicting 3D object instance masks and computing mask-feature representations while reasoning beyond a predefined set of concepts. OpenMask3D operates on RGB-D sequences and leverages the corresponding 3D reconstructed geometry to achieve its objectives. 

It uses a two-stage pipeline consisting of a class-agnostic mask proposal head and a mask-feature aggregation module. OpenMask3D identifies frames where instances are obvious and extracts CLIP features from the best images of each mask. The resulting feature representation is aggregated across multiple views and associated with each 3D instance mask. This instance-based feature computation approach equips OpenMask3D with the capability to retrieve object instance masks based on their similarity to any given text query, enabling open-vocabulary 3D instance segmentation and surpassing the limitations of closed-vocabulary paradigms.

Overview of OpenMask3D. Source: https://arxiv.org/pdf/2306.13631.pdf

By computing a mask feature per object instance, OpenMask3D can retrieve object instance masks based on similarity to any given query, making it capable of performing open-vocabulary 3D instance segmentation. Moreover, OpenMask3D preserves information about the novel and long-tail objects better than trained or fine-tuned counterparts. It also surpasses the limitations of a closed-vocabulary paradigm, enabling the segmentation of object instances based on free-form queries related to object properties such as semantics, geometry, affordances, and material properties.

Check out the Paper and Project. Don’t forget to join our 25k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club

The post Playing Where’s Waldo? in 3D: OpenMask3D is an AI Model That Can Segment Instances in 3D with Open-Vocabulary Queries appeared first on MarkTechPost.

Integrate SaaS platforms with Amazon SageMaker to enable ML-powered ap …

Amazon SageMaker is an end-to-end machine learning (ML) platform with wide-ranging features to ingest, transform, and measure bias in data, and train, deploy, and manage models in production with best-in-class compute and services such as Amazon SageMaker Data Wrangler, Amazon SageMaker Studio, Amazon SageMaker Canvas, Amazon SageMaker Model Registry, Amazon SageMaker Feature Store, Amazon SageMaker Pipelines, Amazon SageMaker Model Monitor, and Amazon SageMaker Clarify. Many organizations choose SageMaker as their ML platform because it provides a common set of tools for developers and data scientists. A number of AWS independent software vendor (ISV) partners have already built integrations for users of their software as a service (SaaS) platforms to utilize SageMaker and its various features, including training, deployment, and the model registry.
In this post, we cover the benefits for SaaS platforms to integrate with SageMaker, the range of possible integrations, and the process for developing these integrations. We also deep dive into the most common architectures and AWS resources to facilitate these integrations. This is intended to accelerate time-to-market for ISV partners and other SaaS providers building similar integrations and inspire customers who are users of SaaS platforms to partner with SaaS providers on these integrations.
Benefits of integrating with SageMaker
There are a number of benefits for SaaS providers to integrate their SaaS platforms with SageMaker:

Users of the SaaS platform can take advantage of a comprehensive ML platform in SageMaker
Users can build ML models with data that is in or outside of the SaaS platform and exploit these ML models
It provides users with a seamless experience between the SaaS platform and SageMaker
Users can utilize foundation models available in Amazon SageMaker JumpStart to build generative AI applications
Organizations can standardize on SageMaker
SaaS providers can focus on their core functionality and offer SageMaker for ML model development
It equips SaaS providers with a basis to build joint solutions and go to market with AWS

SageMaker overview and integration options
SageMaker has tools for every step of the ML lifecycle. SaaS platforms can integrate with SageMaker across the ML lifecycle from data labeling and preparation to model training, hosting, monitoring, and managing models with various components, as shown in the following figure. Depending on the needs, any and all parts of the ML lifecycle can be run in either the customer AWS account or SaaS AWS account, and data and models can be shared across accounts using AWS Identity and Access Management (IAM) policies or third-party user-based access tools. This flexibility in the integration makes SageMaker an ideal platform for customers and SaaS providers to standardize on.

Integration process and architectures
In this section, we break the integration process into four main stages and cover the common architectures. Note that there can be other integration points in addition to these, but those are less common.

Data access – How data that is in the SaaS platform is accessed from SageMaker
Model training – How the model is trained
Model deployment and artifacts – Where the model is deployed and what artifacts are produced
Model inference – How the inference happens in the SaaS platform

The diagrams in the following sections assume SageMaker is running in the customer AWS account. Most of the options explained are also applicable if SageMaker is running in the SaaS AWS account. In some cases, an ISV may deploy their software in the customer AWS account. This is usually in a dedicated customer AWS account, meaning there still needs to be cross-account access to the customer AWS account where SageMaker is running.
There are a few different ways in which authentication across AWS accounts can be achieved when data in the SaaS platform is accessed from SageMaker and when the ML model is invoked from the SaaS platform. The recommended method is to use IAM roles. An alternative is to use AWS access keys consisting of an access key ID and secret access key.
Data access
There are multiple options on how data that is in the SaaS platform can be accessed from SageMaker. Data can either be accessed from a SageMaker notebook, SageMaker Data Wrangler, where users can prepare data for ML, or SageMaker Canvas. The most common data access options are:

SageMaker Data Wrangler built-in connector – The SageMaker Data Wrangler connector enables data to be imported from a SaaS platform to be prepared for ML model training. The connector is developed jointly by AWS and the SaaS provider. Current SaaS platform connectors include Databricks and Snowflake.
Amazon Athena Federated Query for the SaaS platform – Federated queries enable users to query the platform from a SageMaker notebook via Amazon Athena using a custom connector that is developed by the SaaS provider.
Amazon AppFlow – With Amazon AppFlow, you can use a custom connector to extract data into Amazon Simple Storage Service (Amazon S3) which subsequently can be accessed from SageMaker. The connector for a SaaS platform can be developed by AWS or the SaaS provider. The open-source Custom Connector SDK enables the development of a private, shared, or public connector using Python or Java.
SaaS platform SDK – If the SaaS platform has an SDK (Software Development Kit), such as a Python SDK, this can be used to access data directly from a SageMaker notebook.
Other options – In addition to these, there can be other options depending on whether the SaaS provider exposes their data via APIs, files or an agent. The agent can be installed on Amazon Elastic Compute Cloud (Amazon EC2) or AWS Lambda. Alternatively, a service such as AWS Glue or a third-party extract, transform, and load (ETL) tool can be used for data transfer.

The following diagram illustrates the architecture for data access options.

Model training
The model can be trained in SageMaker Studio by a data scientist, using Amazon SageMaker Autopilot by a non-data scientist, or in SageMaker Canvas by a business analyst. SageMaker Autopilot takes away the heavy lifting of building ML models, including feature engineering, algorithm selection, and hyperparameter settings, and it is also relatively straightforward to integrate directly into a SaaS platform. SageMaker Canvas provides a no-code visual interface for training ML models.
In addition, Data scientists can use pre-trained models available in SageMaker JumpStart, including foundation models from sources such as Alexa, AI21 Labs, Hugging Face, and Stability AI, and tune them for their own generative AI use cases.
Alternatively, the model can be trained in a third-party or partner-provided tool, service, and infrastructure, including on-premises resources, provided the model artifacts are accessible and readable.
The following diagram illustrates these options.

Model deployment and artifacts
After you have trained and tested the model, you can either deploy it to a SageMaker model endpoint in the customer account, or export it from SageMaker and import it into the SaaS platform storage. The model can be stored and imported in standard formats supported by the common ML frameworks, such as pickle, joblib, and ONNX (Open Neural Network Exchange).
If the ML model is deployed to a SageMaker model endpoint, additional model metadata can be stored in the SageMaker Model Registry, SageMaker Model Cards, or in a file in an S3 bucket. This can be the model version, model inputs and outputs, model metrics, model creation date, inference specification, data lineage information, and more. Where there isn’t a property available in the model package, the data can be stored as custom metadata or in an S3 file.
Creating such metadata can help SaaS providers manage the end-to-end lifecycle of the ML model more effectively. This information can be synced to the model log in the SaaS platform and used to track changes and updates to the ML model. Subsequently, this log can be used to determine whether to refresh downstream data and applications that use that ML model in the SaaS platform.
The following diagram illustrates this architecture.

Model inference
SageMaker offers four options for ML model inference: real-time inference, serverless inference, asynchronous inference, and batch transform. For the first three, the model is deployed to a SageMaker model endpoint and the SaaS platform invokes the model using the AWS SDKs. The recommended option is to use the Python SDK. The inference pattern for each of these is similar in that the predictor’s predict() or predict_async() methods are used. Cross-account access can be achieved using role-based access.
It’s also possible to seal the backend with Amazon API Gateway, which calls the endpoint via a Lambda function that runs in a protected private network.
For batch transform, data from the SaaS platform first needs to be exported in batch into an S3 bucket in the customer AWS account, then the inference is done on this data in batch. The inference is done by first creating a transformer job or object, and then calling the transform() method with the S3 location of the data. Results are imported back into the SaaS platform in batch as a dataset, and joined to other datasets in the platform as part of a batch pipeline job.
Another option for inference is to do it directly in the SaaS account compute cluster. This would be the case when the model has been imported into the SaaS platform. In this case, SaaS providers can choose from a range of EC2 instances that are optimized for ML inference.
The following diagram illustrates these options.

Example integrations
Several ISVs have built integrations between their SaaS platforms and SageMaker. To learn more about some example integrations, refer to the following:

Enabling Data-Centric Artificial Intelligence Through Snowflake and Amazon SageMaker
Machine Learning for Everyone with Amazon SageMaker Autopilot and Domo
How to architect end-to-end development, monitoring, and maintenance of your models in AWS and Domino Data Lab

Conclusion
In this post, we explained why and how SaaS providers should integrate SageMaker with their SaaS platforms by breaking the process into four parts and covering the common integration architectures. SaaS providers looking to build an integration with SageMaker can utilize these architectures. If there are any custom requirements beyond what has been covered in this post, including with other SageMaker components, get in touch with your AWS account teams. Once the integration has been built and validated, ISV partners can join the AWS Service Ready Program for SageMaker and unlock a variety of benefits.
We also ask customers who are users of SaaS platforms to register their interest in an integration with Amazon SageMaker with their AWS account teams, as this can help inspire and progress the development for SaaS providers.

About the Authors
Mehmet Bakkaloglu is a Principal Solutions Architect at AWS, focusing on Data Analytics, AI/ML and ISV partners.
Raj Kadiyala is a Principal AI/ML Evangelist at AWS.

Researchers from the University of Wisconsin and ByteDance Introduce P …

In computer vision and graphics, photo-realistic portrait image synthesis has been constantly emphasized, with a wide range of downstream applications in virtual avatars, telepresence, immersive gaming, and many other areas. Indistinguishable from genuine images, recent developments in Generative Adversarial Networks (GANs) have shown a remarkably high image synthesis quality. Contemporary generative methods, however, don’t model the underlying 3D scenes; instead, they operate on 2D convolutional networks. As a result, it is impossible to properly ensure 3D consistency when synthesizing head pictures in different positions. Traditional methods call for a parametric textured mesh model learned from extensive 3D scan collections to produce 3D heads with various forms and looks. 

The produced pictures, however, need more fine details and have poor expressiveness and perceptual quality. To make more realistic 3D-aware face pictures, conditional generative models have been created with the advent of differentiable rendering and implicit neural representation. These methods, however, frequently depend on either a multi-view image or 3D scan supervision, which is challenging to get and has a constrained appearance distribution because it is normally recorded in controlled environments. Recent developments in implicit neural representation in 3D scene modeling and generative adversarial networks (GANs) for picture synthesis have accelerated the development of 3D-aware generative models. 

Figure 1 shows how our PanoHead enables high-fidelity geometry and 360 view-consistent photo-realistic full-head image synthesis to create realistic 3D portraits from a single perspective.

One of these, the pioneering 3D GAN, EG3D, has impressive quality in view-consistent picture synthesis and was trained using single-view image sets found in the wild. These 3D GAN methods can only synthesize in near-frontal perspectives, though. Researchers from ByteDance and the University of Wisconsin-Madison suggest PanoHead, a unique 3D-aware GAN trained using solely in-the-wild unstructured photos, enabling high-quality complete 3D head synthesis in 360. Numerous immersive interaction situations, including telepresence and digital avatars, benefit from their model’s ability to synthesize consistent 3D heads that can be seen from all perspectives. They believe their methodology is the first 3D GAN approach to realize 3D head synthesis in 360 degrees fully. 

There are several major technological obstacles to full 3D head synthesis when using 3D GAN frameworks like EG3D: Many 3D GANs can’t distinguish between foreground and background, leading to 2.5D head geometry. Large postures cannot be rendered because the background, normally structured as a wall structure, gets entangled with the created head in 3D. They develop a foreground-aware tri-discriminator that, using previous information from 2D picture segmentation, concurrently learns the decomposition of the foreground head in 3D space. Additionally, hybrid 3D scene representations, such as tri-plane, offer significant projection uncertainty for 360-degree camera postures, resulting in a “mirrored face” on the rear head despite their efficiency and compactness. 

They provide a unique 3D tri-grid volume representation that separates the frontal characteristics from the rear head while preserving the effectiveness of tri-plane representations to address the problem. Finally, getting accurate camera extrinsic of in-the-wild rear head pictures for 3D GANs training is quite challenging. Additionally, there is a discrepancy in picture alignment between these and frontal photos with discernible face landmarks. Unattractive head geometry and a noisy appearance result from the alignment gap. As a result, they suggest a unique two-stage alignment method that reliably aligns photos from all perspectives. This procedure considerably reduces the 3D GANs’ learning curve. 

They specifically suggest a camera self-adaptation module that dynamically modifies rendering camera locations to account for alignment drifts in the rear head pictures. As seen in Figure 1, their approach significantly improves the 3D GANs’ capacity to acclimatize to in-the-wild whole-head photos from arbitrary viewpoints. The resulting 3D GAN creates high-fidelity 360° RGB pictures and geometry and outperforms cutting-edge techniques in quantitative measures. With this model, they demonstrate how to create a 3D portrait with ease by reconstructing a whole head in 3D from a single monocular-view shot. 

The following is a summary of their principal contributions: 

• The first 3D GAN framework capable of rendering 360-degree full-head image synthesis that is view-consistent and high-fidelity. They use high-quality monocular 3D head reconstruction from photos taken in the field to illustrate their methodology. 

• A unique tri-grid formulation for expressing 3D 360-degree head scenarios that compromises effectiveness and expressiveness. 

• A tri-discriminator that separates 2D backdrop synthesis from 3D foreground head modeling. 

• A cutting-edge two-stage picture alignment technique that adaptively accommodates poor camera postures and misaligned image cropping, enabling the training of 3D GANs from photos taken in the wild with a broad range of camera poses.

Check Out the Paper, Github Repo, and Project. Don’t forget to join our 25k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Featured Tools:

Aragon: Get stunning professional headshots effortlessly with Aragon.

StoryBird AI: Create personalized stories using AI

Taplio: Transform your LinkedIn presence with Taplio’s AI-powered platform

Otter AI: Get a meeting assistant that records audio, writes notes, automatically captures slides, and generates summaries.

Notion: Notion AI is a robust generative AI tool that assists users with tasks like note summarization

tinyEinstein: tinyEinstein is an AI Marketing manager that helps you grow your Shopify store 10x faster with almost zero time investment from you.

AdCreative.ai: Boost your advertising and social media game with AdCreative.ai – the ultimate Artificial Intelligence solution. 

SaneBox: SaneBox’s powerful AI automatically organizes your email for you, and the other smart tools ensure your email habits are more efficient than you can imagine

Motion: Motion is a clever tool that uses AI to create daily schedules that account for your meetings, tasks, and projects. 

Check Out 100’s AI Tools in AI Tools Club
The post Researchers from the University of Wisconsin and ByteDance Introduce PanoHead: The First 3D GAN Framework that Synthesizes View-Consistent Full Head Images with only Single-View Images appeared first on MarkTechPost.

From GPT-1 to GPT-4: A Comprehensive Analysis and Comparison of OpenAI …

OpenAI provides a wide selection of models, each with its own features and cost structure, to meet the needs of various applications. Models are regularly updated to reflect the most recent advances in technology. Users can also adjust the models to make them work better for them. OpenAI’s GPT models have allowed major natural language processing (NLP) advancements.

Simply put, what is GPT?

One machine learning model for NLP applications is the Generative Pre-trained Transformer (GPT). These models are pre-trained on large volumes of information, such as books and websites, to produce natural-sounding, well-structured text.

More simply, GPTs are computer programs that can generate text that looks and reads like a human being wrote it but was not designed to do so. That makes them malleable for NLP applications such as question answering, translation, and text summarization. Regarding natural language processing, GPTs are a major step forward since they enable machines to comprehend and generate language with unparalleled fluency and accuracy. The four GPT models, from the original to the most recent GPT-4, are discussed below, along with an analysis of their strengths and weaknesses.

GPT-1

In 2018, OpenAI unveiled GPT-1, the first iteration of a language model built on the Transformer architecture. Its 117 million parameters were a huge leap forward from even the most advanced language models of the time.

GPT-1’s capacity to produce natural, intelligible speech in response to a prompt or context was one of its many capabilities. The Common Crawl, a vast dataset of web pages containing billions of words, and the BookCorpus dataset, a collection of more than 11,000 books on various topics, were used to train the model. GPT-1 was able to hone its language-modeling skills with the help of these varied datasets.

GPT-2

OpenAI published GPT-2 in 2019 to replace GPT-1. It was significantly larger than GPT-1, with 1.5 billion parameters. By fusing Common Crawl with WebText, a considerably larger and more varied dataset was used to train the model.

GPT-2’s capacity to construct logical and plausible text sequences was one of its strengths. Its ability to mimic human reactions also makes it a useful resource for various applications in natural language processing, including content generation and translation.

However, GPT-2 does have certain drawbacks. Complex reasoning and contextual understanding took a lot of work for it. However, GPT-2 struggled to keep longer passages coherent and in context, despite its superior performance on shorter ones.

GPT-3

The release of GPT-3 in 2020 ushered in a period of exponential growth for models of natural language processing. The size of GPT-3, at 175 billion parameters, is more than ten times that of GPT-2 and one hundred times that of GPT-1.

BookCorpus, Common Crawl, and Wikipedia are just a few sources used to train GPT-3. GPT-3 can produce high-quality results on various NLP tasks with roughly a trillion words across the datasets with little to no training data.

GPT-3’s capacity to compose meaningful prose, write computer code, and create art is a major advancement over earlier models. Unlike its predecessors, GPT-3 can interpret the context of a text and come up with relevant responses. Chatbots, original content generation, and language translation are just a few of the many uses that could benefit greatly from the capacity to generate text that sounds natural.

Concerns concerning the ethical implications and potential misuse of such potent language models were also highlighted in light of GPT-3’s powers. Many professionals are concerned that the model could be misused to create harmful content like hoaxes, phishing emails, and viruses. Criminals have been using ChatGPT to develop malware.

GPT-4

The fourth generation GPT was released on March 14, 2023. It’s a huge improvement over the GPT-3, which itself was revolutionary. Even though the model’s architecture and training data have yet to be made public, it is clear that it improves over GPT-3 in key respects and addresses some of the shortcomings of the prior iteration.

ChatGPT Plus subscribers have unlimited access to GPT-4, but only for so long. Joining the GPT-4 API waitlist is another option, albeit it could be a while before you get access. Nonetheless, Microsoft Bing Chat is the quickest access point for GPT-4. There is no cost or waiting list to participate.

The GPT-4’s ability to function in multiple modes is a defining characteristic. This allows the model to take a picture as input and treat it like a text prompt.

https://www.makeuseof.com/gpt-models-explained-and-compared/

Modeling in OpenAI

One set of AI systems built to comprehend and produce natural language is OpenAI’s GPT-3 models. Although the more advanced GPT-3.5 generation models have replaced these models, the original GPT-3 base models (Da Vinci, Curie, Ada, and Babbage) are still available for customization. Due to its merits, each model is best suited to a certain set of applications.

Davinci is the most advanced model in the GPT-3 family and can perform any work its siblings can. It was built for demanding jobs requiring an in-depth grasp of context and complexity. But unlike the other models, the computational cost of this great capability is higher.

Curie: This model has the same high level of functionality as Da Vinci but at a lower price and significantly higher operating speed. It is a good option for many jobs since it finds a happy medium between power and efficiency.

Ada: Ada was created for elementary programming jobs. It’s the most affordable and fastest of the GPT-3 models. Ada can be cost-effective if the job doesn’t need extensive contextual expertise.

When it comes to simple things, Babbage can handle them. It’s incredibly quick and cheap, just like Ada. It excels in jobs when speed and efficiency are prioritized over in-depth comprehension.

These models were trained on data through October 2019, and their maximum token capacity is 2,049. The task’s complexity, desired output quality, and available computational resources all play a role in determining which model to use.

So why do we need so many variants?

A selection of models allows us to meet the requirements of a diverse set of customers and scenarios. Using a more capable model than necessary can incur unnecessary computing costs, and not all activities necessitate the highest capacity level. OpenAI provides a variety of models to its customers, each with its own set of strengths and weaknesses, as well as its price tag.

Utilization and storage of data

Data privacy is important to OpenAI. Unless users opt-in, the OpenAI API will no longer use user data for model training or improvement as of March 1, 2023. Except for cases where the law mandates retention, API data will be erased after 30 days at the latest. Zero data retention might be an option for high-trust consumers who use particularly sensitive applications.

OpenAI’s Present Models

OpenAI’s models are varied, each built for a particular purpose. Some of the models are briefly described below.

The GPT-4 Limited Beta is an enhanced version of the GPT-3.5 series that can read and write computer code and plain language. It’s still in the beta testing phase, and only select users have access now.

The GPT-3.5 series of models can interpret and produce code in natural language. The get-3.5-turbo is this family’s most powerful and cost-effective member, and it excels at conversation while still performing well on more conventional completion tasks.

DALLE Beta: This methodology combines visual creativity with language comprehension to develop and edit graphics responding to a natural language challenge.

Whisper is a beta voice recognition model that can transcribe spoken words into written ones. Multilingual speech recognition, translation, and identification are possible because of their training on a large and varied dataset.

Embedding models translate text into a numerical representation to perform tasks such as search, clustering, recommendation, anomaly detection, and classification. Safe and courteous spaces can be maintained with the help of this model, which is trained to identify potentially problematic text.

GPT -3: This series of models is capable of both comprehending and producing natural language. Although the more powerful GPT-3.5 versions have replaced the original GPT-3 base models, they are still available for customization.

OpenAI promises regular updates to its models. There have been consistent updates to some models recently, like the gpt-3.5-turbo. Once a new version of a model is released, the previous version remains supported for at least three months to accommodate developers who desire stability. OpenAI is a versatile platform due to its extensive library of models, regular updates, and emphasis on data protection. OpenAI offers a model that can detect sensitive information, convert audio to text, and generate natural language.

Don’t forget to join our 25k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Featured Tools:

Aragon: Get stunning professional headshots effortlessly with Aragon.

StoryBird AI: Create personalized stories using AI

Taplio: Transform your LinkedIn presence with Taplio’s AI-powered platform

Otter AI: Get a meeting assistant that records audio, writes notes, automatically captures slides, and generates summaries.

Notion: Notion AI is a robust generative AI tool that assists users with tasks like note summarization

tinyEinstein: tinyEinstein is an AI Marketing manager that helps you grow your Shopify store 10x faster with almost zero time investment from you.

AdCreative.ai: Boost your advertising and social media game with AdCreative.ai – the ultimate Artificial Intelligence solution. 

SaneBox: SaneBox’s powerful AI automatically organizes your email for you, and the other smart tools ensure your email habits are more efficient than you can imagine

Motion: Motion is a clever tool that uses AI to create daily schedules that account for your meetings, tasks, and projects. 

Check Out 100’s AI Tools in AI Tools Club

References:

https://www.makeuseof.com/gpt-models-explained-and-compared/

https://www.geeky-gadgets.com/openai-models/?utm_source=flipboard&utm_content=topic%2Fmachinelearning

The post From GPT-1 to GPT-4: A Comprehensive Analysis and Comparison of OpenAI’s Evolving Language Models appeared first on MarkTechPost.

Princeton Researchers Introduce InterCode: A Revolutionary Lightweight …

ChatGPT, the latest chatbot developed by OpenAI, has been in the headlines ever since its release. This GPT transformer architecture-based model imitates humans by answering questions accurately just like a human, generates content for blogs, social media, research, etc., translates languages, summarizes long textual paragraphs while retaining the important key points, and even generates code samples. Large Language Models like GPT, BERT, PaLM, and LLaMa have successfully contributed to the advancement in the field of Artificial Intelligence. These deep learning models have effectively used the potential of Natural Language Processing and Natural Language Understanding. 

In recent times, the development of models that can automatically produce code from natural language specifications has gained popularity. Though these models have demonstrated impressive performance on static benchmarks due to the extensive pre-training over thousands of codebases, there are also certain limitations, such as typos, gaps between the process of creating the code and its execution, limited human involvement, and so on. 

To address these challenges, researchers from the Department of Computer Science at Princeton University have proposed a lightweight and flexible framework called InterCode that facilitates interactive coding as a standard reinforcement learning (RL) environment. In InterCode, code is treated as actions, and execution feedback is considered as observations. This RL-based method makes coding more iterative and can be used with many programming languages and environments because it is made to be language and platform-independent. 

InterCode also uses independent Docker environments to guarantee safe and repeatable execution. It has been designed to be compatible with conventional sequence-to-sequence (seq2seq) coding techniques, making it simple to adopt and incorporate current methods. It can easily enable the development of new approaches specifically tailored for interactive code generation. 

For evaluation, the team has constructed two interactive code environments using Bash and SQL as the action spaces to illustrate the utility of InterCode. They have trained and assessed some great Language Models that are equipped with various prompting tactics, such as ReAct and Plan & Solve, using data from the static Spider and NL2Bash datasets. The InterCode experiments demonstrated the advantages of interactive code production while emphasizing its potential as a difficult benchmark for improving code understanding and generating capabilities.

The team has summarized the key contributions as follows – 

InterCode, a new and universal framework for interactive code generation, has been introduced, which provides ease of use, extensibility, and safety. It is user-friendly and accessible, allowing researchers to utilize it in their experiments easily.

Some incredible state-of-the-art models have been accessed and evaluated using InterCode, and a number of potential enhancements have been pointed out.

The InterCode benchmark serves as a standardized evaluation platform for interactive code generation tasks, and it allows researchers to compare the performance of different models using a common framework. It transforms any fresh datasets of static code into interactive activities.

In conclusion, InterCode is a promising approach and a great addition to the developments in the field of Artificial Intelligence. It greatly advances interactive code generation, thus providing a standardized evaluation platform and encouraging further research and development in this area.

Check out the Paper, Code, and Project. Don’t forget to join our 25k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Princeton Researchers Introduce InterCode: A Revolutionary Lightweight Framework Streamlining Language Model Interaction for Human-Like Language-to-Code Generation appeared first on MarkTechPost.