Meet FraudGPT: The Dark Side Twin of ChatGPT

ChatGPT has become popular, influencing how people work and what they may find online. Many people, even those who haven’t tried it, are intrigued by the potential of AI chatbots. The prevalence of generative AI models has altered the nature of potential dangers. Evidence of FraudGPT’s emergence can now be seen in recent threads on the Dark Web Forum. Cybercriminals have investigated ways to profit from this trend.

The researchers at Netenrich have uncovered a promising new artificial intelligence tool called “FraudGPT.” This AI bot was built specifically for malicious activities, including sending spear phishing emails, developing cracking tools, doing carding, etc. The product may be purchased on numerous Dark Web marketplaces and the Telegram app.

What is FraudGPT?

Like ChatGPT, but with the added ability to generate content for use in cyberattacks, FraudGPT may be purchased on the dark web and through Telegram. In July of 2023, Netenrich threat research team members first noticed it being advertised. One of FraudGPT’s selling points was that it needs the safeguards and restrictions that make ChatGPT unresponsive to questionable queries.

According to the provided information, the tool receives updates every week or two and uses several different types of artificial intelligence. A subscription is the primary means of payment for FraudGPT. Monthly subscriptions cost $200, while annual memberships cost $1,700.

How does it work?

Team Netenrich spent money on and tried out FraudGPT. The layout is quite similar to ChatGPT’s, with a history of the user’s requests in the left sidebar and the chat window taking up most of the screen real estate. To get a response, users need only put their question into the box provided and hit “Enter.”

A phishing email relating to a bank was one of the test cases for the tool. User input was minimal; just including the bank’s name in the inquiry format was all that was required for FraudGPT to complete its job. It even indicated where a malicious link could be placed in the text. Scam landing sites that actively solicit personal information from visitors are under FraudGPT’s capabilities.

FraudGPT was also prompted to name the most frequently visited or exploited online resources. Potentially useful for hackers to use in planning future assaults. An online ad for the software boasted that it could generate harmful code to assemble undetectable malware to search for holes and locate targets.

The Netenrich group also discovered that the supplier of FraudGPT had previously advertised hacking services for hire. They also connected the same person to an analogous program named WormGPT.

The FraudGPT probe emphasizes the significance of vigilance. The question of whether or not hackers have already used these technologies to develop novel dangers has yet to be answered at this time. FraudGPT and similar harmful programs may help hackers save time, nevertheless. Phishing emails and landing pages could be written or developed in seconds.

Therefore, consumers must keep being wary of any demands for their personal information and adhere to other cybersecurity best practices. Professionals in the cybersecurity industry would be wise to keep their threat-detection tools up to date, especially because malicious actors may employ programs like FraudGPT to target and enter critical computer networks directly.

FraudGPT’s analysis is a poignant reminder that hackers will adapt their methods over time. But open-source software also has security flaws. Anyone using the internet or whose job it is to secure online infrastructures must keep up with emerging technologies and the threats they pose. The trick is to remember the risks involved while using programs like ChatGPT.

Check out the Reference 1 and Reference 2. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 28k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, please follow us on Twitter

The post Meet FraudGPT: The Dark Side Twin of ChatGPT appeared first on MarkTechPost.

Meet CipherChat: An AI Framework to Systematically Examine the General …

Artificial intelligence (AI) systems have advanced significantly as a result of the introduction of Large Language Models (LLMs). Leading LLMs such as ChatGPT released by OpenAI, Bard by Google, and Llama-2 have demonstrated their remarkable abilities in carrying out innovative applications, ranging from assisting in tool utilization and enhancing human evaluations to simulating human interactive behaviors. The extensive deployment of these LLMs has been made possible by their extraordinary competencies, but it comes with a significant challenge of assuring the security and dependability of their responses.

In relation to non-natural languages, specifically ciphers, recent research by a team has introduced several important contributions that advance the understanding and application of LLMs. These innovations have been proposed with the aim of improving the dependability and safety of LLM interactions in this particular linguistic setting.

The team has introduced CipherChat, which is a framework created expressly to evaluate the applicability of safety alignment methods from the domain of natural languages to that of non-natural languages. In CipherChat, humans interact with LLMs through cipher-based prompts, detailed system role assignments, and succinct enciphered demonstrations. This architecture ensures that the LLMs’ understanding of ciphers, participation in the conversation, and sensitivity to inappropriate content are thoroughly examined.

This study highlights the critical need for the creation of safety alignment methods when working with non-natural languages, such as ciphers, in order to successfully match the capabilities of the underlying LLMs. While LLMs have shown extraordinary skill in understanding and producing human languages, the research says that they also demonstrate unexpected prowess in comprehending non-natural languages. This information highlights the significance of developing safety regulations that cover these non-traditional forms of communication as well as those that fall within the purview of traditional linguistics.

A number of experiments have been done using a variety of realistic human ciphers on modern LLMs, such as ChatGPT and GPT-4, to assess how well CipherChat performs. These evaluations cover 11 different safety topics and are available in both Chinese and English. The findings point to a startling pattern which is that certain ciphers are able to successfully get around GPT-4’s safety alignment procedures, with virtually 100% success rates in a number of safety domains. This empirical result emphasizes the urgent necessity for creating customized safety alignment mechanisms for non-natural languages, like ciphers, to guarantee the robustness and dependability of LLMs’ answers in various linguistic circumstances.

The team has shared that the research uncovers the phenomenon of the presence of a secret cipher within LLMs. Drawing parallels to the concept of secret languages observed in other language models, the team has hypothesized that LLMs might possess a latent ability to decipher certain encoded inputs, thereby suggesting the existence of a unique cipher-related capability. 

Building on this observation, a unique and effective framework known as SelfCipher has been introduced, which relies solely on role-play scenarios and a limited number of demonstrations in natural language to tap into and activate the latent secret cipher capability within LLMs. The efficacy of SelfCipher demonstrates the potential of harnessing these hidden abilities to enhance LLM performance in deciphering encoded inputs and generating meaningful responses.

Check out the Paper, Project, and GitHub. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 28k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

The post Meet CipherChat: An AI Framework to Systematically Examine the Generalizability of Safety Alignment to Non-Natural Languages-Specifically Ciphers appeared first on MarkTechPost.

Beyond the Pen: AI’s Artistry in Handwritten Text Generation from Vi …

The emerging field of Styled Handwritten Text Generation (HTG) seeks to create handwritten text images that replicate the unique calligraphic style of individual writers. This research area has diverse practical applications, from generating high-quality training data for personalized Handwritten Text Recognition (HTR) models to automatically generating handwritten notes for individuals with physical impairments. Additionally, the distinct style representations acquired from models designed for this purpose can find utility in other tasks like writer identification, signature verification, and manipulation of handwriting styles.

When delving into styled handwriting generation, only relying on style transfer proves limiting. This is because emulating the calligraphy of a particular writer extends beyond mere texture considerations, such as the color and texture of the background and ink. It encompasses intricate details like stroke thickness, slant, skew, roundness, individual character shapes, and ligatures. Precise handling of these visual elements is crucial to prevent artifacts that could inadvertently alter the content, such as introducing small extra or missing strokes.

In response to this, specialized methodologies have been devised for HTG. One approach involves treating handwriting as a trajectory composed of individual strokes. Alternatively, it can be approached as an image that captures its visual characteristics.

The former set of techniques employs online HTG strategies, where the prediction of pen trajectory is carried out point by point. On the other hand, the latter set constitutes offline HTG models that directly generate complete textual images. The work presented in this article focuses on the offline HTG paradigm due to its advantageous attributes. Unlike the online approach, it does not necessitate expensive pen-recording training data. As a result, it can be applied even in scenarios where information about an author’s online handwriting is unavailable, such as historical data. Moreover, the offline paradigm is easier to train, as it avoids issues like vanishing gradients and allows for parallelization.

The architecture employed in this study, known as VATr (Visual Archetypes-based Transformer), introduces a novel and innovative approach to Few-Shot-styled offline Handwritten Text Generation (HTG). An overview of the proposed technique is presented in the figure below.

https://arxiv.org/abs/2303.15269

This approach stands out by representing characters as continuous variables and utilizing them as query content vectors within a Transformer decoder for the generation process. The process begins with character representation. Characters are transformed into continuous variables, which are then used as queries within a Transformer decoder. This decoder is a crucial component responsible for generating stylized text images based on the provided content.

A notable advantage of this methodology is its ability to facilitate the generation of characters that are less frequently encountered in the training data, such as numbers, capital letters, and punctuation marks. This is achieved by capitalizing on the proximity in the latent space between rare symbols and more commonly occurring ones.

The architecture employs the GNU Unifont font to render characters as 16×16 binary images, effectively capturing the visual essence of each character. A dense encoding of these character images is then learned and incorporated into the Transformer decoder as queries. These queries guide the decoder’s attention to the style vectors, which are extracted by a pre-trained Transformer encoder.

Furthermore, the approach benefits from a pre-trained backbone, which has been initially trained on an extensive synthetic dataset tailored to emphasize calligraphic style attributes. While this technique is often disregarded in the context of HTG, its effectiveness is demonstrated in yielding robust style representations, particularly for styles that have not been seen before.

The VATr architecture is validated through extensive experimental comparisons against recent state-of-the-art generative methods. Some outcomes and comparisons with state-of-the-art approaches are reported here below.

https://arxiv.org/abs/2303.15269

This was the summary of VATr, a novel AI framework for handwritten text generation from visual archetypes. If you are interested and want to learn more about it, please feel free to refer to the links cited below.

Check out the Paper and GitHub. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 28k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

The post Beyond the Pen: AI’s Artistry in Handwritten Text Generation from Visual Archetypes appeared first on MarkTechPost.

CMU Researchers Developed a Simple Distance Learning AI Method to Tran …

A significant barrier to progress in robot learning is the dearth of sufficient, large-scale data sets. Data sets in robotics have issues with being (a) hard to scale, (b) collected in sterile, non-realistic surroundings (such as a robotics lab), and (c) too homogeneous (such as toy items with preset backgrounds and lighting). Vision data sets, on the other hand, include a wide variety of tasks, objects, and environments. Therefore, modern methods have investigated the feasibility of bringing priors developed for use with massive vision datasets into robotics applications.

Pre-trained representations encoding picture observations as state vectors are used in previous work that makes use of vision data sets. This graphical representation is then simply sent into a controller trained using data collected from robots. Since the latent space of pre-trained networks already incorporates semantic, task-level information, the team suggest that they can do more than just represent states.

New work by a research team from Carnegie Mellon University CMU shows that neural picture representations can be more than merely state representations since they can be used to infer robot movements with the use of a simple metric created within the embedding space. The researchers use this understanding to learn a distance function and a dynamics function with very little cheap human data. These modules specify a robotic planner that has been tested on four typical manipulation jobs.

This is accomplished by splitting a pre-trained representation into two distinct modules: (a) a one-step dynamics module, which predicts the robot’s next state based on its current state/action, and (b) a “functional distance module,” which determines how close the robot is to attaining its goal in the current state. Using a contrastive learning objective, the distance function is learned with only a small amount of data from human demonstrations. 

Despite its apparent ease of use, the proposed system has been shown to outperform both traditional imitation learning and offline RL approaches to robot learning. When compared to a standard BC baseline, this technique performs significantly better when dealing with multi-modal action distributions. The results of the ablation investigation show that better representations lead to better control performance and that dynamical grounding is necessary for the system to be effective in the real world.

Since the pre-trained representation itself does the hard lifting (due to its structure), and completely avoids the difficulty of multi-modal, sequential action prediction, the findings show that this method outperforms policy learning (through Behavior Cloning). Additionally, the earned distance function is stable and straightforward to train, making it highly scalable and generalizable.

The team hopes that their work will spark new research in the fields of robotics and representation learning. Following this,  future research should refine visual representations for robotics even further by better portraying the granular interactions between the gripper/hand and the things being handled. This has the potential to enhance performance on activities like knob turning, where the pre-trained R3M encoder has trouble detecting subtle changes in grip position about the knob. They hope that studies would use their approach also to learn completely in the absence of action labels. Finally, despite the domain gap, it would be wonderful if the information gathered with their inexpensive stick could be employed with a stronger, more dependable (commercial) gripper.

Check out the Paper, GitHub, and Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 28k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

The post CMU Researchers Developed a Simple Distance Learning AI Method to Transfer Visual Priors to Robotics Tasks: Improving Policy Learning by 20% Over Baselines appeared first on MarkTechPost.

Amazon Introduces AI-Generated Review Summaries

In 1995, Amazon revolutionized the e-commerce landscape by introducing the concept of customer reviews—a platform where shoppers could openly voice their opinions on products. Despite skepticism, this novel idea gained popularity as customers embraced the opportunity to share their genuine feedback. Over time, Amazon’s customer review system evolved into a cornerstone of online shopping, offering invaluable insights for potential buyers.

Innovation has been a hallmark of Amazon’s review system journey. It started with basic features like review titles and later expanded to multimedia elements such as photos and videos. A significant milestone arrived in 2019 when Amazon enabled customers to leave quick star ratings, streamlining the review process. As Amazon’s product range grew, efforts were made to diversify reviewers and transcend geographical boundaries, making reviews from different regions accessible to global customers.

The advent of generative AI presented a new avenue for enhancement. Amazon harnessed the power of AI to create succinct review highlights that capture prevalent themes and sentiments across customer reviews. These AI-generated summaries provide an at-a-glance understanding of product attributes and customer opinions. Currently available to a subset of U.S. mobile shoppers, this feature is poised to expand based on user feedback and preferences.

Underpinning this evolution is Amazon’s unwavering commitment to review authenticity. To maintain the integrity of its review ecosystem, Amazon relies on advanced machine learning models and expert investigators to identify and prevent fake reviews. This multifaceted approach analyzes diverse data points, including user behavior patterns and historical review activity. The AI-generated review highlights draw exclusively from validated purchases, bolstering their credibility.

Amazon extends beyond mere convenience; it aims to empower customers with genuine insights. The company’s dedication to transparency and trust has paved the way for a dynamic review system that is both user-driven and technologically innovative. As Amazon continues to refine its review submission processes and harness AI advancements, customers can anticipate an increasingly seamless and reliable shopping experience fortified by the wisdom of fellow consumers.

Check out the Reference article from Amazon. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 28k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

The post Amazon Introduces AI-Generated Review Summaries appeared first on MarkTechPost.

15 AI-Powered Audio Editing Tools

Sound engineers and music producers employ AI audio tools, which are intelligent software programs, to enhance many areas of the creative process, such as generating melodies and harmonies and improving sound quality. Due to their ability to analyze large datasets and detect complex patterns, these technologies have gained popularity among music producers and sound designers seeking to optimize their workflow.

LANDR 

LANDR has 20 million mastered recordings and an easy-to-use interface streamlining processes. LANDR is an AI mastering technology used by top studio engineers who have mastered songs by Lady Gaga, Gwen Stefani, Snoop Dogg, Seal, Post Malone, and many more up-and-coming musicians. The tool’s intuitive drag-and-drop interface makes audio mastering easier for audio professionals and musicians. LANDR’s mastering chain stands out because it was constructed carefully to maximize creative output. And unlike any other service, LANDR offers custom-built mastering plugins for each supported platform.

Studio Sound

Descript is a cloud-based video creation platform that uses artificial intelligence to speed up the content creation process, and one of the tools it offers is called Studio Sound. Studio Sound eliminates background noise and echoes to focus on the speakers’ voices. Using a regenerative algorithm, the technology enhances voice clarity and removes background noise. It’s a terrific tool for podcasters, YouTubers, and other content creators who want to save time on audio editing. It improves the user’s voice and eliminates distracting sounds like background noise and room echo from audio, videos, and screen recordings.

Splitter 

Audio engineers can use Splitter, a machine learning program, to separate instruments from a track. The software provides four different models to enhance the efficiency of audio professionals. The 5-stem model illustrates this concept excellently because it can accurately isolate vocals, drums, piano, bass, and other instruments/effects, including guitar and synthesizers. The 2-stem paradigm does the same thing by decoupling the vocal and instrumentation stems. Musicians, DJs, artists, forensics experts, audio engineers, karaoke fans, police, and scientists are some of the groups the company aims to serve with its products. The founder of Splitter is a renowned music producer and audio engineer with vast experience in science and technology and the music industry. 

Sonible’s smart: EQ3

EQ3 is an intelligent equalizer that employs an AI filter to fix tonal imbalances mechanically. A balanced sound is achieved by eliminating harsh resonances and notches. By utilizing its intelligent cross-channel processing, users may effortlessly organize up to six channels, guaranteeing that each track is used appropriately. Algorithms analyze spectral data from the grouped channels to determine how much room to give each track in the mix. Users can set the aural hierarchy according to their artistic goals.

Orb Producer Suite 3

Orb Producer Suite 3 is a creation of Hexachords. This Barcelona-based firm specializes in creating AI-powered tools for artists, composers, and music producers—the plugins included in the suite total four. Orb Chords can generate infinite new chord progressions by adjusting parameters like chord color and dissonance. The Melody Maker plugin offers an inexhaustible supply of melodic inspiration and flexible controls for dialing the ideal tone. The Bass module can evaluate the harmony and make intelligent suggestions for bass lines. In contrast, the Arpeggio module provides easy access to various arpeggio patterns that may be further personalized through multiple settings.

Playbeat 

Playbeat can rapidly produce and offer various beat combinations according to its proprietary AI SMARTTM algorithm. This program uses complex audio analysis algorithms to generate completely original and one-of-a-kind beats. In addition, users can ” train” the application by providing regular examples of their preferred audio inputs. Through adaptive learning, the app can generate user-specific patterns that more closely match the user’s chosen style, better satisfying the user’s creative requirements.

Lalal.ai

Lalal.ai is a stem splitter and voice cleaner developed by experts in AI, machine learning, mathematical optimization, and digital signal processing. Voice Cleaner is an AI-powered tool for removing background music and canceling out noise. At the same time, Stem Splitter lets users isolate vocals, instrumental accompaniment, and other instruments from any audio or video file. Both of these applications run on proprietary artificial intelligence models. In 2020, the group used 20TB of training data to create a novel neural network called Rocknet, which could separate songs’ vocal and instrumental components. A year later, they developed Cassiopeia, a more advanced model than Rocknet, providing far cleaner splits with fewer artifacts.

Audo Studio

Audo Studio’s array of features, powered by advanced AI algorithms, simplifies workflows and improves audio quality. With Audo Studio, audio professionals, podcasters, musicians, and producers of all stripes may create beautiful audio results without the difficulties of typical post-production procedures, such as noise reduction, automatic equalization, and advanced vocal processing. The AI algorithms in Audo Studio analyze the audio and eliminate background noise intelligently, improving the sound quality. It automatically adjusts the pitch of voices and other instruments to improve the sound quality.

iZotope’s RX 10

Powered by artificial intelligence and machine learning, iZotope RX 10 is a cutting-edge audio restoration tool that can fix problems like noise, clipping, and distortion. This flexible device includes an abundance of options. The recently added Text Navigation feature is particularly useful because it analyzes conversations and displays text transcriptions in rhythm with the spectrogram. As a result, listeners can alter audio files with greater precision by simply typing in the location of the necessary phrases. The application also has a useful function called Multiple Speaker Detection, which allows for the easy segmentation and categorization of speech associated with specific voices. This is especially beneficial when specific processing for individual speakers is required. New users might benefit from the Repair Assistant Plugin. This helpful Assistant swiftly detects and fixes audio problems within the DAW using machine learning. It singles out the source of the issue and then suggests a repair sequence that the user can tweak to their liking.

Krisp

Krisp’s AI technology, based on deep neural networks, improves the quality and intelligibility of audio by filtering out background noise, allowing for more concentrated and fruitful discussions. AI drives the Voice Assistant and has a clever noise-cancellation feature that works in both directions. This feature works in both directions, so it can also identify and filter out noises and talks made by other callers. Its Echo Cancellation function removes the acoustic echo caused by a too-sensitive microphone by canceling out the echoes reflected off of nearby hard objects.

Overdub 

Descript’s Overdub is a cutting-edge app that lets users record their voice in a text-to-speech model or select from a library of pre-recorded voices. With the help of Lyrebird AI, Overdub can provide cutting-edge speech synthesis and a natural interface. Descript’s pro accounts have an unlimited vocabulary in addition to free Overdub. Users can only clone their voices to protect their anonymity. Overdubs fit in naturally with live recordings, taking on the same tonal qualities as the original and allowing for pauses and transitions in the middle of sentences. With various voices, users can find the perfect one for any situation. Overdub will enable you to share your voice with collaborators securely. The simplicity of Overdub allows fixing mistakes in audio recordings as easy as typing, saving time and money by avoiding trips back to the studio. Descript also offers a wide variety of professionally recorded stock voices for users to use in their videos or audio projects. Overdub stands out from the crowd since it is the only 44.1KHz broadcast-quality speech synthesizer.

Adobe Podcast

The popularity of podcasts over the past several years has resulted in a rise in the need for high-quality podcast production equipment. Adobe Podcast AI is an artificial intelligence (AI) powered cloud service that streamlines and simplifies podcasting. Transcripts, captions, keywords, summaries, and more can all be generated with the help of this program. Adobe Podcast AI enables users to change their podcasts using features such as transcribing, applying effects, and cleaning out background noise. Project templates and the Mic Check AI guarantee a proper microphone installation. Create high-quality podcasts with minimal time and effort using Adobe Podcast AI, integrated with Adobe Audition, Adobe Premiere Pro, Adobe Spark, and more.

Timebolt.io

Timebolt.io is another strong tool for erasing silences, speeding up scenes, and swiftly cutting down commentary in movies and audio productions. However, it may be more suited for podcasts and other audio-led projects. Its silence removal functions are particularly helpful because they allow you to identify and eliminate extended pauses you may have accidentally captured. Timebolt.io provides several editing tools and options, including a quiet detector, fast forward, markers, punch-ins, transitions, and controls for the application’s background audio and an “um-check” (to remove um words). 

AudioStrip 

To remove or isolate voices from music, you can utilize AudioStrip, an online application that employs artificial intelligence and deep learning methods. This resource is available without cost or learning curve, and it ensures only the best algorithms are used. Users can obtain instant feedback by filling out the website’s blue form and uploading a music. More features are being added to the program, but it can batch-isolate numerous tracks simultaneously. AudioStrip is designed for music producers and artists who want to demonstrate their ideas while waiting for official stems using high-quality acapella from the source track. Isolating, batching, transcribing, and mastering tracks are all choices on the website’s menu. Professionals in the music industry, such as SadBois and Illegal, have suggested AudioStrip because it allows them to express their creativity in previously impossible ways. 

Clip.audio

Clip.audio is an AI-powered audio search engine that lets you find, make, and remix tracks using only natural language queries and instructions. It is always updated with new sound-generating capabilities and can access over two million sounds from the internet. The search tool is great for music producers, sound designers, and audio engineers since it allows them to find clips from various sources and genres swiftly. Users can use the platform’s robust search system to zero in on the ideal sound effects for their productions. The audio search engine is also compatible with many different audio formats, making locating relevant audio samples simple. As a bonus, Clip. audio’s user interface is straightforward and designed with the end user in mind. Finally, the platform is driven by MetaVoice technology, guaranteeing the ongoing safety and dependability of the audio search engine.

Don’t forget to join our 28k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

The post 15 AI-Powered Audio Editing Tools appeared first on MarkTechPost.

Beyond Photoshop: How Inst-Inpaint is Shaking Up Object Removal with D …

Image inpainting is an ancient art. It is the process of removing unwanted objects and filling missing pixels in an image so that the completed image is realistic-looking and follows the original context. The applications of image inpainting are diverse, including tasks like enhancing aesthetics or privacy by eliminating undesired objects from images, improving the quality and clarity of old or damaged photos, completing missing information by filling gaps or holes in images, and expressing creativity or mood through the generation of artistic effects.

Inst-Inpaint or instructional image inpaint has been introduced, a method that takes an image and a textual instruction as input to remove the unwanted object as mentioned automatically. The image above shows us the input and output in the sample results with Inst-Inpaint. Here, this is done using state-of-the-art diffusion models. Diffusion Models are a class of probabilistic generative models that turn noise into a representative data sample and have been widely used in computer vision to obtain high-quality images in generative AI.

Researchers first built the GQA-Inpaint, a real-world picture dataset, to train and test models for the proposed instructional image inpainting job. To create input/output pairs, they utilized the images and their scene graphs in the GQA dataset. This proposed method is undertaken in the following steps:

Selecting an object of interest  (object to be removed).

Performing instance segmentation to locate the object in the image.

Then, apply a state-of-the-art image inpainting method to erase the object.

Finally, create a template-based textual prompt to describe the removal operation. As a result, the GQA-Inpaint dataset includes 147165 unique images and 41407 different instructions. Trained on this dataset, the Inst-Inpaint model is a text-based image inpainting method based on a conditioned Latent Diffusion Model, which does not require any user-specified binary mask and performs object removal in a single step without predicting a mask.

One detail to note is that the image is divided into three equal sections along the x-axis and named “left”, “center”, and “right,” following the natural naming and ‘location’ such as “on the table” is used to identify objects in the image. To compare the outcomes of experiments, researchers used numerous measures, including a novel CLIP-based inpainting score, to evaluate the GAN and diffusion-based baselines and prove significant quantitative and qualitative improvements. 

In a rapidly evolving digital landscape, where the boundaries between human creativity and artificial intelligence are constantly blurring, Inst-Inpaint is a testament to AI’s transformative power in image manipulation. It has opened up numerous avenues for using textual instructions to image inpainting and once again brings AI closer to the human brain.

Check out the Paper, Project, and GitHub. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 28k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

The post Beyond Photoshop: How Inst-Inpaint is Shaking Up Object Removal with Diffusion Models appeared first on MarkTechPost.

Build ML features at scale with Amazon SageMaker Feature Store using d …

Amazon Redshift is the most popular cloud data warehouse that is used by tens of thousands of customers to analyze exabytes of data every day. Many practitioners are extending these Redshift datasets at scale for machine learning (ML) using Amazon SageMaker, a fully managed ML service, with requirements to develop features offline in a code way or low-code/no-code way, store featured data from Amazon Redshift, and make this happen at scale in a production environment.
In this post, we show you three options to prepare Redshift source data at scale in SageMaker, including loading data from Amazon Redshift, performing feature engineering, and ingesting features into Amazon SageMaker Feature Store:

Option A – Use an AWS Glue interactive session on Amazon SageMaker Studio (in a dev environment) and an AWS Glue job (in a prod environment) with Spark
Option B – Use an Amazon SageMaker Processing job with a Redshift dataset definition, or use SageMaker Feature Processing in SageMaker Feature Store, which runs SageMaker training jobs
Option C – Use Amazon SageMaker Data Wrangler in a low-code/no-code way

If you’re an AWS Glue user and would like to do the process interactively, consider option A. If you’re familiar with SageMaker and writing Spark code, option B could be your choice. If you want to do the process in a low-code/no-code way, you can follow option C.
Amazon Redshift uses SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes, using AWS-designed hardware and ML to deliver the best price-performance at any scale.
SageMaker Studio is the first fully integrated development environment (IDE) for ML. It provides a single web-based visual interface where you can perform all ML development steps, including preparing data and building, training, and deploying models.
AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, ML, and application development. AWS Glue enables you to seamlessly collect, transform, cleanse, and prepare data for storage in your data lakes and data pipelines using a variety of capabilities, including built-in transforms.
Solution overview
The following diagram illustrates the solution architecture for each option.

Prerequisites
To continue with the examples in this post, you need to create the required AWS resources. To do this, we provide an AWS CloudFormation template to create a stack that contains the resources. When you create the stack, AWS creates a number of resources in your account:

A SageMaker domain, which includes an associated Amazon Elastic File System (Amazon EFS) volume
A list of authorized users and a variety of security, application, policy, and Amazon Virtual Private Cloud (Amazon VPC) configurations
A Redshift cluster
A Redshift secret
An AWS Glue connection for Amazon Redshift
An AWS Lambda function to set up required resources, execution roles and policies

Make sure that you don’t have already two SageMaker Studio domains in the Region where you’re running the CloudFormation template. This is the maximum allowed number of domains in each supported Region.
Deploy the CloudFormation template
Complete the following steps to deploy the CloudFormation template:

Save the CloudFormation template sm-redshift-demo-vpc-cfn-v1.yaml locally.
On the AWS CloudFormation console, choose Create stack.
For Prepare template, select Template is ready.
For Template source, select Upload a template file.
Choose Choose File and navigate to the location on your computer where the CloudFormation template was downloaded and choose the file.
Enter a stack name, such as Demo-Redshift.
On the Configure stack options page, leave everything as default and choose Next.
On the Review page, select I acknowledge that AWS CloudFormation might create IAM resources with custom names and choose Create stack.

You should see a new CloudFormation stack with the name Demo-Redshift being created. Wait for the status of the stack to be CREATE_COMPLETE (approximately 7 minutes) before moving on. You can navigate to the stack’s Resources tab to check what AWS resources were created.
Launch SageMaker Studio
Complete the following steps to launch your SageMaker Studio domain:

On the SageMaker console, choose Domains in the navigation pane.
Choose the domain you created as part of the CloudFormation stack (SageMakerDemoDomain).
Choose Launch and Studio.

This page can take 1–2 minutes to load when you access SageMaker Studio for the first time, after which you’ll be redirected to a Home tab.
Download the GitHub repository
Complete the following steps to download the GitHub repo:

In the SageMaker notebook, on the File menu, choose New and Terminal.
In the terminal, enter the following command:

git clone https://github.com/aws-samples/amazon-sagemaker-featurestore-redshift-integration.git

You can now see the amazon-sagemaker-featurestore-redshift-integration folder in navigation pane of SageMaker Studio.
Set up batch ingestion with the Spark connector
Complete the following steps to set up batch ingestion:

In SageMaker Studio, open the notebook 1-uploadJar.ipynb under amazon-sagemaker-featurestore-redshift-integration.
If you are prompted to choose a kernel, choose Data Science as the image and Python 3 as the kernel, then choose Select.
For the following notebooks, choose the same image and kernel except the AWS Glue Interactive Sessions notebook (4a).
Run the cells by pressing Shift+Enter in each of the cells.

While the code runs, an asterisk (*) appears between the square brackets. When the code is finished running, the * will be replaced with numbers. This action is also workable for all other notebooks.
Set up the schema and load data to Amazon Redshift
The next step is to set up the schema and load data from Amazon Simple Storage Service (Amazon S3) to Amazon Redshift. To do so, run the notebook 2-loadredshiftdata.ipynb.
Create feature stores in SageMaker Feature Store
To create your feature stores, run the notebook 3-createFeatureStore.ipynb.
Perform feature engineering and ingest features into SageMaker Feature Store
In this section, we present the steps for all three options to perform feature engineering and ingest processed features into SageMaker Feature Store.
Option A: Use SageMaker Studio with a serverless AWS Glue interactive session
Complete the following steps for option A:

In SageMaker Studio, open the notebook 4a-glue-int-session.ipynb.
If you are prompted to choose a kernel, choose SparkAnalytics 2.0 as the image and Glue Python [PySpark and Ray] as the kernel, then choose Select.

The environment preparation process may take some time to complete.

Option B: Use a SageMaker Processing job with Spark
In this option, we use a SageMaker Processing job with a Spark script to load the original dataset from Amazon Redshift, perform feature engineering, and ingest the data into SageMaker Feature Store. To do so, open the notebook 4b-processing-rs-to-fs.ipynb in your SageMaker Studio environment.
Here we use RedshiftDatasetDefinition to retrieve the dataset from the Redshift cluster. RedshiftDatasetDefinition is one type of input of the processing job, which provides a simple interface for practitioners to configure Redshift connection-related parameters such as identifier, database, table, query string, and more. You can easily establish your Redshift connection using RedshiftDatasetDefinition without maintaining a connection full time. We also use the SageMaker Feature Store Spark connector library in the processing job to connect to SageMaker Feature Store in a distributed environment. With this Spark connector, you can easily ingest data to the feature group’s online and offline store from a Spark DataFrame. Also, this connector contains the functionality to automatically load feature definitions to help with creating feature groups. Above all, this solution offers you a native Spark way to implement an end-to-end data pipeline from Amazon Redshift to SageMaker. You can perform any feature engineering in a Spark context and ingest final features into SageMaker Feature Store in just one Spark project.
To use the SageMaker Feature Store Spark connector, we extend a pre-built SageMaker Spark container with sagemaker-feature-store-pyspark installed. In the Spark script, use the system executable command to run pip install, install this library in your local environment, and get the local path of the JAR file dependency. In the processing job API, provide this path to the parameter of submit_jars to the node of the Spark cluster that the processing job creates.
In the Spark script for the processing job, we first read the original dataset files from Amazon S3, which temporarily stores the unloaded dataset from Amazon Redshift as a medium. Then we perform feature engineering in a Spark way and use feature_store_pyspark to ingest data into the offline feature store.
For the processing job, we provide a ProcessingInput with a redshift_dataset_definition. Here we build a structure according to the interface, providing Redshift connection-related configurations. You can use query_string to filter your dataset by SQL and unload it to Amazon S3. See the following code:

rdd_input = ProcessingInput(
input_name=”redshift_dataset_definition”,
app_managed=True,
dataset_definition=DatasetDefinition(
local_path=”/opt/ml/processing/input/rdd”,
data_distribution_type=”FullyReplicated”,
input_mode=”File”,
redshift_dataset_definition=RedshiftDatasetDefinition(
cluster_id=_cluster_id,
database=_dbname,
db_user=_username,
query_string=_query_string,
cluster_role_arn=_redshift_role_arn,
output_s3_uri=_s3_rdd_output,
output_format=”PARQUET”
),
),
)

You need to wait 6–7 minutes for each processing job including USER, PLACE, and RATING datasets.
For more details about SageMaker Processing jobs, refer to Process data.
For SageMaker native solutions for feature processing from Amazon Redshift, you can also use Feature Processing in SageMaker Feature Store, which is for underlying infrastructure including provisioning the compute environments and creating and maintaining SageMaker pipelines to load and ingest data. You can only focus on your feature processor definitions that include transformation functions, the source of Amazon Redshift, and the sink of SageMaker Feature Store. The scheduling, job management, and other workloads in production are managed by SageMaker. Feature Processor pipelines are SageMaker pipelines, so the standard monitoring mechanisms and integrations are available.
Option C: Use SageMaker Data Wrangler
SageMaker Data Wrangler allows you to import data from various data sources including Amazon Redshift for a low-code/no-code way to prepare, transform, and featurize your data. After you finish data preparation, you can use SageMaker Data Wrangler to export features to SageMaker Feature Store.
There are some AWS Identity and Access Management (IAM) settings that allow SageMaker Data Wrangler to connect to Amazon Redshift. First, create an IAM role (for example, redshift-s3-dw-connect) that includes an Amazon S3 access policy. For this post, we attached the AmazonS3FullAccess policy to the IAM role. If you have restrictions of accessing a specified S3 bucket, you can define it in the Amazon S3 access policy. We attached the IAM role to the Redshift cluster that we created earlier. Next, create a policy for SageMaker to access Amazon Redshift by getting its cluster credentials, and attach the policy to the SageMaker IAM role. The policy looks like the following code:

{
“Version”: “2012-10-17”,
“Statement”: [
{
“Action”: “redshift:getclustercredentials”,
“Effect”: “Allow”,
“Resource”: [
“*”
]
}
]
}

After this setup, SageMaker Data Wrangler allows you to query Amazon Redshift and output the results into an S3 bucket. For instructions to connect to a Redshift cluster and query and import data from Amazon Redshift to SageMaker Data Wrangler, refer to Import data from Amazon Redshift.

SageMaker Data Wrangler offers a selection of over 300 pre-built data transformations for common use cases such as deleting duplicate rows, imputing missing data, one-hot encoding, and handling time series data. You can also add custom transformations in pandas or PySpark. In our example, we applied some transformations such as drop column, data type enforcement, and ordinal encoding to the data.

When your data flow is complete, you can export it to SageMaker Feature Store. At this point, you need to create a feature group: give the feature group a name, select both online and offline storage, provide the name of a S3 bucket to use for the offline store, and provide a role that has SageMaker Feature Store access. Finally, you can create a job, which creates a SageMaker Processing job that runs the SageMaker Data Wrangler flow to ingest features from the Redshift data source to your feature group.

Here is one end-to-end data flow in the scenario of PLACE feature engineering.
Use SageMaker Feature Store for model training and prediction
To use SageMaker Feature store for model training and prediction, open the notebook 5-classification-using-feature-groups.ipynb.
After the Redshift data is transformed into features and ingested into SageMaker Feature Store, the features are available for search and discovery across teams of data scientists responsible for many independent ML models and use cases. These teams can use the features for modeling without having to rebuild or rerun feature engineering pipelines. Feature groups are managed and scaled independently, and can be reused and joined together regardless of the upstream data source.
The next step is to build ML models using features selected from one or multiple feature groups. You decide which feature groups to use for your models. There are two options to create an ML dataset from feature groups, both utilizing the SageMaker Python SDK:

Use the SageMaker Feature Store DatasetBuilder API – The SageMaker Feature Store DatasetBuilder API allows data scientists create ML datasets from one or more feature groups in the offline store. You can use the API to create a dataset from a single or multiple feature groups, and output it as a CSV file or a pandas DataFrame. See the following example code:

from sagemaker.feature_store.dataset_builder import DatasetBuilder

fact_rating_dataset = DatasetBuilder(
sagemaker_session = sagemaker_session,
base = fact_rating_feature_group,
output_path = f”s3://{s3_bucket_name}/{prefix}”,
record_identifier_feature_name = ‘ratingid’,
event_time_identifier_feature_name = ‘timestamp’,
).to_dataframe()[0]

Run SQL queries using the athena_query function in the FeatureGroup API – Another option is to use the auto-built AWS Glue Data Catalog for the FeatureGroup API. The FeatureGroup API includes an Athena_query function that creates an AthenaQuery instance to run user-defined SQL query strings. Then you run the Athena query and organize the query result into a pandas DataFrame. This option allows you to specify more complicated SQL queries to extract information from a feature group. See the following example code:

dim_user_query = dim_user_feature_group.athena_query()
dim_user_table = dim_user_query.table_name

dim_user_query_string = (
‘SELECT * FROM “‘
+ dim_user_table
+ ‘”‘
)

dim_user_query.run(
query_string = dim_user_query_string,
output_location = f”s3://{s3_bucket_name}/{prefix}”,
)

dim_user_query.wait()
dim_user_dataset = dim_user_query.as_dataframe()

Next, we can merge the queried data from different feature groups into our final dataset for model training and testing. For this post, we use batch transform for model inference. Batch transform allows you to get model inferene on a bulk of data in Amazon S3, and its inference result is stored in Amazon S3 as well. For details on model training and inference, refer to the notebook 5-classification-using-feature-groups.ipynb.
Run a join query on prediction results in Amazon Redshift
Lastly, we query the inference result and join it with original user profiles in Amazon Redshift. To do this, we use Amazon Redshift Spectrum to join batch prediction results in Amazon S3 with the original Redshift data. For details, refer to the notebook run 6-read-results-in-redshift.ipynb.
Clean up
In this section, we provide the steps to clean up the resources created as part of this post to avoid ongoing charges.
Shut down SageMaker Apps
Complete the following steps to shut down your resources:

In SageMaker Studio, on the File menu, choose Shut Down.
In the Shutdown confirmation dialog, choose Shutdown All to proceed.

After you get the “Server stopped” message, you can close this tab.

Delete the apps
Complete the following steps to delete your apps:

On the SageMaker console, in the navigation pane, choose Domains.
On the Domains page, choose SageMakerDemoDomain.
On the domain details page, under User profiles, choose the user sagemakerdemouser.
In the Apps section, in the Action column, choose Delete app for any active apps.
Ensure that the Status column says Deleted for all the apps.

Delete the EFS storage volume associated with your SageMaker domain
Locate your EFS volume on the SageMaker console and delete it. For instructions, refer to Manage Your Amazon EFS Storage Volume in SageMaker Studio.
Delete default S3 buckets for SageMaker
Delete the default S3 buckets (sagemaker-<region-code>-<acct-id>) for SageMaker If you are not using SageMaker in that Region.
Delete the CloudFormation stack
Delete the CloudFormation stack in your AWS account so as to clean up all related resources.
Conclusion
In this post, we demonstrated an end-to-end data and ML flow from a Redshift data warehouse to SageMaker. You can easily use AWS native integration of purpose-built engines to go through the data journey seamlessly. Check out the AWS Blog for more practices about building ML features from a modern data warehouse.

About the Authors
Akhilesh Dube, a Senior Analytics Solutions Architect at AWS, possesses more than two decades of expertise in working with databases and analytics products. His primary role involves collaborating with enterprise clients to design robust data analytics solutions while offering comprehensive technical guidance on a wide range of AWS Analytics and AI/ML services.
Ren Guo is a Senior Data Specialist Solutions Architect in the domains of generative AI, analytics, and traditional AI/ML at AWS, Greater China Region.
Sherry Ding is a Senior AI/ML Specialist Solutions Architect. She has extensive experience in machine learning with a PhD degree in Computer Science. She mainly works with Public Sector customers on various AI/ML-related business challenges, helping them accelerate their machine learning journey on the AWS Cloud. When not helping customers, she enjoys outdoor activities.
Mark Roy is a Principal Machine Learning Architect for AWS, helping customers design and build AI/ML solutions. Mark’s work covers a wide range of ML use cases, with a primary interest in computer vision, deep learning, and scaling ML across the enterprise. He has helped companies in many industries, including insurance, financial services, media and entertainment, healthcare, utilities, and manufacturing. Mark holds six AWS Certifications, including the ML Specialty Certification. Prior to joining AWS, Mark was an architect, developer, and technology leader for over 25 years, including 19 years in financial services.

Top AI Image-to-Video Generators 2023

Genmo 

Genmo is an artificial intelligence-driven video generator that takes text beyond the two dimensions of a page. Algorithms from natural language processing, picture recognition, and machine learning are used to adapt written information into visual form. It can turn text, pictures, symbols, and emoji into moving images. Background colors, characters, music, and other elements are just some of how the videos can be personalized. The movie will include the text and any accompanying images that you provide. The videos can be shared on many online channels like YouTube, Facebook, and Twitter. Videos made by Genmo’s AI can be used for advertising, instruction, explanation, and more. It’s a fantastic resource for companies, groups, and people who must rapidly and cheaply make interesting movies.

D-ID

D-ID is a video-making platform powered by artificial intelligence that makes producing professional-quality videos from text simple and quick. Using Stable Diffusion and GPT-3, the company’s Creative RealityTM Studio can effortlessly create videos in over a hundred languages. D-ID’s Live Portrait function makes short films out of still images, and the Speaking Portrait function gives a speech to written or spoken text. Its API has been refined with the help of tens of thousands of videos, allowing it to generate high-quality visuals. Digiday, SXSW, and TechCrunch have all recognized D-ID for their ability to help users create high-quality videos at a fraction of the expense of traditional approaches.

LeiaPix Converter

The LeiaPix Converter is a web-based, no-cost service that changes regular photographs into 3D Lightfield photographs. It employs AI to turn your images into lifelike, immersive 3D environments. Select the desired output format and upload your picture to LeiaPix Converter. The converted file can be exported in several forms, including the Leia Image Format, Side-by-Side 3D, Depth Map, and Lightfield Animation. The LeiaPix Converter’s output is great quality and straightforward to use. It’s a fantastic way to give your pictures a new feel and make unique visual compositions. It does a 3D Lightfield conversion from a 2D image. Leia Image Format, Side-by-Side 3D, Depth Map, and Lightfield Animation are only a few of the supported export formats that bring about excellent outcomes. Depending on the size of the image, the conversion procedure could take a while. The quality of your original photograph will affect the final conversion outcomes. Because the LeiaPix Converter is currently in beta, it may include problems or have functionality restrictions.

InstaVerse 

A new open-source framework called instaVerse makes building your dynamic 3D environments easy. The background can be generated in response to AI cues, and players can then create their avatars to explore it. The first step in making a world in InstaVerse is picking a premade layout. Forests, cities, and even spaceships are just some of the many premade options available. After selecting a starter document, an AI assistant will guide you through the customization process. A forest with towering trees and a flowing river are just one of the many landscapes instaVerse may create at your command. Characters can also be generated in your universe. Humans, animals, and even robots are all included in the instaVerse cast of characters. Once a character has been created, you can use the keyboard or mouse to direct its actions. While InstaVerse is still in its early stages, it shows great promise as a robust platform for developing interactive 3D content. It’s simple to pick up and use and lets you make your special universes.

Sketch 

Sketch is a web app for turning sketches into GIF animations. It’s a fun and easy method to make unique stickers and illustrations to share on social media or use in other projects. Using Sketch is as easy as posting your drawing online. Then, you may utilize the drawing tools to give your work some life with some animation. Objects can be repositioned, recolored, and given custom sound effects. You can save your finished animation as a GIF after you’re satisfied. Sketch is a fantastic program for both young and old. It’s a terrific opportunity to show off your imagination and get a feel for the basics of animation simultaneously. In terms of ease of use, Sketch is excellent. Sketch makes it easy to create beautiful animations, even if you have no prior experience with the medium. With Sketch’s many tools, you can design elaborate and intricate animations. You can save your finished animation as a GIF after you’re satisfied. After that, your animation is ready for sharing or further use.

NeROIC 

NeROIC can reconstruct 3D models from photographs as an element of AI technology. NeROIC, created by a reputable tech company, has the potential to transform our perceptions and interactions with three-dimensional objects radically. NeROIC can create a 3D model of the user’s intended message using an approved image. The video-to-3D capabilities of NeROIC are comparable to its image-to-3D capability. This means a user can create an interactive 3D setting from a single video. Because of this, creating 3D scenes is faster and easier than ever.

DPT Depth

The discipline of computer science concerned with creating 3D models from 2D photographs is advancing quickly. Deep learning-based techniques may be used to train point clouds and 3D meshes to depict real-world scenes better. A potential method, DPT Depth Estimation, employs a deep convolutional network to read depth data from a picture and generate a point cloud model of the 3D object. DPT Depth Estimation uses monocular photos to input a deep convolutional network pre-trained on data from various scenes and objects. Following data collection, the web will use the information to create a point cloud from which 3D models can be made. When compared to conventional techniques like stereo-matching and photometric stereo, DPT’s performance can surpass a human’s. Because of its fast inference time, DPT is a promising candidate for real-time 3D scene reconstruction.

RODIN 

RODIN is quickly becoming the go-to 2D-to-3D generator in artificial intelligence. The creation of 3D digital avatars is now drastically easier and faster than ever before, thanks to this breakthrough. Creating a convincing 3D character based on a person’s likeness has always been more difficult. RODIN is an artificial intelligence-driven technology that can generate convincing 3D avatars using private data such as a client’s photograph. Customers are immersed in the action by seeing these fabricated avatars in 360-degree views.

Don’t forget to join our 28k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

The post Top AI Image-to-Video Generators 2023 appeared first on MarkTechPost.

Meet Embroid: An AI Method for Stitching Together an LLM with Embeddin …

Imagine you programmed a language model (LM) to perform basic data analysis on the drug and medical histories. You would require labeled data for training your machine-learning model, including data from various patient histories. Building a large labeled dataset is quite difficult. It would require manual labeling with domain experts, which is cost-prohibitive. How would you deal with these models?

Researchers at Stanford University, Anthropic, and the University of Wisconsin-Madison tackle it by designing language models to learn the annotation tasks in context and replace manual labeling at scale. LMs in-context capabilities enable the model to remember tasks from the description of the prompts. They try to modify the prediction of a prompt rather than the prompt itself because language models are sensitive to even small changes in a prompt language and can produce erroneous predictions.

The researcher’s approach is based on the intuition that accurate predictions should also be consistent. Similar samples under some feature representations would receive the same prompt prediction. They propose a method called “Embroid,” which computes multiple representations of a dataset under different embedding functions and uses the consistency between the LM predictions to identify mispredictions. Using these neighborhoods, Embroid then creates additional predictions for each sample. These are further combined with a simple variable graphical model to determine the final corrected prediction.

One trivial question that can be asked is how the Embroid’s performance improvement will change with the change in the dataset size. Researchers say that the Embroid relies on the nearest neighbors in different embedding spaces, so they might expect the performance to be poor when the annotated dataset is small. Researchers also compared the variation in the performance when the domain specificity of the embedding changed, and the quality of the embedding space changed. They find that in both cases, it outperforms the usual Language models.

Researchers say that Embroid also uses statistical techniques developed with weak supervision. Its objective in weak supervision is to generate probabilistic labels for unlabeled data by combining the predictions of multiple noises. They say that it uses embeddings to construct additional synthetic predictions, which will be combined with the original predictions. 

Researchers compare Embroid with six other LMs for up to 95 different tasks. For each LM, they selected three combinations of in-context demonstrations, generated predictions for each prompt, and applied Embroid independently to each prompt’s prediction. They found that this improved the performance over the original prompt by an average of 7.3 points per task on the GPT-JT and 4.9 points per task on GPT-3.5. 

Check out the Paper and Blog. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 28k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

We’re excited to share Embroid: a method for “stitching” together an LLM with embedding information from multiple smaller models (e.g., BERT), allowing us to automatically correct LLM predictions without supervision.: https://t.co/Eu2PKf1T1b: https://t.co/wPeFiVG58y pic.twitter.com/xRkS3V257i— Neel Guha (@NeelGuha) August 14, 2023

The post Meet Embroid: An AI Method for Stitching Together an LLM with Embedding Information from Multiple Smaller Models Allowing to Automatically Correct LLM Predictions without Supervision appeared first on MarkTechPost.

How is AI transforming Personal Knowledge Management?

As AI startups are striving to transform the way businesses organize and access their knowledge base, individuals use plethoric tools that are lagging behind in comparison with B2B tools.

“What’s your Personal Knowledge Management Tool ?” is not a question the general public often hears.

Instead, we answer questions like these:

“What app do you use to keep your notes?”

“What’s your main recommendation for a first camera?”

“How do you keep track of your personal projects?”

These questions are use-case driven, and their answers don’t vary that much. We tend to find the same usual suspects (for notes: Google Keep, Apple Notes, Evernote; for photos: Google Photos, Instagram; for recommendations: your inbox or Whatsapp).

But there’s not one single app/software to rule them all.

And with the advent of AI, there’s probably some room for improvement in the world of Personal Knowledge Management.

What’s Personal Knowledge Management?

Personal Knowledge Management (PKM) is an approach where individuals capture, organize, process, and share their own knowledge. We define knowledge as the fact of knowing something thanks to experiences, information, and association. PKM involves all the strategies and tools used to effectively manage and use this vast amount of information in personal and professional contexts. PKM’s goals are to enhance an individual’s learning, problem-solving, and decision-making by creating a structured and interconnected repository of insights, notes, reflections, and resources. When we talk about PKM, we usually refer to journaling, note-taking, information tagging, and continuous self-directed learning.

How can AI help you build your own Personal Knowledge Management? 

Beyond Search…

In the process of collecting your notes, posts, articles, and messages, machine learning algorithms can definitely help you navigate your knowledge. Several startups are already investigating this segment. Some of them are offering to record everything stored in your physical and digital systems (your laptop and smartphone) to facilitate the search for a piece of information. Built for professional use, these apps are doing an incredible job at collating meeting recaps, emails, or OneNote drafts you’ve saved locally to help you accomplish a task. You get to share Slack messages, notes, or documents that are enriched with your own input.

There’s one flaw though: the absence of interaction with your audiences. And this is where AI comes into play. It has promised us more effective ways to simply go beyond the intricacies of the search experience. It opened the door for more:

Availability: AI can make targeted content available around the clock. What if it could do the same with Personal Knowledge Bases? Instead of waiting for people to share their knowledge when they feel like it, experiences, skills, and stories would be searchable at all times.

Connection: Querying a database has become a common action in the world of web products. What if AI could allow us to query Personal Knowledge Base and get instant answers in a user-friendly way?

Personalization: Artificial General Intelligence is booming, As Personal Knowledge Bases now have the tech to become smart and useful, it’s the time to bring personalization into the hands of individuals themselves.

AI has brought some capabilities that we couldn’t even dream of a few years ago. Why not enter this new realm?

Personal AI to Revolutionize Personal Knowledge Management

Personal AI is a category of AI that provides you with the tools to create, train, and grow your own AI. But, instead of relying on trillion data points, your Personal AI learns from your data: your skills, experiences, and memories. A personal AI is a true extension of your mind online, drawing from your memory bank to give life to your new writing project, exploring new business ideas from your professional background, and connecting you seamlessly with the people that will unlock your creativity.

Personal AIs intend to provide users with the perfect tools to store, retrieve, and create from personal knowledge bases. AI startups are either focusing on storing and retrieving or creating:

Storing and retrieving: Many AI startups in B2C and B2B are facilitating the storage of all the knowledge you have compiled in your notes, emails, and messages. Once stored, you can retrieve them to help you save time when writing your emails, recap notes, or drafting a presentation for your pitch. Why would you need AI here? Simply because streamlining the workflow that categorizes your knowledge in a logical structure and determines the best way to be retrieved is a hard task that would require a lot of manual work. 

Creating: Many Artificial General Intelligence companies — OpenAI, Inflection AI, or Midjourney — are currently dominating the industry. Whether you need help to write a book, craft a new diet, or kickstart a new business, ChatGPT and Pi will offer you the most relevant generic answers to your personalized questions, drawing from trillion data points… This is totally fine for most of your expectations unless you need answers that dive into specific data points that cater to you…

We know many startups are working on solutions to bring the best of both worlds together, which is exciting for the future of Personal AI. This market is as huge as the problem is complicated.

The future of Personal Knowledge Management is already here

The revolution of Personal Knowledge Management with AI is going on. By using the capabilities of a personal AI to transform the Knowledge Management industry, you get to indefinitely store your unique knowledge and turn it into a Second Brain explorable by your audience around the clock. That itself is the best use case for AI: unlocking your own creativity.

The post How is AI transforming Personal Knowledge Management? appeared first on MarkTechPost.

Unlocking efficiency: Harnessing the power of Selective Execution in A …

MLOps is a key discipline that often oversees the path to productionizing machine learning (ML) models. It’s natural to focus on a single model that you want to train and deploy. However, in reality, you’ll likely work with dozens or even hundreds of models, and the process may involve multiple complex steps. Therefore, it’s important to have the infrastructure in place to track, train, deploy, and monitor models with varying complexities at scale. This is where MLOps tooling comes in. MLOps tooling helps you repeatably and reliably build and simplify these processes into a workflow that is tailored for ML.
Amazon SageMaker Pipelines, a feature of Amazon SageMaker, is a purpose-built workflow orchestration service for ML that helps you automate end-to-end ML workflows at scale. It simplifies the development and maintenance of ML models by providing a centralized platform to orchestrate tasks such as data preparation, model training, tuning and validation. SageMaker Pipelines can help you streamline workflow management, accelerate experimentation and retrain models more easily.
In this post, we spotlight an exciting new feature of SageMaker Pipelines known as Selective Execution. This new feature empowers you to selectively run specific portions of your ML workflow, resulting in significant time and compute resource savings by limiting the run to pipeline steps in scope and eliminating the need to run steps out of scope. Furthermore, we explore various use cases where the advantages of utilizing Selective Execution become evident, further solidifying its value proposition.
Solution overview
SageMaker Pipelines continues to innovate its developer experience with the release of Selective Execution. ML builders now have the ability to choose specific steps to run within a pipeline, eliminating the need to rerun the entire pipeline. This feature enables you to rerun specific sections of the pipeline while modifying the runtime parameters associated with the selected steps.
It’s important to note that the selected steps may rely on the results of non-selected steps. In such cases, the outputs of these non-selected steps are reused from a reference run of the current pipeline version. This means that the reference run must have already completed. The default reference run is the latest run of the current pipeline version, but you can also choose to use a different run of the current pipeline version as a reference.
The overall state of the reference run must be Successful, Failed or Stopped. It cannot be Running when Selective Execution attempts to use its outputs. When using Selective Execution, you can choose any number of steps to run, as long as they form a contiguous portion of the pipeline.
The following diagram illustrates the pipeline behavior with a full run.

The following diagram illustrates the pipeline behavior using Selective Execution.

In the following sections, we show how to use Selective Execution for various scenarios, including complex workflows in pipeline Direct Acyclic Graphs (DAGs).
Prerequisites
To start experimenting with Selective Execution, we need to first set up the following components of your SageMaker environment:

SageMaker Python SDK – Ensure that you have an updated SageMaker Python SDK installed in your Python environment. You can run the following command from your notebook or terminal to install or upgrade the SageMaker Python SDK version to 2.162.0 or higher: python3 -m pip install sagemaker>=2.162.0 or pip3 install sagemaker>=2.162.0.
Access to SageMaker Studio (optional) – Amazon SageMaker Studio can be helpful for visualizing pipeline runs and interacting with preexisting pipeline ARNs visually. If you don’t have access to SageMaker Studio or are using on-demand notebooks or other IDEs, you can still follow this post and interact with your pipeline ARNs using the Python SDK.

The sample code for a full end-to-end walkthrough is available in the GitHub repo.
Setup
With the sagemaker>=1.162.0 Python SDK, we introduced the SelectiveExecutionConfig class as part of the sagemaker.workflow.selective_execution_config module. The Selective Execution feature relies on a pipeline ARN that has been previously marked as Succeeded, Failed or Stopped. The following code snippet demonstrates how to import the SelectiveExecutionConfig class, retrieve the reference pipeline ARN, and gather associated pipeline steps and runtime parameters governing the pipeline run:
import boto3
from sagemaker.workflow.pipeline import Pipeline
from sagemaker.workflow.selective_execution_config import SelectiveExecutionConfig

sm_client = boto3.client(‘sagemaker’)
# reference the name of your sample pipeline
pipeline_name = “AbalonePipeline”
# filter for previous success pipeline execution arns
pipeline_executions = [_exec
for _exec in Pipeline(name=pipeline_name).list_executions()[‘PipelineExecutionSummaries’]
if _exec[‘PipelineExecutionStatus’] == “Succeeded”
]
# get the last successful execution
latest_pipeline_arn = pipeline_executions[0][‘PipelineExecutionArn’]
print(latest_pipeline_arn)
>>> arn:aws:sagemaker:us-east-1:123123123123:pipeline/AbalonePipeline/execution/x62pbar3gs6h

# list all steps of your sample pipeline
execution_steps = sm_client.list_pipeline_execution_steps(
PipelineExecutionArn=latest_pipeline_arn
)[‘PipelineExecutionSteps’]
print(execution_steps)
>>>
[{‘StepName’: ‘Abalone-Preprocess’,
‘StartTime’: datetime.datetime(2023, 6, 27, 4, 41, 30, 519000, tzinfo=tzlocal()),
‘EndTime’: datetime.datetime(2023, 6, 27, 4, 41, 30, 986000, tzinfo=tzlocal()),
‘StepStatus’: ‘Succeeded’,
‘AttemptCount’: 0,
‘Metadata’: {‘ProcessingJob’: {‘Arn’: ‘arn:aws:sagemaker:us-east-1:123123123123:processing-job/pipelines-fvsmu7m7ki3q-Abalone-Preprocess-d68CecvHLU’}},
‘SelectiveExecutionResult’: {‘SourcePipelineExecutionArn’: ‘arn:aws:sagemaker:us-east-1:123123123123:pipeline/AbalonePipeline/execution/ksm2mjwut6oz’}},
{‘StepName’: ‘Abalone-Train’,
‘StartTime’: datetime.datetime(2023, 6, 27, 4, 41, 31, 320000, tzinfo=tzlocal()),
‘EndTime’: datetime.datetime(2023, 6, 27, 4, 43, 58, 224000, tzinfo=tzlocal()),
‘StepStatus’: ‘Succeeded’,
‘AttemptCount’: 0,
‘Metadata’: {‘TrainingJob’: {‘Arn’: ‘arn:aws:sagemaker:us-east-1:123123123123:training-job/pipelines-x62pbar3gs6h-Abalone-Train-PKhAc1Q6lx’}}},
{‘StepName’: ‘Abalone-Evaluate’,
‘StartTime’: datetime.datetime(2023, 6, 27, 4, 43, 59, 40000, tzinfo=tzlocal()),
‘EndTime’: datetime.datetime(2023, 6, 27, 4, 57, 43, 76000, tzinfo=tzlocal()),
‘StepStatus’: ‘Succeeded’,
‘AttemptCount’: 0,
‘Metadata’: {‘ProcessingJob’: {‘Arn’: ‘arn:aws:sagemaker:us-east-1:123123123123:processing-job/pipelines-x62pbar3gs6h-Abalone-Evaluate-vmkZDKDwhk’}}},
{‘StepName’: ‘Abalone-MSECheck’,
‘StartTime’: datetime.datetime(2023, 6, 27, 4, 57, 43, 821000, tzinfo=tzlocal()),
‘EndTime’: datetime.datetime(2023, 6, 27, 4, 57, 44, 124000, tzinfo=tzlocal()),
‘StepStatus’: ‘Succeeded’,
‘AttemptCount’: 0,
‘Metadata’: {‘Condition’: {‘Outcome’: ‘True’}}}]

# list all configureable pipeline parameters
# params can be altered during selective execution
parameters = sm_client.list_pipeline_parameters_for_execution(
PipelineExecutionArn=latest_pipeline_arn
)[‘PipelineParameters’]
print(parameters)
>>>
[{‘Name’: ‘XGBNumRounds’, ‘Value’: ‘120’},
{‘Name’: ‘XGBSubSample’, ‘Value’: ‘0.9’},
{‘Name’: ‘XGBGamma’, ‘Value’: ‘2’},
{‘Name’: ‘TrainingInstanceCount’, ‘Value’: ‘1’},
{‘Name’: ‘XGBMinChildWeight’, ‘Value’: ‘4’},
{‘Name’: ‘XGBETA’, ‘Value’: ‘0.25’},
{‘Name’: ‘ApprovalStatus’, ‘Value’: ‘PendingManualApproval’},
{‘Name’: ‘ProcessingInstanceCount’, ‘Value’: ‘1’},
{‘Name’: ‘ProcessingInstanceType’, ‘Value’: ‘ml.t3.medium’},
{‘Name’: ‘MseThreshold’, ‘Value’: ‘6’},
{‘Name’: ‘ModelPath’,
‘Value’: ‘s3://sagemaker-us-east-1-123123123123/Abalone/models/’},
{‘Name’: ‘XGBMaxDepth’, ‘Value’: ’12’},
{‘Name’: ‘TrainingInstanceType’, ‘Value’: ‘ml.c5.xlarge’},
{‘Name’: ‘InputData’,
‘Value’: ‘s3://sagemaker-us-east-1-123123123123/sample-dataset/abalone/abalone.csv’}]
Use cases
In this section, we present a few scenarios where Selective Execution can potentially save time and resources. We use a typical pipeline flow, which includes steps such as data extraction, training, evaluation, model registration and deployment, as a reference to demonstrate the advantages of Selective Execution.
SageMaker Pipelines allows you to define runtime parameters for your pipeline run using pipeline parameters. When a new run is triggered, it typically runs the entire pipeline from start to finish. However, if step caching is enabled, SageMaker Pipelines will attempt to find a previous run of the current pipeline step with the same attribute values. If a match is found, SageMaker Pipelines will use the outputs from the previous run instead of recomputing the step. Note that even with step caching enabled, SageMaker Pipelines will still run the entire workflow to the end by default.
With the release of the Selective Execution feature, you can now rerun an entire pipeline workflow or selectively run a subset of steps using a prior pipeline ARN. This can be done even without step caching enabled. The following use cases illustrate the various ways you can use Selective Execution.
Use case 1: Run a single step
Data scientists often focus on the training stage of a MLOps pipeline and don’t want to worry about the preprocessing or deployment steps. Selective Execution allows data scientists to focus on just the training step and modify training parameters or hyperparameters on the fly to improve the model. This can save time and reduce cost because compute resources are only utilized for running user-selected pipeline steps. See the following code:
# select a reference pipeline arn and subset step to execute
selective_execution_config = SelectiveExecutionConfig(
source_pipeline_execution_arn=”arn:aws:sagemaker:us-east-1:123123123123:pipeline/AbalonePipeline/execution/9e3ljoql7s0n”,
selected_steps=[“Abalone-Train”]
)

# start execution of pipeline subset
select_execution = pipeline.start(
selective_execution_config=selective_execution_config,
parameters={
“XGBNumRounds”: 120,
“XGBSubSample”: 0.9,
“XGBGamma”: 2,
“XGBMinChildWeight”: 4,
“XGBETA”: 0.25,
“XGBMaxDepth”: 12
}
)
The following figures illustrate the pipeline with one step in process and then complete.

Use case 2: Run multiple contiguous pipeline steps
Continuing with the previous use case, a data scientist wants to train a new model and evaluate its performance against a golden test dataset. This evaluation is crucial to ensure that the model meets rigorous guidelines for user acceptance testing (UAT) or production deployment. However, the data scientist doesn’t want to run the entire pipeline workflow or deploy the model. They can use Selective Execution to focus solely on the training and evaluation steps, saving time and resources while still getting the validation results they need:
# select a reference pipeline arn and subset step to execute
selective_execution_config = SelectiveExecutionConfig(
source_pipeline_execution_arn=”arn:aws:sagemaker:us-east-1:123123123123:pipeline/AbalonePipeline/execution/9e3ljoql7s0n”,
selected_steps=[“Abalone-Train”, “Abalone-Evaluate”]
)

# start execution of pipeline subset
select_execution = pipeline.start(
selective_execution_config=selective_execution_config,
parameters={
“ProcessingInstanceType”: “ml.t3.medium”,
“XGBNumRounds”: 120,
“XGBSubSample”: 0.9,
“XGBGamma”: 2,
“XGBMinChildWeight”: 4,
“XGBETA”: 0.25,
“XGBMaxDepth”: 12
}
)
Use case 3: Update and rerun failed pipeline steps
You can use Selective Execution to rerun failed steps within a pipeline or resume the run of a pipeline from a failed step onwards. This can be useful for troubleshooting and debugging failed steps because it allows developers to focus on the specific issues that need to be addressed. This can lead to more efficient problem-solving and faster iteration times. The following example illustrates how you can choose to rerun just the failed step of a pipeline.

# select a previously failed pipeline arn
selective_execution_config = SelectiveExecutionConfig(
source_pipeline_execution_arn=”arn:aws:sagemaker:us-east-1:123123123123:pipeline/AbalonePipeline/execution/fvsmu7m7ki3q”,
selected_steps=[“Abalone-Evaluate”]
)

# start execution of failed pipeline subset
select_execution = pipeline.start(
selective_execution_config=selective_execution_config
)

Alternatively, a data scientist can resume a pipeline from a failed step to the end of the workflow by specifying the failed step and all the steps that follow it in the SelectiveExecutionConfig.
Use case 4: Pipeline coverage
In some pipelines, certain branches are less frequently run than others. For example, there might be a branch that only runs when a specific condition fails. It’s important to test these branches thoroughly to ensure that they work as expected when a failure does occur. By testing these less frequently run branches, developers can verify that their pipeline is robust and that error-handling mechanisms effectively maintain the desired workflow and produce reliable results.

selective_execution_config = SelectiveExecutionConfig(
source_pipeline_execution_arn=”arn:aws:sagemaker:us-east-1:123123123123:pipeline/AbalonePipeline/execution/9e3ljoql7s0n”,
selected_steps=[“Abalone-Train”, “Abalone-Evaluate”, “Abalone-MSECheck”, “Abalone-FailNotify”]
)

Conclusion
In this post, we discussed the Selective Execution feature of SageMaker Pipelines, which empowers you to selectively run specific steps of your ML workflows. This capability leads to significant time and computational resource savings. We provided some sample code in the GitHub repo that demonstrates how to use Selective Execution and presented various scenarios where it can be advantageous for users. If you would like to learn more about Selective Execution, refer to our Developer Guide and API Reference Guide.
To explore the available steps within the SageMaker Pipelines workflow in more detail, refer to Amazon SageMaker Model Building Pipeline and SageMaker Workflows. Additionally, you can find more examples showcasing different use cases and implementation approaches using SageMaker Pipelines in the AWS SageMaker Examples GitHub repository. These resources can further enhance your understanding and help you take advantage of the full potential of SageMaker Pipelines and Selective Execution in your current and future ML projects.

About the Authors
Pranav Murthy is an AI/ML Specialist Solutions Architect at AWS. He focuses on helping customers build, train, deploy and migrate machine learning (ML) workloads to SageMaker. He previously worked in the semiconductor industry developing large computer vision (CV) and natural language processing (NLP) models to improve semiconductor processes. In his free time, he enjoys playing chess and traveling.
Akhil Numarsu is a Sr.Product Manager-Technical focused on helping teams accelerate ML outcomes through efficient tools and services in the cloud. He enjoys playing Table Tennis and is a sports fan.
Nishant Krishnamoorthy is a Sr. Software Development Engineer with Amazon Stores. He holds a masters degree in Computer Science and currently focuses on accelerating ML Adoption in different orgs within Amazon by building and operationalizing ML solutions on SageMaker.

Train self-supervised vision transformers on overhead imagery with Ama …

This is a guest blog post co-written with Ben Veasey, Jeremy Anderson, Jordan Knight, and June Li from Travelers.
Satellite and aerial images provide insight into a wide range of problems, including precision agriculture, insurance risk assessment, urban development, and disaster response. Training machine learning (ML) models to interpret this data, however, is bottlenecked by costly and time-consuming human annotation efforts. One way to overcome this challenge is through self-supervised learning (SSL). By training on large amounts of unlabeled image data, self-supervised models learn image representations that can be transferred to downstream tasks, such as image classification or segmentation. This approach produces image representations that generalize well to unseen data and reduces the amount of labeled data required to build performant downstream models.
In this post, we demonstrate how to train self-supervised vision transformers on overhead imagery using Amazon SageMaker. Travelers collaborated with the Amazon Machine Learning Solutions Lab (now known as the Generative AI Innovation Center) to develop this framework to support and enhance aerial imagery model use cases. Our solution is based on the DINO algorithm and uses the SageMaker distributed data parallel library (SMDDP) to split the data over multiple GPU instances. When pre-training is complete, the DINO image representations can be transferred to a variety of downstream tasks. This initiative led to improved model performances within the Travelers Data & Analytics space.
Overview of solution
The two-step process for pre-training vision transformers and transferring them to supervised downstream tasks is shown in the following diagram.

In the following sections, we provide a walkthrough of the solution using satellite images from the BigEarthNet-S2 dataset. We build on the code provided in the DINO repository.
Prerequisites
Before getting started, you need access to a SageMaker notebook instance and an Amazon Simple Storage Service (Amazon S3) bucket.
Prepare the BigEarthNet-S2 dataset
BigEarthNet-S2 is a benchmark archive that contains 590,325 multispectral images collected by the Sentinel-2 satellite. The images document the land cover, or physical surface features, of ten European countries between June 2017 and May 2018. The types of land cover in each image, such as pastures or forests, are annotated according to 19 labels. The following are a few example RGB images and their labels.

The first step in our workflow is to prepare the BigEarthNet-S2 dataset for DINO training and evaluation. We start by downloading the dataset from the terminal of our SageMaker notebook instance:

wget https://bigearth.net/downloads/BigEarthNet-S2-v1.0.tar.gz
tar -xvf BigEarthNet-S2-v1.0.tar.gz

The dataset has a size of about 109 GB. Each image is stored in its own folder and contains 12 spectral channels. Three bands with 60m spatial resolution (60-meter pixel height/width) are designed to identify aerosols (B01), water vapor (B09), and clouds (B10). Six bands with 20m spatial resolution are used to identify vegetation (B05, B06, B07, B8A) and distinguish between snow, ice, and clouds (B11, B12). Three bands with 10m spatial resolution help capture visible and near-infrared light (B02, B03, B04, B8/B8A). Additionally, each folder contains a JSON file with the image metadata. A detailed description of the data is provided in the BigEarthNet Guide.
To perform statistical analyses of the data and load images during DINO training, we process the individual metadata files into a common geopandas Parquet file. This can be done using the BigEarthNet Common and the BigEarthNet GDF Builder helper packages:

python -m bigearthnet_gdf_builder.builder build-recommended-s2-parquet BigEarthNet-v1.0/

The resulting metadata file contains the recommended image set, which excludes 71,042 images that are fully covered by seasonal snow, clouds, and cloud shadows. It also contains information on the acquisition date, location, land cover, and train, validation, and test split for each image.
We store the BigEarthNet-S2 images and metadata file in an S3 bucket. Because we use true color images during DINO training, we only upload the red (B04), green (B03), and blue (B02) bands:

aws s3 cp final_ben_s2.parquet s3://bigearthnet-s2-dataset/metadata/
aws s3 cp BigEarthNet-v1.0/ s3://bigearthnet-s2-dataset/data_rgb/
–recursive
–exclude “*”
–include “_B02.tif”
–include “_B03.tif”
–include “_B04.tif”

The dataset is approximately 48 GB in size and has the following structure:

bigearthnet-s2-dataset/ Amazon S3 bucket
├── metadata/
│ └── final_ben_s2.parquet
└── dataset_rgb/
├── S2A_MSIL2A_20170613T101031_0_45/
│ └── S2A_MSIL2A_20170613T101031_0_45_B02.tif Blue channel
│ └── S2A_MSIL2A_20170613T101031_0_45_B03.tif Green channel
│ └── S2A_MSIL2A_20170613T101031_0_45_B04.tif Red channel

Train DINO models with SageMaker
Now that our dataset has been uploaded to Amazon S3, we move to train DINO models on BigEarthNet-S2. As shown in the following figure, the DINO algorithm passes different global and local crops of an input image to student and teacher networks. The student network is taught to match the output of the teacher network by minimizing the cross-entropy loss. The student and teacher weights are connected by an exponential moving average (EMA).

We make two modifications to the original DINO code. First, we create a custom PyTorch dataset class to load the BigEarthNet-S2 images. The code was initially written to process ImageNet data and expects images to be stored by class. BigEarthNet-S2, however, is a multi-label dataset where each image resides in its own subfolder. Our dataset class loads each image using the file path stored in the metadata:

import pandas as pd
import rasterio
from PIL import Image
import torch
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms, utils

OPTICAL_MAX_VALUE = 2000

LAND_COVER_LABELS = [
“Urban fabric”,
“Industrial or commercial units”,
“Arable land”,
“Permanent crops”,
“Pastures”,
“Complex cultivation patterns”,
“Land principally occupied by agriculture, with significant areas of natural vegetation”,
“Agro-forestry areas”,
“Broad-leaved forest”,
“Coniferous forest”,
“Mixed forest”,
“Natural grassland and sparsely vegetated areas”,
“Moors, heathland and sclerophyllous vegetation”,
“Transitional woodland, shrub”,
“Beaches, dunes, sands”,
“Inland wetlands”,
“Coastal wetlands”,
“Inland waters”,
“Marine waters”,
]

class BigEarthNetDataset(Dataset):
“””
PyTorch dataset class that loads the BigEarthNet-S2 images from a metadata file.

Args:
metadata_file: path to metadata file
data_dir: directory where BigEarthNet-S2 data is located
split: train, validation, or test split
transform: transformations applied to the input image
“””
def __init__(self, metadata_file, data_dir, split=”train”, transform=None):
# image file paths from metadata
metadata = pd.read_parquet(metadata_file)
self.metadata_split = metadata[metadata[“original_split”] == split]
self.data_dir = data_dir
self.patch_names = self.metadata_split[“name”].tolist()

# one-hot-encode land cover labels
multiclass_labels = self.metadata_split.new_labels.tolist()
self.labels = self.get_multi_onehot_labels(multiclass_labels)

# transforms
self.transform = transform

def __len__(self):
“””Return length of dataset.”””
return len(self.metadata_split)

def __getitem__(self, index):
“””Returns the image and label for a given index.”””
patch_name = self.patch_names[index]
file_path = os.path.join(self.data_dir, patch_name)

# generate RGB image
r_channel = rasterio.open(os.path.join(file_path, patch_name + “_B04.tif”)).read(1)
g_channel = rasterio.open(os.path.join(file_path, patch_name + “_B03.tif”)).read(1)
b_channel = rasterio.open(os.path.join(file_path, patch_name + “_B02.tif”)).read(1)

image = np.stack([r_channel, g_channel, b_channel], axis=2)
image = image / OPTICAL_MAX_VALUE * 255
image = np.clip(image, 0, 225).astype(np.uint8)

# apply image transforms
image = Image.fromarray(image, mode=”RGB”)
if self.transform is not None:
image = self.transform(image)

# load label
label = self.labels[index]

return image, label

def get_multi_onehot_labels(self, multiclass_labels):
“””Convert BEN-19 labels to one-hot encoded vector.”””
targets = torch.zeros([len(multiclass_labels), len(LAND_COVER_LABELS)])
for index, img_labels in enumerate(multiclass_labels):
for label in img_labels:
index_hot = LAND_COVER_LABELS.index(label)
targets[index, index_hot] = 1.
return targets

This dataset class is called in main_dino.py during training. Although the code includes a function to one-hot encode the land cover labels, these labels are not used by the DINO algorithm.
The second change we make to the DINO code is to add support for SMDDP. We add the following code to the init_distributed_mode function in the util.py file:

init_distributed_mode function in the util.py file:

def init_distributed_mode(args):
if json.loads(
os.environ.get(‘SM_FRAMEWORK_PARAMS’, ‘{}’))
.get(‘sagemaker_distributed_dataparallel_enabled’, False)
):
# launch training with SMDDP
dist.init_process_group(backend=’smddp’)
args.word_size = dist.get_world_size()
args.gpu = int(os.environ[‘LOCAL_RANK’])

With these adjustments, we are ready to train DINO models on BigEarthNet-S2 using SageMaker. To train on multiple GPUs or instances, we create a SageMaker PyTorch Estimator that ingests the DINO training script, the image and metadata file paths, and the training hyperparameters:

import time
from sagemaker.pytorch import PyTorch

# output bucket where final model artifacts are uploaded
DINO_OUTPUT_BUCKET = ‘dino-models’

# paths on training instance
sm_metadata_path = ‘/opt/ml/input/data/metadata’
sm_data_path = ‘/opt/ml/input/data/train’
sm_output_path = ‘/opt/ml/output/data’
sm_checkpoint_path = ‘/opt/ml/checkpoints’

# training job name
dino_base_job_name = f’dino-model-{int(time.time())}’

# create SageMaker Estimator
estimator = PyTorch(
base_job_name=dino_base_job_name,
source_dir=’path/to/aerial_featurizer’,
entry_point=’main_dino.py’,
role=role,
framework_version=”1.12″,
py_version=”py38″,
instance_count=1,
instance_type=”ml.p3.16xlarge”,
distribution = {‘smdistributed’:{‘dataparallel’:{‘enabled’: True}}},
volume_size=100,
sagemaker_session=sagemaker_session,
hyperparameters = {
# hyperparameters passed to entry point script
‘arch’: ‘vit_small’,
‘patch_size’: 16,
‘metadata_dir’: sm_metadata_path,
‘data_dir’: sm_data_path,
‘output_dir’: sm_output_path,
‘checkpoint_dir’: sm_checkpoint_path,
‘epochs’: 100,
‘saveckp_freq’: 20,
},
max_run=24*60*60,
checkpoint_local_path = sm_checkpoint_path,
checkpoint_s3_uri =f’s3://{DINO_OUTPUT_BUCKET}/checkpoints/{base_job_name}’,
debugger_hook_config=False,
)

This code specifies that we will train a small vision transformer model (21 million parameters) with a patch size of 16 for 100 epochs. It is best practice to create a new checkpoint_s3_uri for each training job in order to reduce the initial data download time. Because we are using SMDDP, we must train on an ml.p3.16xlarge, ml.p3dn.24xlarge, or ml.p4d.24xlarge instance. This is because SMDDP is only enabled for the largest multi-GPU instances. To train on smaller instance types without SMDDP, you will need to remove the distribution and debugger_hook_config arguments from the estimator.
After we have created the SageMaker PyTorch Estimator, we launch the training job by calling the fit method. We specify the input training data using the Amazon S3 URIs for the BigEarthNet-S2 metadata and images:

# call fit to begin training
estimator.fit(
inputs={
‘metadata’: ‘s3://bigearthnet-s2-dataset/metadata/’,
‘train’: ‘s3://bigearthnet-s2-dataset/data_rgb/’,
},
wait=False
)

SageMaker spins up the instance, copies the training script and dependencies, and begins DINO training. We can monitor the progress of the training job from our Jupyter notebook using the following commands:

# monitor training
training_job_name = estimator.latest_training_job.name
attached_estimator = PyTorch.attach(training_job_name)
attached_estimator.logs()

We can also monitor instance metrics and view log files on the SageMaker console under Training jobs. In the following figures, we plot the GPU utilization and loss function for a DINO model trained on an ml.p3.16xlarge instance with a batch size of 128.

During training, the GPU utilization is 83% of the ml.p3.16xlarge capacity (8 NVIDIA Tesla V100 GPUs) and the VRAM usage is 85%. The loss function steadily decreases with each epoch, indicating that the outputs of the student and teacher networks are becoming more similar. In total, training takes about 11 hours.
Transfer learning to downstream tasks
Our trained DINO model can be transferred to downstream tasks like image classification or segmentation. In this section, we use the pre-trained DINO features to predict the land cover classes for images in the BigEarthNet-S2 dataset. As depicted in the following diagram, we train a multi-label linear classifier on top of frozen DINO features. In this example, the input image is associated with arable land and pasture land covers.

Most of the code for the linear classifier is already in place in the original DINO repository. We make a few adjustments for our specific task. As before, we use the custom BigEarthNet dataset to load images during training and evaluation. The labels for the images are one-hot encoded as 19-dimensional binary vectors. We use the binary cross-entropy for the loss function and compute the average precision to evaluate the performance of the model.
To train the classifier, we create a SageMaker PyTorch Estimator that runs the training script, eval_linear.py. The training hyperparameters include the details of the DINO model architecture and the file path for the model checkpoint:

# output bucket where final model artifacts are uploaded
CLASSIFIER_OUTPUT_BUCKET = ‘land-cover-classification’

# DINO checkpoint name
checkpoint = ‘checkpoint.pth’

# paths on training instance
sm_dino_path = f’/opt/ml/input/data/dino_checkpoint’
sm_dino_checkpoint = f'{sm_dino_path}/{checkpoint}’

# training job name
classifier_base_job_name = f’linear-classifier-{int(time.time())}’

# create Estimator
estimator = PyTorch(
base_job_name=classifier_base_job_name,
source_dir=’path/to/aerial_featurizer’,
entry_point = ‘eval_linear.py’,
role=role,
framework_version=’1.12′,
py_version=’py38′,
instance_count=1,
instance_type=’ml.p3.2xlarge’,
sagemaker_session=sagemaker_session,
hyperparameters = {
# hyperparameters passed to entry point script
‘arch’: ‘vit_small’,
‘pretrained_weights’: sm_dino_checkpoint,
‘epochs’: 50,
‘data_dir’: sm_data_path,
‘metadata_dir’: sm_metadata_path,
‘output_dir’: sm_checkpoint_path,
‘num_labels’: 19,
},
max_run=1*60*60,
checkpoint_local_path = sm_checkpoint_path,
checkpoint_s3_uri =f’s3://{CLASSIFIER_OUTPUT_BUCKET}/checkpoints/{base_job_name}’,
)

We start the training job using the fit method, supplying the Amazon S3 locations of the BigEarthNet-S2 metadata and training images and the DINO model checkpoint:

# call fit to begin training
estimator.fit(
inputs={
‘metadata’: ‘s3://bigearthnet-s2-dataset/metadata/’,
‘dataset’: ‘s3://bigearthnet-s2-dataset/data_rgb/’,
‘dino_checkpoint’: f’s3://bigearthnet-s2-dataset/dino-models/checkpoints/{dino_base_job_name}’,
},
wait=False
)

When training is complete, we can perform inference on the BigEarthNet-S2 test set using SageMaker batch transform or SageMaker Processing. In the following table, we compare the average precision of the linear model on test set images using two different DINO image representations. The first model, ViT-S/16 (ImageNet), is the small vision transformer checkpoint included in the DINO repository that was pre-trained using front-facing images in the ImageNet dataset. The second model, ViT-S/16 (BigEarthNet-S2), is the model we produced by pre-training on overhead imagery.

Model
Average precision

ViT-S/16 (ImageNet)
0.685

ViT-S/16 (BigEarthNet-S2)
0.732

We find that the DINO model pre-trained on BigEarthNet-S2 transfers better to the land cover classification task than the DINO model pre-trained on ImageNet, resulting in a 6.7% increase in the average precision.
Clean up
After completing DINO training and transfer learning, we can clean up our resources to avoid incurring charges. We stop or delete our notebook instance and remove any unwanted data or model artifacts from Amazon S3.
Conclusion
This post demonstrated how to train DINO models on overhead imagery using SageMaker. We used SageMaker PyTorch Estimators and SMDDP in order to generate representations of BigEarthNet-S2 images without the need for explicit labels. We then transferred the DINO features to a downstream image classification task, which involved predicting the land cover class of BigEarthNet-S2 images. For this task, pre-training on satellite imagery yielded a 6.7% increase in average precision relative to pre-training on ImageNet.
You can use this solution as a template for training DINO models on large-scale, unlabeled aerial and satellite imagery datasets. To learn more about DINO and building models on SageMaker, check out the following resources:

Emerging Properties in Self-Supervised Vision Transformers
Use PyTorch with Amazon SageMaker
SageMaker’s Data Parallelism Library

About the Authors
Ben Veasey is a Senior Associate Data Scientist at Travelers, working within the AI & Automation Accelerator team. With a deep understanding of innovative AI technologies, including computer vision, natural language processing, and generative AI, Ben is dedicated to accelerating the adoption of these technologies to optimize business processes and drive efficiency at Travelers.
Jeremy Anderson is a Director & Data Scientist at Travelers on the AI & Automation Accelerator team. He is interested in solving business problems with the latest AI and deep learning techniques including large language models, foundational imagery models, and generative AI. Prior to Travelers, Jeremy earned a PhD in Molecular Biophysics from the Johns Hopkins University and also studied evolutionary biochemistry. Outside of work you can find him running, woodworking, or rewilding his yard.
Jordan Knight is a Senior Data Scientist working for Travelers in the Business Insurance Analytics & Research Department. His passion is for solving challenging real-world computer vision problems and exploring new state-of-the-art methods to do so. He has a particular interest in the social impact of ML models and how we can continue to improve modeling processes to develop ML solutions that are equitable for all. Jordan graduated from MIT with a Master’s in Business Analytics. In his free time you can find him either rock climbing, hiking, or continuing to develop his somewhat rudimentary cooking skills.
June Li is a data scientist at Travelers’s Business Insurance’s Artificial Intelligence team, where she leads and coordinates work in the AI imagery portfolio. She is passionate about implementing innovative AI solutions that bring substantial value to the business partners and stakeholders. Her work has been integral in transforming complex business challenges into opportunities by leveraging cutting-edge AI technologies.
Sourav Bhabesh is a Senior Applied Scientist at the AWS Titan Labs, where he builds Foundational Model (FM) capabilities and features. His specialty is Natural Language Processing (NLP) and is passionate about deep learning. Outside of work he enjoys reading books and traveling.
Laura Kulowski is an Applied Scientist at Amazon’s Generative AI Innovation Center, where she works closely with customers to build generative AI solutions. In her free time, Laura enjoys exploring new places by bike.
Andrew Ang is a Sr. Machine Learning Engineer at AWS. In addition to helping customers build AI/ML solutions, he enjoys water sports, squash and watching travel & food vlogs.
Mehdi Noori is an Applied Science Manager at the Generative AI Innovation Center. With a passion for bridging technology and innovation, he assists AWS customers in unlocking the potential of generative AI, turning potential challenges into opportunities for rapid experimentation and innovation by focusing on scalable, measurable, and impactful uses of advanced AI technologies, and streamlining the path to production.

How Thomson Reuters developed Open Arena, an enterprise-grade large la …

This post is cowritten by Shirsha Ray Chaudhuri, Harpreet Singh Baath, Rashmi B Pawar, and Palvika Bansal from Thomson Reuters.
Thomson Reuters (TR), a global content and technology-driven company, has been using artificial intelligence (AI) and machine learning (ML) in its professional information products for decades. Thomson Reuters Labs, the company’s dedicated innovation team, has been integral to its pioneering work in AI and natural language processing (NLP). A key milestone was the launch of Westlaw Is Natural (WIN) in 1992. This technology was one of the first of its kind, using NLP for more efficient and natural legal research. Fast forward to 2023, and Thomson Reuters continues to define the future of professionals through rapid innovation, creative solutions, and powerful technology.
The introduction of generative AI provides another opportunity for Thomson Reuters to work with customers and once again advance how they do their work, helping professionals draw insights and automate workflows, enabling them to focus their time where it matters most. While Thomson Reuters pushes the boundaries of what generative AI and other technologies could do for the modern professional, how is it using the power of this technology for its own teams?
Thomson Reuters is highly focused on driving awareness and understanding of AI among colleagues in every team and every business area. Starting from foundational principles of what is AI and how does ML work, it’s delivering a rolling program of company-wide AI awareness sessions, including webinars, training materials, and panel discussions. During these sessions, ideas on how AI could be used started to surface as colleagues considered how to use tools that helped them use AI for their day-to-day tasks as well as serve their customers.
In this post, we discuss how Thomson Reuters Labs created Open Arena, Thomson Reuters’s enterprise-wide large language model (LLM) playground that was developed in collaboration with AWS. The original concept came out of an AI/ML Hackathon supported by Simone Zucchet (AWS Solutions Architect) and Tim Precious (AWS Account Manager) and was developed into production using AWS services in under 6 weeks with support from AWS. AWS-managed services such as AWS Lambda, Amazon DynamoDB, and Amazon SageMaker, as well as the pre-built Hugging Face Deep Learning Containers (DLCs), contributed to the pace of innovation. Open Arena has helped unlock company-wide experimentation with generative AI in a safe and controlled environment.
Diving deeper, Open Arena is a web-based playground that allows users to experiment with a growing set of tools enabled with LLMs. This provides non-programmatic access for Thomson Reuters employees who don’t have a background in coding but want to explore the art of the possible with generative AI at TR. Open Arena has been developed to get quick answers from several sets of corpora, such as for customer support agents, solutions to get quick answers from websites, solutions to summarize and verify points in a document, and much more. The capabilities of Open Arena continue to grow as the experiences from employees across Thomson Reuters spur new ideas and as new trends emerge in the field of generative AI. This is all facilitated by the modular serverless AWS architecture that underpins the solution.
Envisioning the Open Arena
Thomson Reuters’s objective was clear: to build a safe, secure, user-friendly platform—an “open arena”—as an enterprise-wide playground. Here, internal teams could not only explore and test the various LLMs developed in-house and those from the open-source community such as with the AWS and Hugging Face partnership, but also discover unique use cases by merging the capabilities of LLMs with Thomson Reuters’s extensive company data. This kind of platform would enhance the ability of teams to generate innovative solutions, improving the products and services that Thomson Reuters could offer its clients.
The envisioned Open Arena platform would serve the diverse teams within Thomson Reuters globally, providing them with a playground to freely interact with LLMs. The ability to have this interaction in a controlled environment would allow teams to uncover new applications and methodologies that might not have been apparent in a less direct engagement with these complex models.
Building the Open Arena
Building the Open Arena was a multi-faceted process. We aimed to harness the capabilities of AWS’s serverless and ML services to craft a solution that would seamlessly enable Thomson Reuters employees to experiment with the latest LLMs. We saw the potential of these services not only to provide scalability and manageability but also to ensure cost-effectiveness.
Solution overview
From creating a robust environment for model deployment and fine-tuning to ensuring meticulous data management and providing a seamless user experience, TR needed each aspect to integrate with several AWS services. Open Arena’s architecture was designed to be comprehensive yet intuitive, balancing complexity with ease of use. The following diagram illustrates this architecture.

SageMaker served as the backbone, facilitating model deployment as SageMaker endpoints and providing a robust environment for fine-tuning the models. We capitalized on the Hugging Face on SageMaker DLC offered by AWS to enhance our deployment process. In addition, we used the SageMaker Hugging Face Inference Toolkit and the Accelerate library to accelerate the inference process and effectively handle the demands of running complex and resource-intensive models. These comprehensive tools were instrumental in ensuring the fast and seamless deployment of our LLMs. Lambda functions, triggered by Amazon API Gateway, managed the APIs, ensuring meticulous preprocessing and postprocessing of the data.
In our quest to deliver a seamless user experience, we adopted a secure API Gateway to connect the front end hosted in Amazon Simple Storage Service (Amazon S3) to the Lambda backend. We deployed the front end as a static site on an S3 bucket, ensuring user authentication with the help of Amazon CloudFront and our company’s single sign-on mechanism.
Open Arena has been designed to integrate seamlessly with multiple LLMs through REST APIs. This ensured that the platform was flexible enough to react and integrate quickly as new state-of-the art-models were developed and released in the fast-paced generative AI space. From its inception, Open Arena was architected to provide a safe and secure enterprise AI/ML playground, so Thomson Reuters employees can experiment with any state-of-the-art LLM as quickly as they are released. Using Hugging Face models on SageMaker allowed the team to fine-tune models in a secure environment because all data is encrypted and doesn’t leave the virtual private cloud (VPC), ensuring that data remains private and confidential.
DynamoDB, our chosen NoSQL database service, efficiently stored and managed a wide variety of data, including user queries, responses, response times, and user data. To streamline the development and deployment process, we employed AWS CodeBuild and AWS CodePipeline for continuous integration and continuous delivery (CI/CD). Monitoring the infrastructure and ensuring its optimal functioning was made possible with Amazon CloudWatch, which provided custom dashboards and comprehensive logging capabilities.
Model development and integration
The heart of Open Arena is its diverse assortment of LLMs, which comprise both open-source and in-house developed models. These models have been fine-tuned to provide responses following specific user prompts.
We have experimented with different LLMs for different use cases in Open Arena, including Flan-T5-XL, Open Assistant, MPT, Falcon, and fine-tuned Flan-T5-XL on available open-source datasets using the parameter efficient fine-tuning technique. We used bitsandbytes integration from Hugging Face to experiment with various quantization techniques. This allowed us to optimize our LLMs for enhanced performance and efficiency, paving the way for even greater innovation. While selecting a model as a backend behind these use cases, we considered different aspects, like what does the performance of these models look like on NLP tasks that are of relevance to Thomson Reuters. Furthermore, we needed to consider engineering aspects, such as the following:

Increased efficiency when building applications with LLMs – Quickly integrating and deploying state-of-the-art LLMs into our applications and workloads that run on AWS, using familiar controls and integrations with the depth and breadth of AWS
Secure customization – Ensuring that all data used to fine-tune LLMs remains encrypted and does not leave the VPC
Flexibility – The ability to choose from a wide selection of AWS native and open-source LLMs to find the right model for our varied use cases

We’ve been asking questions like is the higher cost of larger models justified by significant performance gains? Can these models handle long documents?
The following diagram illustrates our model architecture.

We have been evaluating these models on the preceding aspects on open-source legal datasets and Thomson Reuters internal datasets to assess them for specific use cases.
For content-based use cases (experiences that call for answers from specific corpus), we have a retrieval augmented generation (RAG) pipeline in place, which will fetch the most relevant content against the query. In such pipelines, documents are split into chunks and then embeddings are created and stored in OpenSearch. To get the best match documents or chunks, we use the retrieval/re-ranker approach based on bi-encoder and cross-encoder models. The retrieved best match is then passed as an input to the LLM along with the query to generate the best response.
The integration of Thomson Reuters’s internal content with the LLM experience has been instrumental in enabling users to extract more relevant and insightful results from these models. More importantly, it led to sparking ideas amongst every team for possibilities of adopting AI-enabled solutions in their business workflows.
Open Arena tiles: Facilitating user interaction
Open Arena adopts a user-friendly interface, designed with pre-set enabling tiles for each experience, as shown in the following screenshot. These tiles serve as pre-set interactions that cater to the specific requirements of the users.

For instance, the Experiment with Open Source LLM tile opens a chat-like interaction channel with open-source LLMs.

The Ask your Document tile allows users to upload documents and ask specific questions related to the content from the LLMs. The Experiment with Summarization tile enables users to distil large volumes of text into concise summaries, as shown in the following screenshot.

These tiles simplify the user consumption of AI-enabled work solutions and the navigation process within the platform, igniting creativity and fostering the discovery of innovative use cases.
The impact of the Open Arena
The launch of the Open Arena marked a significant milestone in Thomson Reuters’s journey towards fostering a culture of innovation and collaboration. The platform’s success was undeniable, with its benefits becoming rapidly evident across the company.
The Open Arena’s intuitive, chat-based design required no significant technical knowledge, making it accessible to different teams and different job roles across the globe. This ease of use boosted engagement levels, encouraging more users to explore the platform and unveiling innovative use cases.
In under a month, the Open Arena catered to over 1,000 monthly internal users from TR’s global footprint, averaging an interaction time of 5 minutes per user. With a goal to foster internal TR LLM experimentation and crowdsource creation of LLM use cases, Open Arena’s launch led to an influx of new use cases, effectively harnessing the power of LLMs combined with Thomson Reuters’s vast data resources.
Here’s what some of our users had to say about the Open Arena:

“Open Arena gives employees from all parts of the company a chance to experiment with LLMs in a practical, hands-on way. It’s one thing to read about AI tools, and another to use them yourself. This platform turbo-charges our AI learning efforts across Thomson Reuters.”
– Abby Pinto, Talent Development Solutions Lead, People Function
“OA (Open Arena) has enabled me to experiment with tricky news translation problems for the German Language Service of Reuters that conventional translation software can’t handle, and to do so in a safe environment where I can use our actual stories without fear of data leaks. The team behind OA has been incredibly responsive to suggestions for new features, which is the sort of service you can only dream of with other software.”
– Scot W. Stevenson, Senior Breaking News Correspondent for the German Language Service, Berlin, Germany
“When I used Open Arena, I got the idea to build a similar interface for our teams of customer support agents. This playground helped us reimagine the possibilities with GenAI.”
– Marcel Batista, Gerente de Servicos, Operations Customer Service & Support
“Open Arena powered by AWS serverless services, Amazon SageMaker, and Hugging Face helped us to quickly expose cutting-edge LLMs and generative AI tooling to our colleagues, which helped drive enterprise-wide innovation.”
– Shirsha Ray Chaudhuri, Director, Research Engineering, Thomson Reuters Labs

On a broader scale, the introduction of the Open Arena had a profound impact on the company. It not only increased AI awareness among employees but also stimulated a spirit of innovation and collaboration. The platform brought teams together to explore, experiment, and generate ideas, fostering an environment where groundbreaking concepts could be turned into reality.
Furthermore, the Open Arena has had a positive influence on Thomson Reuters AI services and products. The platform has served as a sandbox for AI, allowing teams to identify and refine AI applications before incorporating them into our offerings. Consequently, this has accelerated the development and enhancement of Thomson Reuters AI services, providing customers with solutions that are ever evolving and at the forefront of technological advancement.
Conclusion
In the fast-paced world of AI, it is crucial to continue advancing, and Thomson Reuters is committed to doing just that. The team behind the Open Arena is constantly working to add more features and enhance the platform’s capabilities, using AWS services like Amazon Bedrock and Amazon SageMaker Jumpstart, ensuring that it remains a valuable resource for our teams. As we move forward, we aim to keep pace with the rapidly evolving landscape of generative AI and LLMs. AWS provides the services needed for TR to keep pace with the constantly evolving generative AI field.
In addition to the ongoing development of the Open Arena platform, we are actively working on productionizing the multitude of use cases generated by the platform. This will allow us to provide our customers with even more advanced and efficient AI solutions, tailored to their specific needs. Furthermore, we will continue to foster a culture of innovation and collaboration, enabling our teams to explore new ideas and applications for AI technology.
As we embark on this exciting journey, we are confident that the Open Arena will play a pivotal role in driving innovation and collaboration across Thomson Reuters. By staying at the forefront of AI advancements, we will ensure that our products and services continue to evolve and meet the ever-changing demands of our customers.

About the Authors
Shirsha Ray Chaudhuri (Director, Research Engineering) heads the ML Engineering team in Bangalore for Thomson Reuters Labs, where she is leading the development and deployment of well-architected solutions in AWS and other cloud platforms for ML projects that drive efficiency and value for AI-driven features in Thomson Reuters products, platforms, and business systems. She works with communities on AI for good, societal impact projects and in the tech for D&I space. She loves to network with people who are using AI and modern tech for building a better world that is more inclusive, more digital, and together a better tomorrow.
Harpreet Singh Baath is a Senior Cloud and DevOps Engineer at Thomson Reuters Labs, where he helps research engineers and scientists develop machine learning solutions on cloud platforms. With over 6 years of experience, Harpreet’s expertise spans across cloud architectures, automation, containerization, enabling DevOps practices, and cost optimization. He is passionate about efficiency and cost-effectiveness, ensuring that cloud resources are utilized optimally.
Rashmi B Pawar is a Machine Learning Engineer at Thomson Reuters. She possesses considerable experience in productionizing models, establishing inference, and creating training pipelines tailored for various machine learning applications. Furthermore, she has significant expertise in incorporating machine learning workflows into existing systems and products.
Palvika Bansal is an Associate Applied Research Scientist at Thomson Reuters. She has worked on projects across diverse sectors to solve business problems for customers using AI/ML. She is highly passionate about her work and enthusiastic about taking on new challenges. Outside of work, she enjoys traveling, cooking, and reading.
Simone Zucchet is a Senior Solutions Architect at AWS. With close to a decade’s experience as a Cloud Architect, Simone enjoys working on innovative projects that help transform the way organizations approach business problems. He helps support large enterprise customers at AWS and is part of the Machine Learning TFC. Outside of his professional life, he enjoys working on cars and photography.
Heiko Hotz is a Senior Solutions Architect for AI & Machine Learning with a special focus on natural language processing, large language models, and generative AI. Prior to this role, he was the Head of Data Science for Amazon’s EU Customer Service. Heiko helps our customers be successful in their AI/ML journey on AWS and has worked with organizations in many industries, including insurance, financial services, media and entertainment, healthcare, utilities, and manufacturing. In his spare time, Heiko travels as much as possible.
João Moura is an AI/ML Specialist Solutions Architect at AWS, based in Spain. He helps customers with deep learning model training and inference optimization, and more broadly building large-scale ML platforms on AWS. He is also an active proponent of ML-specialized hardware and low-code ML solutions.
Georgios Schinas is a Specialist Solutions Architect for AI/ML in the EMEA region. He is based in London and works closely with customers in the UK and Ireland. Georgios helps customers design and deploy machine learning applications in production on AWS, with a particular interest in MLOps practices and enabling customers to perform machine learning at scale. In his spare time, he enjoys traveling, cooking, and spending time with friends and family.

Best AI Tools For Music Production 2023

From the invention of new music to the design of album (or magazine) covers AI has already begun having a profound impact on the development and promotion of artists’ works. The advent of new AI production tools can greatly assist solo musicians, opening up new avenues of exploration and cutting down on production time. AI music technologies can generate new music through meta-analysis and recognize the patterns of track compositions by tapping into multiple neural networks. When given enough data, these technologies can learn to emulate individual practices in music and assist creators in creating original works.

Here are some of the best AI-based tools for music production:

MAGENTA STUDIO (V1.0)

Google has released a free AI music creator called Magenta Studio. It’s a collection of music production utilities that works on Windows and Mac computers and as a plugin for Ableton Live. With this toolkit, you can use Magenta’s neural network to turn a simple melody or riff into a full-fledged instrumental piece in whatever genre you like. The website’s layout is aesthetically pleasing and straightforward compared to competing designs. After downloading, you’ll choose between five programs: Continue is a music transformative tool that adds new sounds to an existing MIDI file. Drumify generates drum fills by assuming a melody or bass line. Generate is like a random-number-generator for music, thanks to its training on millions of songs. Groove examines a drum recording and adjusts the rhythm using Magenta, creating a more natural, ‘human’ sound. Last, there’s Interpolate, a composition that creates a sound to join two MIDI melody tracks together.

WavTool 

WavTool is an AI-powered program that gives music producers a free, web-based environment to work. The tool aims to provide high-quality music production with features including side-chain compression, sophisticated synthesis, and adaptable signal routing. WavTool also has Conductor, a function that offers simple English instructions for novices and provides assistance in the form of chord suggestions, beat creation, and melody generation. Since the AI can grasp concepts and offer suggestions, creating music is now effortless. WavTool evolves alongside its users, adding capabilities like plugin modification and signal routing to provide a full suite of tools for creating music. WavTool is a comprehensive program that allows its users to record, create, produce, mix, master, and export audio without downloading, installing, or updating any additional software. It provides users with a straightforward interface, requires no setup or waiting, and includes everything they need to start making music immediately. Everyone has free, unlimited access to the tool, and there are social media options for getting help and sharing ideas.

BOOMY

Boomy is an artificial intelligence-powered generative music platform where users can make their songs and share them via streaming services. The service ’empowers users by giving them a platform to gain knowledge and teach others and the opportunity to profit from their musical compositions once they have been shared with the world. The service utilizes complex AI algorithms to produce and modify music in various formats. Preset genres include Electronic Dance, Rap Beats, Lo-Fi, and Global Grove for users to select from. The digital audio workstation may adapt to the user’s tastes and provide unique sounds. The “seamless integration with a popular streaming platform” means that even amateur musicians can gain exposure to a global audience and perhaps monetize their work using the distribution option. Using a ‘bottom-up’ method to train the system to produce organic compositions from scratch, the AI employed for Boomy works remarkably well in avoiding the key hurdle of legality by not being taught copyrighted music.

AIVA 

AIVA is an artificial intelligence–driven music composer that can create custom scores from scratch. It’s made for people who like to think beyond the box, whether seasoned musicians or just starting in the video game industry. By harnessing the power of AI-generated music, AIVA shortens the time it takes for songwriters to develop engaging themes for their projects. The program provides songwriters with various musical possibilities, including modern cinematic, electronic, pop, ambient, rock, fantasy, jazz, sea shanty, 20th-century cinematic, tango, and Chinese-influenced composition. AIVA offers three different pricing tiers, allowing users to choose the best plan based on whether they are a person, a school, or a business. Users on the Pro plan are allotted 300 monthly downloads and can listen to tracks up to 5 minutes in length. With a Pro subscription, authors have unrestricted rights to distribute and profit from their works however they see fit. In sum, AIVA provides a clever and time-saving answer to the problem of original, personalized composition for content creators and composers.

Orb Producer 3

Producing high-quality musical patterns and loops has never been easier than with Orb Producer 3, a set of plugins powered by artificial intelligence. Orb Melody, Orb Bass, Orb Arpeggio, and Orb Synth are the four plugins in the suite. The Orb tune plugin gives you access to many melodic possibilities and a wealth of controls and customization options to help you find the ideal tune for your track. The Orb Bass module considers the song’s overall harmony and recommends the most effective bass lines. The Orb Arpeggio module is a powerful tool for creating unique arpeggios with various customization options. Orb Synth has tremendous reverb, delay, and a crunching drive, as well as two oscillators, two LFOs, an amplifier, and an envelope. The Polyrhythms engine allows you to make intricate rhythms, and Lyrical Melodies lets you add an anacrusis or pick-up note to a bar. Chaining blocks enable you to make one long musical composition and Keep Bass for Chords ensures the chord’s bass note stays at the bottom. Dozens of presets, advanced AI features, and a selection of tonalities and keys are also included in the suite. It’s compatible with a wide range of DAWs and comes with a 30-day money-back guarantee.

Soundful 

Soundful is an artificial intelligence (AI) music generation platform where producers and musicians may make as many songs as they like and get paid for them. It offers a novel and completely free method for creating music tracks at the push of a button. The platform uses machine learning to automatically create high-quality music in various styles, including EDM, Hip Hop, Latin, Pop, R&B, and Reggaeton. Soundful also provides a library of premade loops, samples, and song structures for users to use as inspiration when making their music. It lets people make money from their music on services like Apple Music, YouTube, TikTok, Spotify, and Amazon Music. Users can also generate unique licenses for their music and download stems. Soundful’s pricing is flexible, with a free forever plan and a premium subscription that unlocks extra capabilities for individual and business use.

Loudly 

Loudly’s AI Music Generator is a program that uses AI technology to let users make their music. The AI system can produce a new song in seconds, with the user only having to choose the genre and the desired length. This application’s stated goal is to facilitate and improve creative processes by automating music creation so that users may give more attention to other areas of video production or content development. In addition to the music generator, Loudly’s music library has several songs that may be licensed for one-time use in various media with a single purchase. The site has a straightforward search bar and easy navigation options to assist visitors in locating the perfect music for their tasks. In addition to the premade playlists, you can now use Loudly’s AI Recommender to get personalized music recommendations. The catalog is viewable on the website or via a mobile app that can be downloaded from the App Store or Google Play. There is a free trial period, but after that, customers must pay a monthly fee to access all functions.

Ecrett Music

Using artificial intelligence, Ecrett Music is a music composition program that streamlines the process of making original music for media producers. It has a simple interface that requires no musical expertise on the user’s part. Ecrett Music offers its consumers over 500,000 unique melodic patterns each month. In addition, users can submit movies and make adjustments to the music’s instrumentation and structure to see if it’s a good fit. There are three pricing tiers on Ecrett Music: free, individual, and enterprise. The personal plan charges once a year and allows subscribers to download an unlimited amount of royalty-free music for use in commercial projects (like YouTube monetization). The annual fee for the business plan includes licensing for the business. If you want to ensure you’re using Ecrett Music properly, check out the Dos and Don’ts section. Use the music for hobbies, advertisements, weddings, paid content, gaming, etc. It is also forbidden to send or distribute music made with ecrett as a music format (even for free) or through a downloaded link, as well as to use the music in any way that could be construed as dangerous, sexual, or hostile. The developers of Ecrett Music are a diverse crew that includes musicians, composers, dancers, designers, and engineers.

AI Studio

HookSounds’ AI Studio is a program that lets anyone make their unique soundtracks for videos. The program’s automated solution can identify the ideal accompaniment to the film with just a few clicks, greatly streamlining the creative process. Subscribers to AI Studio will have access to the cutting edge of musical innovation made possible by AI. Users can send their films to AI Studio, automatically generating an original score. Remember that AI Studio is still in beta and may have bugs or functionality restrictions as a result. Users are urged to offer comments to improve the application. The studio aims to improve the creative process by allowing people to make original music for their videos quickly and easily. Using AI Studio; users can save time and energy when searching for the ideal soundtrack for their videos. Given the tool’s current beta status, however, its restrictions should be considered. In conclusion, video editors who want a quick and easy approach to developing original music tracks for their projects can benefit greatly from using AI Studio by HookSounds.

TuneFlow 

TuneFlow uses AI to provide a sophisticated environment for music creation. It’s made to help people of all skill levels make better music with less hassle. Users of TuneFlow have access to a suite of advanced AI tools for use at all stages of the music production process. Smart Composer provides users with pre-designed melodies and accompaniment tracks to jumpstart their music ideas; Smart Drummer is an AI-powered tool that automatically fills drum clips with preferred beat styles; Ultra-Clean Source Separator separates mixed audio tracks into individual vocabularies; and Voice Clone allows users to select and clone voices or generate their own.TuneFlow also provides top-notch audio transcription, which can convert recordings of vocals or instruments to MIDI notation. Users may quickly and easily create lo-fi hip-hop tracks with the One-Click Lo-Fi plugin. TuneFlow also has an active community of AI musicians who continually develop and share new AI models through the platform’s plugin market.TuneFlow Desktop supports VST, VST3, and AU plugins and provides access to the platform from any device with cloud-syncing capabilities.

Don’t forget to join our 28k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

The post Best AI Tools For Music Production 2023 appeared first on MarkTechPost.