Introducing an image-to-speech Generative AI application using Amazon …

Vision loss comes in various forms. For some, it’s from birth, for others, it’s a slow descent over time which comes with many expiration dates: The day you can’t see pictures, recognize yourself, or loved ones faces or even read your mail. In our previous blogpost Enable the Visually Impaired to Hear Documents using Amazon Textract and Amazon Polly, we showed you our Text to Speech application called “Read for Me”. Accessibility has come a long way, but what about images?
At the 2022 AWS re:Invent conference in Las Vegas, we demonstrated “Describe for Me” at the AWS Builders’ Fair, a website which helps the visually impaired understand images through image caption, facial recognition, and text-to-speech, a technology we refer to as “Image to Speech.” Through the use of multiple AI/ML services, “Describe For Me” generates a caption of an input image and will read it back in a clear, natural-sounding voice in a variety of languages and dialects.
In this blog post we walk you through the Solution Architecture behind “Describe For Me”, and the design considerations of our solution.
Solution Overview
The following Reference Architecture shows the workflow of a user taking a picture with a phone and playing an MP3 of the captioning the image.

The workflow includes the below steps,

AWS Amplify distributes the DescribeForMe web app consisting of HTML, JavaScript, and CSS to end users’ mobile devices.
The Amazon Cognito Identity pool grants temporary access to the Amazon S3 bucket.
The user uploads an image file to the Amazon S3 bucket using AWS SDK through the web app.
The DescribeForMe web app invokes the backend AI services by sending the Amazon S3 object Key in the payload to Amazon API Gateway
Amazon API Gateway instantiates an AWS Step Functions workflow. The state Machine orchestrates the Artificial Intelligence /Machine Learning (AI/ML) services Amazon Rekognition, Amazon SageMaker, Amazon Textract, Amazon Translate, and Amazon Polly  using AWS lambda functions.
The AWS Step Functions workflow creates an audio file as output and stores it in Amazon S3 in MP3 format.
A pre-signed URL with the location of the audio file stored in Amazon S3 is sent back to the user’s browser through Amazon API Gateway. The user’s mobile device plays the audio file using the pre-signed URL.

Solution Walkthrough
In this section, we focus on the design considerations for why we chose

parallel processing within an AWS Step Functions workflow
unified sequence-to-sequence pre-trained machine learning model OFA (One For All) from Hugging Face to Amazon SageMaker for image caption
Amazon Rekognition for facial recognition

For a more detailed overview of why we chose a serverless architecture, synchronous workflow, express step functions workflow, headless architecture and the benefits gained, please read our previous blog post Enable the Visually Impaired to Hear Documents using Amazon Textract and Amazon Polly. 
Parallel Processing
Using parallel processing within the Step Functions workflow reduced compute time up to 48%. Once the user uploads the image to the S3 bucket, Amazon API Gateway instantiates an AWS Step Functions workflow. Then the below three Lambda functions process the image within the Step Functions workflow in parallel.

The first Lambda function called describe_image analyzes the image using the OFA_IMAGE_CAPTION model hosted on a SageMaker real-time endpoint to provide image caption.
The second Lambda function called describe_faces first checks if there are faces using Amazon Rekognition’s Detect Faces API, and if true, it calls the Compare Faces API. The reason for this is Compare Faces will throw an error if there are no faces found in the image. Also, calling Detect Faces first is faster than simply running Compare Faces and handling errors, so for images without faces in them, processing time will be faster.
The third Lambda function called extract_text handles text-to-speech utilizing Amazon Textract, and Amazon Comprehend.

Executing the Lambda functions in succession is suitable, but the faster, more efficient way of doing this is through parallel processing. The following table shows the compute time saved for three sample images.

Image
People
Sequential Time
Parallel Time
Time Savings (%)
Caption

0
1869ms
1702ms
8%
A tabby cat curled up in a fluffy white bed.

1
4277ms
2197ms
48%
A woman in a green blouse and black cardigan smiles at the camera. I recognize one person: Kanbo.

4
6603ms
3904ms
40%
People standing in front of the Amazon Spheres. I recognize 3 people: Kanbo, Jack, and Ayman.

Image Caption
Hugging Face is an open-source community and data science platform that allows users to share, build, train, and deploy machine learning models. After exploring models available in the Hugging Face model hub, we chose to use the OFA model because as described by the authors, it is “a task-agnostic and modality-agnostic framework that supports Task Comprehensiveness”.
OFA is a step towards “One For All”, as it is a unified multimodal pre-trained model that can transfer to a number of downstream tasks effectively. While the OFA model supports many tasks including visual grounding, language understanding, and image generation, we used the OFA model for image captioning in the Describe For Me project to perform the image to text portion of the application. Check out the official repository of OFA (ICML 2022), paper to learn about OFA’s Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework.
To integrate OFA in our application we cloned the repo from Hugging Face and containerized the model to deploy it to a SageMaker endpoint. The notebook in this repo is an excellent guide to deploy the OFA large model in a Jupyter notebook in SageMaker. After containerizing your inference script, the model is ready to be deployed behind a SageMaker endpoint as described in the SageMaker documentation. Once the model is deployed, create an HTTPS endpoint which can be integrated with the “describe_image” lambda function that analyzes the image to create the image caption. We deployed the OFA tiny model because it is a smaller model and can be deployed in a shorter period of time while achieving similar performance.
Examples of image to speech content generated by “Describe For Me“ are shown below:

The aurora borealis, or northern lights, fill the night sky above a silhouette of a house..

A dog sleeps on a red blanket on a hardwood floor, next to an open suitcase filled with toys..

A tabby cat curled up in a fluffy white bed.

Facial recognition
Amazon Rekognition Image provides the DetectFaces operation that looks for key facial features such as eyes, nose, and mouth to detect faces in an input image. In our solution we leverage this functionality to detect any people in the input image. If a person is detected, we then use the CompareFaces operation to compare the face in the input image with the faces that “Describe For Me“ has been trained with and describe the person by name. We chose to use Rekognition for facial detection because of the high accuracy and how simple it was to integrate into our application with the out of the box capabilities.

A group of people posing for a picture in a room. I recognize 4 people: Jack, Kanbo, Alak, and Trac. There was text found in the image as well. It reads: AWS re: Invent

Potential Use Cases
Alternate Text Generation for web images
All images on a web site are required to have an alternative text so that screen readers can speak them to the visually impaired. It’s also good for search engine optimization (SEO). Creating alt captions can be time consuming as a copywriter is tasked with providing them within a design document. The Describe For Me API could automatically generate alt-text for images. It could also be utilized as a browser plugin to automatically add image caption to images missing alt text on any website.
Audio Description for Video
Audio Description provides a narration track for video content to help the visually impaired follow along with movies. As image caption becomes more robust and accurate, a workflow involving the creation of an audio track based upon descriptions for key parts of a scene could be possible. Amazon Rekognition can already detect scene changes, logos, and credit sequences, and celebrity detection. A future version of describe would allow for automating this key feature for films and videos.
Conclusion
In this post, we discussed how to use AWS services, including AI and serverless services, to aid the visually impaired to see images. You can learn more about the Describe For Me project and use it by visiting describeforme.com. Learn more about the unique features of Amazon SageMaker, Amazon Rekognition and the AWS partnership with Hugging Face.
Third Party ML Model Disclaimer for Guidance
This guidance is for informational purposes only. You should still perform your own independent assessment, and take measures to ensure that you comply with your own specific quality control practices and standards, and the local rules, laws, regulations, licenses and terms of use that apply to you, your content, and the third-party Machine Learning model referenced in this guidance. AWS has no control or authority over the third-party Machine Learning model referenced in this guidance, and does not make any representations or warranties that the third-party Machine Learning model is secure, virus-free, operational, or compatible with your production environment and standards. AWS does not make any representations, warranties or guarantees that any information in this guidance will result in a particular outcome or result.

About the Authors
Jack Marchetti is a Senior Solutions architect at AWS focused on helping customers modernize and implement serverless, event-driven architectures. Jack is legally blind and resides in Chicago with his wife Erin and cat Minou. He also is a screenwriter, and director with a primary focus on Christmas movies and horror. View Jack’s filmography at his IMDb page.
Alak Eswaradass is a Senior Solutions Architect at AWS based in Chicago, Illinois. She is passionate about helping customers design cloud architectures utilizing AWS services to solve business challenges. Alak is enthusiastic about using SageMaker to solve a variety of ML use cases for AWS customers. When she’s not working, Alak enjoys spending time with her daughters and exploring the outdoors with her dogs.
Kandyce Bohannon is a Senior Solutions Architect based out of Minneapolis, MN. In this role, Kandyce works as a technical advisor to AWS customers as they modernize technology strategies especially related to data and DevOps to implement best practices in AWS. Additionally, Kandyce is passionate about mentoring future generations of technologists and showcasing women in technology through the AWS She Builds Tech Skills program.
Trac Do is a Solutions Architect at AWS. In his role, Trac works with enterprise customers to support their cloud migrations and application modernization initiatives. He is passionate about learning customers’ challenges and solving them with robust and scalable solutions using AWS services. Trac currently lives in Chicago with his wife and 3 boys. He is a big aviation enthusiast and in the process of completing his Private Pilot License.

Stability AI Unveils Stable Animation SDK: A Powerful Text-To-Animatio …

Stability AI, the world’s leading open-source artificial intelligence company, has released an exciting new tool for artists and developers to take their animations to the next level. The Stable Animation SDK allows users to create stunning animations using the most advanced Stable Diffusion models. With Stable Animation SDK, artists and developers can create animations in various ways, including prompts without images, source images, and source videos. The animation endpoint offered by Stability AI gives artists access to all the Stable Diffusion models, including the highly acclaimed Stable Diffusion 2.0 and Stable Diffusion XL.

Stable Animation SDK offers three ways to create animations. The first is text-to-animation, where users input a text prompt and tweak various parameters to produce an animation. The second method combines text input with an initial image input, where users provide an initial image that acts as the starting point of their animation. Finally, users can provide an initial video to base their animation on and arrive at a final output animation by tweaking various parameters and using a text prompt.

To install the Stable Animation SDK UI, the user needs to visit the installation page on the Stability AI website. After choosing their operating system, they can follow the instructions on the page to download the SDK. Once the SDK is downloaded, the user must set up the environment variables per their system’s requirements and install the necessary dependencies. 

Using the Stable Animation SDK UI is a straightforward process. Once the user has installed the SDK, they can open the interface and select their preferred method for creating animations: text to animation, text input plus initial image input, or input video plus text input. Next, the user can input the animation’s desired parameters and prompts. To further customize their animation, the user can adjust various settings, such as frame rate, output resolution, and length. After finalizing their settings, users can click the “Generate” button to create their animation. The output will be displayed on the interface, and the user can save it as a video file. 

The Stable Animation SDK offers a range of parameters that can be adjusted to create customized animations. These include parameters such as prompt, image or video input, frame rate, output resolution, length, temperature, top P, and top K. By adjusting these parameters; users can create animations tailored to their specific needs and preferences. The Stable Animation SDK offers a wide range of options to ensure that users can easily create stunning and unique animations.

The Stable Animation SDK promises to revolutionize the animation industry, allowing artists and developers to create cutting-edge animations with ease and flexibility.

Check out the Developer Platform and Reference Article. Don’t forget to join our 21k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Stability AI Unveils Stable Animation SDK: A Powerful Text-To-Animation Tool For Developers appeared first on MarkTechPost.

Microsoft AI Releases Guidance: A Next-Gen Language For Prompt Program …

Microsoft recently introduced a groundbreaking language called “Guidance,” revolutionizing the landscape of prompt programming. With this innovative language, developers now have the power to generate natural language responses in various formats, creating simple yet sophisticated rules. 

Similar to established programming languages like Java or Python, “Guidance” enables developers to focus on high-level design patterns and utilize specific structures to describe generation tasks, such as dialogue and JSON. One of the most remarkable features is the ability to generate accurate text proceeds while efficiently encapsulating generation parameters instantly. The language comes equipped with an intuitive editor that simplifies the process of coding rules and defining language model functionality. It allows users to set up steps like value validation and integration with external services, providing high customization.

Furthermore, the comprehensive vocabulary and libraries in the language offer a wide range of possibilities for software engineers, enhancing their creative potential. For instance, the jsonformer module facilitates on-the-fly correction of generated output and ensures proper formatting. Additionally, the integration of NVidia’s Guardrails project empowers developers to create chatbots using template-like prompts. With “Guidance,” developers can now leverage their preferred language models, such as LLAMA and Vicuna, and tailor the direction of generation according to their needs.

The advent of “Guidance” marks a significant milestone in language model programming. By optimizing generation cycles and enabling the creation of elaborate structures with just a few lines of code, users can establish a solid foundation for their language models while working with increased autonomy. Moreover, the language eliminates numerous development bottlenecks, allowing developers to swiftly execute their instructions without grappling with complex hardware issues.

Microsoft’s “Guidance” release represents a significant language model programming breakthrough. It not only showcases the potential of modern machine learning tools but also leads to more efficient and cost-effective solutions. Consequently, this language is poised to have a lasting impact on the prompt programming field, fostering waves of innovation and inspiring novel applications.

Previously, developers faced limitations regarding how much they could program manually or with preprogrammed templates. By harnessing the capabilities of “Guidance,” they can turbocharge their workflow and utilize one-line snippets to create intricate systems with remarkable efficiency.

What sets “Guidance” apart from other programming languages is its emphasis on code comprehension and idea generation. Through the implementation of self-generated statements and artificial intelligence techniques, developers are guided through the process of filling in the gaps in their code, making it easier to understand and generate new ideas. The potential applications of this technology are vast, and it has already been leveraged to create personalized content, automate personal assistant tasks, and construct advanced chatbots for customer interactions.

Microsoft has partnered with several businesses, including NUIX, to facilitate this technology’s seamless and rapid deployment to bring the language to a broader audience that may lack the necessary background or resources to learn programming from scratch.

At its core, “Guidance” eliminates the traditionally laborious aspects of programming, enabling developers to transition smoothly from concept to completion. By streamlining the code-writing process, developers can create more targeted applications and significantly reduce product development time.

Microsoft remains committed to refining this groundbreaking language to enhance productivity and reduce development costs further. Therefore, for those seeking to enhance programming workflows, save time, and boost efficiency, exploring Microsoft’s “Guidance” is highly recommended, as it has the potential to become the ultimate coding companion.

Check out the Github link. Don’t forget to join our 21k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Microsoft AI Releases Guidance: A Next-Gen Language For Prompt Programming appeared first on MarkTechPost.

Researchers From China Propose a Generate-and-Edit Approach that Utili …

Researchers draw inspiration from the process of human programming to help LLMs do better in competitive programming jobs. The competitive programming job has recently been applied to large language models. This work necessitates accurately implementing solutions that can span hundreds of lines and comprehending a sophisticated natural language description of a problem with example test cases. Executing solutions on concealed test cases allows for solution evaluation. However, current LLMs’ accuracy and pass rates could be higher for this purpose. For instance, on the widely used APPS test, a competitive programming benchmark, the virtually most powerful model GPT3 only scores 7% accuracy. 

Programmers often develop an initial program, run a few sample test cases, and then make changes to the code in response to the test findings while resolving competitive programming difficulties. During this step, the programmer may use important information from the test results to troubleshoot the software. They implement this concept by using a comparable workflow with a neural-based editor. The code produced by a pre-trained LLM was examined, and it was discovered that several of the generated codes might be enhanced with small adjustments. 

They see that the error message identifies the coding fault, allowing them to correct the problem rapidly. It encourages us to look into editing methods and enhance the code quality produced by LLMs with the aid of execution outcomes. In this study, researchers from Peking University suggest a unique generate-and-edit approach to improve LLMs at competitive programming tasks. Their method uses the capability of LLMs in three phases to emulate the behavior of the human programmers mentioned above: 

Generation utilizing LLMs. They create the program based on the problem description using huge language models like black box generators.

Execution. They run the created code on the sample test case using LLMs to obtain the execution results. They also offer templates for the execution results as additional comments to include more useful data for modification.

Edit. They create a fault-aware neural code editor that improves the code using the produced code and additional comments as input. Their code editor strives to raise the caliber and precision of LLM-based code production. 

They conduct in-depth research on the APPS and HumanEval public competitive programming benchmarks. To demonstrate the universality, they apply their methodology to 9 well-known LLMs with parameter values ranging from 110M to 175B. Their strategy dramatically raises LLM’s performance. In particular, their method raises the average of pass@1 on APPS-dev and APPS-test by 89% and 31%, respectively. Their tiny editor model can increase pass@1 from 26.6% to 32.4% on the APPS-dev test, even for the biggest language model used, GPT3-175B. They prove the transferability of their method on the out-of-distribution benchmark by improving the average of pass@1 by 48% on a new kind of dataset called HumanEval. Various methods for post-processing programs created by LLMs have recently been presented. 

These methods do extensive LLM sampling, rerank the sampled programs, and produce the final program. Their strategy, in contrast, provides two benefits: Their method keeps the sample budget constant and drastically lowers the computational burden on LLMs. Their editor alters the programs directly and outperforms these reranking-based techniques, particularly with a constrained sample budget like pass@1. They are the first, as far as they are aware, to use an editing-based post-processing technique for programming competitions. 

The following is a list of the contributions: 

• To produce high-quality code for challenging programming jobs, they suggest a generate-and-edit method for huge language models. 

• They create a fault-aware neural code editor that uses error messages and produces code as input to improve the code’s precision and quality. 

• They do trials using two well-known datasets and nine LLMs to show the potency and applicability of their strategy.

Check out the Paper. Don’t forget to join our 21k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Researchers From China Propose a Generate-and-Edit Approach that Utilizes Execution Results of the Generated Code from LLMs to Improve the Code Quality in the Competitive Programming Task appeared first on MarkTechPost.

Announcing the updated Microsoft SharePoint connector (V2.0) for Amazo …

Amazon Kendra is a highly accurate and simple-to-use intelligent search service powered by machine learning (ML). Amazon Kendra offers a suite of data source connectors to simplify the process of ingesting and indexing your content, wherever it resides.
Valuable data in organizations is stored in both structured and unstructured repositories. Amazon Kendra can pull together data across several structured and unstructured knowledge base repositories to index and search on.
One such knowledge base repository is Microsoft SharePoint, and we are excited to announce that we have updated the SharePoint connector for Amazon Kendra to add even more capabilities. In this new version (V2.0), we have added support for SharePoint Subscription Edition and multiple authentication and sync modes to index contents based on new, modified, or deleted contents.
You can now also choose OAuth 2.0 to authenticate with SharePoint Online. Multiple synchronization options are available to update your index when your data source content changes. You can filter the search results based on the user and group information to ensure your search results are only shown based on user access rights.
In this post, we demonstrate how to index content from SharePoint using the Amazon Kendra SharePoint connector V2.0.
Solution overview
You can use Amazon Kendra as a central location to index the content provided by various data sources for intelligent search. In the following sections, we go through the steps to create an index, add the SharePoint connector, and test the solution.
Prerequisites
To get started, you need the following:

A SharePoint (Server or Online) user with owner rights.
An AWS account with privileges to create AWS Identity and Access Management (IAM) roles and policies. For more information, see Overview of access management: Permissions and policies.
Basic knowledge of AWS.

Create an Amazon Kendra Index
To create an Amazon Kendra index, complete the following steps:

On the Amazon Kendra console, choose Create an index.
For Index name, enter a name for the index (for example, my-sharepoint-index).
Enter an optional description.
Choose Create a new role.
For Role name, enter an IAM role name.
­Configure optional encryption settings and tags.
Choose Next.
For Access control settings, choose Yes.
For Token configuration, set Token type to JSON and leave the default values for Username and Groups.
For User-group expansion, leave the defaults.
Choose Next.
For Specify provisioning, select Developer edition, which is suited for building a proof of concept and experimentation, and choose Create.

Add a SharePoint data source to your Amazon Kendra index
One of the advantages of implementing Amazon Kendra is that you can use a set of pre-built connectors for data sources such as Amazon Simple Storage Service (Amazon S3), Amazon Relational Database Service (Amazon RDS), SharePoint Online, and Salesforce.
To add a SharePoint data source to your index, complete the following steps:

On the Amazon Kendra console, navigate to the index that you created.
Choose Data sources in the navigation pane.
Under SharePoint Connector V2.0, choose Add connector.
For Data source name, enter a name (for example, my-sharepoint-data-source).
Enter an optional description.
Choose English (en) for Default language.
Enter optional tags.
Choose Next.

Depending on the hosting option your SharePoint application is using, pick the appropriate hosting method. The required attributes for the connector configuration appear based on the hosting method you choose.

If you select SharePoint Online, complete the following steps:

Enter the URL for your SharePoint Online repository.
Choose your authentication option (these authentication details will be used by the SharePoint connector to integrate with your SharePoint application).
Enter the tenant ID of your SharePoint Online application.
For AWS Secrets Manager secret, pick the secret that has SharePoint Online application credentials or create a new secret and add the connection details (for example, AmazonKendra-SharePoint-my-sharepoint-online-secret).

To learn more about AWS Secrets Manger, refer to Getting started with Secrets Manager.
The SharePoint connector uses the clientId, clientSecret, userName, and password information to authenticate with the SharePoint Online application. These details can be accessed on the App registrations page on the Azure portal, if the SharePoi­nt Online application is already registered.

If you select SharePoint Server, complete the following steps:

Choose your SharePoint version (for example, we use SharePoint 2019 for this post).
Enter the site URL for your SharePoint Server repository.
For SSL certificate location, enter the path to the S3 bucket file where the SharePoint Server SSL certificate is located.
Enter the web proxy host name and the port number details if the SharePoint server requires a proxy connection.

For this post, no web proxy is used because the SharePoint application used for this example is a public-facing application.

Select the authorization option for the Access Control List (ACL) configuration.

These authentication details will be used by the SharePoint connector to integrate with your SharePoint instance.

For AWS Secrets Manager secret, choose the secret that has SharePoint Server credentials or create a new secret and add the connection details (for example, AmazonKendra-my-sharepoint-server-secret).

The SharePoint connector uses the user name and password information to authenticate with the SharePoint Server application. If you use an email ID with domain form IDP as the ACL setting, the LDAP server endpoint, search base, LDAP user name, and LDAP password are also required.
To achieve a granular level of control over the searchable and displayable content, identity crawler functionality is introduced in the SharePoint connector V2.0.

Enable the identity crawler and select Crawl Local Group Mapping and Crawl AD Group Mapping.
For Virtual Private Cloud (VPC), choose the VPC through which the SharePoint application is reachable from your SharePoint connector.

For this post, we choose No VPC because the SharePoint application used for this example is a public-facing application deployed on Amazon Elastic Compute Cloud (Amazon EC2) instances.

Chose Create a new role (Recommended) and provide a role name, such as AmazonKendra-sharepoint-v2.
Choose Next.
Select entities that you would like to include for indexing. You can choose All or specific entities based on your use case. For this post, we choose All.

You can also include or exclude documents by using regular expressions. You can define patterns that Amazon Kendra either uses to exclude certain documents from indexing or include only documents with that pattern. For more information, refer to SharePoint Configuration.

Select your sync mode to update the index when your data source content changes.

You can sync and index all contents in all entities, regardless of the previous sync process by selecting Full sync, or only sync new, modified, or deleted content, or only sync new or modified content. For this post, we select Full sync.

Choose a frequency to run the sync schedule, such as Run on demand.
Choose Next.

In this next step, you can create field mappings to add an extra layer of metadata to your documents. This enables you to improve accuracy through manual tuning, filtering, and faceting.

Review the default field mappings information and choose Next.
As a last step, review the configuration details and choose Add data source to create the SharePoint connector data source for the Amazon Kendra index.

Test the solution
Now you’re ready to prepare and test the Amazon Kendra search features using the SharePoint connector.
For this post, AWS getting started documents are added to the SharePoint data source. The sample dataset used for this post can be downloaded from AWS_Whitepapers.zip. This dataset has PDF documents categorized into multiple directories based on the type of documents (for example, documents related to AWS database options, security, and ML).
Also, sample dataset directories in SharePoint are configured with user email IDs and group details so that only the users and groups with permissions can access specific directories or individual files.
To achieve granular-level control over the search results, the SharePoint connector crawls the local or Active Directory (AD) group mapping in the SharePoint data source in addition to the content when the identity crawler is enabled with the local and AD group mapping options selected. With this capability, Amazon Kendra indexed content is searchable and displayable based on the access control permissions of the users and groups.
To sync our index with SharePoint content, complete the following steps:

On the Amazon Kendra console, navigate to the index you created.
Choose Data sources in the navigation pane and select the SharePoint data source.
Choose Sync now to start the process to index the content from the SharePoint application and wait for the process to complete.

If you encounter any sync issues, refer to Troubleshooting data sources for more information.
When the sync process is successful, the value for Last sync status will be set to Successful – service is operating normally. The content from the SharePoint application is now indexed and ready for queries.

Choose Search indexed content (under Data management) in the navigation pane.
Enter a test query in the search field and press Enter.

A test query such as “What is the durability of S3?” provides the following Amazon Kendra suggested answers. Note that the results for this query are from all the indexed content. This is because there is no context of user name or group information for this query.

To test the access-controlled search, expand Test query with username or groups and choose Apply user name or groups to add a user name (email ID) or group information.

When an Experience Builder app is used, it includes the user context, and therefore you don’t need to add user or group IDs explicitly.

For this post, access to the Databases directory in the SharePoint site is provided to the database-specialists group only.
Enter a new test query and press Enter.

In this example, only the content in the Databases directory is searched and the results are displayed. This is because the database-specialists group only has access to the Databases directory.
Congratulations! You have successfully used Amazon Kendra to surface answers and insights based on the content indexed from your SharePoint application.
Amazon Kendra Experience Builder
You can build and deploy an Amazon Kendra search application without the need for any front-end code. Amazon Kendra Experience Builder helps you build and deploy a fully functional search application in a few clicks so that you can start searching right away.
Refer to Building a search experience with no code for more information.
Clean up
To avoid incurring future costs, clean up the resources you created as part of this solution. If you created a new Amazon Kendra index while testing this solution, delete it if you no longer need it. If you only added a new data source using the Amazon Kendra connector for SharePoint, delete that data source after your solution review is completed.
Refer to Deleting an index and data source for more information.
Conclusion
In this post, we showed how to ingest documents from your SharePoint application into your Amazon Kendra index. We also reviewed some of the new features that are introduced in the new version of the SharePoint connector.
To learn more about the Amazon Kendra connector for SharePoint, refer to Microsoft SharePoint connector V2.0.
Finally, don’t forget to check out the other blog posts about Amazon Kendra!

About the Author
Udaya Jaladi is a Solutions Architect at Amazon Web Services (AWS), specializing in assisting Independent Software Vendor (ISV) customers. With expertise in cloud strategies, AI/ML technologies, and operations, Udaya serves as a trusted advisor to executives and engineers, offering personalized guidance on maximizing the cloud’s potential and driving innovative product development. Leveraging his background as an Enterprise Architect (EA) across diverse business domains, Udaya excels in architecting scalable cloud solutions tailored to meet the specific needs of ISV customers.

Top Identity Verification Platforms (2023)

In 2023, the digital transformation of the global economy will have completely altered how people purchase, use financial services, interact with the government, and even enjoy entertainment. The benefits of a digital economy should be considered since doing business online can provide its own set of dangers and difficulties. The best method to protect your company’s data from cybercriminals is to implement an efficient identity verification system.

Identity verification tools are crucial in the modern online environment. Most people now do everything from banking to learning to grocery shopping online. The benefits and drawbacks of technology’s presence in our lives are continually being explored and mastered. Identity verification software is a safe and private way to confirm someone’s identity online. This program protects the user’s anonymity while preventing fraudulent activity in online financial dealings.

Rules and regulations are evolving with this new environment. Businesses may comply with ever-evolving laws and standards with identity verification systems. The most effective identity verification programs guarantee the safety of financial transactions while keeping businesses up to date. 

Here are some top Identity verification platforms

ComplyCube

ComplyCube is an automated identity verification tool that helps organizations with AML and KYC regulations. It has functions that let companies confirm a person’s identity via official papers, biometrics, and public records. SaaS helps companies accomplish worldwide AML/CTF compliance, convert more consumers, and prevent fraud by combining trustworthy data sources with skilled human reviewers. ComplyCube’s outreach feature allows firms to perform several customer verification procedures in a single session. The system calculates what information is needed and then provides a client with a secure link to click on to initiate a Know Your Customer (KYC) session. AML screenings, PEP checks, address verification, ID document authentication, and ID verification are some available checks.

ID.me

ID.me streamlines the process of establishing and sharing digital identification. Over 98 million people are part of the ID.me secure digital identification network, which also includes relationships with 30 states, 10 federal agencies, and over 500 well-known shops. The startup helps businesses of all kinds confirm their employees’ identities and memberships in certain communities. The Kantara Initiative has recognized the firm as a NIST 800-63-3 IAL2 / AAL2 conformant credential service provider, proving that the company’s solution satisfies federal requirements for consumer authentication. ID.me has received a FedRAMP Moderate Authorization to Operate (ATO) for its Identity Gateway. ID.me’s “No Identity Left Behind” mission and integrated video chat make it the only supplier, ensuring everyone can establish and maintain a safe online persona.

Trustmatic 

Trustmatic’s goal is to streamline client confidence in enterprises. Trustmatic streamlines verification processes while keeping private information safe, assisting businesses in meeting requirements for local and international Know Your Customer (KYC) regulations. The service has a 98% first-time success rate in its onboarding procedure. Trustmatic is compatible with 138 languages and various identification papers from 248 nations and territories. The firm uses the PAD algorithm, validated at the iBeta Level 2 standard, for passive liveness detection.

Middesk 

Middesk is a business intelligence (BI) platform serving the onboarding needs of companies in the financial technology (fintech), consumer credit reporting, and business marketplaces industries. Middesk states that it provides engineering resources to compliance teams that would not have them otherwise. Middesk is not just an underwriting and verification platform for businesses but also an identity platform. The company’s platform also alerts service providers to any changes in its customer base, giving them a complete view of the company’s clientele so they can better deliver the essential items those clients need to launch, run, and manage their enterprises.

Pipl 

Pipl is an identity trust firm that offers identity resolution by mining the extensive web of associations between various pieces of personal data. Pipl is a global database that examines the connections between multiple identifiers, such as email addresses, mobile phone numbers, and social media profiles. Identity records from all over the internet and many proprietary sources are continually collected, cross-referenced, and connected via its identity resolution engine. More than 3.6 billion phone numbers and 1.7 billion email addresses are included in the resulting searchable database, which spans over 150 nations. Merchants can reduce chargebacks and fraud risk with the Pipl API and manual review solutions while providing a more pleasant customer experience.

DocuSign 

DocuSign Identify integrates identification and identity verification into the eSignature process. Users are aided in safeguarding vital agreements and the overall client experience. The solution is useful for complying with KYC, AML, and other local, national, and international mandates. Each Certificate of Completion has a verification status. Users can confirm a signer’s identity in more ways than only by having them click a link in an email. DocuSign Identify facilitates the incorporation of identity verification and authentication into the eSignature process. This safeguards vital contracts and the satisfaction of your customers. Digitally verifying a signer’s identification using a government-issued ID, KBA questions, or an electronic ID is made possible with ID Verification. It’s promoted as being accessible everywhere in the globe and on any mobile device.

Ekata 

Ekata Inc., a Mastercard subsidiary, works with companies all around the globe to eliminate friction and prevent fraud. The Ekata Identity Engine underpins their identity verification products, allowing companies to make quick and precise risk assessments regarding their clients. With the aid of Ekata, companies can verify their customers’ identities and evaluate potential threats without compromising their privacy. Their products equip over 2,000 companies and partners to prevent cybercrime and provide a welcoming, hassle-free service to clients in more than 230 regions worldwide.

TruNarrative

LexisNexis Risk Solutions’ TruNarrative is designed to streamline security for online transactions. Multi-jurisdictional client onboarding, financial crime detection, risk, and regulatory compliance are all made easier with the TruNarrative Platform. Identity Verification, Fraud, eKYC, AML, and Account Monitoring are all supported by a common decision API. The platform centralizes these inspections in a tunable setting and then employs cutting-edge methods like machine learning and process automation to boost productivity. The TruNarrative no-code interface integrates user data with that from 40+ third-party suppliers, allowing for real-time rule and decision adaptation without IT support. This reduces TCO and facilitates adoption and integration.

KILT 

Self-governing, anonymous, and independently verified credentials may be issued with the help of the KILT system, a blockchain-based decentralized identity system. The protocol facilitates identity-based business models and responds to the need for secure and auditable digital identity systems. KILT is an identity representation system that ensures personal information remains secure and under its owner’s control, just as it would be with a physical passport, driver’s license, or certificate. With KILT, users may make claims about themselves, have those claims certified by other parties, and then keep them as digital credentials to present to others. A hash of the credential is recorded on the blockchain, but the actual data never leaves the owner’s possession.

HealthVerity 

HealthVerity Census is a tool that facilitates real-time data mastery and connecting to aid in resolving patient identification on demand. Using Identity, Privacy, Governance, and Exchange as its cornerstones, the HealthVerity IPGE platform makes finding RWD across the largest healthcare data ecosystem possible, constructing more thorough and accurate patient journeys, and power analytics and applications. Life science companies may create a single source of truth for patient identification and properly develop and distribute the patient journey across partner datasets by de-identifying sensitive health information into universal HealthVerity IDs.

Bureau ID 

Bureau is a no-code identification and risk orchestration platform that promises to assist users in generating fraud-free customer interactions across the customer lifecycle. Bureau provides a software application programming interface (API) that lets customers build their risk policies using a drag-and-drop interface. The bureau handles all types of risk, from account creation and user onboarding to knowing your customer checks and transaction monitoring. By combining pre-integrated device, persona, phone, email, social, alternate data, and behavior intelligence from many data sources, the Bureau Platform helps the user create a risk profile of every user and transaction to perform accurate identity verification and prevent fraud before it occurs.

Mortgage Credit Link (MCL)

Mortgage Credit Link (MCL) is an easy-to-use, cloud-based platform for processing online orders. Its user-friendly online interface and built-in fulfillment facilities streamline placing an order for a product. Its application programming interfaces (APIs) may integrate its products and services with any program to improve usability, remove the potential for human mistakes, and reduce overhead. Services like credit data trends and analytics are part of the solution set. Ellie Mae’s Encompass and Fannie Mae’s Desktop Underwriter are only two examples of loan origination and automated underwriting systems compatible with this software.

Verified by Clari5

To ensure the security of the card, digital banking, and UPI transactions, users may connect with their financial institutions using Verified by Clari5TM. In addition to supporting WhatsApp for transaction verification, Verified by Clari5TM is an omnichannel tool that helps SMS and phone calls for those consumers who want them. To mitigate the growing concern about losing money in online transactions, Verified by Clari5TM provides customer-initiated verification.

INEO 

INEO (formerly Lending Solution) is a financial industry CRM with anti-fraud and digital onboarding features. With an omnichannel approach and administration of customers, INEO systems offer both in-person and remote onboarding, combining biometric identification, Know Your Customer (KYC) and Anti-Money Laundering (AML) controls, digital signature, and customer relationship management (CRM).

Token of Trust

With Token of Trust, you may check someone’s credentials and see what others think of them. Token of Trust allows businesses to do due diligence on users, customers, and transactions in over 130 countries. The Token of Trust platform will enable companies to check the security of transactions and adhere to regulations while giving consumers more agency over their verified identities.

ABBYY 

ABBYY Proof of Identity is an all-in-one service that verifies and authenticates your identity through documents. The system is based on ABBYY’s mobile capture and intelligent document processing technology, prioritizing user-friendliness and data security. Customer onboarding, account opening, claims, and enrollment procedures need proof of identification. At this stage, the consumer presents identification (such as a driver’s license or passport) and other documentation to the company to verify their identity. Customers prefer to interact through mobile and internet channels while looking for loans, making claims, registering for benefits, or signing up for a new healthcare provider. To decrease customer abandonment and churn, businesses may use ABBYY’s Proof of Identity solution to meet their consumers where they already are.

SEON 

SEON provides full-service passive identity verification and anti-fraud solutions. With SEON, organizations can use white-box machine learning tailored to their needs, an advanced scoring engine, and real-time data enrichment. Open-source intelligence (OSINT) and data enrichment come together in SEON’s complete digital profiling offering to display data from 50+ online networks and platforms, such as user photographs, biographies, and profile information. 

Jumio 

Jumio provides a global database of 500 million identities from over 200 countries, which it uses to power an all-encompassing identity verification solution and fraud protection platform. Jumio’s offerings conform to Know Your Customer and Anti-Money Laundering laws. Machine learning for fraud detection and live video, biometric face recognition, barcode and NFC scanning, and so on are all a part of its identity verification capabilities. Jumio is relentlessly pursuing better data protection and information security measures. Jumio abides by the rules and regulations in every country it operates. They are GDPR compliant and certified to ISO/IEC 27001:2013, PCI DSS, and SOC2 Type 2.

Onfido 

Real Identity by Onfido is an automated, end-to-end AI-powered identity verification platform that accepts 2,500 unique forms of identification from 195 different countries. Onfido’s solutions provide a streamlined client identification procedure by providing a flexible, intuitive interface with instantaneous response. Challenger banks want to enroll new customers rapidly and safely, like its identity verification package, since it helps them comply with worldwide Know Your Customer and Anti-Money Laundering requirements. Onfido is an award-winning identity verification technology using AI to automate onboarding. To guarantee a one-of-a-kind, rapid, and accurate micro-model approach to identity verification, this is constructed utilizing data from worldwide. 

Veriff

Veriff is a cloud-based identity verification SaaS product that is highly automated and AI-powered and covers over 11,000 unique government-issued identification papers in over 230 nations and territories and 45 languages. Veriff is widely used in the financial industry and is the solution of choice in the gambling and gaming sectors for maintaining Know Your Customer (KYC) and Anti-Money Laundering (AML) compliance. Veriff’s solution was among the first to employ video analysis to aid in identity verification and to help screen out fraudsters throughout the transaction process by nearly instantaneously identifying whether a user is correct.

 Don’t forget to join our 21k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Top Identity Verification Platforms (2023) appeared first on MarkTechPost.

Stanford Researchers Introduce FrugalGPT: A New AI Framework For LLM A …

Many businesses (OpenAI, AI21, CoHere, etc.) are providing LLMs as a service, given their attractive potential in commercial, scientific, and financial contexts. While GPT-4 and other LLMs have demonstrated record-breaking performance on tasks like question answering, their use in high-throughput applications can be prohibitively expensive. FOR INSTANCE, using GPT-4 to assist with customer service can cost a small business over $21,000 monthly, and ChatGPT is predicted to cost over $700,000 daily. The use of the largest LLMs has a high monetary price tag and has serious negative effects on the environment and society.

Studies show that many LLMs are accessible via APIs at a wide range of pricing. There are normally three parts to the cost of using an LLM API:

The prompt cost (which scales with the duration of the prompt)

The generation cost (which scales with the length of the generation)

A fixed cost per question.

Given the wide range in price and quality, it can be difficult for practitioners to decide how to use all available LLM tools best. Furthermore, relying on a single API provider is not dependable if service is interrupted, as could happen in the event of unexpectedly high demand.

The limitations of LLM are not considered by current model ensemble paradigms like model cascade and FrugalML, which were developed for prediction tasks with a fixed set of labels. 

Recent research by Stanford University proposes a concept for a budget-friendly framework called FrugalGPT, that takes advantage of LLM APIs to handle natural language queries.

Prompt adaptation, LLM approximation, and LLM cascade are the three primary approaches to cost reduction. To save expenses, the prompt adaptation investigates methods of determining which prompts are most efficient. By approximating a complex and high-priced LLM, simpler and more cost-effective alternatives that perform as well as the original can be developed. The key idea of the LLM cascade is to select the appropriate LLM APIs for various queries dynamically. 

A basic version of FrugalGPT built on the LLM cascade is implemented and evaluated to show the potential of these ideas. FrugalGPT learns, for each dataset and task, how to adaptively triage questions from the dataset to various combinations of LLMs, such as ChatGPT, GPT-3, and GPT-4. Compared to the best individual LLM API, FrugalGPT saves up to 98% of the inference cost while maintaining the same performance on the downstream task. FrugalGPT, on the other hand, can yield a performance boost of up to 4% for the same price. 

FrugalGPT’s LLM cascade technique requires labeled examples to be trained. In addition, the training and test examples should have the same or a similar distribution for the cascade to be effective. In addition, time and energy are needed to master the LLM cascade.

FrugalGPT seeks a balance between performance and cost, but other factors, including latency, fairness, privacy, and environmental impact, are more important in practice. The team believes that future studies should focus on including these features in optimization approaches without sacrificing performance or cost-effectiveness. The uncertainty of LLM-generated results also needs to be carefully quantified for use in risk-critical applications. 

Check out the Paper. Don’t forget to join our 21k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Stanford Researchers Introduce FrugalGPT: A New AI Framework For LLM APIs To Handle Natural Language Queries appeared first on MarkTechPost.

Researchers Introduce SPFlowNet: An End-To-End Self-Supervised Approac …

The field of scene flow estimation, which seeks to estimate motion between two successive frames of point clouds, is integral to a myriad of applications, from estimating the motion of objects around a vehicle in autonomous driving to analyzing sports movements. The development of 3D sensors, such as Lidar or stereo-vision cameras, has stimulated research into the topic of 3D scene flow estimation. In a paper from Nanjing University of Science and Technology, researchers in China have introduced a novel end-to-end self-supervised approach for scene flow estimation.

Traditionally, this task is performed in two steps: identifying points or clusters of interest (which could be moving) within a point cloud and then estimating the flow based on the calculated point displacement. The estimation of point cloud clusters typically relies on hand-crafted algorithms, which can yield inaccurate results for complex scenes. Once the point clouds are generated by these algorithms, they remain fixed during the flow estimation step, leading to potential error propagation over time and imprecise flow estimation. This can occur when points with different underlying flow patterns – for example, points associated with two objects moving at different speeds – are assigned to the same superpoint. Recent approaches have explored the use of supervised methods employing deep neural networks to estimate the flow from point clouds directly, but the scarcity of labeled ground-truth data for flows makes the training of these models challenging. To address this issue, self-supervised learning methods have recently emerged as a promising framework for end-to-end scene flow learning from point clouds.

In their paper, the authors propose SPFlowNet (Super Points Flow guided scene estimation), an end-to-end approach for point segmentation, based on the existing work of SPNet. SPFlowNet takes as input two successive point clouds, P and Q (each containing 3-dimensional points), and attempts to estimate the flow in both directions (from P to Q and from Q to P). What sets this approach apart from others is the flow refinement process used, which allows for the dynamic updating of superpoints and flows. This process involves an iterative loop that estimates pairs of Flows F_t. The method can be summarized as follows:

At the outset (t=0), a feature encoder is applied to point clouds P and Q, which calculates an initial guess of the flow pair, F₀. Both point clouds and the flow estimate are then fed into an algorithm called farthest point sampling (FPS), which assigns superpoints to each point cloud.

For t>0, the estimated flows F_t and superpoints are iteratively updated as depicted in the image below. The flow refinement process uses the latest superpoint estimate to compute F_t, which is subsequently used to calculate the pair of superpoint clouds, SP_t. Both processes involve learnable operators.

The training of the neural network involves a specific loss function, L, which includes a regularized Chamfer loss with a penalty on the flow’s smoothness and consistency. The Chamfer loss is given by the following equation:

Here, points of P’ refer to points of the cloud P, moved by the estimated flow F_t.

The overall framework can be considered self-supervised as it does not require the existence of ground truth in the predicted loss function. Notably, this approach achieves state-of-the-art results by a significant margin in the considered benchmark while being trained on modest hardware. However, as discussed in the paper, some parameters remain hand-tuned, including the unsupervised loss function, the number of iterations, T, and the number of superpoints centers, K, considered.

In conclusion, the SPFlowNet presents a significant stride forward in 3D scene flow estimation, offering state-of-the-art results with modest hardware. Its dynamic refinement of flows and superpoints addresses crucial accuracy issues in current methodologies. This work showcases the potential of self-supervised learning for advancing applications where precise motion capture is important. 

[1] Learning representations for rigid motion estimation from point clouds. In CVPR, 2019 

[2] 3d scene flow estimation on pseudo-lidar: Bridging the gap on estimating point motion 

[3] Superpoint network for point cloud over-segmentation. In ICCV, 2021. 3, 8

Check out the Paper. Don’t forget to join our 21k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Researchers Introduce SPFlowNet: An End-To-End Self-Supervised Approach For 3D Scene Flow Estimation appeared first on MarkTechPost.

Build a serverless meeting summarization backend with large language m …

AWS delivers services that meet customers’ artificial intelligence (AI) and machine learning (ML) needs with services ranging from custom hardware like AWS Trainium and AWS Inferentia to generative AI foundation models (FMs) on Amazon Bedrock. In February 2022, AWS and Hugging Face announced a collaboration to make generative AI more accessible and cost efficient.
Generative AI has grown at an accelerating rate from the largest pre-trained model in 2019 having 330 million parameters to more than 500 billion parameters today. The performance and quality of the models also improved drastically with the number of parameters. These models span tasks like text-to-text, text-to-image, text-to-embedding, and more. You can use large language models (LLMs), more specifically, for tasks including summarization, metadata extraction, and question answering.
Amazon SageMaker JumpStart is an ML hub that can helps you accelerate your ML journey. With JumpStart, you can access pre-trained models and foundation models from the Foundations Model Hub to perform tasks like article summarization and image generation. Pre-trained models are fully customizable for your use cases and can be easily deployed into production with the user interface or SDK. Most importantly, none of your data is used to train the underlying models. Because all data is encrypted and doesn’t leave the virtual private cloud (VPC), you can trust that your data will remain private and confidential.
This post focuses on building a serverless meeting summarization using Amazon Transcribe to transcribe meeting audio and the Flan-T5-XL model from Hugging Face (available on JumpStart) for summarization.
Solution overview
The Meeting Notes Generator Solution creates an automated serverless pipeline using AWS Lambda for transcribing and summarizing audio and video recordings of meetings. The solution can be deployed with other FMs available on JumpStart.
The solution includes the following components:

A shell script for creating a custom Lambda layer
A configurable AWS CloudFormation template for deploying the solution
Lambda function code for starting Amazon Transcribe transcription jobs
Lambda function code for invoking a SageMaker real-time endpoint hosting the Flan T5 XL model

The following diagram illustrates this architecture.

As shown in the architecture diagram, the meeting recordings, transcripts, and notes are stored in respective Amazon Simple Storage Service (Amazon S3) buckets. The solution takes an event-driven approach to transcribe and summarize upon S3 upload events. The events trigger Lambda functions to make API calls to Amazon Transcribe and invoke the real-time endpoint hosting the Flan T5 XL model.
The CloudFormation template and instructions for deploying the solution can be found in the GitHub repository.
Real-time inference with SageMaker
Real-time inference on SageMaker is designed for workloads with low latency requirements. SageMaker endpoints are fully managed and support multiple hosting options and auto scaling. Once created, the endpoint can be invoked with the InvokeEndpoint API. The provided CloudFormation template creates a real-time endpoint with the default instance count of 1, but it can be adjusted based on expected load on the endpoint and as the service quota for the instance type permits. You can request service quota increases on the Service Quotas page of the AWS Management Console.
The following snippet of the CloudFormation template defines the SageMaker model, endpoint configuration, and endpoint using the ModelData and ImageURI of the Flan T5 XL from JumpStart. You can explore more FMs on Getting started with Amazon SageMaker JumpStart. To deploy the solution with a different model, replace the ModelData and ImageURI parameters in the CloudFormation template with the desired model S3 artifact and container image URI, respectively. Check out the sample notebook on GitHub for sample code on how to retrieve the latest JumpStart model artifact on Amazon S3 and the corresponding public container image provided by SageMaker.

# SageMaker Model
SageMakerModel:
Type: AWS::SageMaker::Model
Properties:
ModelName: !Sub ${AWS::StackName}-SageMakerModel
Containers:
– Image: !Ref ImageURI
ModelDataUrl: !Ref ModelData
Mode: SingleModel
Environment: {
“MODEL_CACHE_ROOT”: “/opt/ml/model”,
“SAGEMAKER_ENV”: “1”,
“SAGEMAKER_MODEL_SERVER_TIMEOUT”: “3600”,
“SAGEMAKER_MODEL_SERVER_WORKERS”: “1”,
“SAGEMAKER_PROGRAM”: “inference.py”,
“SAGEMAKER_SUBMIT_DIRECTORY”: “/opt/ml/model/code/”,
“TS_DEFAULT_WORKERS_PER_MODEL”: 1
}
EnableNetworkIsolation: true
ExecutionRoleArn: !GetAtt SageMakerExecutionRole.Arn

# SageMaker Endpoint Config
SageMakerEndpointConfig:
Type: AWS::SageMaker::EndpointConfig
Properties:
EndpointConfigName: !Sub ${AWS::StackName}-SageMakerEndpointConfig
ProductionVariants:
– ModelName: !GetAtt SageMakerModel.ModelName
VariantName: !Sub ${SageMakerModel.ModelName}-1
InitialInstanceCount: !Ref InstanceCount
InstanceType: !Ref InstanceType
InitialVariantWeight: 1.0
VolumeSizeInGB: 40

# SageMaker Endpoint
SageMakerEndpoint:
Type: AWS::SageMaker::Endpoint
Properties:
EndpointName: !Sub ${AWS::StackName}-SageMakerEndpoint
EndpointConfigName: !GetAtt SageMakerEndpointConfig.EndpointConfigName

Deploy the solution
For detailed steps on deploying the solution, follow the Deployment with CloudFormation section of the GitHub repository.
If you want to use a different instance type or more instances for the endpoint, submit a quota increase request for the desired instance type on the AWS Service Quotas Dashboard.
To use a different FM for the endpoint, replace the ImageURI and ModelData parameters in the CloudFormation template for the corresponding FM.
Test the solution
After you deploy the solution using the Lambda layer creation script and the CloudFormation template, you can test the architecture by uploading an audio or video meeting recording in any of the media formats supported by Amazon Transcribe. Complete the following steps:

On the Amazon S3 console, choose Buckets in the navigation pane.
From the list of S3 buckets, choose the S3 bucket created by the CloudFormation template named meeting-note-generator-demo-bucket-<aws-account-id>.
Choose Create folder.
For Folder name, enter the S3 prefix specified in the S3RecordingsPrefix parameter of the CloudFormation template (recordings by default).
Choose Create folder.
In the newly created folder, choose Upload.
Choose Add files and choose the meeting recording file to upload.
Choose Upload.

Now we can check for a successful transcription.

On the Amazon Transcribe console, choose Transcription jobs in the navigation pane.
Check that a transcription job with a corresponding name to the uploaded meeting recording has the status In progress or Complete.
When the status is Complete, return to the Amazon S3 console and open the demo bucket.
In the S3 bucket, open the transcripts/ folder.
Download the generated text file to view the transcription.

We can also check the generated summary.

In the S3 bucket, open the notes/ folder.
Download the generated text file to view the generated summary.

Prompt engineering
Even though LLMs have improved in the last few years, the models can only take in finite inputs; therefore, inserting an entire transcript of a meeting may exceed the limit of the model and cause an error with the invocation. To design around this challenge, we can break down the context into manageable chunks by limiting the number of tokens in each invocation context. In this sample solution, the transcript is broken down into smaller chunks with a maximum limit on the number of tokens per chunk. Then each transcript chunk is summarized using the Flan T5 XL model. Finally, the chunk summaries are combined to form the context for the final combined summary, as shown in the following diagram.

The following code from the GenerateMeetingNotes Lambda function uses the Natural Language Toolkit (NLTK) library to tokenize the transcript, then it chunks the transcript into sections, each containing up to a certain number of tokens:

# Chunk transcript into chunks
transcript = contents[‘results’][‘transcripts’][0][‘transcript’]
transcript_tokens = word_tokenize(transcript)

num_chunks = int(math.ceil(len(transcript_tokens) / CHUNK_LENGTH))
transcript_chunks = []
for i in range(num_chunks):
if i == num_chunks – 1:
chunk = TreebankWordDetokenizer().detokenize(transcript_tokens[CHUNK_LENGTH * i:])
else:
chunk = TreebankWordDetokenizer().detokenize(transcript_tokens[CHUNK_LENGTH * i:CHUNK_LENGTH * (i + 1)])
transcript_chunks.append(chunk)
After the transcript is broken up into smaller chunks, the following code invokes the SageMaker real-time inference endpoint to get summaries of each transcript chunk:
# Summarize each chunk
chunk_summaries = []
for i in range(len(transcript_chunks)):
text_input = ‘{}n{}’.format(transcript_chunks[i], instruction)
payload = {
“text_inputs”: text_input,
“max_length”: 100,
“num_return_sequences”: 1,
“top_k”: 50,
“top_p”: 0.95,
“do_sample”: True
}
query_response = query_endpoint_with_json_payload(json.dumps(payload).encode(‘utf-8’))
generated_texts = parse_response_multiple_texts(query_response)
chunk_summaries.append(generated_texts[0])
print(generated_texts[0])

Finally, the following code snippet combines the chunk summaries as the context to generate a final summary:

# Create a combined summary
text_input = ‘{}n{}’.format(‘ ‘.join(chunk_summaries), instruction)
payload = {
“text_inputs”: text_input,
“max_length”: 100,
“num_return_sequences”: 1,
“top_k”: 50,
“top_p”: 0.95,
“do_sample”: True
}
query_response = query_endpoint_with_json_payload(json.dumps(payload).encode(‘utf-8’))
generated_texts = parse_response_multiple_texts(query_response)

results = {
“summary”: generated_texts,
“chunk_summaries”: chunk_summaries
}

The full GenerateMeetingNotes Lambda function can be found in the GitHub repository.
Clean up
To clean up the solution, complete the following steps:

Delete all objects in the demo S3 bucket and the logs S3 bucket.
Delete the CloudFormation stack.
Delete the Lambda layer.

Conclusion
This post demonstrated how to use FMs on JumpStart to quickly build a serverless meeting notes generator architecture with AWS CloudFormation. Combined with AWS AI services like Amazon Transcribe and serverless technologies like Lambda, you can use FMs on JumpStart and Amazon Bedrock to build applications for various generative AI use cases.
For additional posts on ML at AWS, visit the AWS ML Blog.

About the author
Eric Kim is a Solutions Architect (SA) at Amazon Web Services. He works with game developers and publishers to build scalable games and supporting services on AWS. He primarily focuses on applications of artificial intelligence and machine learning.

Prepare training and validation dataset for facies classification usin …

This post is co-written with Thatcher Thornberry from bpx energy. 
Facies classification is the process of segmenting lithologic formations from geologic data at the wellbore location. During drilling, wireline logs are obtained, which have depth-dependent geologic information. Geologists are deployed to analyze this log data and determine depth ranges for potential facies of interest from the different types of log data. Accurately classifying these regions is critical for the drilling processes that follow.
Facies classification using AI and machine learning (ML) has become an increasingly popular area of investigation for many oil majors. Many data scientists and business analysts at large oil companies don’t have the necessary skillset to run advanced ML experiments on important tasks such as facies classification. To address this, we show you how to easily prepare and train a best-in-class ML classification model on this problem.
In this post, aimed primarily at those who are already using Snowflake, we explain how you can import both training and validation data for a facies classification task from Snowflake into Amazon SageMaker Canvas and subsequently train the model using a 3+ category prediction model.
Solution overview
Our solution consists of the following steps:

Upload facies CSV data from your local machine to Snowflake. For this post, we use data from the following open-source GitHub repo.
Configure AWS Identity and Access Management (IAM) roles for Snowflake and create a Snowflake integration.
Create a secret for Snowflake credentials (optional, but advised).
Import Snowflake directly into Canvas.
Build a facies classification model.
Analyze the model.
Run batch and single predictions using the multi-class model.
Share the trained model to Amazon SageMaker Studio.

Prerequisites
Prerequisites for this post include the following:

An AWS account.
Canvas set up, with an Amazon SageMaker user profile associated with it.
A Snowflake account. For steps to create a Snowflake account, refer to How to: Create a Snowflake Free Trial Account
The Snowflake CLI. For steps to connect to Snowflake by CLI, refer to SnowSQL, the command line interface for connecting to Snowflake. For steps to connect to to Snowflake by CLI, refer to Snowflake SnowSQL: Command Line Tool to access Snowflake.
An existing database within Snowflake.

Upload facies CSV data to Snowflake
In this section, we take two open-source datasets and upload them directly from our local machine to a Snowflake database. From there, we set up an integration layer between Snowflake and Canvas.

Download the training_data.csv and validation_data_nofacies.csv files to your local machine. Make note of where you saved them.
Ensuring that you have the correct Snowflake credentials and have installed the Snowflake CLI desktop app, you can federate in. For more information, refer to Log into SnowSQL.
Select the appropriate Snowflake warehouse to work within, which in our case is COMPUTE_WH:

USE WAREHOUSE COMPUTE_WH;

Choose a database to use for the remainder of the walkthrough:

use demo_db;

Create a named file format that will describe a set of staged data to access or load into Snowflake tables.

This can be run either in the Snowflake CLI or in a Snowflake worksheet on the web application. For this post, we run a SnowSQL query in the web application. See Getting Started With Worksheets for instructions to create a worksheet on the Snowflake web application.

Create a table in Snowflake using the CREATE statement.

The following statement creates a new table in the current or specified schema (or replaces an existing table).
It’s important that the data types and the order in which they appear are correct, and align with what is found in the CSV files that we previously downloaded. If they’re inconsistent, we’ll run into issues later when we try to copy the data across.

Do the same for the validation database.

Note that the schema is a little different to the training data. Again, ensure that the data types and column or feature orders are correct.

Load the CSV data file from your local system into the Snowflake staging environment:

The following is the syntax of the statement for Windows OS:

put file://D:path-to-file.csv @DB_Name.PUBLIC.%table_name;

The following is the syntax of the statement for Mac OS:

put file:///path-to-file.csv @DB_NAME.PUBLIC.%table_name;

The following screenshot shows an example command and output from within the SnowSQL CLI.

Copy the data into the target Snowflake table.

Here, we load the training CSV data to the target table, which we created earlier. Note that you have to do this for both the training and validation CSV files, copying them into the training and validation tables, respectively.

Verify that the data has been loaded into the target table by running a SELECT query (you can do this for both the training and validation data):

select * from TRAINING_DATA

Configure Snowflake IAM roles and create the Snowflake integration
As a prerequisite for this section, please follow the official Snowflake documentation on how to configure a Snowflake Storage Integration to Access Amazon S3.
Retrieve the IAM user for your Snowflake account
Once you have successfully configured your Snowflake storage integration, run the following DESCRIBE INTEGRATION command to retrieve the ARN for the IAM user that was created automatically for your Snowflake account:

DESC INTEGRATION SAGEMAKER_CANVAS_INTEGRATION;

Record the following values from the output:

STORAGE_AWS_IAM_USER_ARN – The IAM user created for your Snowflake account
STORAGE_AWS_EXTERNAL_ID – The external ID needed to establish a trust relationship

Update the IAM role trust policy
Now we update the trust policy:

On the IAM console, choose Roles in the navigation pane.
Choose the role you created.
On the Trust relationship tab, choose Edit trust relationship.
Modify the policy document as shown in the following code with the DESC STORAGE INTEGRATION output values you recorded in the previous step.

{
“Version”: “2012-10-17”,
“Statement”: [
{
“Sid”: “”,
“Effect”: “Allow”,
“Principal”: {
“AWS”: “<snowflake_user_arn>”
},
“Action”: “sts:AssumeRole”,
“Condition”: {
“StringEquals”: {
“sts:ExternalId”: “<snowflake_external_id>”
}
}
}
]
}

Choose Update trust policy.

Create an external stage in Snowflake
We use an external stage within Snowflake for loading data from an S3 bucket in your own account into Snowflake. In this step, we create an external (Amazon S3) stage that references the storage integration you created. For more information, see Creating an S3 Stage.
This requires a role that has the CREATE_STAGE privilege for the schema as well as the USAGE privilege on the storage integration. You can grant these privileges to the role as shown in the code in the next step.
Create the stage using the CREATE_STAGE command with placeholders for the external stage and S3 bucket and prefix. The stage also references a named file format object called my_csv_format:

grant create stage on schema public to role <iam_role>;
grant usage on integration SAGEMAKER_CANVAS_INTEGRATION to role <iam_role_arn>;
create stage <external_stage>
storage_integration = SAGEMAKER_CANVAS_INTEGRATION
url = ‘<s3_bucket>/<prefix>’
file_format = my_csv_format;

Create a secret for Snowflake credentials
Canvas allows you to use the ARN of an AWS Secrets Manager secret or a Snowflake account name, user name, and password to access Snowflake. If you intend to use the Snowflake account name, user name, and password option, skip to the next section, which covers adding the data source.
To create a Secrets Manager secret manually, complete the following steps:

On the Secrets Manager console, choose Store a new secret.
For Select secret type¸ select Other types of secrets.
Specify the details of your secret as key-value pairs.

The names of the key are case-sensitive and must be lowercase.
If you prefer, you can use the plaintext option and enter the secret values as JSON:

{
“username”: “<snowflake username>”,
“password”: “<snowflake password>”,
“accountid”: “<snowflake account id>”
}

Choose Next.
For Secret name, add the prefix AmazonSageMaker (for example, our secret is AmazonSageMaker-CanvasSnowflakeCreds).
In the Tags section, add a tag with the key SageMaker and value true.

Choose Next.
The rest of the fields are optional; choose Next until you have the option to choose Store to store the secret.
After you store the secret, you’re returned to the Secrets Manager console.
Choose the secret you just created, then retrieve the secret ARN.
Store this in your preferred text editor for use later when you create the Canvas data source.

Import Snowflake directly into Canvas
To import your facies dataset directly into Canvas, complete the following steps:

On the SageMaker console, choose Amazon SageMaker Canvas in the navigation pane.
Choose your user profile and choose Open Canvas.
On the Canvas landing page, choose Datasets in the navigation pane.
Choose Import.

Click on Snowflake in the below image and then immediately “Add Connection”.
Enter the ARN of the Snowflake secret that we previously created, the storage integration name (SAGEMAKER_CANVAS_INTEGRATION), and a unique connection name of your choosing.
Choose Add connection.

If all the entries are valid, you should see all the databases associated with the connection in the navigation pane (see the following example for NICK_FACIES).

Choose the TRAINING_DATA table, then choose Preview dataset.

If you’re happy with the data, you can edit the custom SQL in the data visualizer.

Choose Edit in SQL.
Run the following SQL command before importing into Canvas. (This assumes that the database is called NICK_FACIES. Replace this value with your database name.)

SELECT “FACIES”, “FORMATION”, “WELL_NAME”, “DEPTH”, “GR”, “ILD_LOG10”, “DELTAPHI”, “PHIND”, “PE”, “NM_M”, “RELPOS” FROM “NICK_FACIES”.”PUBLIC”.”TRAINING_DATA”;

Something similar to the following screenshot should appear in the Import preview section.

If you’re happy with the preview, choose Import data.

Choose an appropriate data name, ensuring that it’s unique and fewer than 32 characters long.
Use the following command to import the validation dataset, using the same method as earlier:

SELECT “FORMATION”, “WELL_NAME”, “DEPTH”, “GR”, “ILD_LOG10”, “DELTAPHI”, “PHIND”, “PE”, “NM_M”, “RELPOS” FROM “NICK_FACIES”.”PUBLIC”.”VALIDATION_DATA”;

Build a facies classification model
To build your facies classification model, complete the following steps:

Choose Models in the navigation pane, then choose New Model.
Give your model a suitable name.
On the Select tab, choose the recently imported training dataset, then choose Select dataset.
On the Build tab, drop the WELL_NAME column.

We do this because the well names themselves aren’t useful information for the ML model. They are merely arbitrary names that we find useful to distinguish between the wells themselves. The name we give a particular well is irrelevant to the ML model.

Choose FACIES as the target column.
Leave Model type as 3+ category prediction.
Validate the data.
Choose Standard build.

Your page should look similar to the following screenshot just before building your model.

After you choose Standard build, the model enters the analyze stage. You’re provided an expected build time. You can now close this window, log out of Canvas (in order to avoid charges), and return to Canvas at a later time.
Analyze the facies classification model
To analyze the model, complete the following steps:

Federate back into Canvas.
Locate your previously created model, choose View, then choose Analyze.
On the Overview tab, you can see the impact that individual features are having on the model output.
In the right pane, you can visualize the impact that a given feature (X axis) is having on the prediction of each facies class (Y axis).

These visualizations will change accordingly depending on the feature you select. We encourage you to explore this page by cycling through all 9 classes and 10 features.

On the Scoring tab, we can see the predicted vs. actual facies classification.
Choose Advanced metrics to view F1 scores, average accuracy, precision, recall, and AUC.
Again, we encourage viewing all the different classes.
Choose Download to download an image to your local machine.

In the following image, we can see a number of different advanced metrics, such as the F1 score. In statistical analysis, the F1 score conveys the balance between the precision and the recall of a classification model, and is computed using the following equation: 2*((Precision * Recall)/ (Precision + Recall)).

Run batch and single prediction using the multi-class facies classification model
To run a prediction, complete the following steps:

Choose Single prediction to modify the feature values as needed, and get a facies classification returned on the right of the page.

You can then copy the prediction chart image to your clipboard, and also download the predictions into a CSV file.

Choose Batch prediction and then choose Select dataset to choose the validation dataset you previously imported.
Choose Generate predictions.

You’re redirected to the Predict page, where the Status will read Generating predictions for a few seconds.
After the predictions are returned, you can preview, download, or delete the predictions by choosing the options menu (three vertical dots) next to the predictions.

The following is an example of a predictions preview.

Share a trained model in Studio
You can now share the latest version of the model with another Studio user. This allows data scientists to review the model in detail, test it, make any changes that may improve accuracy, and share the updated model back with you.
The ability to share your work with a more technical user within Studio is a key feature of Canvas, given the key distinction between ML personas’ workflows. Note the strong focus here on collaboration between cross-functional teams with differing technical abilities.

Choose Share to share the model.

Choose which model version to share.
Enter the Studio user to share the model with.
Add an optional note.
Choose Share.

Conclusion
In this post, we showed how with just a few clicks in Amazon SageMaker Canvas you can prepare and import your data from Snowflake, join your datasets, analyze estimated accuracy, verify which columns are impactful, train the best performing model, and generate new individual or batch predictions. We’re excited to hear your feedback and help you solve even more business problems with ML. To build your own models, see Getting started with using Amazon SageMaker Canvas.

About the Authors
Nick McCarthy is a Machine Learning Engineer in the AWS Professional Services team. He has worked with AWS clients across various industries including healthcare, finance, sports, telecoms and energy to accelerate their business outcomes through the use of AI/ML. Working with the bpx data science team, Nick recently finished building bpx’s Machine Learning platform on Amazon SageMaker.
Thatcher Thornberry is a Machine Learning Engineer at bpx Energy. He supports bpx’s data scientists by developing and maintaining the company’s core Data Science platform in Amazon SageMaker. In his free time he loves to hack on personal coding projects and spend time outdoors with his wife.

A Distributed Inference Serving System For Large Language Models LLMs

Large language model (LLM) improvements create opportunities in various fields and inspire a new wave of interactive AI applications. The most noteworthy one is ChatGPT, which enables people to communicate informally with an AI agent to resolve problems ranging from software engineering to language translation. ChatGPT is one of the fastest-growing programs in history, thanks to its remarkable capabilities. Many companies follow the trend of releasing LLMs and ChatGPT-like products, including Microsoft’s New Bing, Google’s Bard, Meta’s LLaMa, Stanford’s Alpaca, Databricks’ Dolly, and UC Berkeley’s Vicuna. 

LLM inference differs from another deep neural network (DNN) model inference, such as ResNet, because it has special traits. Interactive AI applications built on LLMs must provide inferences to function. These apps’ interactive design necessitates quick job completion times (JCT) for LLM inference to deliver engaging user experiences. For instance, consumers anticipate an immediate response when they submit data into ChatGPT. However, the inference serving infrastructure is under great strain due to the number and complexity of LLMs. Businesses set up pricey clusters with accelerators like GPUs and TPUs to handle LLM inference operations. 

DNN inference jobs are often deterministic and highly predictable, i.e., the model and the hardware largely determine the inference job’s execution time. For instance, the execution time of various input photos varies a little while using the same ResNet model on a certain GPU. LLM inference positions, in contrast, have a unique autoregressive pattern. The LLM inference work goes through several rounds. Each iteration produces one output token, which is then added to the input to make the subsequent token in the following iteration. The output length, which is unknown at the outset, affects both the execution time and input length. Most deterministic model inference tasks, like those performed by ResNet, are catered for by existing inference serving systems like Clockwork and Shepherd. 

They base their scheduling decisions on precise execution time profiling, which is ineffective for LLM inference with variable execution times. The most advanced method for LLM inference is Orca. It suggests iteration-level scheduling, allowing for adding new jobs to or deleting completed jobs from the current processing batch after each iteration. However, it processes inference jobs using first-come, first-served (FCFS). A scheduled task runs continuously until it is completed. The processing batch cannot be increased with an arbitrary number of incoming functions due to the restricted GPU memory capacity and the low JCT requirements of inference jobs. Head-of-line blocking in run-to-completion processing is well known. 

Because LLMs are vast in size and take a long time to execute in absolute terms, the issue is particularly severe for LLM inference operations. Large LLM inference jobs, especially those with lengthy output lengths, would take a long time to complete and obstruct subsequent short jobs. Researchers from Peking University developed a distributed inference serving solution for LLMs called FastServe. To enable preemption at the level of each output token, FastServe uses iteration-level scheduling and the autoregressive pattern of LLM inference. FastServe can choose whether to continue a scheduled task after it has generated an output token or to preempt it with another job in the queue. This enables FastServe to reduce JCT and head-of-line blocking via preemptive scheduling. 

A unique skip-join Multi-Level Feedback Queue (MLFQ) scheduler serves as the foundation of FastServe. MLFQ is a well-known method for minimizing average JCT in information-free environments. Each work starts in the highest priority queue, and if it doesn’t finish within a certain time, it gets demoted to the next priority queue. LLM inference is semi-information agnostic, meaning that while the output length is not known a priori, the input length is known. This is the main distinction between LLM inference and the conventional situation. The input length determines the execution time to create the initial output token, which might take much longer than those of the following tokens because of the autoregressive pattern of LLM inference.

The initial output token’s execution time takes up most of the work when the input is lengthy and the output is brief. They use this quality to add skip-join to the traditional MLFQ. Each arrival task joins an appropriate queue by comparing the execution time of the first output token with the demotion thresholds of the lines, as opposed to always entering the highest priority queue. The higher priority queues than the joined queue are bypassed to minimize downgrades. Preemptive scheduling with MLFQ adds additional memory overhead to keep begun but incomplete jobs in an interim state. LLMs maintain a key-value cache for each Transformer layer to store the intermediate state. As long as the batch size is not exceeded, the FCFS cache needs to store the scheduled jobs’ intermediate states. However, additional jobs may have begun in MLFQ, but they are relegated to queues with lesser priorities. All begun but incomplete jobs in MLFQ must have the interim state maintained by the cache. Given the size of LLMs and the restricted memory space of GPUs, the cache may overflow. When the cache is full, the scheduler naively can delay initiating new jobs, but this once more creates head-of-line blocking. 

Instead, they develop a productive GPU memory management system that proactively uploads the state of processes in low-priority queues when they are scheduled and offloads the state when the cache is almost full. To increase efficiency, they employ pipelining and asynchronous memory operations. FastServe uses parallelization techniques like tensor and pipeline parallelism to provide distributed inference serving with many GPUs for huge models that do not fit in one GPU. To reduce pipeline bubbles, the scheduler performs numerous batches of jobs concurrently. A distributed key-value cache is organized by the key-value cache manager, which also distributes the management of memory swapping between GPU and host memory. They put into practice a FastServe system prototype based on NVIDIA FasterTransformer.The results reveal that FastServe enhances the average and tail JCT by up to 5.1 and 6.4, respectively, compared to the cutting-edge solution Orca.

Check out the Paper. Don’t forget to join our 21k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Peking University Researchers Introduce FastServe: A Distributed Inference Serving System For Large Language Models LLMs appeared first on MarkTechPost.

9+ Use Cases of ChatGPT that Went on Steriods

With the advent of the internet, ChatGPT is now available to everyone. Steroids have been added to ChatGPT. Here are some examples of how you can put the latest version of ChatGPT to work:

Rapid creation of content based on current events, trends, and analytics.  

Example: “Generate a caption & prompt for Midjourney to create a post for Instagram [Username] using yesterday’s news for [Topic]. Finally, utilizing the last week’s engagement statistics, recommend a posting time.

Job hunting 

Example: “Find a list of remote [Field] jobs that fit my profile, with a pay range of $70k-90k. Using that list, edit my resume & cover letter for each job listing. And lastly, submit each application and use Excel sheets to keep track of each application.”

Plan your trip with up-to-the-minute information.  

Example: “Find me the quickest flight from [Location] to [Loaction] within 27 hours. Once landed, book a hotel under $200 in [Location] that also has a hot tub, 2 beds, and is near the city”

Find the most recent trends  

Example: “Give me a list of trends forming within the last week in the [Topic]. Using that list, create multiple visuals and a presentation for [company] on Google Slides about how [Topic] is the next big thing”

Discover the best discounts on the web. 

Example: “Search for the three lowest priced [items] across all e-commerce websites. Show me the best options with over 4.5 ratings, that ship to [Location].”

Updates on the stock market in real-time 

Example: “What’s the current price of all the stocks in my [portfolio]? How has its value changed in the last 7 days?”

Summarization

Example: “Summarize the following [Article] in a paragraph. Then, send that summary to my boss on his [email]”

Explore the most current literature, media, and online resources.  

Example: “Conduct market research on [Website #1] and compare it to our [Website #2].   Give me a list of ways to improve our website and a step-by-step plan to achieve it.”

Create new promotional plans every day. 

Example: “Generate an SEO strategy for [company] around the [Topic].”

Personal Portfolio Manager

Example: “Using data from the latest Snowflake, Google, and Amazon earnings, generate graphs predicting each company’s growth within [Topic].  Then, provide me a thesis on each stock and how they fit within my portfolio.”

This article is based on this tweet thread. Don’t forget to join our 21k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com
The post 9+ Use Cases of ChatGPT that Went on Steriods appeared first on MarkTechPost.

A Consistency Model-Based Method For Speech Synthesis That Achieves Fa …

With the growing human-machine interaction and entertainment applications, text-to-speech (TTS) and singing voice synthesis (SVS) tasks have been widely included in speech synthesis, which strives to generate realistic audio of people. Deep neural network (DNN)-based methods have largely taken over the field of speech synthesis. Typically, a two-stage pipeline is used, with the acoustic model converting text and other controlling information into acoustic features (such as mel-spectrograms) before the vocoder further converts the acoustic features into audible waveforms. 

The two-stage pipeline has succeeded because it acts as a “relay” to solve the dimension-exploding issue of translating short texts to long audios with a high sampling frequency. Frames describe acoustic characteristics. The acoustic characteristic that the acoustic model produces, often a mel-spectrogram, significantly impacts the quality of the synthesized talks. Convolutional neural networks (CNN) and Transformers are frequently employed in industry-standard methods like Tacotron, DurIAN, and FastSpeech to forecast the mel-spectrogram from the governing component. The ability of diffusion model approaches to generate high-quality samples has gained a lot of interest. The two processes that make up a diffusion model, also known as a score-based model, are a diffusion process that gradually perturbs data into noise and a reverse process that slowly transforms noise back to data. The diffusion model’s need for several iterations for generation is a serious flaw. Several techniques based on the diffusion model have been suggested for acoustic modeling in voice synthesis. The sluggish generating speed issue still exists in most of these works. 

Grad-TTS developed a stochastic differential equation (SDE) to solve the reverse SDE, which is utilized to solve the noise to mel-spectrogram transformation. Despite producing great audio quality, the inference speed is slow since the reverse method requires a lot of iterations (10–1000). Progressive distillation was added to Prodiff when it was being developed further to minimize the sample processes. DiffGAN-TTS used an adversarially-trained model in Liu et al. to roughly represent the denoising function for effective voice synthesis. The ResGrad in Chen et al. estimates the prediction residual from pre-trained FastSpeech2 and ground truth using the diffusion model. 

From the description above, it is clear that speech synthesis has three goals: 

• Excellent audio quality: The generative model should faithfully capture the subtleties of the speaking voice that add to the expressiveness and naturalness of the synthesized audio. Recent research has focused on voices with more intricate changes in pitch, timing, and emotion in addition to the distinctive speaking voice. Diffsinger, for instance, demonstrates how a well-designed diffusion model may provide a synthesized singing voice of good quality after 100 iterations. Additionally, it’s important to prevent artifacts and distortions in the created audio.

• Quick inference: Quick audio synthesis is necessary for real-time applications, including communication, interactive speech, and music systems. Simply being quicker than real-time for voice synthesis is insufficient when making time for other algorithms in an integrated system. 

• Beyond speaking: More intricate voice modeling, such as singing voice, is needed in place of the distinctive speaking voice in terms of pitch, emotion, rhythm, breath control, and timbre. 

Although numerous attempts have been made, the trade-off issue between the synthesized audio quality, model capability, and inference speed persists in TTS. It is more obvious in SVS due to the mechanism of the denoising diffusion process when performing the sampling. Existing approaches often aim to mitigate rather than completely resolve the slow inference problem. Despite this, they must be faster than traditional approaches without using diffusion models like FastSpeech2. 

The consistency model has recently been developed, producing high-quality images with just one sampling step by expressing the stochastic differential equation (SDE), describing the sampling process as an ordinary differential equation (ODE), and further enforcing the consistency property of the model on the ODE trajectory. Despite this accomplishment in picture synthesis, there currently needs to be a known voice synthesis model based on the consistency model. This suggests that it is possible to develop a consistent model-based voice synthesis technique that combines high-quality synthesis with quick inference speed. 

In this study, researchers from Hong Kong Baptist University, Hong Kong University of Science and Technology, Microsoft Research Asia and Hong Kong Institute of Science & Innovation offer CoMoSpeech, a swift and high-quality speech synthesis approach based on consistency models.  Their CoMoSpeech is derived from an instructor who has already received training. More specifically, their teacher model uses the SDE to learn the matching scoring function and smoothly translate the mel-spectrogram into the Gaussian noise distribution. After training, they build the teacher denoiser function using the associated numerical ODE solvers, which is then utilized for further consistency distillation. Their CoMoSpeech with consistent characteristics is produced by distillation. Ultimately, their CoMoSpeech can generate high-quality audio with a single sample step. 

The findings of their TTS and SVS trials demonstrate that the CoMoSpeech can produce monologues with a single sample step, which is more than 150 times quicker than in real-time. The study of audio quality also reveals that CoMoSpeech provides audio quality that is superior to or on par with other diffusion model techniques that need tens to hundreds of iterations. The diffusion model-based speech synthesis is now practicable for the first time. Several audio examples are given on their project website.

Check out the Paper and Project. Don’t forget to join our 21k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Meet CoMoSpeech: A Consistency Model-Based Method For Speech Synthesis That Achieves Fast And High-Quality Audio Generation appeared first on MarkTechPost.

A better way to study ocean currents

To study ocean currents, scientists release GPS-tagged buoys in the ocean and record their velocities to reconstruct the currents that transport them. These buoy data are also used to identify “divergences,” which are areas where water rises up from below the surface or sinks beneath it.

By accurately predicting currents and pinpointing divergences, scientists can more precisely forecast the weather, approximate how oil will spread after a spill, or measure energy transfer in the ocean. A new model that incorporates machine learning makes more accurate predictions than conventional models do, a new study reports.

A multidisciplinary research team including computer scientists at MIT and oceanographers has found that a standard statistical model typically used on buoy data can struggle to accurately reconstruct currents or identify divergences because it makes unrealistic assumptions about the behavior of water.

The researchers developed a new model that incorporates knowledge from fluid dynamics to better reflect the physics at work in ocean currents. They show that their method, which only requires a small amount of additional computational expense, is more accurate at predicting currents and identifying divergences than the traditional model.

This new model could help oceanographers make more accurate estimates from buoy data, which would enable them to more effectively monitor the transportation of biomass (such as Sargassum seaweed), carbon, plastics, oil, and nutrients in the ocean. This information is also important for understanding and tracking climate change.

“Our method captures the physical assumptions more appropriately and more accurately. In this case, we know a lot of the physics already. We are giving the model a little bit of that information so it can focus on learning the things that are important to us, like what are the currents away from the buoys, or what is this divergence and where is it happening?” says senior author Tamara Broderick, an associate professor in MIT’s Department of Electrical Engineering and Computer Science (EECS) and a member of the Laboratory for Information and Decision Systems and the Institute for Data, Systems, and Society.

Broderick’s co-authors include lead author Renato Berlinghieri, an electrical engineering and computer science graduate student; Brian L. Trippe, a postdoc at Columbia University; David R. Burt and Ryan Giordano, MIT postdocs; Kaushik Srinivasan, an assistant researcher in atmospheric and ocean sciences at the University of California at Los Angeles; Tamay Özgökmen, professor in the Department of Ocean Sciences at the University of Miami; and Junfei Xia, a graduate student at the University of Miami. The research will be presented at the International Conference on Machine Learning.

Diving into the data

Oceanographers use data on buoy velocity to predict ocean currents and identify “divergences” where water rises to the surface or sinks deeper.

To estimate currents and find divergences, oceanographers have used a machine-learning technique known as a Gaussian process, which can make predictions even when data are sparse. To work well in this case, the Gaussian process must make assumptions about the data to generate a prediction.

A standard way of applying a Gaussian process to oceans data assumes the latitude and longitude components of the current are unrelated. But this assumption isn’t physically accurate. For instance, this existing model implies that a current’s divergence and its vorticity (a whirling motion of fluid) operate on the same magnitude and length scales. Ocean scientists know this is not true, Broderick says. The previous model also assumes the frame of reference matters, which means fluid would behave differently in the latitude versus the longitude direction.

“We were thinking we could address these problems with a model that incorporates the physics,” she says.

They built a new model that uses what is known as a Helmholtz decomposition to accurately represent the principles of fluid dynamics. This method models an ocean current by breaking it down into a vorticity component (which captures the whirling motion) and a divergence component (which captures water rising or sinking).

In this way, they give the model some basic physics knowledge that it uses to make more accurate predictions.

This new model utilizes the same data as the old model. And while their method can be more computationally intensive, the researchers show that the additional cost is relatively small.

Buoyant performance

They evaluated the new model using synthetic and real ocean buoy data. Because the synthetic data were fabricated by the researchers, they could compare the model’s predictions to ground-truth currents and divergences. But simulation involves assumptions that may not reflect real life, so the researchers also tested their model using data captured by real buoys released in the Gulf of Mexico.

In each case, their method demonstrated superior performance for both tasks, predicting currents and identifying divergences, when compared to the standard Gaussian process and another machine-learning approach that used a neural network. For example, in one simulation that included a vortex adjacent to an ocean current, the new method correctly predicted no divergence while the previous Gaussian process method and the neural network method both predicted a divergence with very high confidence.

The technique is also good at identifying vortices from a small set of buoys, Broderick adds.

Now that they have demonstrated the effectiveness of using a Helmholtz decomposition, the researchers want to incorporate a time element into their model, since currents can vary over time as well as space. In addition, they want to better capture how noise impacts the data, such as winds that sometimes affect buoy velocity. Separating that noise from the data could make their approach more accurate.

“Our hope is to take this noisily observed field of velocities from the buoys, and then say what is the actual divergence and actual vorticity, and predict away from those buoys, and we think that our new technique will be helpful for this,” she says.

“The authors cleverly integrate known behaviors from fluid dynamics to model ocean currents in a flexible model,” says Massimiliano Russo, an associate biostatistician at Brigham and Women’s Hospital and instructor at Harvard Medical School, who was not involved with this work. “The resulting approach retains the flexibility to model the nonlinearity in the currents but can also characterize phenomena such as vortices and connected currents that would only be noticed if the fluid dynamic structure is integrated into the model. This is an excellent example of where a flexible model can be substantially improved with a well thought and scientifically sound specification.”

This research is supported, in part, by the Office of Naval Research, a National Science Foundation (NSF) CAREER Award, and the Rosenstiel School of Marine, Atmospheric, and Earth Science at the University of Miami.

An AI challenge only humans can solve

The Dark Ages were not entirely dark. Advances in agriculture and building technology increased Medieval wealth and led to a wave of cathedral construction in Europe. However, it was a time of profound inequality. Elites captured virtually all economic gains. In Britain, as Canterbury Cathedral soared upward, peasants had no net increase in wealth between 1100 and 1300. Life expectancy hovered around 25 years. Chronic malnutrition was rampant.

“We’ve been struggling to share prosperity for a long time,” says MIT Professor Simon Johnson. “Every cathedral that your parents dragged you to see in Europe is a symbol of despair and expropriation, made possible by higher productivity.”

At a glance, this might not seem relevant to life in 2023. But Johnson and his MIT colleague Daron Acemoglu, both economists, think it is. Technology drives economic progress. As innovations take hold, one perpetual question is: Who benefits?

This applies, the scholars believe, to automation and artificial intelligence, which is the focus of a new book by Acemoglu and Johnson, “Power and Progress: Our 1000-Year Struggle Over Technology and Prosperity,” published this week by PublicAffairs. In it, they examine who reaped the rewards from past innovations and who may gain from AI today, economically and politically.

“The book is about the choices we make with technology,” Johnson says. “That’s a very MIT type of theme. But a lot of people feel technology just descends on you, and you have to live with it.”

AI could develop as a beneficial force, Johnson says. However, he adds, “Many algorithms are being designed to try to replace humans as much as possible. We think that’s entirely wrong. The way we make progress with technology is by making machines useful to people, not displacing them. In the past we have had automation, but with new tasks for people to do and sufficient countervailing power in society.”

Today, AI is a tool of social control for some governments that also creates riches for a small number of people, according to Acemoglu and Johnson. “The current path of AI is neither good for the economy nor for democracy, and these two problems, unfortunately, reinforce each other,” they write.

A return to shared prosperity?

Acemoglu and Johnson have collaborated before; in the early 2000s, with political scientist James Robinson, they produced influential papers about politics and economic progress. Acemoglu, an Institute Professor at MIT, also co-authored with Robinson the books “Why Nations Fail” (2012), about political institutions and growth, and “The Narrow Corridor” (2019), which casts liberty as the never-assured outcome of social struggle.

Johnson, the Ronald A. Kurtz Professor of Entrepreneurship at the MIT Sloan School of Management, wrote “13 Bankers” (2010), about finance reform, and, with MIT economist Jonathan Gruber, “Jump-Starting America” (2019), a call for more investment in scientific research.

In “Power and Progress,” the authors emphasize that technology has created remarkable long-term benefits. As they write, “we are greatly better off than our ancestors,” and “scientific and technological progress is a vital part of that story.”

Still, a lot of suffering and oppression has occurred while the long term is unfolding, and not just during Medieval times.  

“It was a 100-year struggle during the Industrial Revolution for workers to get any cut of these massive productivity gains in textiles and railways,” Johnson observes. Broader progress has come through increased labor power and electoral government; when the U.S. economy grew spectacularly for three decades after World War II, gains were widely distributed, though that has not been the case recently.

“We’re suggesting we can get back onto that path of shared prosperity, reharness technology for everybody, and get productivity gains,” Johnson says. “We had all that in the postwar period. We can get it back, but not with the current form of our machine intelligence obsession. That, we think, is undermining prosperity in the U.S. and around the world.”

A call for “machine usefulness,” not “so-so automation”

What do Acemoglu and Johnson think is deficient about AI? For one thing, they believe the development of AI is too focused on mimicking human intelligence. The scholars are skeptical of the notion that AI mirrors human thinking all told — even things like the chess program AlphaZero, which they regard more as a specialized set of instructions.

Or, for instance, image recognition programs — Is that a husky or a wolf? — use large data sets of past human decisions to build predictive models. But these are often correlation-dependent (a husky is more likely to be in front of your house), and can’t replicate the same cues humans rely on. Researchers know this, of course, and keep refining their tools. But Acemoglu and Robinson contend that many AI programs are less agile than the human mind, and suboptimal replacements for it, even as AI is designed to replace human work.

Acemoglu, who has published many papers on automation and robots, calls these replacement tools “so-so technologies.” A supermarket self-checkout machine does not add meaningful economic productivity; it just transfers work to customers and wealth to shareholders. Or, among more sophisticated AI tools, for instance, a customer service line using AI that doesn’t address a given problem can frustrate people, leading them to vent once they do reach a human and making the whole process less efficient.

All told, Acemoglu and Johnson write, “neither traditional digital technologies nor AI can perform essential tasks that involve social interaction, adaptation, flexibility, and communication.”

Instead, growth-minded economists prefer technologies creating “marginal productivity” gains, which compel firms to hire more workers. Instead of aiming to eliminate medical specialists like radiologists, a much-forecast AI development that has not occurred, Acemoglu and Johnson suggest AI tools might expand what home health care workers can do, and make their services more valuable, without reducing workers in the sector.

“We think there is a fork in the road, and it’s not too late — AI is a very good opportunity to reassert machine usefulness as a philosophy of design,” Johnson says. “And to look for ways to put tools in the hands of workers, including lower-wage workers.”

Defining the discussion

Another set of AI issues Acemoglu and Johnson are concerned about extend directly into politics: Surveillance technologies, facial-recognition tools, intensive data collection, and AI-spread misinformation.

China deploys AI to create “social credit” scores for citizens, along with heavy surveillance, while tightly restricting freedom of expression. Elsewhere, social media platforms use algorithms to influence what users see; by emphasizing “engagement” above other priorities, they can spread harmful misinformation.

Indeed, throughout “Power and Progress,” Acemoglu and Johnson emphasize that the use of AI can set up self-reinforcing dynamics in which those who benefit economically can gain political influence and power at the expense of wider democratic participation.

To alter this trajectory, Acemoglu and Johnson advocate for an extensive menu of policy responses, including data ownership for internet users (an idea of technologist Jaron Lanier); tax reform that rewards employment more than automation; government support for a diversity of high-tech research directions; repealing Section 230 of the 1996 Communications Decency Act, which protects online platforms from regulation or legal action based on the content they host; and a digital advertising tax (aimed to limit the profitability of algorithm-driven misinformation).

Johnson believes people of all ideologies have incentives to support such measures: “The point we’re making is not a partisan point,” he says.

Other scholars have praised “Power and Progress.” Michael Sandel, the Anne T. and Robert M. Bass Professor of Government at Harvard University, has called it a “humane and hopeful book” that “shows how we can steer technology to promote the public good,” and is “required reading for everyone who cares about the fate of democracy in a digital age.”

For their part, Acemoglu and Johnson want to broaden the public discussion of AI beyond industry leaders, discard notions about the AI inevitability, and think again about human agency, social priorities, and economic possibilities.

“Debates on new technology ought to center not just on the brilliance of new products and algorithms but on whether they are working for the people or against the people,” they write.

“We need these discussions,” Johnson says. “There’s nothing inherent in technology. It’s within our control. Even if you think we can’t say no to new technology, you can channel it, and get better outcomes from it, if you talk about it.”