Nvidia AI Released Llama-Minitron 3.1 4B: A New Language Model Built b …

Nvidia has just announced a new release in language models, but this time, a small language model: the Llama-3.1-Minitron 4B model. This means it is one of the major steps in the continuous evolution of language models, combining the efficiency of large-scale models with smaller models through cutting-edge techniques such as pruning and knowledge distillation.

The Llama-3.1-Minitron 4B model is the distilled and pruned version of the bigger Llama-3.1 8B sister model. To create this smaller model from the original 8B model, Nvidia used structured pruning in the depth and width directions. Pruning is a technique that deletes less important layers or neurons of the network to reduce model size and complexity while retaining its performance. In this case, Nvidia performed the depth pruning by removing 16 layers from the model and downsizing it from an 8B to a 4B model. Another technique applied is width pruning through trimming embedding dimensions and MLP intermediate.

Besides pruning, Nvidia also applied classical distillation to enhance the efficiency of Llama-3.1-Minitron 4B. Knowledge distillation is a process whereby a smaller model, the student, is trained to mimic the behavior of a larger and more complex one, the teacher. In this way, much of the predictive power of the original model is preserved in the smaller model, but it is faster and more frugal in terms of resources. Nvidia has combined this with the distillation technique and pruning, making sure that the retrained model of 4B is high-performing and is well-spent in larger models.

Image Source

The Llama-3.1-Minitron 4B model excels in various benchmarks, producing competitive performance against larger state-of-the-art open-source models. It highly outperforms many other small language models in most domains, like Minitron 4B, Phi-2 2.7B, Gemma2 2.6B, and Qwen2-1.5B. Extensive benchmarking has proven this model’s effectiveness in terms of better accuracy and efficiency for reasoning, coding, and math.

One of the biggest advantages of the Llama-3.1-Minitron 4B model lies in its ability to compete equally well, yet it’s resource-efficient. It uses a fraction of the number of training tokens required by training from scratch, up to 40 times smaller. This translates to considerable compute cost savings. It makes this a very appealing option to deploy in scenarios where there might be limits to computational resources to deploy large-scale language models.

Image Source

Nvidia has further optimized the Llama-3.1-Minitron 4B model to deploy it using its TensorRT-LLM toolkit, which enhances its inference performance. For instance, the model’s throughput in FP8 precision for various cases increased to 2.7x higher than the original Llama 3.1 8B model. The additional optimization performed on Llama-3.1-Minitron 4B renders this model extremely powerful and efficient, easily applicable in many domains.

Image Source

In conclusion, Nvidia’s release of the Llama-3.1-Minitron 4B model is a huge leap in the creation of LLMs. Thus, the model designed by Nvidia has achieved good performance while being resource-efficient; hence, it is very useful in many NLP tasks. The Llama-3.1-Minitron 4B model will become part of Nvidia’s Hugging Face collection and add to the shifting landscape of powerful, freely available AI models.

Check out the Model Card and Details. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here

Arcee AI Introduces Arcee Swarm: A Groundbreaking Mixture of Agents MoA Architecture Inspired by the Cooperative Intelligence Found in Nature Itself

The post Nvidia AI Released Llama-Minitron 3.1 4B: A New Language Model Built by Pruning and Distilling Llama 3.1 8B appeared first on MarkTechPost.

Perform generative AI-powered data prep and no-code ML over any size o …

Amazon SageMaker Canvas now empowers enterprises to harness the full potential of their data by enabling support of petabyte-scale datasets. Starting today, you can interactively prepare large datasets, create end-to-end data flows, and invoke automated machine learning (AutoML) experiments on petabytes of data—a substantial leap from the previous 5 GB limit. With over 50 connectors, an intuitive Chat for data prep interface, and petabyte support, SageMaker Canvas provides a scalable, low-code/no-code (LCNC) ML solution for handling real-world, enterprise use cases.
Organizations often struggle to extract meaningful insights and value from their ever-growing volume of data. You need data engineering expertise and time to develop the proper scripts and pipelines to wrangle, clean, and transform data. Then you must experiment with numerous models and hyperparameters requiring domain expertise. Afterward, you need to manage complex clusters to process and train your ML models over these large-scale datasets.
Starting today, you can prepare your petabyte-scale data and explore many ML models with AutoML by chat and with a few clicks. In this post, we show you how you can complete all these steps with the new integration in SageMaker Canvas with Amazon EMR Serverless without writing code.
Solution overview
For this post, we use a sample dataset of a 33 GB CSV file containing flight purchase transactions from Expedia between April 16, 2022, and October 5, 2022. We use the features to predict the base fare of a ticket based on the flight date, distance, seat type, and others.
In the following sections, we demonstrate how to import and prepare the data, optionally export the data, create a model, and run inference, all in SageMaker Canvas.
Prerequisites
You can follow along by completing the following prerequisites:

Set up SageMaker Canvas.
Download the dataset from Kaggle and upload it to an Amazon Simple Storage Service (Amazon S3) bucket.
Enable Amazon EMR Serverless for large data processing in your SageMaker user profile and/or SageMaker domain in the AWS console (as shown in the below screenshot). You can read more about the detailed steps to enable large data processing here.

Import data in SageMaker Canvas
We start by importing the data from Amazon S3 using Amazon SageMaker Data Wrangler in SageMaker Canvas. Complete the following steps:

In SageMaker Canvas, choose Data Wrangler in the navigation pane.
On the Data flows tab, choose Tabular on the Import and prepare dropdown menu.
Enter the S3 URI for the file and choose Go, then choose Next.
Give your dataset a name, choose Random for Sampling method, then choose Import.

Importing data from the SageMaker Data Wrangler flow allows you to interact with a sample of the data before scaling the data preparation flow to the full dataset. This improves time and performance because you don’t need to work with the entirety of the data during preparation. You can later use EMR Serverless to handle the heavy lifting. When SageMaker Data Wrangler finishes importing, you can start transforming the dataset.
After you import the dataset, you can first look at the Data Quality Insights Report to see recommendations from SageMaker Canvas on how to improve the data quality and therefore improve the model’s performance.

In the flow, choose the options menu (three dots) for the node, then choose Get data insights.
Give your analysis a name, select Regression for Problem type, choose baseFare for Target column, select Sampled dataset for Data Size, then choose Create.

Assessing the data quality and analyzing the report’s findings is often the first step because it can guide the proceeding data preparation steps. Within the report, you will find dataset statistics, high priority warnings around target leakage, skewness, anomalies, and a feature summary.
Prepare the data with SageMaker Canvas
Now that you understand your dataset characteristics and potential issues, you can use the Chat for data prep feature in SageMaker Canvas to simplify data preparation with natural language prompts. This generative artificial intelligence (AI)-powered capability reduces the time, effort, and expertise required for the often complex tasks of data preparation.

Choose the .flow file on the top banner to go back to your flow canvas.
Choose the options menu for the node, then choose Chat for data prep.

For our first example, converting searchDate and flightDate to datetime format might help us perform date manipulations and extract useful features such as year, month, day, and the difference in days between searchDate and flightDate. These features can find temporal patterns in the data that can influence the baseFare.

Provide a prompt like “Convert searchDate and flightDate to datetime format” to view the code and choose Add to steps.

In addition to data preparation using the chat UI, you can use LCNC transforms with the SageMaker Data Wrangler UI to transform your data. For example, we use one-hot encoding as a technique to convert categorical data into numerical format using the LCNC interface.

Add the transform Encode categorical.
Choose One-hot encode for Transform and add the following columns: startingAirport, destinationAirport, fareBasisCode, segmentsArrivalAirportCode, segmentsDepartureAirportCode, segmentsAirlineName, segmentsAirlineCode, segmentsEquipmentDescription, and segmentsCabinCode.

You can use the advanced search and filter option in SageMaker Canvas to select columns that are of String data type to simplify the process.

Refer to the SageMaker Canvas blog for other examples using SageMaker Data Wrangler. For this post, we simplify our efforts with these two steps, but we encourage you to use both chat and transforms to add data preparation steps on your own. In our testing, we successfully ran all our data preparation steps through the chat using the following prompts as an example:

“Add another step that extracts relevant features such as year, month, day, and day of the week which can enhance temporality to our dataset”
“Have Canvas convert the travelDuration, segmentsDurationInSeconds, and segmentsDistance column from string to numeric”
“Handle missing values by imputing the mean for the totalTravelDistance column, and replacing missing values as ‘Unknown’ for the segmentsEquipmentDescription column”
“Convert boolean columns isBasicEconomy, isRefundable, and isNonStop to integer format (0 and 1)”
“Scale numerical features like totalFare, seatsRemaining, totalTravelDistance using Standard Scaler from scikit-learn”

When these steps are complete, you can move to the next step of processing the full dataset and creating a model.
(Optional) Export your data in Amazon S3 using an EMR Serverless job
You can process the entire 33 GB dataset by running the data flow using EMR Serverless for the data preparation job without worrying about the infrastructure.

From the last node in the flow diagram, choose Export and Export data to Amazon S3.
Provide a dataset name and output location.
It is recommended to keep Auto job configuration selected unless you want to change any of the Amazon EMR or SageMaker Processing configs. (If your data is greater than 5 GB data processing will run in EMR Serverless, otherwise it will run within the SageMaker Canvas workspace.)
Under EMR Serverless, provide a job name and choose Export.

You can view the job status in SageMaker Canvas on the Data Wrangler page on the Jobs tab.

You can also view the job status on the Amazon EMR Studio console by choosing Applications under Serverless in the navigation pane.

Create a model
You can also create a model at the end of your flow.

Choose Create model from the node options, and SageMaker Canvas will create a dataset and then navigate you to create a model.
Provide a dataset and model name, select Predictive analysis for Problem type, choose baseFare as the target column, then choose Export and create model.

The model creation process will take a couple of minutes to complete.

Choose My Models in the navigation pane.
Choose the model you just exported and navigate to version 1.
Under Model type, choose Configure model.
Select Numeric model type, then choose Save.
On the dropdown menu, choose Quick Build to start the build process.

When the build is complete, on the Analyze page, you can the following tabs:

Overview – This gives you a general overview of the model’s performance, depending on the model type.
Scoring – This shows visualizations that you can use to get more insights into your model’s performance beyond the overall accuracy metrics.
Advanced metrics – This contains your model’s scores for advanced metrics and additional information that can give you a deeper understanding of your model’s performance. You can also view information such as the column impacts.

Run inference
In this section, we walk through the steps to run batch predictions against the generated dataset.

On the Analyze page, choose Predict.
To generate predictions on your test dataset, choose Manual.
Select the test dataset you created and choose Generate predictions.
When the predictions are ready, either choose View in the pop-up message at the bottom of the page or navigate to the Status column to choose Preview on the options menu (three dots).

You’re now able to review the predictions.

You have now used the generative AI data preparation capabilities in SageMaker Canvas to prepare a large dataset, trained a model using AutoML techniques, and run batch predictions at scale. All of this was done with a few clicks and using a natural language interface.
Clean up
To avoid incurring future session charges, log out of SageMaker Canvas. To log out, choose Log out in the navigation pane of the SageMaker Canvas application.
When you log out of SageMaker Canvas, your models and datasets aren’t affected, but SageMaker Canvas cancels any Quick build tasks. If you log out of SageMaker Canvas while running a Quick build, your build might be interrupted until you relaunch the application. When you relaunch, SageMaker Canvas automatically restarts the build. Standard builds continue even if you log out.
Conclusion
The introduction of petabyte-scale AutoML support within SageMaker Canvas marks a significant milestone in the democratization of ML. By combining the power of generative AI, AutoML, and the scalability of EMR Serverless, we’re empowering organizations of all sizes to unlock insights and drive business value from even the largest and most complex datasets.
The benefits of ML are no longer confined to the domain of highly specialized experts. SageMaker Canvas is revolutionizing the way businesses approach data and AI, putting the power of predictive analytics and data-driven decision-making into the hands of everyone. Explore the future of no-code ML with SageMaker Canvas today.

About the authors
Bret Pontillo is a Sr. Solutions Architect at AWS. He works closely with enterprise customers building data lakes and analytical applications on the AWS platform. In his free time, Bret enjoys traveling, watching sports, and trying new restaurants.
Polaris Jhandi is a Cloud Application Architect with AWS Professional Services. He has a background in AI/ML & big data. He is currently working with customers to migrate their legacy Mainframe applications to the Cloud.
Peter Chung is a Solutions Architect serving enterprise customers at AWS. He loves to help customers use technology to solve business problems on various topics like cutting costs and leveraging artificial intelligence. He wrote a book on AWS FinOps, and enjoys reading and building solutions.

InfinityMath: A Scalable Instruction Tuning Dataset for Programmatic M …

One primary driver for artificial intelligence research in mathematical reasoning is that it may further increase model understanding and problem-solving abilities on complex mathematical problems. Applications such as these can be very important in education, finance, and technology—fields dependent on the accuracy of solutions and the speed at which problems are solved. This improvement in model capabilities can be transferred to enhancing AI’s performance in several special tasks and at logical processes generally.

One of the most important challenges in this area is that large-scale, high-quality datasets designed for mathematical reasoning take time. Traditional methods of building such datasets often require a lot of computational resources and a large amount of seed data, making them hard to scale. This limits the models’ ability to handle a wide variety of math problems, which ends up causing errors—most especially on value variations. This raises the issue of consistency in logic, where models make wrong adjustments to their reasoning due to these variations and hence reduce the reliability of the models.

State-of-the-art techniques to improve mathematical reasoning in AI, such as Chain-of-Thought and Program-of-Thought, either have models reason through a problem step by step or embed computation into their reasoning. Many of these methods, however, have been expensive in terms of dependence on large datasets and computational resources and should be made more scalable. They should also thoroughly model one of the big challenges—inconsistencies that arise naturally when a change in the numerical values of problems leads to wrong deductions.

A research team from the Beijing Academy of Artificial Intelligence and China University of Mining & Technology has proposed a scalable dataset for programmatic mathematical reasoning called InfinityMath. According to the authors, InfinityMath is supposed to decouple numeric values from problems stated in mathematics. This way, creating a huge, diverse dataset will require a manageable amount of computational resources. The dataset was created from seven high-quality math sources. It has over 101,380 data points. This makes it quite a comprehensive tool for enhancing the reasoning ability of artificial intelligence models.

The methodology of InfinityMath is multistep for maximum scalability and logical consistency. Masking numerical values of math problems creates generic templates that provide a base for generating problem-solving programs. These are then taken as general templates for developing programs that do not refer to specific numbers, logically following the same reasoning procedure for all possible numerical variations. It can efficiently scale data and improve the resiliency of AI models across different mathematical challenges. Such programs could be generated with sophisticated language models like GPT-4 to reduce potential errors and improve overall quality.

The models fine-tuned with the InfinityMath dataset performed quite well across several benchmarks. For example, aided by the InfinityMath dataset, the Llama2 model showed sensational accuracy improvements in the GSM8K dataset at 316.44% and in the MATH dataset at 1067.6%. Another model fine-tuned on this dataset was CodeLlama, which also showed huge improvements: 120.58% in SVAMP and 1118.09% in SimulEq. These results show that, at the very least, InfinityMath can increase AI models’ accuracy and robustness and improve their reliability in solving various mathematical problems. This consistency was also ahead regarding logical outcomes due to numerical variations; traditional datasets often lack performance.

Therefore, The InfinityMath effect extends beyond mere numerical accuracy to strike at perhaps the most fundamental feature of mathematical reasoning. The authors performed strict, improved evaluations with existing test sets, such as GSM8K+ and MATH+, differing only in the numerical values. Models trained on InfinityMath showed higher performance in logical consistency than any other dataset in accuracy and model efficacy. This success underlines the role played by InfinityMath in further pushing the frontiers of mathematical reasoning and scaling and making an effective solution available to a very large class of AI models.

In other words, InfinityMath is a major improvement in mathematical reasoning, solving two major challenges: scalability and logical consistency. The dataset was curated by a dedicated research team from the Beijing Academy of Artificial Intelligence and the China University of Mining & Technology to ensure that a robust and highly extensible solution could ultimately allow AI models to solve extremely complex mathematical problems. In this case, the InfinityMath process not only separates numerical values from solving processes but also makes constructing a large, highly diversified dataset more efficient to enhance the accuracy and reliability of the AI models. These results thus enable gains in improvement to be witnessed with multiple benchmark-related performances. Therefore, this dataset could further improve AI and its applications in various fields.

Check out the Paper and Dataset. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here

Arcee AI Introduces Arcee Swarm: A Groundbreaking Mixture of Agents MoA Architecture Inspired by the Cooperative Intelligence Found in Nature Itself

The post InfinityMath: A Scalable Instruction Tuning Dataset for Programmatic Mathematical Reasoning appeared first on MarkTechPost.

Prompt Caching is Now Available on the Anthropic API for Specific Clau …

As AI models grow more sophisticated, they often require extensive prompts with detailed context, leading to increased costs and latency in processing. This problem is especially pertinent for use cases like conversational agents, coding assistants, and large document processing, where the context needs to be repeatedly referenced across multiple interactions. The researchers address the challenge of efficiently managing and utilizing large prompt contexts in AI models, particularly in scenarios requiring frequent reuse of similar contextual information.

Traditional methods involve sending the entire prompt context with each API call, which can be costly and time-consuming, especially with long prompts. These methods are not optimized for prompts where the same or similar context is used repeatedly. Anthropic API introduces a new feature called “prompt caching,” which is available for specific Claude models. Prompt caching allows developers to store frequently used prompt contexts and reuse them across multiple API calls. The proposed model significantly reduces the cost and latency associated with sending large prompts repeatedly. The feature is currently in public beta for Claude 3.5 Sonnet and Claude 3 Haiku, with support for Claude 3 Opus forthcoming.

Prompt caching works by enabling developers to cache a large prompt context once and then reuse that cached context in subsequent API calls. This method is particularly effective in scenarios such as extended conversations, coding assistance, large document processing, and agentic search, where a significant amount of contextual information needs to be maintained throughout multiple interactions. The cached content can include detailed instructions, codebase summaries, long-form documents, and other extensive contextual information. The pricing model for prompt caching is structured to be cost-effective: writing to the cache incurs a 25% increase in input token price while reading from the cache costs only 10% of the base input token price. Early users of prompt caching have reported substantial improvements in both cost efficiency and processing speed, making it a valuable tool for optimizing AI-driven applications.

In conclusion, prompt caching addresses a critical need for reducing costs and latency in AI models that require extensive prompt contexts. By allowing developers to store and reuse contextual information, this feature enhances the efficiency of various applications, from conversational agents to large document processing. The implementation of prompt caching on the Anthropic API offers a promising solution to the challenges posed by large prompt contexts, making it a significant advancement in the field of LLMs.

Check out the Details. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here

Arcee AI Introduces Arcee Swarm: A Groundbreaking Mixture of Agents MoA Architecture Inspired by the Cooperative Intelligence Found in Nature Itself

The post Prompt Caching is Now Available on the Anthropic API for Specific Claude Models appeared first on MarkTechPost.

xAI Released Grok-2 Beta: An AI Model with Unparalleled Reasoning, Be …

The release of Grok-2, a very advanced language model that redefines AI reasoning and performance benchmarks, marks a quantum jump toward that goal. This beta release contains Grok-2 and a distilled version called Grok-2 mini, both major improvements over Grok-1.5. The release is part of xAI’s greater strategy to dominate the AI landscape with models that excel in chat, coding, and complex reasoning tasks.

Introduction of Grok-2 and Grok-2 Mini.

Grok-2 is an all-rounder in applications as it does state-of-the-art text and vision understanding. Users were provided with beta versions of the models on the ud835udd4f platform, and the full release of the enterprise API is slated for later this month. It introduces Grok-2 mini, a small but highly capable variant, to balance computational efficiency and quality in the output. The model would do well in situations where speed and resource usage are of the essence.

Benchmark Performance: Outrunning Competition

Grok-2 has already been run on many highly competitive benchmarks and exceeds their standards. Even a preliminary variant of Grok-2, “sus-column-r,” has already been tested in the LMSYS chatbot arena, arguably the best-known benchmark for language models. Grok-2 outperformed the Claude 3.5 Sonnet and very prominent models like GPT-4-Turbo in this setting. More precisely, Grok-2 scored an overall Elo, placing it at the top of the leaderboard, thus establishing cutting-edge reasoning and response generation capabilities.

Key Benchmark Scores:

Graduate-Level Science Knowledge (GPQA): Grok-2 achieved a score of 56.0%, outperforming GPT-4 Turbo (48.0%) and Claude 3.5 Sonnet (50.4%).

General Knowledge (MMLU): Grok-2 scored 87.5%, slightly ahead of GPT-4 Turbo at 86.5% and significantly better than Claude 3.5 Sonnet at 85.7%.

Math Competition Problems (MATH): Grok-2 excelled with a score of 76.1%, surpassing GPT-4 Turbo (72.6%) and far outpacing Claude 3.5 Sonnet at 60.1%.

Visual Math Reasoning (MathVista): Grok-2 achieved 69.0%, establishing itself as a leader in this critical area ahead of both GPT-4 Turbo (58.1%) and Claude 3.5 Sonnet (50.5%).

Document-Based Question Answering (DocVQA): Grok-2 reached 93.6%, outperforming GPT-4 Turbo at 87.2% and Claude 3.5 Sonnet at 89.3%.

Image Source

Advanced Evaluation and Capabilities

Internally, xAI conducted rigorous testing for the abilities of Grok-2. The AI Tutors tested many real-world activities, and the responses were compared to produce the best response under very strict guidelines. The testing involved two areas: following instructions and the accuracy of facts. Grok-2 significantly improved using this content retrieved to reason and advanced tool-use capabilities. On graduate levels of reasoning assessment, it performed well in finding missing information, working through complex sequences of events, and filtering out irrelevant data—critical for tasks that require deep comprehension and accurate execution.

Expanded Capabilities and User Experience

The release of Grok-2 is about performance enhancements and providing a richer user experience on the ud835udd4f platform. Over the past few months, xAI has continuously improved the platform, and Grok-2’s release marks the introduction of a redesigned interface and new features. Premium and Premium+ users now have access to Grok-2 and Grok-2 mini, which integrate real-time information to provide more dynamic and accurate responses.

Grok-2 is more than just a model for text-based tasks; it also excels in vision-based applications. For example, Grok-2’s performance in MathVista, a benchmark for visual math reasoning, and DocVQA, a document-based question-answering task, demonstrate its ability to handle multimodal data effectively. These capabilities make Grok-2 a versatile tool for various applications, from academic research to complex problem-solving.

Image Source

Enterprise API and Future Developments

For developers, xAI is launching Grok-2 and Grok-2 mini through a new enterprise API platform, which will become available later this month. The API is built on a bespoke tech stack that supports multi-region inference deployments, ensuring low-latency global access. This infrastructure is curated to meet the requirements of enterprises with enhanced security features, including mandatory multi-factor authentication (e.g., Yubikey, Apple TouchID, TOTP) and advanced analytics tools for traffic and billing management.

Looking ahead, xAI has ambitious plans to expand Grok-2’s capabilities further. The company is preparing to introduce multimodal understanding as a core feature of the Grok experience, both on the ud835udd4f platform and through the API. This will allow Grok-2 to handle a wider range of data types and deliver even more sophisticated responses.

Conclusion

The release of Grok-2 was a gigantic step toward advancing xAI and put the company at the forefront of artificial intelligence. Advanced reasoning coupled with strong performance on a wide array of benchmarks puts Grok-2 at the forefront of tools in the AI landscape. Introducing the Grok-2 mini adds versatility by giving users a model that balances speed and quality. How far xAI has come with the rapid progress made by its small, highly talented team underscores a commitment to impactful innovation in the future of AI. Grok-2 will continue to mature and become a fundamental tool for casual and technical users, providing a peerless understanding of text and vision.

Check out the Details. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here

Arcee AI Introduces Arcee Swarm: A Groundbreaking Mixture of Agents MoA Architecture Inspired by the Cooperative Intelligence Found in Nature Itself

The post xAI Released Grok-2 Beta: An AI Model with Unparalleled Reasoning, Benchmark-Topping Performance, and Advanced Capabilities appeared first on MarkTechPost.

Delight your customers with great conversational experiences via QnABo …

QnABot on AWS (an AWS Solution) now provides access to Amazon Bedrock foundational models (FMs) and Knowledge Bases for Amazon Bedrock, a fully managed end-to-end Retrieval Augmented Generation (RAG) workflow. You can now provide contextual information from your private data sources that can be used to create rich, contextual, conversational experiences.
The advent of generative artificial intelligence (AI) provides organizations unique opportunities to digitally transform customer experiences. Enterprises with contact center operations are looking to improve customer satisfaction by providing self-service, conversational, interactive chat bots that have natural language understanding (NLU). Enterprises want to automate frequently asked transactional questions, provide a friendly conversational interface, and improve operational efficiency. In turn, customers can ask a variety of questions and receive accurate answers powered by generative AI.
In this post, we discuss how to use QnABot on AWS to deploy a fully functional chatbot integrated with other AWS services, and delight your customers with human agent like conversational experiences.
Solution overview
QnABot on AWS is an AWS Solution that enterprises can use to enable a multi-channel, multi-language chatbot with NLU to improve end customer experiences. QnABot provides a flexible, tiered conversational interface empowering enterprises to meet customers where they are and provide accurate responses. Some responses need to be exact (for example, regulated industries like healthcare or capital markets), some responses need to be searched from large, indexed data sources and cited, and some answers need to be generated on the fly, conversationally, based on semantic context. With QnABot on AWS, you can achieve all of the above by deploying the solution using an AWS CloudFormation template, with no coding required. The solution is extensible, uses AWS AI and machine learning (ML) services, and integrates with multiple channels such as voice, web, and text (SMS).
QnABot on AWS provides access to multiple FMs through Amazon Bedrock, so you can create conversational interfaces based on your customers’ language needs (such as Spanish, English, or French), sophistication of questions, and accuracy of responses based on user intent. You now have the capability to access various large language models (LLMs) from leading AI enterprises (such as Amazon Titan, Anthropic Claude 3, Cohere Command, Meta Llama 3, Mistal AI Large Model, and others on Amazon Bedrock) to find a model best suited for your use case. Additionally, native integration with Knowledge Bases for Amazon Bedrock allows you to retrieve specific, relevant data from your data sources via pre-built data source connectors (Amazon Simple Storage Service – S3, Confluence, Microsoft SharePoint, Salesforce, or web crawlers), and automatically converted to text embeddings stored in a vector database of your choice. You can then retrieve your company-specific information with source attribution (such as citations) to improve transparency and minimize hallucinations. Lastly, if you don’t want to set up custom integrations with large data sources, you can simply upload your documents and support multi-turn conversations. With prompt engineering, managed RAG workflows, and access to multiple FMs, you can provide your customers rich, human agent-like experiences with precise answers.
Deploying the QnABot solution builds the following environment in the AWS Cloud.

Figure 1: QnABot Architecture Diagram

The high-level process flow for the solution components deployed with the CloudFormation template is as follows:

The admin deploys the solution into their AWS account, opens the Content Designer UI or Amazon Lex web client, and uses Amazon Cognito to authenticate.
After authentication, Amazon API Gateway and Amazon S3 deliver the contents of the Content Designer UI.
The admin configures questions and answers in the Content Designer and the UI sends requests to API Gateway to save the questions and answers.
The Content Designer AWS Lambda function saves the input in Amazon OpenSearch Service in a questions bank index. If using text embeddings, these requests first pass through a LLM model hosted on Amazon Bedrock or Amazon SageMaker to generate embeddings before being saved into the question bank on OpenSearch Service.
Users of the chatbot interact with Amazon Lex through the web client UI, Amazon Alexa, or Amazon Connect.
Amazon Lex forwards requests to the Bot Fulfillment Lambda function. Users can also send requests to this Lambda function through Amazon Alexa devices.
The user and chat information is stored in Amazon DynamoDB to disambiguate follow-up questions from previous question and answer context.
The Bot Fulfillment Lambda function takes the user’s input and uses Amazon Comprehend and Amazon Translate (if necessary) to translate non-native language requests to the native language selected by the user during the deployment, and then looks up the answer in OpenSearch Service. If using LLM features such as text generation and text embeddings, these requests first pass through various LLM models hosted on Amazon Bedrock or SageMaker to generate the search query and embeddings to compare with those saved in the question bank on OpenSearch Service.
If no match is returned from the OpenSearch Service question bank, then the Bot Fulfillment Lambda function forwards the request as follows:

If an Amazon Kendra index is configured for fallback, then the Bot Fulfillment Lambda function forwards the request to Amazon Kendra if no match is returned from the OpenSearch Service question bank. The text generation LLM can optionally be used to create the search query and synthesize a response from the returned document excerpts.
If a knowledge base ID is configured, the Bot Fulfillment Lambda function forwards the request to the knowledge base. The Bot Fulfillment Lambda function uses the RetrieveAndGenerate API to fetch the relevant results for a user query, augment the FM’s prompt, and return the response.

User interactions with the Bot Fulfillment function generate logs and metrics data, which is sent to Amazon Kinesis Data Firehose and then to Amazon S3 for later data analysis.
OpenSearch Dashboards can be used to view usage history, logged utterances, no hits utterances, positive user feedback, and negative user feedback, and also provides the ability to create custom reports.

Prerequisites
To get started, you need the following:

An AWS account
An active deployment of QnABot on AWS (version 6.0.0 or later)
Amazon Bedrock model access (required) for all embeddings and LLM models that will be used in QnABot

Figure 2: Request Access to Bedrock Foundational Models (FMs)

The Amazon Bedrock knowledge base ID (create a knowledge base in Amazon Bedrock if you want to chat with your documents)

In the following sections, we explore some of QnABot’s generative AI features.
Semantic question matching using an embeddings LLM
QnABot on AWS can use text embeddings to provide semantic search capabilities by using LLMs. The goal of this feature is to improve question matching accuracy while reducing the amount of tuning required when compared to the default OpenSearch Service keyword-based matching.
Some of the benefits include:

Improved FAQ accuracy from semantic matching vs. keyword matching (comparing the meaning vs. comparing individual words)
Fewer training utterances required to match a diverse set of queries
Better multi-language support, because translated utterances only need to match the meaning of the stored text, not the wording

Configure Amazon Bedrock to enable semantic question matching
To enable these expanded semantic search capabilities, QnABot uses an Amazon Bedrock FM to generate text embeddings provided using the EmbeddingsBedrockModelId CloudFormation stack parameter. These models provide the best performance and operate on a pay-per-request model. At the time of writing, the following embeddings models are supported by QnABot on AWS:

Amazon Titan Embeddings G1
Cohere English
Cohere Multilingual

For the CloudFormation stack, set the following parameters:

Set EmbeddingsAPI to BEDROCK
Set EmbeddingsBedrockModelId to one of the available options

For example, with semantic matching enabled, the question “What’s the address of the White House?” matches to “Where does the President live?” This example doesn’t match using keywords because they don’t share any of the same words.

Figure 3: Semantic matching in QnABot

In the UI designer, you can set ENABLE_DEBUG_RESPONSE to true to see the user input, source, or any errors of the answer, as illustrated in the preceding screenshot.
You can also evaluate the matching score on the TEST tab in the content designer UI. In this example, we add a match on “qna item question” with the question “Where does the President live?”

Figure 4: Test and evaluate answers in QnABot

Similarly, you can try a match on “item text passage” with the question “Where did Humpty Dumpty sit?”

Figure 5: Match items or text passages in QnABot

Recommendations for tuning with an embeddings LLM
When using embeddings in QnABot, we recommend generalizing questions because more user utterances will match a general statement. For example, the embeddings LLM model will cluster “checking” and “savings” with “account,” so if you want to match both account types, use “account” in your questions.
Similarly, for the question and utterance of “transfer to an agent,” consider using “transfer to someone” because it will better match with “agent,” “representative,” “human,” “person,” and so on.
In addition, we recommend tuning EMBEDDINGS_SCORE_THRESHOLD, EMBEDDINGS_SCORE_ANSWER_THRESHOLD, and EMBEDDINGS_TEXT_PASSAGE_SCORE_THRESHOLD based on the scores. The default values are generalized to all multiple models, but you might need to modify this based on embeddings model and your experiments.
Text generation and query disambiguation using a text LLM
QnABot on AWS can use LLMs to provide a richer, more conversational chat experience. The goal of these features is to minimize the amount of individually curated answers administrators are required to maintain, improve question matching accuracy by providing query disambiguation, and enable the solution to provide more concise answers to users, especially when using a knowledge base in Amazon Bedrock or the Amazon Kendra fallback feature.
Configure an Amazon Bedrock FM with AWS CloudFormation
To enable these capabilities, QnABot uses one of the Amazon Bedrock FMs to generate text embeddings provided using the LLMBedrockModelId CloudFormation stack parameter. These models provide the best performance and operate on a pay-per-request model.
For the CloudFormation stack, set the following parameters:

Set LLMApi to BEDROCK
Set LLMBedrockModelId to one of the available LLM options

Figure 6: Setup QnABot to use Bedrock FMs

Query disambiguation (LLM-generated query)
By using an LLM, QnABot can take the user’s chat history and generate a standalone question for the current utterance. This enables users to ask follow-up questions that on their own may not be answerable without context of the conversation. The new disambiguated, or standalone, question can then be used as search queries to retrieve the best FAQ, passage, or Amazon Kendra match.
In QnABot’s Content Designer, you can further customize the prompt and model listed in the Query Matching section:

LLM_GENERATE_QUERY_PROMPT_TEMPLATE – The prompt template used to construct a prompt for the LLM to disambiguate a follow-up question. The template may use the following placeholders:

history – A placeholder for the last LLM_CHAT_HISTORY_MAX_MESSAGES messages in the conversational history, to provide conversational context.
input – A placeholder for the current user utterance or question.

LLM_GENERATE_QUERY_MODEL_PARAMS – The parameters sent to the LLM model when disambiguating follow-up questions. Refer to the relevant model documentation for additional values that the model provider accepts.

The following screenshot shows an example with the new LLM disambiguation feature enabled, given the chat history context after answering “Who was Little Bo Peep” and the follow-up question “Did she find them again?”

Figure 7: LLM query disambiguation feature enabled

QnABot rewrites that question to provide all the context required to search for the relevant FAQ or passage: “Did Little Bo Peep find her lost sheep again?”

Figure 8: With query disambiguation with LLMs, context is maintained

Answer text generation using QnABot
You can now generate answers to questions from context provided by knowledge base search results, or from text passages created or imported directly into QnABot. This allows you to generate answers that reduce the number of FAQs you have to maintain, because you can now synthesize concise answers from your existing documents in a knowledge base, Amazon Kendra index, or document passages stored in QnABot as text items. Additionally, your generated answers can be concise and therefore suitable for voice or contact center chatbots, website bots, and SMS bots. Lastly, these generated answers are compatible with the solution’s multi-language support—customers can interact in their chosen languages and receive generated answers in the same language.
With QnABot, you can use two different data sources to generate responses: text passages or a knowledge base in Amazon Bedrock.
Generate answers to questions from text passages
In the content designer web interface, administrators can store full text passages for QnABot on AWS to use. When a question gets asked that matches against this passage, the solution can use LLMs to answer the user’s question based on information found within the passage. We highly recommend you use this option with semantic question matching using Amazon Bedrock text embedding. In QnABot content designer, you can further customize the prompt and model listed under Text Generation using the General Settings section.
Let’s look at a text passage example:

In the Content Designer, choose Add.
Select the text, enter an item ID and a passage, and choose Create.

You can also import your passages from a JSON file using the Content Designer Import feature. On the tools menu, choose Import, open Examples/Extensions, and choose LOAD next to TextPassage-NurseryRhymeExamples to import two nursery rhyme text items.
The following example shows QnABot generating an answer using a text passage item that contains the nursery rhyme, in response to the question “Where did Humpty Dumpty sit?”

Figure 9: Generate answers from text passages

You can also use query disambiguation and text generation together, by asking “Who tried to fix Humpty Dumpty?” and the follow-up question “Did they succeed?”

Figure 10: Text generation with query disambiguation to maintain context

You can also modify LLM_QA_PROMPT_TEMPLATE in the Content Designer to answer in different languages. In the prompt, you can specify the prompt and answers in different languages (e.g. prompts in French, Spanish).

Figure 11: Answer in different languages

You can also specify answers in two languages with bulleted points.

Figure 12: Answer in multiple languages

RAG using an Amazon Bedrock knowledge base
By integrating with a knowledge base, QnABot on AWS can generate concise answers to users’ questions from configured data sources. This prevents the need for users to sift through larger text passages to find the answer. You can also create your own knowledge base from files stored in an S3 bucket. Amazon Bedrock knowledge bases with QnABot don’t require EmbeddingsApi and LLMApi because the embeddings and generative response are already provided by the knowledge base. To enable this option, create an Amazon Bedrock knowledge base and use your knowledge base ID for the CloudFormation stack parameter BedrockKnowledgeBaseId.
To configure QnABot to use the knowledge base, refer to Create a knowledge base. The following is a quick setup guide to get started:

Provide your knowledge base details.

Figure 13: Setup Amazon Bedrock Knowledge Base for RAG use cases

Configure your data source based on the available options. For this example, we use Amazon S3 as the data source and note that the bucket has to be prepended with qna or QNA.

Figure 14: Setup your RAG data sources for Amazon Knowledge Base

Upload your documents to Amazon S3. For this example, we uploaded the aws-overview.pdf whitepaper to test integration.
Create or choose your vector database store to allow Bedrock to store, update and manage embeddings.
Sync the data source and use your knowledge base ID for the CloudFormation stack parameter BedrockKnowledgeBaseId.

Figure 15: Complete setting up Amazon Bedrock Knowledge Base for your RAG use cases

In QnABot Content Designer, you can customize additional settings list under Text Generation using RAG with the Amazon Bedrock knowledge base.
QnABot on AWS can now answer questions from the AWS whitepapers, such as “What services are available in AWS for container orchestration?” and “Are there any upfront fees with ECS?”

Figure 16: Generate answers from your Amazon Bedrock Knowledge Base (RAG)

Conclusion
Customers expect quick and efficient service from enterprises in today’s fast-paced world. But providing excellent customer experience can be significantly challenging when the volume of inquiries outpaces the human resources employed to address them. Companies of all sizes can use QnABot on AWS with built-in Amazon Bedrock integrations to provide access to many market leading FMs, provide specialized lookup needs using RAG to reduce hallucinations, and provide a friendly AI conversational experience. With QnABot on AWS, you can provide high-quality natural text conversations, content management, and multi-turn dialogues. The solution comes with one-click deployment for custom implementation, a content designer for Q&A management, and rich reporting. You can also integrate with contact center systems like Amazon Connect and Genesys Cloud CX. Get started with QnABot on AWS.

About the Author
Ajay Swamy is the Product Leader for Data, ML and Generative AI AWS Solutions. He specializes in building AWS Solutions (production-ready software packages) that deliver compelling value to customers by solving for their unique business needs. Other than QnABot on AWS, he manages Generative AI Application Builder, Enhanced Document Understanding, Discovering Hot Topics using Machine Learning and other AWS Solutions. He lives with his wife and dog (Figaro), in New York, NY.
Abhishek Patil is a Software Development Engineer at Amazon Web Services (AWS) based in Atlanta, GA, USA. With over 7 years of experience in the tech industry, he specializes in building distributed software systems, with a primary focus on Generative AI and Machine Learning. Abhishek is a primary builder on AI solution QnABot on AWS and has contributed to other AWS Solutions including Discovering Hot Topics using Machine Learning and OSDU® Data Platform. Outside of work, Abhishek enjoys spending time outdoors, reading, resistance training, and practicing yoga.

The AI Scientist: The World’s First AI System for Automating Scienti …

Artificial intelligence (AI) has evolved into a powerful tool beyond simple automation, becoming a critical asset in scientific research. Integrating AI in scientific discovery is reshaping the landscape by enabling machines to perform tasks that traditionally require human intelligence. This evolution marks a shift towards a future where AI assists and autonomously drives scientific innovation. The goal is to develop AI systems that can independently generate hypotheses, conduct experiments, and produce scientific knowledge, ultimately accelerating the pace of discovery in various fields.

A significant challenge in this evolution is the limited capacity of current AI systems to carry out the full spectrum of scientific research autonomously. While AI has made strides in specific tasks like data analysis and experiment execution, these systems are generally constrained by human-defined parameters and require substantial human oversight. This limitation hinders the potential of AI to engage in open-ended exploration and to generate new, groundbreaking knowledge autonomously. The bottleneck lies in the inability of AI to fully integrate and automate the entire research process from ideation to publication without human intervention.

Traditional methods in AI-assisted research have focused on optimizing individual components of the scientific process. For example, hyperparameter tuning and algorithm discovery are often automated, but these efforts still need to be completed. AI systems typically perform well-defined tasks within narrowly scoped research problems, such as improving specific machine learning models or analyzing predefined datasets. However, these systems need the holistic approach needed to independently drive the research process from start to finish, limiting their contributions to incremental improvements rather than pioneering new avenues of scientific inquiry.

Researchers from Sakana AI, FLAIR, the University of Oxford, the University of British Columbia, Vector Institute, and Canada CIFAR have developed “The AI Scientist,” a groundbreaking framework that aims to automate the scientific discovery fully. This innovative system leverages large language models (LLMs) to autonomously generate research ideas, conduct experiments, and produce scientific manuscripts. The AI Scientist represents a significant advancement in the quest for fully autonomous research, integrating all aspects of the scientific process into a single, seamless workflow. This approach enhances efficiency and democratizes access to scientific research, making it possible for cutting-edge studies to be conducted at a fraction of the traditional cost.

The AI Scientist operates through three phases: idea generation, experimental iteration, and paper write-up. The system begins by generating diverse research ideas using LLMs inspired by evolutionary computation principles. These ideas are then filtered through a literature review and novelty assessment to ensure their originality and feasibility. Once an idea is selected, the AI Scientist uses a coding assistant named Aider to implement the necessary code modifications and execute the experiments. Aider executes the code and iteratively refines it based on experimental results, enhancing the robustness and reliability of the research process. Finally, the AI Scientist compiles the results into a scientific paper using LaTeX, incorporating real experimental data and citations to ensure accuracy and relevance.

The AI Scientist has demonstrated impressive performance, generating research papers that meet or exceed the quality standards of top machine learning conferences. For instance, the system produced a full scientific manuscript at an estimated cost of just $15 per paper. In evaluating these papers, the AI Scientist’s automated reviewer, based on the GPT-4o model, achieved a balanced accuracy of 70% when assessing the quality of generated research, closely aligning with human reviewers who scored 73%. The system’s ability to generate hundreds of medium-quality papers within a week underscores its potential to accelerate the research process significantly. For example, one highlighted result showed a 12.8% reduction in KL divergence in a diffusion modeling experiment, a key metric for evaluating the quality of generated data. Furthermore, the AI Scientist’s framework allowed for the continuous iteration of ideas, improving each subsequent research output based on feedback from previous experiments.

To conclude, the development of the AI Scientist marks a crucial step forward in automating scientific research. By addressing the limitations of traditional AI systems, this framework opens new possibilities for innovation across various scientific disciplines. While the current iteration of the AI Scientist shows great promise, ongoing refinements will be necessary to enhance its performance, especially in handling more complex, real-world problems. Nonetheless, the AI Scientist represents a pioneering journey towards fully autonomous, AI-driven research, offering a glimpse into a future where machines could independently drive scientific progress on a global scale.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here

Researchers at FPT Software AI Center Introduce XMainframe: A State-of-the-Art Large Language Model (LLM) Specialized for Mainframe Modernization to Address the $100B Legacy Code Modernization

The post The AI Scientist: The World’s First AI System for Automating Scientific Research and Open-Ended Discovery appeared first on MarkTechPost.

Mobius Labs Introduces Aana SDK: Open-Source SDK Empowering Seamless D …

The rapid advancement of AI and machine learning has transformed industries, yet deploying complex models at scale remains challenging. This is particularly true for multimodal applications integrating diverse data types like vision, audio, and language. As AI applications grow more sophisticated, transitioning from prototypes to production-ready systems becomes increasingly complex. There is a pressing need for efficient, scalable, and user-friendly frameworks to facilitate this transition and streamline the development of advanced AI applications in real-world scenarios.

Multimodal AI processes various data types simultaneously, enabling complex scene analysis, object recognition, speech transcription, and context understanding. This technology facilitates advanced applications previously deemed science fiction. Mobius Labs introduces Aana SDK, an open-source toolkit addressing challenges in multimodal AI development. It manages diverse inputs, scales Generative AI applications, and ensures extensibility. The SDK forms the core infrastructure for Mobius Labs’ AI solutions.

Aana SDK bridges cutting-edge AI research and practical, enterprise-grade applications. It simplifies the integration of multiple AI models, manages various data types, and scales applications efficiently. The SDK addresses key challenges in managing multimodal inputs, scaling Generative AI, and ensuring extensibility. Its design philosophy prioritizes reliability, scalability, efficiency, and ease of use, offering fault tolerance, distributed computing capabilities, optimized resource utilization, and accessibility for developers of all skill levels.

Aana SDK is a powerful framework for multimodal applications, enabling large-scale deployment of machine learning models for vision, audio, and language. It supports Retrieval-Augmented Generation systems and facilitates advanced applications like search engines and recommendation systems. The SDK adheres to principles of reliability, scalability, efficiency, and ease of use. Built on Ray distributed computing framework, it offers fault tolerance and easy scaling. The SDK remains in development, with ongoing improvements and openness to feedback.

Aana SDK simplifies the deployment and integration of machine learning models into real-world applications at scale. Key features include model deployment, automatic API and documentation generation, predefined data types, streaming support, and task queue functionality. It offers integrations with various ML models and libraries. Installation options include PyPI and GitHub, with recommendations for optimal PyTorch and Flash Attention library installations for enhanced performance.

The Aana SDK offers a GitHub template and example applications for machine learning projects. It features three core components: deployments, endpoints, and AanaSDK class. With comprehensive documentation, Apache 2.0 licensing, and Docker support, it’s a versatile tool for developers. The SDK welcomes community contributions and adheres to the Contributor Covenant. Future trends focus on multimodal capabilities, agentic workflows, embodied intelligence, and on-device AI, aiming to create efficient, scalable applications across various domains with minimal computational overhead.

In conclusion, Aana SDK presents a robust framework for developing and deploying multimodal machine-learning applications at scale. It addresses the complex challenges of implementing advanced AI systems in real-world scenarios by combining ease of use with powerful features such as automated API generation, flexible model deployment, and integration with various ML libraries. The framework’s design principles of reliability, scalability, and efficiency, along with its extensive documentation and open-source nature, position it as a valuable tool for developers and researchers in applied machine learning. As Aana SDK continues to evolve, it promises to significantly streamline the process of transitioning sophisticated AI models from experimentation to production environments.

Check out the Blog and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here

Arcee AI Released DistillKit: An Open Source, Easy-to-Use Tool Transforming Model Distillation for Creating Efficient, High-Performance Small Language Models

The post Mobius Labs Introduces Aana SDK: Open-Source SDK Empowering Seamless Deployment of Advanced Machine Learning Applications appeared first on MarkTechPost.

Sarvam AI Releases Samvaad-Hi-v1 Dataset and Sarvam-2B: A 2 Billion Pa …

Sarvam AI has recently unveiled its cutting-edge language model, Sarvam-2B. This powerful model, boasting 2 billion parameters, represents a significant stride in Indic language processing. With a focus on inclusivity and cultural representation, Sarvam-2B is pre-trained from scratch on a massive dataset of 4 trillion high-quality tokens, with an impressive 50% dedicated to Indic languages. This development, particularly their ability to understand and generate text in languages, is historically underrepresented in AI research.

They have also introduced the Samvaad-Hi-v1 dataset, a meticulously curated collection of 100,000 high-quality English, Hindi, and Hinglish conversations. This dataset is uniquely designed with an Indic context, making it an invaluable resource for researchers and developers working on multilingual and culturally relevant AI models. Samvaad-Hi-v1 is poised to enhance the training of conversational AI systems that can understand and engage with users more naturally and contextually appropriately across different languages and dialects prevalent in India.

The Vision Behind Sarvam-2B

Sarvam AI’s vision with Sarvam-2B is clear: to create a robust and versatile language model that excels in English and champions Indic languages. This is especially important in a country like India, where linguistic diversity is vast, and the need for AI models that can effectively process and generate text in multiple languages is paramount.

The model supports 10 Indic languages, including Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil, and Telugu. This broad language support ensures the model is accessible to many users across different linguistic backgrounds. The model’s architecture and training process have been meticulously designed to ensure it performs well across all supported languages, making it a versatile tool for developers and researchers.

Technical Excellence and Implementation

Sarvam-2B has been trained on a balanced mix of English and Indic language data, each contributing 2 trillion tokens to the training process. This careful balance ensures that the model is equally proficient in English and the supported Indic languages. The training process involved sophisticated techniques to enhance the model’s understanding and generation capabilities, making it one of the most advanced models in its category.

Expanding the Horizon: Complementary Models

In addition to Sarvam-2B, Sarvam AI has also introduced three other remarkable models that complement its capabilities:

Bulbul 1.0: A Text-to-Speech (TTS) model that supports combinations of 10 languages and six voices. This model generates natural-sounding speech, making it a valuable tool for applications requiring multilingual voice output.

Saaras 1.0: A Speech-to-Text (STT) model that supports the same ten languages and includes automatic language identification. This model is particularly useful for transcribing spoken language into text, with the added advantage of detecting the language automatically.

Mayura 1.0: A translation API designed to handle the complexities of translating between Indian languages and English. This model is tailored to address the nuances and unique challenges associated with Indian languages, providing more accurate and culturally relevant translations.

Conclusion

Sarvam AI launched Sarvam-2B, particularly in the context of language models designed for Indic languages. By dedicating half of its training data to these languages, Sarvam-2B stands out as a model that actively promotes linguistic diversity’s importance. The model’s versatility, combined with the complementary capabilities of Bulbul 1.0, Saaras 1.0, and Mayura 1.0, positions Sarvam AI as a leader in developing inclusive, innovative, and forward-thinking AI technologies.

Check out the Model Card and Dataset. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here

Arcee AI Released DistillKit: An Open Source, Easy-to-Use Tool Transforming Model Distillation for Creating Efficient, High-Performance Small Language Models

The post Sarvam AI Releases Samvaad-Hi-v1 Dataset and Sarvam-2B: A 2 Billion Parameter Language Model with 4 Trillion Tokens Focused on 10 Indic Languages for Enhanced NLP appeared first on MarkTechPost.

Introducing document-level sync reports: Enhanced data sync visibility …

Amazon Q Business is a fully managed, generative artificial intelligence (AI)-powered assistant that helps enterprises unlock the value of their data and knowledge. With Amazon Q, you can quickly find answers to questions, generate summaries and content, and complete tasks by using the information and expertise stored across your company’s various data sources and enterprise systems. At the core of this capability are native data source connectors that seamlessly integrate and index content from multiple repositories into a unified index. This enables the Amazon Q large language model (LLM) to provide accurate, well-written answers by drawing from the consolidated data and information. The data source connectors act as a bridge, synchronizing content from disparate systems like Salesforce, Jira, and SharePoint into a centralized index that powers the natural language understanding and generative abilities of Amazon Q.
Customers appreciate that Amazon Q Business securely connects to over 40 data sources. While using their data source, they want better visibility into the document processing lifecycle during data source sync jobs. They want to know the status of each document they attempted to crawl and index, as well as the ability to troubleshoot why certain documents were not returned with the expected answers. Additionally, they want access to metadata, timestamps, and access control lists (ACLs) for the indexed documents.
We are pleased to announce a new feature now available in Amazon Q Business that significantly improves visibility into data source sync operations. The latest release introduces a comprehensive document-level report incorporated into the sync history, providing administrators with granular indexing status, metadata, and ACL details for every document processed during a data source sync job. This enhancement to sync job observability enables administrators to quickly investigate and resolve ingestion or access issues encountered while setting up an Amazon Q Business application. The detailed document reports are persisted in the new SYNC_RUN_HISTORY_REPORT log stream under the Amazon Q Business application log group, so critical sync job details are available on-demand when troubleshooting.
Lifecycle of a document in a data source sync run job
In this section, we examine the lifecycle of a document within a data source sync in Amazon Q Business. This provides valuable insight into the sync process. The data source sync comprises three key stages: crawling, syncing, and indexing. Crawling involves the connector connecting to the data source and extracting documents meeting the defined sync scope according to the data source configuration. These documents are then synced to Amazon Q Business during the syncing phase. Finally, indexing makes the synced documents searchable within the Amazon Q Business environment.
The following diagram shows a flowchart of a sync run job.

Crawling stage
The first stage is the crawling stage, where the connector crawls all documents and their metadata from the data source. During this stage, the connector also compares the checksum of the document against the Amazon Q index to figure out if a particular document needs to be added, modified, or deleted from the index. This operation corresponds to the CrawlAction field in the sync run history report.
If the document is unmodified, it is marked as UNMODIFIED and skipped in the rest of the stages. If any document fails in the crawling stage, for example due to throttling errors, broken content, or if the document size is too big, that document is marked as failed in the sync run history report with the CrawlStatus as FAILED. If the document was skipped due to any validation errors, its CrawlStatus is marked as SKIPPED. These documents are not sent forward to the next stage. All successful documents are marked as SUCCESS and are sent forward.
We also capture the ACLs and metadata on each document in this stage to be able to add it to the sync run history report.
Syncing stage
During the syncing stage, the document is sent to Amazon Q Business ingestion service APIs like BatchPutDocument and BatchDeleteDocument. After a document is submitted to these APIs, Amazon Q Business runs validation checks on the submitted documents. If any document fails these checks, its SyncStatus is marked as FAILED. If there is an irrecoverable error for a particular document, it is marked as SKIPPED and other documents are sent forward.
Indexing stage
In this step, Amazon Q Business parses the document, processes it according to its content type, and persists it in the index. If the document fails to be persisted, its IndexStatus is marked as FAILED; otherwise, it is marked as SUCCESS.
After the statuses of all the stages have been captured, we emit these statuses as an Amazon Cloudwatch event to the customer’s AWS account.
Key features and benefits of document-level reports
The following are the key features and benefits of the new document level report in Amazon Q Business applications:

Enhanced sync run history page – A new Actions column has been added to the sync run history page, providing access to the document-level report for each sync run.
Dedicated log stream – A new log stream named SYNC_RUN_HISTORY_REPORT has been created in the Amazon Q Business CloudWatch log group, containing the document-level report.
Comprehensive document information – The document-level report includes the following information for each document.
Document ID – This is the document ID that is inherited directly from the data source or mapped by the customer in the data source field mappings.
Document title – The title of the document is taken from the data source or mapped by the customer in the data source field mappings.
Consolidated document status (SUCCESS, FAILED, or SKIPPED) – This is the final consolidated status of the document. It can have a value of SUCCESS, FAILED, or SKIPPED. If the document was successfully processed in all stages, then the value is SUCCESS. If the document has failed or was skipped in any of the stages, then the value of this field will be FAILED or SKIPPED.
Error message (if the document failed) – This field contains the error message with which a document failed. If a document was skipped due to throttling errors, or any internal errors, this will be shown in the error message field.
Crawl status – This field denotes whether the document was crawled successfully from the data source. This status correlates to the syncing-crawling state in the data source sync.
Sync status – This field denotes whether the document was sent for syncing successfully. This correlates to the syncing-indexing state in the data source sync.
Index status – This field denotes whether the document was successfully persisted in the index.
ACLs – This field contains a list of document-level permissions that were crawled from the data source. The details of each element in the list are:

Global name: This is the email/username of the user. This field is mapped across multiple data sources. For example, if a user has 3 data sources – Confluence, Sharepoint and Gmail with the local user ID as confluence_user, sharepoint_user and gmail_user respectively, and their email address user@email.com is the globalName in the ACL for all of them; then Amazon Q Business understands that all of these local user IDs map to the same global name.
Name: This is the local unique ID of the user which is assigned by the data source.
Type: This field indicates the principal type. This can be either USER or GROUP.
Is Federated: This is a boolean flag which indicates whether the group is of INDEX level (true) or DATASOURCE level (false).
Access: This field indicates whether the user has access allowed or denied explicitly. Values can be either ALLOWED or DENIED.
Data source ID: This is the data source ID. For federated groups (INDEX level), this field will be null.

Metadata – This field contains the metadata fields (other than ACL) that were pulled from the data source. This list also includes the metadata fields mapped by the customer in the data source field mappings as well as extra metadata fields added by the connector.
Hashed document ID (for troubleshooting assistance) – To safeguard your data privacy, we present a secure, one-way hash of the document identifier. This encrypted value enables the Amazon Q Business team to efficiently locate and analyze the specific document within our logs, should you encounter any issue that requires further investigation and resolution.
Timestamp – The timestamp indicates when the document status was logged in CloudWatch.

In the following sections, we explore different use cases for the logging feature.
Troubleshoot “Sorry, I could not find relevant information” with the new logging feature
The new document-level logging feature in Amazon Q Business can help troubleshoot common issues related to the “Sorry, I could not find relevant information to complete your request” response.
Let’s explore an example scenario. A mutual funds manager uses Amazon Q Business chat for knowledge retrieval and insights extraction across their enterprise data stores. When the fund manager asks, “What is the CAGR of the multi-asset fund?” in the Amazon Q chat, they receive the “Sorry, I could not find relevant information to complete your request” response.
As the administrator managing their Amazon Q Business application, you can troubleshoot the issue using the following approach with the new logging feature. First, you want to determine whether the multi-asset fund document was successfully indexed in the Amazon Q Business application. Next, you need to verify if the fund manager’s user account has the required permission to read the information from the multi-asset fund document. Amazon Q Business enforces the document permissions configured in its data source, and you can use this new feature to verify that the document ACL settings are synced in the Amazon Q Business application index.
You can use the following CloudWatch query string to check the document ACL settings:

filter @logStream like ‘SYNC_RUN_HISTORY_REPORT/’
and DocumentTitle = “your-document-title”
| fields DocumentTitle, ConnectorDocumentStatus.Status, Acl
| sort @timestamp desc
| limit 1

This query filter uses the per-document-level logging stream SYNC_RUN_HISTORY_REPORT, and displays the document title and its associated ACL settings. By verifying the document indexing and permissions, you can identify and resolve potential issues that may be causing the “Sorry, I could not find relevant information” response.
The following screenshot shows an example result.

Determine the optimal boosting duration for recent documents in using document-level reporting
When it comes to generating accurate answers, you may want to fine-tune the way Amazon Q prioritizes its content. For instance, you may prefer to boost recent documents over older ones to make sure the most up-to-date passages are used to generate an answer. To achieve this, you can use the business’s relevance tuning feature in Amazon Q Business to boost documents based on the last update date attribute, with a specified boosting duration. However, determining the optimal boosting period can be challenging when dealing with a large number of frequently changing documents.
You can now use the per-document-level report to obtain the _last_updated_at metadata field information for your documents, which can help you determine the appropriate boosting period. For this, you use the following CloudWatch Logs Insights query to retrieve the _last_updated_at metadata attribute for machine learning documents from the SYNC_RUN_HISTORY_REPORT log stream:

filter @logStream like ‘SYNC_RUN_HISTORY_REPORT/’
and Metadata like ‘Machine Learning’
| parse Metadata ‘{“key”:”_last_updated_at”,”value”:{“dateValue”:”*”}}’ as @last_updated_at
| sort @last_updated_at desc, @timestamp desc
| dedup DocumentTitle

With the preceding query, you can gain insights into the last updated timestamps of your documents, enabling you to make informed decisions about the optimal boosting period. This approach makes sure your chat responses are generated using the most recent and relevant information, enhancing the overall accuracy and effectiveness of your Amazon Q Business implementation.
The following screenshot shows an example result.

Common document indexing observability and troubleshooting methods
In this section, we explore some common admin tasks for observing and troubleshooting document indexing using the new document-level reporting feature.
List all successfully indexed documents from a data source
To retrieve a list of all documents that have been successfully indexed from a specific data source, you can use the following CloudWatch query:

fields DocumentTitle, DocumentId, @timestamp
| filter @logStream like ‘SYNC_RUN_HISTORY_REPORT/your-data-source-id/’
and ConnectorDocumentStatus.Status = “SUCCESS”
| sort @timestamp desc | dedup DocumentTitle, DocumentId

The following screenshot shows an example result. 

List all successfully indexed documents from a data source sync job
To retrieve a list of all documents that have been successfully indexed during a specific sync job, you can use the following CloudWatch query:

fields DocumentTitle, DocumentId, ConnectorDocumentStatus.Status AS IndexStatus, @timestamp
| filter @logStream like ‘SYNC_RUN_HISTORY_REPORT/your-data-source-id/run-id’
and ConnectorDocumentStatus.Status = “SUCCESS”
| sort DocumentTitle

The following screenshot shows an example result.

List all failed indexed documents from a data source sync job
To retrieve a list of all documents that failed to index during a specific sync job, along with the error messages, you can use the following CloudWatch query:

fields DocumentTitle, DocumentId, ConnectorDocumentStatus.Status AS IndexStatus, ErrorMsg, @timestamp
| filter @logStream like ‘SYNC_RUN_HISTORY_REPORT/your-data-source-id/run-id’
and ConnectorDocumentStatus.Status = “FAILED”
| sort @timestamp desc

The following screenshot shows an example result.

List all documents that contains a particular user name ACL permission from an Amazon Q Business application
To retrieve a list of documents that have a specific user’s ACL permission, you can use the following CloudWatch Logs Insights query:

filter @logStream like ‘SYNC_RUN_HISTORY_REPORT/’
and Acl like ‘aneesh@mydemoaws.onmicrosoft.com’
| display DocumentTitle, SourceUri

The following screenshot shows an example result.

 List the ACL of an indexed document from a data source sync job
To retrieve the ACL information for a specific indexed document from a sync job, you can use the following CloudWatch Logs Insights query:

filter @logStream like ‘SYNC_RUN_HISTORY_REPORT/data-source-id/run-id’
and DocumentTitle = “your-document-title”
| display DocumentTitle, Acl

The following screenshot shows an example result.

List metadata of an indexed document from a data source sync job
To retrieve the metadata information for a specific indexed document from a sync job, you can use the following CloudWatch Logs Insights query:

filter @logStream like ‘SYNC_RUN_HISTORY_REPORT/data-source-id/run-id’
and DocumentTitle = “your-document-title”
| display DocumentTitle, Metadata

The following screenshot shows an example result.

Conclusion
The newly introduced document-level report in Amazon Q Business provides enhanced visibility and observability into the document processing lifecycle during data source sync jobs. This feature addresses a critical need expressed by customers for better troubleshooting capabilities and access to detailed information about the indexing status, metadata, and ACLs of individual documents.
The document-level report is stored in a dedicated log stream named SYNC_RUN_HISTORY_REPORT within the Amazon Q Business application CloudWatch log group. This report contains comprehensive information for each document, including the document ID, title, overall document sync status, error messages (if any), along with its ACLs, and metadata information retrieved from the data sources. The data source sync run history page now includes an Actions column, providing access to the document-level report for each sync run. This feature significantly improves the ability to troubleshoot issues related to document ingestion and access control, and issues related to metadata relevance, and provides better visibility about the documents synced with an Amazon Q index.
To get started with Amazon Q Business, explore the Getting started guide. To learn more about data source connectors and best practices, see Configuring Amazon Q Business data source connectors.

About the authors
Aneesh Mohan is a Senior Solutions Architect at Amazon Web Services (AWS), bringing two decades of experience in creating impactful solutions for business-critical workloads. He is passionate about technology and loves working with customers to build well-architected solutions, focusing on the financial services industry, AI/ML, security, and data technologies.
Ashwin Shukla is a Software Development Engineer II on the Amazon Q for Business and Amazon Kendra engineering team, with 6 years of experience in developing enterprise software. In this role, he works on designing and developing foundational features for Amazon Q for Business.

Derive generative AI-powered insights from ServiceNow with Amazon Q Bu …

Effective customer support, project management, and knowledge management are critical aspects of providing efficient customer relationship management. ServiceNow is a platform for incident tracking, knowledge management, and project management functions for software projects and has become an indispensable part of many organizations’ workflows to ensure success of the customer and the product. However, extracting valuable insights from the vast amount of data stored in ServiceNow often requires manual effort and building specialized tooling. Users such as support engineers, project managers, and product managers need to be able to ask questions about an incident or a customer, or get answers from knowledge articles in order to provide excellent customer support. Organizations use ServiceNow to manage workflows, such as IT services, ticketing systems, configuration management, and infrastructure changes across IT systems. Generative artificial intelligence (AI) provides the ability to take relevant information from a data source such as ServiceNow and provide well-constructed answers back to the user.
Building a generative AI-based conversational application integrated with relevant data sources requires an enterprise to invest time, money, and people. First, you need to build connectors to the data sources. Next, you need to index this data to make it available for a Retrieval Augmented Generation (RAG) approach, where relevant passages are delivered with high accuracy to a large language model (LLM). To do this, you need to select an index that provides the capabilities to index the content for semantic and vector search, build the infrastructure to retrieve and rank the answers, and build a feature-rich web application. Additionally, you need to hire and staff a large team to build, maintain, and manage such a system.
Amazon Q Business is a fully managed generative AI-powered assistant that can answer questions, provide summaries, generate content, and securely complete tasks based on data and information in your enterprise systems. Amazon Q Business can help you get fast, relevant answers to pressing questions, solve problems, generate content, and take action using the data and expertise found in your company’s information repositories, code, and enterprise systems (such as ServiceNow, among others). Amazon Q provides out-of-the-box native data source connectors that can index content into a built-in retriever and uses an LLM to provide accurate, well-written answers. A data source connector is a component of Amazon Q that helps integrate and synchronize data from multiple repositories into one index.
Amazon Q Business offers multiple prebuilt connectors to a large number of data sources, including ServiceNow, Atlassian Confluence, Amazon Simple Storage Service (Amazon S3), Microsoft SharePoint, Salesforce, and many more, and helps you create your generative AI solution with minimal configuration. For a full list of Amazon Q business supported data source connectors, see Amazon Q Business connectors.
You can use the Amazon Q Business ServiceNow Online data source connector to connect to the ServiceNow Online platform and index ServiceNow entities such as knowledge articles, Service Catalogs, and incident entries, along with the metadata and document access control lists (ACLs).
This post shows how to configure the Amazon Q ServiceNow connector to index your ServiceNow platform and take advantage of generative AI searches in Amazon Q. We use an example of an illustrative ServiceNow platform to discuss technical topics related to AWS services.
Find accurate answers from content in ServiceNow using Amazon Q Business
After you integrate Amazon Q Business with ServiceNow, you can ask questions from the description of the document, such as:

How do I troubleshoot an invalid IP configuration on a network router? – This could be derived from an internal knowledge article on that topic
Which form do I use to request a new email account? – This could be derived from an internal Service Catalog entry
Is there a previous incident on the topic of resetting cloud root user password? – This could be derived from an internal incident entry

Overview of the ServiceNow connector
A data source connector is a mechanism for integrating and synchronizing data from multiple repositories into one container index. Amazon Q Business offers multiple data source connectors that can connect to your data sources and help you create your generative AI solution with minimal configuration.
To crawl and index contents in ServiceNow, we configure Amazon Q Business ServiceNow connector as a data source in your Amazon Q business application.
When you connect Amazon Q Business to a data source and initiate the data synchronization process, Amazon Q Business crawls and adds documents from the data source to its index.
Types of documents
Let’s look at what are considered as documents in the context of Amazon Q Business ServiceNow connector.
The Amazon Q Business ServiceNow connector supports crawling of the following entities in ServiceNow:

Knowledge articles – Each article is considered a single document
Knowledge article attachments – Each attachment is considered a single document
Service Catalog – Each catalog item is considered a single document
Service Catalog attachments – Each catalog attachment is considered a single document
Incidents – Each incident is considered a single document
Incident attachments – Each incident attachment is considered a single document

Although not all metadata is available at the time of writing, you can also configure field mappings. Field mappings allow you to map ServiceNow field names to Amazon Q index field names. This includes both default field mappings created automatically by Amazon Q, as well as custom field mappings that you can create and edit. Refer to ServiceNow data source connector field mappings documentation for more information.
Authentication
The Amazon Q Business ServiceNow connector support two types of authentication methods:

Basic authentication – ServiceNow host URL, user name, and password
OAuth 2.0 authentication with Resource Owner Password Flow – ServiceNow host URL, user name, password, client ID, and client secret

Supported ServiceNow versions
ServiceNow usually names platform versions after cities for the added convenience of easily differentiating between versions and associated features. At the time of writing, the following versions are natively supported in the Amazon Q Business ServiceNow connector:

San Diego
Tokyo
Rome
Vancouver
Others

ACL crawling
To maintain a secure environment, Amazon Q Business now requires ACL and identity crawling for all connected data sources. When preparing to connect Amazon Q Business applications to AWS IAM Identity Center, you need to enable ACL indexing and identity crawling and re-synchronize your connector.
Amazon Q Business enforces data security by supporting the crawling of ACLs and identity information from connected data sources. Indexing documents with ACLs is crucial for maintaining data security, because documents without ACLs are considered public.
If you need to index documents without ACLs, make sure they’re explicitly marked as public in your data source. When connecting a ServiceNow data source, Amazon Q Business crawls ACL information, including user and group information, from your ServiceNow instance. With ACL crawling, you can filter chat responses based on the end-user’s document access level, making sure users only see information they’re authorized to access.
In ServiceNow, user IDs are mapped from user emails and exist on files with set access permissions. This mapping allows Amazon Q Business to effectively enforce access controls based on the user’s identity and permissions within the ServiceNow environment.
Refer to How Amazon Q Business connector crawls ServiceNow ACLs for more information.
Overview of solution
Amazon Q is a generative-AI powered assistant that helps customers answer questions, provide summaries, generate content, and complete tasks based on data in their company repository. It also exists as a learning tool for AWS users who want to ask questions about services and best practices in the cloud. You can use the Amazon Q connector for ServiceNow online to crawl your ServiceNow domain and index service tickets, guides, and community posts to discover answers for your questions faster.
Amazon Q understands and respects your existing identities, roles, and permissions and uses this information to personalize its interactions. If a user doesn’t have permission to access data without Amazon Q, they can’t access it using Amazon Q either. The following table outlines which documents each user is authorized to access for our use case. For a complete list of ServiceNow roles, refer to documentation. The documents being used in this example are a subset of AWS public documents from re:Post pre-loaded into ServiceNow with access restriction.

#
First Name
Last Name
Document type authorized for access
ServiceNow Roles

1
John
Stiles
Knowledge Articles, Service Catalog and Incidents
knowledge, catalog, incident_manager

2
Mary
Major
Knowledge Articles and Service Catalog
knowledge, catalog

3
Mateo
Jackson
Incidents
incident_manager

In this post, we show how to use the Amazon Q Business ServiceNow connector to index data from your ServiceNow platform for intelligent search.
Prerequisites
For this walkthrough, you should have the following prerequisites:

An AWS account
Administrator-level access to your ServiceNow platform
Privileges to create an Amazon Q Business application, AWS resources, and AWS Identity and Access Management (IAM) roles and policies
Basic knowledge of AWS services and working knowledge of ServiceNow
IAM Identity Center set up for user management

Configure your ServiceNow connection
In your ServiceNow platform, complete the following steps to create an OAuth2 secret that could be consumed from your Amazon Q application:

In ServiceNow, on the All menu, expand System OAuth and choose Application Registry.

Choose New.

Choose Create an OAuth API endpoint for external clients.

For Name, enter a unique name.
Fill out the remaining parameters according to your requirements and choose Submit.

Note down the client ID and client secret to use in later steps.

Create an Amazon Q Business application
Complete the following steps to create an Amazon Q Business application:

On the Amazon Q console, choose Getting started in the navigation pane.
Under Amazon Q Business Pro, choose Q Business to subscribe.

On the Amazon Q Business console, choose Get started.

On the Applications page, choose Create application.

On the Create application page, provide your application details.
Choose Create.

Make sure the Amazon Q Business application is connected to IAM Identity Center. For more information, see Setting up Amazon Q Business with IAM Identity Center as identity provider.

On the Select retriever page, select Use native retriever for your retriever and select Starter for the index provisioning type.
Choose Next.

On the Connect data sources page, choose Next without connecting to any data source (we do that in the next section).

On the Add groups and users page, choose Add groups and users.

Add any groups and users to access the application.

For more details, refer to Adding users and subscriptions to an Amazon Q Business application.

Choose Create application.

Configure the data source using the Amazon Q ServiceNow Online connector
Now let’s configure the ServiceNow Online data source connector with the Amazon Q application that we created in the previous section.

On the Amazon Q console, navigate to the Applications page and choose the application you just created.

In the Data sources section, choose Add data source.

Search for and choose the ServiceNow Online connector.

Provide the name, ServiceNow host, and version information.

If your ServiceNow version isn’t on the dropdown menu, choose Others.

Choose Create and add new secret to create a new secret to connect with the ServiceNow platform account.

Provide the connection information based on the OAuth2 endpoint created in ServiceNow previously, then choose Save.

Leave the defaults for the VPC and Identity crawler
For IAM role, choose Create a new service role (Recommended) and keep the default role name.

Choose entities that you want to bring over from ServiceNow.

This example shows knowledge articles, Service Catalog items, and incidents. The Filter query option helps curate the list of items that you want to bring into Amazon Q. When you use a query, you can specify multiple knowledge bases, including private knowledge bases. For more details on how to build ServiceNow filters, refer to Filters. For additional query building resources, see Specifying documents to index with a query.

For Sync mode, select Full sync.
For Sync run schedule, choose Run on demand.

Leave the remaining options as default and choose Add data source.

When the data source status shows as Active, initiate data synchronization by choosing Sync now.

Wait until the synchronization status changes to Completed before continuing to the next steps.

For information about common issues encountered and related troubleshooting steps, refer to Troubleshooting data source connectors.
Run queries with the Amazon Q web experience
Now that the data synchronization is complete, you can start exploring insights from Amazon Q. You have three users for testing— John with admin access, Mary with access to knowledge articles and service catalog, and Mateo with access only to incidents. In the following steps, you will sign in as each user and ask various questions to see what responses Amazon Q provides based on the permitted document types for their respective groups. You will also test edge cases where users try to access information from restricted sources to validate the access control functionality.

On the details page of the new Amazon Q application, navigate to the Web experience settings tab and choose the link under Deployed URL. This will open a new tab with a preview of the UI and options to customize according to your needs.

Log in to the application as John Stiles first, using the credentials for the user that you added to the Amazon Q application.

After the login is successful, choose the application that you just created.

From there, you’ll be redirected to the Amazon Q assistant UI, where you can start asking questions using natural language and get insights from your ServiceNow platform.

Let’s run some queries to see how Amazon Q can answer questions related to synchronized data. John has access to all ServiceNow document types. When asked “How do I upgrade my EKS cluster to the latest version”, Amazon Q will provide a summary pulling information from the related knowledge article, highlighting the sources at the end of each excerpt.

Still logged in as John, when asked “What is Amazon QLDB?”, Amazon Q will provide a summary pulling information from the related ServiceNow incident.

Sign out as user John. Start a new incognito browser session or use a different browser. Copy the web experience URL and sign in as user Mary. Repeat these steps each time you need to sign in as a different user. Mary only has access to knowledge articles and service catalog with no incident access. When asked “How do I perform vector search with Amazon Redshift”, Amazon Q will provide a summary pulling information from the related knowledge article, highlighting the source.

However, when asked “What is Amazon QLDB?”, Amazon Q responds that it could not find relevant information. This because Mary does not have access to ServiceNow incidents which is the only place where the answer to that question can be found.

Sign out as user Mary. Start a new incognito browser session or use a different browser. Copy the web experience URL and sign in as user Mateo. Mateo only has access to incidents with no knowledge article or service catalog access. When asked “What is Amazon QLDB?”, Amazon Q will provide a summary pulling information from the related incident, highlighting the source.

However, when asked “How do I perform vector search with Amazon Redshift?”, Amazon Q responds that it could not find relevant information. This because Mateo does not have access to ServiceNow knowledge article which is the only place where the answer to this question can be found.

Try out the assistant with additional queries, such as:

How do you set up new blackberry device?
How do I set up S3 object replication?
How do I resolve empty log issues in CloudWatch?
How do I troubleshoot 403 Access Denied errors from Amazon S3?

Frequently asked questions
In this section, we provide guidance to frequently asked questions.
Amazon Q Business is unable to answer your questions
If you get the response “Sorry, I could not find relevant information to complete your request,” this may be due to a few reasons:

No permissions – ACLs applied to your account don’t allow you to query certain data sources. If this is the case, reach out to your application administrator to make sure your ACLs are configured to access the data sources.
Email ID doesn’t match user ID – In rare scenarios, a user may have a different email ID associated with Amazon Q in IAM Identity Center than what is associated in the ServiceNow user profile. In such cases, make sure the Amazon Q user profile is updated to recognize the ServiceNow email ID through the update-user command in the AWS Command Line Interface (AWS CLI) or the related API call.
Data connector sync failed – Your data connector may have failed to sync information from the source to the Amazon Q Business application. Verify the data connector’s sync run schedule and sync history to confirm the sync is successful.
Empty or private ServiceNow projects – Private or empty projects aren’t crawled during the sync run.

If none of these reasons apply to your use case, open a support case and work with your technical account manager to get this resolved.
How to generate responses from authoritative data sources
If you want Amazon Q Business to only generate responses from authoritative data sources, you can configure this using the Amazon Q Business application global controls under Admin controls and guardrails.

Log in to the Amazon Q Business console as an Amazon Q Business application administrator.
Navigate to the application and choose Admin controls and guardrails in the navigation pane.
Choose Edit in the Global controls section to set these options.

For more information, refer to Admin controls and guardrails in Amazon Q Business.

Amazon Q Business responds using old (stale) data even though your data source is updated
Each Amazon Q Business data connector can be configured with a unique sync run schedule frequency. Verifying the sync status and sync schedule frequency for your data connector reveals when the last sync ran successfully. It could be that your data connector’s sync run schedule is either set to sync at a scheduled time of day, week, or month. If it’s set to run on demand, the sync has to be manually invoked. When the sync run is complete, verify the sync history to make sure the run has successfully synced all new issues. Refer to Sync run schedule for more information about each option.
Clean up
To avoid incurring future charges, clean up any resources created as part of this solution. Delete the Amazon Q ServiceNow Online connector data source, OAuth API endpoint created in ServiceNow, and the Q Business application. Also, delete the user management setup in IAM Identity Center.
Conclusion
In this post, we discussed how to configure the Amazon Q ServiceNow Online connector to crawl and index service tickets, community posts, and knowledge guides. We showed how generative AI-based search in Amazon Q enables your business leaders and agents to discover insights from your ServiceNow content quicker. This is all available through a user-friendly interface with Amazon Q Business doing the undifferentiated heavy lifting.
To learn more about the Amazon Q Business connector for ServiceNow Online, refer to Connecting ServiceNow Online to Amazon Q Business.

About the Authors
Prabhakar Chandrasekaran is a Senior Technical Account Manager with AWS Enterprise Support. Prabhakar enjoys helping customers build cutting-edge AI/ML solutions on the cloud. He also works with enterprise customers providing proactive guidance and operational assistance, helping them improve the value of their solutions when using AWS. Prabhakar holds six AWS and seven other professional certifications. With over 20 years of professional experience, Prabhakar was a data engineer and a program leader in the financial services space prior to joining AWS.
Lakshmi Dogiparti is a is a Software Development Engineer at Amazon Web Services. She works on the Amazon Q and Amazon Kendra connector design, development, integration and test operations.
Vijai Gandikota is a Principal Product Manager in the Amazon Q and Amazon Kendra organization of Amazon Web Services. He is responsible for the Amazon Q and Amazon Kendra connectors, ingestion, security, and other aspects of the Amazon Q and Amazon Kendra services.

LessonPlanner: A Tool for Enhancing Novice Teachers’ Effectiveness b …

Integrating advanced educational technology has opened new avenues for enhancing teaching effectiveness, particularly using large language models (LLMs). These models are increasingly being explored as tools to assist educators, especially those new to teaching, in developing detailed and effective lesson plans. The application of LLMs in education is driven by the potential to generate tailored instructional content that adapts to various teaching scenarios, providing substantial support to educators who may lack experience or resources.

Novice teachers frequently encounter significant challenges in creating well-structured lesson plans. The core of the problem lies in their limited experience organizing instructional materials and applying pedagogical strategies effectively. This inexperience often results in inconsistent lesson quality and inefficiencies in lesson delivery. Curating relevant teaching resources is time-consuming and prone to producing suboptimal results, as teachers must sift through vast amounts of material to find content that aligns with their specific teaching objectives. These challenges can hinder the educational experience, making it difficult for teachers to achieve their instructional goals.

Traditional lesson planning methods typically involve referring to textbooks, consulting existing lesson plans, and searching online for relevant materials. However, these approaches are far from efficient. While they provide a foundation for lesson content, they often fail to address the needs of individual classes or the unique teaching styles of different educators. Teachers using generic LLMs, such as ChatGPT, usually find that while these tools can generate useful content, the outputs are fragmented and require significant refinement to be useful in a classroom setting.

Researchers from Sun Yat-sen University and Cornell University introduced LessonPlanner, designed to assist teachers, particularly novices, in constructing detailed and effective lesson plans. LessonPlanner is an interactive system that integrates pedagogy-driven content generation with the structured guidance of Gagne’s Nine Events of Instruction. This tool simplifies the lesson planning process and ensures that the generated content is pedagogically sound and tailored to the teacher’s and their student’s specific needs.

LessonPlanner allows users to input basic information about their course, such as the course name, lesson topic, and the student’s academic level. The system then generates a detailed lesson plan outline based on these inputs, guided by well-established educational theories. For example, Gagne’s Nine Events of Instruction, which includes strategies such as gaining attention, providing learner guidance, and eliciting performance, are integral to the plan’s structure. Teachers can then interact with the system to extend and customize the outline with specific activities and teaching materials. The tool also offers options to generate content, request explanations, and obtain suggestions for effectively delivering the lesson.

Image Source

The performance of LessonPlanner was evaluated through a within-subjects study involving twelve participants and expert interviews with six novice teachers. The study revealed that LessonPlanner significantly improved the quality of lesson plans. Specifically, the system reduced the workload associated with lesson planning. The participants, graduate students or senior undergraduates, reported that using LessonPlanner made the lesson planning process more efficient and less stressful. On average, the quality of the lesson plans improved by over 30% when using LessonPlanner compared to traditional methods. The expert interviews underscored the tool’s usefulness, with novice teachers highlighting its ability to provide well-organized outlines and inspiring content.

In conclusion, LessonPlanner presents a promising solution to the challenges faced by novice teachers in creating effective lesson plans. By integrating large language models with pedagogical frameworks, the tool simplifies the lesson-planning process and enhances the quality of the resulting instructional content. The system’s ability to adapt to individual needs and user-friendly interface make it valuable for educators striving to improve their teaching effectiveness. With a notable increase in lesson quality and a significant reduction in preparation workload, LessonPlanner is poised to become an essential tool in the modern educator’s toolkit, particularly for those new to the profession.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here

Arcee AI Released DistillKit: An Open Source, Easy-to-Use Tool Transforming Model Distillation for Creating Efficient, High-Performance Small Language Models

The post LessonPlanner: A Tool for Enhancing Novice Teachers’ Effectiveness by Integrating Large Language Models with Structured Pedagogical Strategies to Improve Lesson Planning Quality appeared first on MarkTechPost.

Self-play muTuAl Reasoning (rStar): A Novel AI Approach that Boosts Sm …

Large language models (LLMs) have made significant strides in various applications, but they continue to face substantial challenges in complex reasoning tasks. For instance, even advanced models like Mistral-7B can only achieve 36.5% accuracy on the GSM8K dataset, despite employing techniques such as Chain-of-Thought (CoT). While fine-tuning has shown promise in improving reasoning capabilities, most LLMs rely on data distilled or synthesized by superior models like GPT-4. This dependency on more advanced models has led researchers to explore alternative approaches to enhance reasoning without relying on a superior teacher LLM. However, this endeavour presents its challenges, particularly for smaller language models (SLMs), which need help with effective solution space exploration and quality assessment of reasoning steps.

Researchers have made various attempts to enhance the reasoning capabilities of language models. Prompting-based methods, such as Chain-of-Thought, focus on designing instructions and pipelines to improve performance during inference. These approaches include planning, problem decomposition, abstraction, and programming techniques. Also, self-improvement methods have gained traction, with fine-tuning approaches utilizing pre-trained LLMs to synthesise data and enhance performance progressively. Advanced prompting techniques like self-verification and RAP aim to improve performance through iterative self-exploration. Sampling diverse reasoning paths has shown promise in mathematical reasoning tasks, with methods like Self-Consistency and tree-search approaches breaking down tasks into simpler steps. For answer verification, majority voting is widely used, while some researchers have explored training value or reward models, though these require additional annotations and risk overfitting.

Researchers from Microsoft Research Asia and Harvard University introduced the Self-play muTuAl Reasoning (rStar) approach, a robust solution to enhance SLMs reasoning capabilities during inference, without relying on fine-tuning or superior models. rStar tackles the challenges faced by SLMs through a unique self-play mutual generation-discrimination process. This method employs a conventional Monte Carlo Tree Search (MCTS) for self-generating reasoning steps but expands the set of reasoning actions to simulate human reasoning behaviors. These actions include decomposing problems, searching for specific reasoning steps, proposing new sub-questions, and rephrasing given questions. To guide the exploration of generated reasoning trajectories effectively, rStar introduces a discrimination process called mutual consistency, which employs a second SLM as a discriminator to provide unsupervised feedback on candidate reasoning trajectories.

The rStar approach employs a unique architecture to enhance SLMs reasoning capabilities. At its core, rStar uses an MCTS algorithm to augment the target SLM for self-generating multi-step reasoning solutions. The method introduces a rich set of five human-like reasoning actions, including proposing one-step thoughts, generating remaining thought steps, proposing and answering sub-questions, re-answering sub-questions, and rephrasing questions. This diverse action space allows for thorough exploration across various reasoning tasks.

rStar implements a carefully designed reward function that evaluates each action’s value without relying on self-rewarding techniques or external supervision. The MCTS rollout process uses the Upper Confidence Bounds applied to Trees (UCT) algorithm to balance exploration and exploitation during tree expansion. To verify the generated reasoning trajectories, rStar introduces a second SLM as a discriminator, employing a mutual consistency approach. This process involves masking part of a candidate trajectory and asking the discriminator SLM to complete it, then comparing the results for consistency.

The results demonstrate the effectiveness of rStar across various reasoning benchmarks and language models:

1. Performance on diverse reasoning tasks:

 rStar significantly improved SLMs’ problem-solving abilities. For example, LLaMA2-7B’s accuracy on GSM8K increased from 12.51% with few-shot CoT to 63.91% with rStar, nearly matching fine-tuned performance.

 rStar consistently improved reasoning accuracy across different SLMs and tasks to state-of-the-art levels, outperforming other baseline approaches.

Even without the discriminator, rStar’s generator outperformed existing multi-round inference baselines like RAP, ToT, and Self-Consistency on GSM8K.

2. Efficiency:

rStar showed significant improvements in reasoning accuracy with just 2 rollouts on the GSM8K dataset.

3. Performance on challenging mathematical datasets:

 On GSM-Hard and MATH-500, rStar improved SLMs’ reasoning accuracy significantly, with improvements of up to 12.9% and 9.14% respectively compared to state-of-the-art baselines.

4. Ablation studies:

 The MCTS generator in rStar outperformed other approaches like RAP and Self-Consistency across different models and tasks.

 rStar’s discriminator consistently outperformed other verification methods, including majority voting and self-verification, across different generators.

5. Model comparisons:

 Different models were tested as discriminators, with GPT-4 achieving the highest accuracy (92.57%) on GSM8K, followed by Phi3-Mini-Instruct (91.13%).

These results highlight rStar’s effectiveness in enhancing SLMs’ reasoning capabilities across various tasks and models, outperforming existing methods in both accuracy and efficiency.

The rStar approach introduces a robust generator-discriminator self-play method that significantly enhances the reasoning capabilities of SLMs during inference. This research reveals that SLMs like LLaMA2-7B possess strong inherent reasoning abilities even before domain-specific supervised fine-tuning. rStar demonstrates state-of-the-art performance across five different SLMs and five diverse reasoning tasks, substantially outperforming existing multi-round prompting and self-improvement techniques. The extensive ablation studies and analysis conducted in this research contribute valuable insights to the field, paving the way for more advanced self-improved reasoning techniques in SLMs. These findings highlight the potential of rStar in unlocking the latent reasoning capabilities of language models without the need for extensive fine-tuning or reliance on larger models.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here

Arcee AI Released DistillKit: An Open Source, Easy-to-Use Tool Transforming Model Distillation for Creating Efficient, High-Performance Small Language Models

The post Self-play muTuAl Reasoning (rStar): A Novel AI Approach that Boosts Small Language Models SLMs’ Reasoning Capability during Inference without Fine-Tuning appeared first on MarkTechPost.

Transformer Explainer: An Innovative Web-Based Tool for Interactive Le …

Transformers are a groundbreaking innovation in AI, particularly in natural language processing and machine learning. Despite their pervasive use, the internal mechanics of Transformers remain a mystery to many, especially those who lack a deep technical background in machine learning. Understanding how these models work is crucial for anyone looking to engage with AI on a meaningful level, yet the complexity of the technology presents a significant barrier to entry.

The problem is that while Transformers are becoming more embedded in various applications, the steep learning curve of understanding their inner workings leaves many potential learners alienated. Existing educational resources, such as detailed blog posts and video tutorials, often delve into the mathematical underpinnings of these models, which can be overwhelming for beginners. These resources typically focus on the intricate details of neuron interactions and layer operations within the models, which are not easily digestible for those new to the field.

Existing methods and tools designed to educate users about Transformers tend to either oversimplify the concepts or, conversely, are too technical and require significant computational resources. For instance, while visualization tools that aim to demystify the workings of AI models are available, these tools often require installing specialized software or using advanced hardware, limiting their accessibility. These tools generally lack interactivity. This disconnect between the complexity of the models and the simplicity required for effective learning has created a significant gap in the educational resources available to those interested in AI.

Georgia Tech and IBM Research researchers have introduced a novel tool called Transformer Explainer. This tool is designed to make learning about Transformers more intuitive and accessible. Transformer Explainer is an open-source, web-based platform allowing users to interact directly with a live GPT-2 model in their web browsers. By eliminating the need for additional software or specialized hardware, the tool lowers the barriers to entry for those interested in understanding AI. The tool’s design focuses on enabling users to explore and visualize the internal processes of the Transformer model in real-time.

Transformer Explainer offers a detailed breakdown of how text is processed within a Transformer model. The tool uses a Sankey diagram to visualize the flow of information through the model’s various components. This visualization helps users understand how input text is transformed step by step until the model predicts the next token. One of the key features of Transformer Explainer is its ability to adjust parameters, such as temperature, which controls the probability distribution of the predicted tokens. The tool’s ability to operate entirely within the browser, utilizing frameworks like Svelte and D3, ensures a seamless and accessible user experience.

In terms of performance, Transformer Explainer integrates a live GPT-2 model that runs locally in the user’s browser, offering real-time feedback on user interactions. This immediate response allows users to see the effects of their adjustments in real time, which is crucial for understanding how different aspects of the model interact. The tool’s design also incorporates multiple levels of abstraction, enabling users to begin with a high-level overview and gradually delve into more detailed aspects of the model as needed. 

In conclusion, Transformer Explainer successfully bridges the gap between the complexity of Transformer models and the need for accessible educational tools. By allowing users to interact with a live GPT-2 model and visualize its processes in real time, the tool makes it easier for non-experts to understand how these powerful AI systems work. Exploring model parameters and seeing their effects immediately is a valuable feature that enhances learning and engagement.

Check out the Paper and Details. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here

Arcee AI Released DistillKit: An Open Source, Easy-to-Use Tool Transforming Model Distillation for Creating Efficient, High-Performance Small Language Models

The post Transformer Explainer: An Innovative Web-Based Tool for Interactive Learning and Visualization of Complex AI Models for Non-Experts appeared first on MarkTechPost.

Intelligent healthcare forms analysis with Amazon Bedrock

Generative artificial intelligence (AI) provides an opportunity for improvements in healthcare by combining and analyzing structured and unstructured data across previously disconnected silos. Generative AI can help raise the bar on efficiency and effectiveness across the full scope of healthcare delivery.
The healthcare industry generates and collects a significant amount of unstructured textual data, including clinical documentation such as patient information, medical history, and test results, as well as non-clinical documentation like administrative records. This unstructured data can impact the efficiency and productivity of clinical services, because it’s often found in various paper-based forms that can be difficult to manage and process. Streamlining the handling of this information is crucial for healthcare providers to improve patient care and optimize their operations.
Handling large volumes of data, extracting unstructured data from multiple paper forms or images, and comparing it with the standard or reference forms can be a long and arduous process, prone to errors and inefficiencies. However, advancements in generative AI solutions have introduced automated approaches that offer a more efficient and reliable solution for comparing multiple documents.
Amazon Bedrock is a fully managed service that makes foundation models (FMs) from leading AI startups and Amazon available through an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case. Amazon Bedrock offers a serverless experience, so you can get started quickly, privately customize FMs with your own data, and quickly integrate and deploy them into your applications using the AWS tools without having to manage the infrastructure.
In this post, we explore using the Anthropic Claude 3 on Amazon Bedrock large language model (LLM). Amazon Bedrock provides access to several LLMs, such as Anthropic Claude 3, which can be used to generate semi-structured data relevant to the healthcare industry. This can be particularly useful for creating various healthcare-related forms, such as patient intake forms, insurance claim forms, or medical history questionnaires.
Solution overview
To provide a high-level understanding of how the solution works before diving deeper into the specific elements and the services used, we discuss the architectural steps required to build our solution on AWS. We illustrate the key elements of the solution, giving you an overview of the various components and their interactions.
We then examine each of the key elements in more detail, exploring the specific AWS services that are used to build the solution, and discuss how these services work together to achieve the desired functionality. This provides a solid foundation for further exploration and implementation of the solution.
Part 1: Standard forms: Data extraction and storage
The following diagram highlights the key elements of a solution for data extraction and storage with standard forms.

Figure 1: Architecture – Standard Form – Data Extraction & Storage.
The Standard from processing steps are as follows:

A user upload images of paper forms (PDF, PNG, JPEG) to Amazon Simple Storage Service (Amazon S3), a highly scalable and durable object storage service.
Amazon Simple Queue Service (Amazon SQS) is used as the message queue. Whenever a new form is loaded, an event is invoked in Amazon SQS.

If an S3 object is not processed, then after two tries it will be moved to the SQS dead-letter queue (DLQ), which can be configured further with an Amazon Simple Notification Service (Amazon SNS) topic to notify the user through email.

The SQS message invokes an AWS Lambda The Lambda function is responsible for processing the new form data.
The Lambda function reads the new S3 object and passes it to the Amazon Textract API to process the unstructured data and generate a hierarchical, structured output. Amazon Textract is an AWS service that can extract text, handwriting, and data from scanned documents and images. This approach allows for the efficient and scalable processing of complex documents, enabling you to extract valuable insights and data from various sources.
The Lambda function passes the converted text to Anthropic Claude 3 on Amazon Bedrock Anthropic Claude 3 to generate a list of questions.
Lastly, the Lambda function stores the question list in Amazon S3.

Amazon Bedrock API call to extract form details
We call an Amazon Bedrock API twice in the process for the following actions:

Extract questions from the standard or reference form – The first API call is made to extract a list of questions and sub-questions from the standard or reference form. This list serves as a baseline or reference point for comparison with other forms. By extracting the questions from the reference form, we can establish a benchmark against which other forms can be evaluated.
Extract questions from the custom form – The second API call is made to extract a list of questions and sub-questions from the custom form or the form that needs to be compared against the standard or reference form. This step is necessary because we need to analyze the custom form’s content and structure to identify its questions and sub-questions before we can compare them with the reference form.

By having the questions extracted and structured separately for both the reference and custom forms, the solution can then pass these two lists to the Amazon Bedrock API for the final comparison step. This approach maintains the following:

Accurate comparison – The API has access to the structured data from both forms, making it straightforward to identify matches, mismatches, and provide relevant reasoning
Efficient processing – Separating the extraction process for the reference and custom forms helps avoid redundant operations and optimizes the overall workflow
Observability and interoperability – Keeping the questions separate enables better visibility, analysis, and integration of the questions from different forms
Hallucination avoidance – By following a structured approach and relying on the extracted data, the solution helps avoid generating or hallucinating content, providing integrity in the comparison process

This two-step approach uses the capabilities of the Amazon Bedrock API while optimizing the workflow, enabling accurate and efficient form comparison, and promoting observability and interoperability of the questions involved.
See the following code (API Call):

def get_response_from_claude3(context, prompt_data):
body = json.dumps({
“anthropic_version”: “bedrock-2023-05-31”,
“max_tokens”: 4096,
“system”:”””You are an expert form analyzer and can understand different sections and subsections within a form and can find all the questions being asked. You can find similarities and differences at the question level between different types of forms.”””,
“messages”: [
{
“role”: “user”,
“content”: [
{“type”: “text”,
“text”: f”””Given the following document(s): {context} n {prompt_data}”””},
],
}
],
})
modelId = f’anthropic.claude-3-sonnet-20240229-v1:0′
config = Config(read_timeout=1000)
bedrock = boto3.client(‘bedrock-runtime’,config=config)
response = bedrock.invoke_model(body=body, modelId=modelId)
response_body = json.loads(response.get(“body”).read())
answer = response_body.get(“content”)[0].get(“text”)
return answer

User prompt to extract fields and list them
We provide the following user prompt to Anthropic Claude 3 to extract the fields from the raw text and list them for comparison as shown in step 3B (of Figure 3: Data Extraction & Form Field comparison).

get_response_from_claude3(response, f””” Create a summary of the different sections in the form, then
   for each section create a list of all questions and sub questions asked in the
                                        whole form and group by section including signature, date, reviews and approvals.
Then concatenate all questions and return a single numbered list, Be very detailed.”””))

The following figure illustrates the output from Amazon Bedrock with a list of questions from the standard or reference form.

Figure 2:  Standard Form Sample Question List
Store this question list in Amazon S3 so it can be used for comparison with other forms, as shown in Part 2 of the process below.
Part 2: Data extraction and form field comparison
The following diagram illustrates the architecture for the next step, which is data extraction and form field comparison.

Figure 3: Data Extraction & Form Field comparison
Steps 1 and 2 are similar to those in Figure 1, but are repeated for the forms to be compared against the standard or reference forms. The next steps are as follows:

The SQS message invokes a Lambda function. The Lambda function is responsible for processing the new form data.

The raw text is extracted by Amazon Textract using a Lambda function. The extracted raw text is then passed to Step 3B for further processing and analysis.
Anthropic Claude 3 generates a list of questions from the custom form that needs to be compared with the standard from. Then both forms and document question lists are passed to Amazon Bedrock, which compares the extracted raw text with standard or reference raw text to identify differences and anomalies to provide insights and recommendations relevant to the healthcare industry by respective category. It then generates the final output in JSON format for further processing and dashboarding. The Amazon Bedrock API call and user prompt from Step 5 (Figure 1: Architecture – Standard Form – Data Extraction & Storage) are reused for this step to generate a question list from the custom form.

We discuss Steps 4–6 in the next section.
The following screenshot shows the output from Amazon Bedrock with a list of questions from the custom form.

Figure 4:  Custom Form Sample Question List
Final comparison using Anthropic Claude 3 on Amazon Bedrock:
The following examples show the results from the comparison exercise using Amazon Bedrock with Anthropic Claude 3, showing one that matched and one that didn’t match with the reference or standard form.
The following is the user prompt for forms comparison:

categories = [‘Personal Information’,’Work History’,’Medical History’,’Medications and Allergies’,’Additional Questions’,’Physical Examination’,’Job Description’,’Examination Results’]
forms = f”Form 1 : {reference_form_question_list}, Form 2 : {custom_form_question_list}”

The following is the first call:

match_result = (get_response_from_claude3(forms, f””” Go through questions and sub questions {start}- {processed} in Form 2 return the question whether it matches with any question /sub question/field in Form 1 in terms of meaning and context and provide reasoning, or if it does not match with any question/sub question/field in Form 1 and provide reasoning. Treat each sub question as its own question and the final output should be a numbered list with the same length as the number of questions and sub questions in Form 2. Be concise”””))

The following is the second call:

get_response_from_claude3(match_result,
f””” Go through all the questions and sub questions in the Form 2 Results and turn this into a JSON object called ‘All Questions’ which has the keys ‘Question’ with only the matched or unmatched question, ‘Match’ with valid values of yes or no, and ‘Reason’ which is the reason of match or no match, ‘Category’ placing the question in one the categories in this list: {categories} . Do not omit any questions in output.”””))

The following screenshot shows the questions matched with the reference form.

The following screenshot shows the questions that didn’t match with the reference form.

The steps from the preceding architecture diagram continue as follows:
4. The SQS queue invokes a Lambda function.
5. The Lambda function invokes an AWS Glue job and monitors for completion.
a. The AWS Glue job processes the final JSON output from the Amazon Bedrock model in tabular format for reporting.
6. Amazon QuickSight is used to create interactive dashboards and visualizations, allowing healthcare professionals to explore the analysis, identify trends, and make informed decisions based on the insights provided by Anthropic Claude 3.
The following screenshot shows a sample QuickSight dashboard.
       
Next steps
Many healthcare providers are investing in digital technology, such as electronic health records (EHRs) and electronic medical records (EMRs) to streamline data collection and storage, allowing appropriate staff to access records for patient care. Additionally, digitized health records provide the convenience of electronic forms and remote data editing for patients. Electronic health records offer a more secure and accessible record system, reducing data loss and facilitating data accuracy. Similar solutions can offer capturing the data in these paper forms into EHRs.
Conclusion
Generative AI solutions like Amazon Bedrock with Anthropic Claude 3 can significantly streamline the process of extracting and comparing unstructured data from paper forms or images. By automating the extraction of form fields and questions, and intelligently comparing them against standard or reference forms, this solution offers a more efficient and accurate approach to handling large volumes of data. The integration of AWS services like Lambda, Amazon S3, Amazon SQS, and QuickSight provides a scalable and robust architecture for deploying this solution. As healthcare organizations continue to digitize their operations, such AI-powered solutions can play a crucial role in improving data management, maintaining compliance, and ultimately enhancing patient care through better insights and decision-making.

About the Authors
Satish Sarapuri is a Sr. Data Architect, Data Lake at AWS. He helps enterprise-level customers build high-performance, highly available, cost-effective, resilient, and secure generative AI, data mesh, data lake, and analytics platform solutions on AWS, through which customers can make data-driven decisions to gain impactful outcomes for their business and help them on their digital and data transformation journey. In his spare time, he enjoys spending time with his family and playing tennis.
Harpreet Cheema is a Machine Learning Engineer at the AWS Generative AI Innovation Center. He is very passionate in the field of machine learning and in tackling data-oriented problems. In his role, he focuses on developing and delivering machine learning focused solutions for customers across different domains.
Deborah Devadason is a Senior Advisory Consultant in the Professional Service team at Amazon Web Services. She is a results-driven and passionate Data Strategy specialist with over 25 years of consulting experience across the globe in multiple industries. She leverages her expertise to solve complex problems and accelerate business-focused journeys, thereby creating a stronger backbone for the digital and data transformation journey.