Make-An-Agent: A Novel Policy Parameter Generator that Leverages the P …

Traditional policy learning uses sampled trajectories from a replay buffer or behavior demonstrations to learn policies or trajectory models that map from state to action. This approach models a narrow behavior distribution. However, there is a challenge to guide high-dimensional output generation using low-dimensional demonstrations. Diffusion models have shown highly competitive performance on tasks like text-to-image synthesis. This success supports work toward policy network generation as a conditional denoising diffusion process. Refining noise into structured parameters consistently, the diffusion-based generator can discover various policies with superior performance and robust policy parameter space.

Existing methods in this area include Parameter Generation and Learning to Learn for Policy Learning. Parameter Generation has been a significant research focus since the introduction of Hypernetworks, which led to various studies on predicting neural network weights. For example, Hypertransformer uses Transformers to generate weights for each layer of convolutional neural networks (CNNs) based on task samples, using supervised and semi-supervised learning. On the other hand, Learning to Learn for Policy Learning involves meta-learning, which aims to develop a policy that can adapt to any new task within a given task distribution. In the meta-training or meta-testing process, previous meta-reinforcement learning (meta-RL) methods rely on rewards for policy adaptation.

Researchers from the University of Maryland, Tsinghua University, University of California, Shanghai Qi Zhi Institute, and Shanghai AI Lab have proposed Make-An-Agent, a new method for generating policies using conditional diffusion models. In this process, an autoencoder is developed to compress policy networks into smaller latent representations based on their layer. Researchers used contrastive learning to get the connection between long-term trajectories and their outcomes or future states. Further, an effective diffusion model is utilized based on learned behavior embeddings to generate policy parameters, which are then decoded into usable policies with the pre-trained decoder.

The performance of Make-An-Agent is evaluated by testing in three continuous control domains, including various tabletop manipulation and real-world locomotion tasks. The policies were generated during the testing phase using trajectories from the replay buffer of partially trained RL agents. Generated policies outperformed those created by multi-task or meta-learning and other hypernetwork-based methods. This approach has the potential to produce diverse policy parameters and show strong performance despite environmental randomness in both simulators and real-world situations. Moreover, Make-An-Agent can produce high-performing policies even when given noisy trajectories, demonstrating the robustness of the model.

The policies generated by the Make-An-Agent in real-world scenarios are tested using a technique, walk-these-ways and trained on IsaacGym. Actor networks are generated using the proposed method based on trajectories from IsaacGym simulations and pre-trained adaptation modules. These generated policies are then deployed on real robots in environments different from the simulations. Each policy for real-world movement includes 50,956 parameters, and 1,500 policy networks are collected for each task in MetaWorld and Robosuite. These networks come from policy checkpoints during SAC training and are saved every 5,000 training steps after the test success rate hits 1.

In this paper, researchers present a new policy generation method called Make-An-Agent, based on conditional diffusion models. This method aims to generate policies in spaces with many parameters using an autoencoder to encode and reconstruct these parameters. The results, tested across various domains, show that their approach works well in multi-task settings, can handle new tasks, and is resistant to environmental randomness. However, due to a large number of parameters, more diverse policy networks are not explored, and the abilities of the parameter diffusion generator are limited by the parameter autoencoder, so, future research could look into more flexible ways of generating parameters.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. 

Join our Telegram Channel and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 46k+ ML SubReddit
The post Make-An-Agent: A Novel Policy Parameter Generator that Leverages the Power of Conditional Diffusion Models for Behavior-to-Policy Generation appeared first on MarkTechPost.

GPT-4o Mini: OpenAI’s Latest and Most Cost-Efficient Mini AI Model

OpneAI has just launched GPT-4o Mini, its most cost-efficient small AI Model. This model promises to broaden the scope of AI applications with its affordable pricing and powerful capabilities for the price.

GPT-4o mini is significantly more affordable than previous models. The GPT-4o mini is priced at 15 cents per million input tokens and 60 cents per million output tokens. This makes it an order of magnitude cheaper than its predecessors, including GPT-3.5 Turbo.

GPT-4o mini has already outperformed other small models on various benchmarks:

Reasoning Tasks: Scores 82% on MMLU, surpassing Gemini Flash and Claude Haiku in reasoning tasks involving text and vision.

Math and Coding Proficiency: Scores 87% on MGSM and 87.2% on HumanEval, leading its competitors in mathematical reasoning and coding tasks.

Multimodal Reasoning: Achieved 59.4% on MMMU, outperforming Gemini Flash and Claude Haiku at multimodal reasoning evaluation.

The model has a context window of 128K tokens and supports text and vision inputs and outputs. GPT-4o mini is highly versatile. Future updates will include image, video, and audio inputs and output support.

GPT-4o mini excels in applications that require:

Chaining or parallelizing multiple model calls.

Handling large volumes of context.

Providing fast, real-time text responses.

OpenAI has integrated strong safety measures into GPT-4o mini. The company has filtered out harmful content and applied reinforcement learning with human feedback (RLHF). The model also uses an instruction hierarchy method to resist jailbreaks and prompt injections, making it safer for large-scale applications.

GPT-4o mini is available to developers through various APIs. Free, Plus, and Team users in ChatGPT can access it immediately, with Enterprise users gaining access next week.

OpenAI seems committed to reducing costs while improving model capabilities. Fine-tuning for GPT-4o mini will be available soon, further expanding its usability.

GPT-4o mini has set a new standard in affordable and high-performing AI models. This mini-model allows for different applications and makes AI more accessible to developers and businesses. The future of AI looks more promising with this new model by OpenAI.
The post GPT-4o Mini: OpenAI’s Latest and Most Cost-Efficient Mini AI Model appeared first on MarkTechPost.

Mistral AI and NVIDIA Collaborate to Release Mistral NeMo: A 12B Open …

In collaboration with NVIDIA, the Mistral AI team has unveiled Mistral NeMo, a groundbreaking 12-billion parameter model that promises to set new standards in artificial intelligence. Released under the Apache 2.0 license, Mistral NeMo is designed to be a high-performance, multilingual model capable of handling a context window of up to 128,000 tokens. This extensive context length is a significant advancement, allowing the model to process and understand large amounts of data more efficiently than its predecessors. The team has released two variants:



Mistral NeMo stands out for its exceptional reasoning abilities, extensive world knowledge, and high coding accuracy, making it the top performer in its size category. Its architecture is based on standard designs, ensuring it can be easily integrated into any system currently using Mistral 7B. This seamless compatibility is expected to facilitate widespread adoption among researchers and enterprises seeking to leverage cutting-edge AI technology.

The Mistral AI team has released both pre-trained base and instruction-tuned checkpoints. These resources are intended to support the research community and industry professionals in their efforts to explore and implement advanced AI solutions. Mistral NeMo was developed with quantization awareness, enabling FP8 inference without any degradation in performance. This feature ensures the model operates efficiently even with lower precision data representations.

Image Source

A key component of Mistral NeMo’s success is its multilingual capability, making it a versatile tool for global applications. The model has been trained in function calling and is particularly adept in several major languages, including English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi. This broad linguistic proficiency aims to democratize access to advanced AI technologies, enabling users from diverse linguistic backgrounds to benefit from its capabilities.

Introducing Tekken, a new tokenizer, further enhances Mistral NeMo’s performance. Based on Tiktoken, Tekken was trained in over 100 languages and is significantly more efficient at compressing natural language text and source code than its predecessors. For instance, it is approximately 30% more efficient at compressing source code and several major languages, and it outperforms the Llama 3 tokenizer in compressing text for about 85% of all languages. This increased efficiency is crucial for handling the vast data required for modern AI applications.

Image Source

Mistral NeMo’s advanced instruction fine-tuning process distinguishes it from earlier models like Mistral 7B. The fine-tuning and alignment phases have significantly improved the model’s ability to follow precise instructions, reason effectively, handle multi-turn conversations, and generate accurate code. These enhancements are critical for applications requiring high interaction and accuracy, such as customer service bots, coding assistants, and interactive educational tools.

The performance of Mistral NeMo has been rigorously evaluated and compared with other leading models. It consistently demonstrates superior accuracy and efficiency, reinforcing its position as a state-of-the-art AI model. Weights for the base and instruction-tuned models are hosted on HuggingFace, making them readily available for developers and researchers. Additionally, Mistral NeMo can be accessed via Mistral Inference and adapted using Mistral Finetune, providing flexible options for various use cases.

Mistral NeMo is also integrated into NVIDIA’s NIM inference microservice, available through This integration highlights the collaborative effort between Mistral AI and NVIDIA to push the boundaries of AI technology and deliver robust, scalable solutions to the market.

In conclusion, the release of Mistral NeMo, with its advanced features, including extensive multilingual support, efficient data compression, and superior instruction-following capabilities, positions it as a powerful tool for researchers and enterprises. The collaboration between Mistral AI and NVIDIA exemplifies the potential of joint efforts in driving technological advancements and making cutting-edge AI accessible to a broader audience.

Weights are hosted on HuggingFace both for the Base and for the Instruct models. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. 

Join our Telegram Channel and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 46k+ ML SubReddit
The post Mistral AI and NVIDIA Collaborate to Release Mistral NeMo: A 12B Open Language Model Featuring 128k Context Window, Multilingual Capabilities, and Tekken Tokenizer appeared first on MarkTechPost.

Intelligent document processing using Amazon Bedrock and Anthropic Cla …

Generative artificial intelligence (AI) not only empowers innovation through ideation, content creation, and enhanced customer service, but also streamlines operations and boosts productivity across various domains. To effectively harness this transformative technology, Amazon Bedrock offers a fully managed service that integrates high-performing foundation models (FMs) from leading AI companies, such as AI21 Labs, Anthropic, Cohere, Meta, Stability AI, Mistral AI, and Amazon. By providing access to these advanced models through a single API and supporting the development of generative AI applications with an emphasis on security, privacy, and responsible AI, Amazon Bedrock enables you to use AI to explore new avenues for innovation and improve overall offerings.
Enterprise customers can unlock significant value by harnessing the power of intelligent document processing (IDP) augmented with generative AI. By infusing IDP solutions with generative AI capabilities, organizations can revolutionize their document processing workflows, achieving exceptional levels of automation and reliability. This combination enables advanced document understanding, highly effective structured data extraction, automated document classification, and seamless information retrieval from unstructured text. With these capabilities, organizations can achieve scalable, efficient, and high-value document processing that drives business transformation and competitiveness, ultimately leading to improved productivity, reduced costs, and enhanced decision-making.
In this post, we show how to develop an IDP solution using Anthropic Claude 3 Sonnet on Amazon Bedrock. We demonstrate how to extract data from a scanned document and insert it into a database.
The Anthropic Claude 3 Sonnet model is optimized for speed and efficiency, making it an excellent choice for intelligent tasks—particularly for enterprise workloads. It also possesses sophisticated vision capabilities, demonstrating a strong aptitude for understanding a wide range of visual formats, including photos, charts, graphs, and technical diagrams. Although we demonstrate this solution using the Anthropic Claude 3 Sonnet model, you can alternatively use the Haiku and Opus models if your use case requires them.
Solution overview
The proposed solution uses Amazon Bedrock and the powerful Anthropic Claude 3 Sonnet model to enable IDP capabilities. The architecture consists of several AWS services seamlessly integrated with the Amazon Bedrock, enabling efficient and accurate extraction of data from scanned documents.
The following diagram illustrates our solution architecture.

The solution consists of the following steps:

The process begins with scanned documents being uploaded and stored in an Amazon Simple Storage Service (Amazon S3) bucket, which invokes an S3 Event Notification on object upload.
This event invokes an AWS Lambda function, responsible for invoking the Anthropic Claude 3 Sonnet model on Amazon Bedrock.
The Anthropic Claude 3 Sonnet model, with its advanced multimodal capabilities, processes the scanned documents and extracts relevant data in a structured JSON format.
The extracted data from the Anthropic Claude 3 model is sent to an Amazon Simple Queue Service (Amazon SQS) queue. Amazon SQS acts as a buffer, allowing components to send and receive messages reliably without being directly coupled, providing scalability and fault tolerance in the system.
Another Lambda function consumes the messages from the SQS queue, parses the JSON data, and stores the extracted key-value pairs in an Amazon DynamoDB table for retrieval and further processing.

This serverless architecture takes advantage of the scalability and cost-effectiveness of AWS services while harnessing the cutting-edge intelligence of Anthropic Claude 3 Sonnet. By combining the robust infrastructure of AWS with Anthropic’s FMs, this solution enables organizations to streamline their document processing workflows, extract valuable insights, and enhance overall operational efficiency.
The solution uses the following services and features:

Amazon Bedrock is a fully managed service that provides access to large language models (LLMs), allowing developers to build and deploy their own customized AI applications.
The Anthropic Claude 3 family offers a versatile range of models tailored to meet diverse needs. With three options—Opus, Sonnet, and Haiku—you can choose the perfect balance of intelligence, speed, and cost. These models excel at understanding complex enterprise content, including charts, graphs, technical diagrams, and reports.
Amazon DynamoDB is a fully managed, serverless, NoSQL database service.
AWS Lambda is a serverless computing service that allows you to run code without provisioning or managing servers.
Amazon SQS is a fully managed message queuing service.
Amazon S3 is a highly scalable, durable, and secure object storage service.

In this solution, we use the generative AI capabilities in Amazon Bedrock to efficiently extract data. As of writing of this post, Anthropic Claude 3 Sonnet only accepts images as input. The supported file types are GIF, JPEG, PNG, and WebP. You can choose to save images during the scanning process or convert the PDF to images.
You can also enhance this solution by implementing human-in-the-loop and model evaluation features. The goal of this post is to demonstrate how you can build an IDP solution using Amazon Bedrock, but to use this as a production-scale solution, additional considerations should be taken into account, such as testing for edge case scenarios, better exception handling, trying additional prompting techniques, model fine-tuning, model evaluation, throughput requirements, number of concurrent requests to be supported, and carefully considering cost and latency implications.
You need the following prerequisites before you can proceed with this solution. For this post, we use the us-east-1 AWS Region. For details on available Regions, see Amazon Bedrock endpoints and quotas.

An AWS account with an AWS Identity and Access Management (IAM) user who has permissions to DynamoDB, Lambda, Amazon Bedrock, Amazon S3, Amazon SQS, Lambda, and IAM.
Access to the Anthropic Claude 3 Sonnet model in Amazon Bedrock. For instructions, see Manage model access.

Use case and dataset
For our example use case, let’s look at a state agency responsible for issuing birth certificates. The agency may receive birth certificate applications through various methods, such as online applications, forms completed at a physical location, and mailed-in completed paper applications. Today, most agencies spend a considerable amount of time and resources to manually extract the application details. The process begins with scanning the application forms, manually extracting the details, and then entering them into an application that eventually stores the data into a database. This process is time-consuming, inefficient, not scalable, and error-prone. Additionally, it adds complexity if the application form is in a different language (such as Spanish).
For this demonstration, we use sample scanned images of birth certificate application forms. These forms don’t contain any real personal data. Two examples are provided: one in English (handwritten) and another in Spanish (printed). Save these images as .jpeg files to your computer. You need them later for testing the solution.


Create an S3 bucket
On the Amazon S3 console, create a new bucket with a unique name (for example, bedrock-claude3-idp-{random characters to make it globally unique}) and leave the other settings as default. Within the bucket, create a folder named images and a sub-folder named birth_certificates.

Create an SQS queue
On the Amazon SQS console, create a queue with the Standard queue type, provide a name (for example, bedrock-idp-extracted-data), and leave the other settings as default.

Create a Lambda function to invoke the Amazon Bedrock model
On the Lambda console, create a function (for example, invoke_bedrock_claude3), choose Python 3.12 for the runtime, and leave the remaining settings as default. Later, you configure this function to be invoked every time a new image is uploaded into the S3 bucket. You can download the entire Lambda function code from Replace the contents of the file with the code from the downloaded file. Make sure to substitute {SQS URL} with the URL of the SQS queue you created earlier, then choose Deploy.
The Lambda function should perform the following actions:
s3 = boto3.client(‘s3’)
sqs = boto3.client(‘sqs’)
bedrock = boto3.client(‘bedrock-runtime’, region_name=’us-east-1′)
MODEL_ID = “anthropic.claude-3-sonnet-20240229-v1:0”

The following code gets the image from the S3 bucket using the get_object method and converts it to base64 data:
image_data = s3.get_object(Bucket=bucket_name, Key=object_key)[‘Body’].read()
base64_image = base64.b64encode(image_data).decode(‘utf-8’)

Prompt engineering is a critical factor in unlocking the full potential of generative AI applications like IDP. Crafting well-structured prompts makes sure that the AI system’s outputs are accurate, relevant, and aligned with your objectives, while mitigating potential risks.
With the Anthropic Claude 3 model integrated into the Amazon Bedrock IDP solution, you can use the model’s impressive visual understanding capabilities to effortlessly extract data from documents. Simply provide the image or document as input, and Anthropic Claude 3 will comprehend its contents, seamlessly extracting the desired information and presenting it in a human-readable format. All Anthropic Claude 3 models are capable of understanding non-English languages such as Spanish, Japanese, and French. In this particular use case, we demonstrate how to translate Spanish application forms into English by providing the appropriate prompt instructions.
However, LLMs like Anthropic Claude 3 can exhibit variability in their response formats. To achieve consistent and structured output, you can tailor your prompts to instruct the model to return the extracted data in a specific format, such as JSON with predefined keys. This approach enhances the interoperability of the model’s output with downstream applications and streamlines data processing workflows.
The following is the prompt with the specific JSON output format:
prompt = “””
This image shows a birth certificate application form.
Please precisely copy all the relevant information from the form.
Leave the field blank if there is no information in corresponding field.
If the image is not a birth certificate application form, simply return an empty JSON object.
If the application form is not filled, leave the fees attributes blank.
Translate any non-English text to English.
Organize and return the extracted data in a JSON format with the following keys:
“applicantName”: “”,
“dayPhoneNumber”: “”,
“address”: “”,
“city”: “”,
“state”: “”,
“zipCode”: “”,
“mailingAddressApplicantName”: “”,
“mailingAddress”: “”,
“mailingAddressCity”: “”,
“mailingAddressState”: “”,
“mailingAddressZipCode”: “”
“purposeOfRequest”: “”,

“nameOnBirthCertificate”: “”,
“dateOfBirth”: “”,
“sex”: “”,
“cityOfBirth”: “”,
“countyOfBirth”: “”,
“mothersMaidenName”: “”,
“fathersName”: “”,
“mothersPlaceOfBirth”: “”,
“fathersPlaceOfBirth”: “”,
“parentsMarriedAtBirth”: “”,
“numberOfChildrenBornInSCToMother”: “”,
“searchFee”: “”,
“eachAdditionalCopy”: “”,
“expediteFee”: “”,
“totalFees”: “”

Invoke the Anthropic Claude 3 Sonnet model using the Amazon Bedrock API. Pass the prompt and the base64 image data as parameters:
def invoke_claude_3_multimodal(prompt, base64_image_data):
request_body = {
“anthropic_version”: “bedrock-2023-05-31”,
“max_tokens”: 2048,
“messages”: [
“role”: “user”,
“content”: [
“type”: “text”,
“text”: prompt,
“type”: “image”,
“source”: {
“type”: “base64”,
“media_type”: “image/png”,
“data”: base64_image_data,

response = bedrock.invoke_model(modelId=MODEL_ID, body=json.dumps(request_body))
return json.loads(response[‘body’].read())
except bedrock.exceptions.ClientError as err:
print(f”Couldn’t invoke Claude 3 Sonnet. Here’s why: {err.response[‘Error’][‘Code’]}: {err.response[‘Error’][‘Message’]}”)

Send the Amazon Bedrock API response to the SQS queue using the send_message method:
def send_message_to_sqs(message_body):
sqs.send_message(QueueUrl=QUEUE_URL, MessageBody=json.dumps(message_body))
except sqs.exceptions.ClientError as e:
print(f”Error sending message to SQS: {e.response[‘Error’][‘Code’]}: {e.response[‘Error’][‘Message’]}”)

Next, modify the IAM role of the Lambda function to grant the required permissions:

On the Lambda console, navigate to the function.
On the Configuration tab, choose Permissions in the left pane.
Choose the IAM role (for example, invoke_bedrock_claude3-role-{random chars}).

This will open the role on a new tab.

In the Permissions policies section, choose Add permissions and Create inline policy.
On the Create policy page, switch to the JSON tab in the policy editor.
Enter the policy from the following code block, replacing {AWS Account ID} with your AWS account ID and {S3 Bucket Name} with your S3 bucket name.
Choose Next.
Enter a name for the policy (for example, invoke_bedrock_claude3-role-policy), and choose Create policy.

“Version”: “2012-10-17”,
“Statement”: [{
“Effect”: “Allow”,
“Action”: “bedrock:InvokeModel”,
“Resource”: “arn:aws:bedrock:us-east-1::foundation-model/*”
}, {
“Effect”: “Allow”,
“Action”: “s3:GetObject”,
“Resource”: “arn:aws:s3:::{S3 Bucket Name}/*”
}, {
“Effect”: “Allow”,
“Action”: “sqs:SendMessage”,
“Resource”: “arn:aws:sqs:us-east-1:{AWS Account ID}:bedrock-idp-extracted-data”

The policy will grant the following permissions:

Invoke model access to Amazon Bedrock FMs
Retrieve objects from the bedrock-claude3-idp… S3 bucket
Send messages to the bedrock-idp-extracted-data SQS queue for processing the extracted data

Additionally, modify the Lambda function’s timeout to 2 minutes. By default, it’s set to 3 seconds.
Create an S3 Event Notification
To create an S3 Event Notification, complete the following steps:

On the Amazon S3 console, open the bedrock-claude3-idp… S3 bucket.
Navigate to Properties, and in the Event notifications section, create an event notification.
Enter a name for Event name (for example, bedrock-claude3-idp-event-notification).
Enter images/birth_certificates/ for the prefix.
For Event Type, select Put in the Object creation section.
For Destination, select Lambda function and choose invoke_bedrock_claude3.
Choose Save changes.

Create a DynamoDB table
To store the extracted data in DynamoDB, you need to create a table. On the DynamoDB console, create a table called birth_certificates with Id as the partition key, and keep the remaining settings as default.
Create a Lambda function to insert records into the DynamoDB table
On the Lambda console, create a Lambda function (for example, insert_into_dynamodb), choose Python 3.12 for the runtime, and leave the remaining settings as default. You can download the entire Lambda function code from Replace the contents of the file with the code from the downloaded file and choose Deploy.
The Lambda function should perform the following actions:
Get the message from the SQS queue that contains the response from the Anthropic Claude 3 Sonnet model:
data = json.loads(event[‘Records’][0][‘body’])[‘content’][0][‘text’]
event_id = event[‘Records’][0][‘messageId’]
data = json.loads(data)

Create objects representing DynamoDB and its table:
dynamodb = boto3.resource(‘dynamodb’)
table = dynamodb.Table(‘birth_certificates’)
Get the key objects from the JSON data:
applicant_details = data.get(‘applicantDetails’, {})
mailing_address = data.get(‘mailingAddress’, {})
relation_to_applicant = data.get(‘relationToApplicant’, [])
birth_certificate_details = data.get(‘BirthCertificateDetails’, {})
fees = data.get(‘fees’, {})

Insert the extracted data into DynamoDB table using put_item() method:
‘Id’: event_id,
‘applicantName’: applicant_details.get(‘applicantName’, ”),
‘dayPhoneNumber’: applicant_details.get(‘dayPhoneNumber’, ”),
‘address’: applicant_details.get(‘address’, ”),
‘city’: applicant_details.get(‘city’, ”),
‘state’: applicant_details.get(‘state’, ”),
‘zipCode’: applicant_details.get(‘zipCode’, ”),
’email’: applicant_details.get(’email’, ”),
‘mailingAddressApplicantName’: mailing_address.get(‘mailingAddressApplicantName’, ”),
‘mailingAddress’: mailing_address.get(‘mailingAddress’, ”),
‘mailingAddressCity’: mailing_address.get(‘mailingAddressCity’, ”),
‘mailingAddressState’: mailing_address.get(‘mailingAddressState’, ”),
‘mailingAddressZipCode’: mailing_address.get(‘mailingAddressZipCode’, ”),
‘relationToApplicant’: ‘, ‘.join(relation_to_applicant),
‘purposeOfRequest’: data.get(‘purposeOfRequest’, ”),
‘nameOnBirthCertificate’: birth_certificate_details.get(‘nameOnBirthCertificate’, ”),
‘dateOfBirth’: birth_certificate_details.get(‘dateOfBirth’, ”),
‘sex’: birth_certificate_details.get(‘sex’, ”),
‘cityOfBirth’: birth_certificate_details.get(‘cityOfBirth’, ”),
‘countyOfBirth’: birth_certificate_details.get(‘countyOfBirth’, ”),
‘mothersMaidenName’: birth_certificate_details.get(‘mothersMaidenName’, ”),
‘fathersName’: birth_certificate_details.get(‘fathersName’, ”),
‘mothersPlaceOfBirth’: birth_certificate_details.get(‘mothersPlaceOfBirth’, ”),
‘fathersPlaceOfBirth’: birth_certificate_details.get(‘fathersPlaceOfBirth’, ”),
‘parentsMarriedAtBirth’: birth_certificate_details.get(‘parentsMarriedAtBirth’, ”),
‘numberOfChildrenBornInSCToMother’: birth_certificate_details.get(‘numberOfChildrenBornInSCToMother’, ”),
‘diffNameAtBirth’: birth_certificate_details.get(‘diffNameAtBirth’, ”),
‘searchFee’: fees.get(‘searchFee’, ”),
‘eachAdditionalCopy’: fees.get(‘eachAdditionalCopy’, ”),
‘expediteFee’: fees.get(‘expediteFee’, ”),
‘totalFees’: fees.get(‘totalFees’, ”)

Next, modify the IAM role of the Lambda function to grant the required permissions. Follow the same steps you used to modify the permissions for the invoke_bedrock_claude3 Lambda function, but enter the following JSON as the inline policy:
“Version”: “2012-10-17”,
“Statement”: [
“Sid”: “VisualEditor0”,
“Effect”: “Allow”,
“Action”: “dynamodb:PutItem”,
“Resource”: “arn:aws:dynamodb:us-east-1::{AWS Account ID}:table/birth_certificates”
“Sid”: “VisualEditor1”,
“Effect”: “Allow”,
“Action”: [
“Resource”: “arn:aws:sqs:us-east-1::{AWS Account ID}:bedrock-idp-extracted-data”

Enter a policy name (for example, insert_into_dynamodb-role-policy) and choose Create policy.

The policy will grant the following permissions:

Put records into the DynamoDB table
Read and delete messages from the SQS queue

Configure the Lambda function trigger for SQS
Complete the following steps to create a trigger for the Lambda function:

On the Amazon SQS console, open the bedrock-idp-extracted-data queue.
On the Lambda triggers tab, choose Configure Lambda function trigger.
Select the insert_into_dynamodb Lambda function and choose Save.

Test the solution
Now that you have created all the necessary resources, permissions, and code, it’s time to test the solution.
In the S3 folder birth_certificates, upload the two scanned images that you downloaded earlier. Then open the DynamoDB console and explore the items in the birth_certificates table.
If everything is configured properly, you should see two items in DynamoDB in just a few seconds, as shown in the following screenshots. For the Spanish form, Anthropic Claude 3 automatically translated the keys and labels from Spanish to English based on the prompt.

If you don’t see the extracted data in the DynamoDB table, you can investigate the issue:

Check CloudWatch logs – Review the Amazon CloudWatch log streams of the Lambda functions involved in the data extraction and ingestion process. Look for any error messages or exceptions that may indicate the root cause of the issue.
Identify missing permissions – In many cases, errors can occur due to missing permissions. Confirm that the Lambda functions have the necessary permissions to access the required AWS resources, such as DynamoDB tables, S3 buckets, or other services involved in the solution.
Implement a dead-letter queue – In a production-scale solution, it is recommended to implement a dead letter queue (DLQ) to catch and handle any events or messages that fail to process or encounter errors.

Clean up
Clean up the resources created as part of this post to avoid incurring ongoing charges:

Delete all the objects from the bedrock-claude3-idp… S3 bucket, then delete the bucket.
Delete the two Lambda functions named invoke_bedrock_claude3 and insert_into_dynamodb.
Delete the SQS queue named bedrock-idp-extracted-data.
Delete the DynamoDB table named birth_certificates.

Example use cases and business value
The generative AI-powered IDP solution demonstrated in this post can benefit organizations across various industries, such as:

Government and public sector – Process and extract data from citizen applications, immigration documents, legal contracts, and other government-related forms, enabling faster turnaround times and improved service delivery
Healthcare – Extract and organize patient information, medical records, insurance claims, and other health-related documents, improving data accuracy and accessibility for better patient care
Finance and banking – Automate the extraction and processing of financial documents, loan applications, tax forms, and regulatory filings, reducing manual effort and increasing operational efficiency
Logistics and supply chain – Extract and organize data from shipping documents, invoices, purchase orders, and inventory records, streamlining operations and enhancing supply chain visibility
Retail and ecommerce – Automate the extraction and processing of customer orders, product catalogs, and marketing materials, enabling personalized experiences and efficient order fulfillment

By using the power of generative AI and Amazon Bedrock, organizations can unlock the true potential of their data, driving operational excellence, enhancing customer experiences, and fostering continuous innovation.
In this post, we demonstrated how to use Amazon Bedrock and the powerful Anthropic Claude 3 Sonnet model to develop an IDP solution. By harnessing the advanced multimodal capabilities of Anthropic Claude 3, we were able to accurately extract data from scanned documents and store it in a structured format in a DynamoDB table.
Although this solution showcases the potential of generative AI in IDP, it may not be suitable for all IDP use cases. The effectiveness of the solution may vary depending on the complexity and quality of the documents, the amount of training data available, and the specific requirements of the organization.
To further enhance the solution, consider implementing a human-in-the-loop workflow to review and validate the extracted data, especially for mission-critical or sensitive applications. This will provide data accuracy and compliance with regulatory requirements. You can also explore the model evaluation feature in Amazon Bedrock to compare model outputs, and then choose the model best suited for your downstream generative AI applications.
For further exploration and learning, we recommend checking out the following resources:

Amazon Bedrock Developer Guide
Anthropic’s Claude 3 Opus model is now available on Amazon Bedrock
Anthropic Claude 3

About the Authors
Govind Palanisamy is a Solutions Architect at AWS, where he helps government agencies migrate and modernize their workloads to increase citizen experience. He is passionate about technology and transformation, and he helps customers transform their businesses using AI/ML and generative AI-based solutions.
Bharath Gunapati is a Sr. Solutions architect at AWS, where he helps clinicians, researchers, and staff at academic medical centers to adopt and use cloud technologies. He is passionate about technology and the impact it can make on healthcare and research.

Metadata filtering for tabular data with Knowledge Bases for Amazon Be …

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading artificial intelligence (AI) companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API. To equip FMs with up-to-date and proprietary information, organizations use Retrieval Augmented Generation (RAG), a technique that fetches data from company data sources and enriches the prompt to provide more relevant and accurate responses. Knowledge Bases for Amazon Bedrock is a fully managed capability that helps you implement the entire RAG workflow, from ingestion to retrieval and prompt augmentation. However, information about one dataset can be in another dataset, called metadata. Without using metadata, your retrieval process can cause the retrieval of unrelated results, thereby decreasing FM accuracy and increasing cost in the FM prompt token.
On March 27, 2024, Amazon Bedrock announced a key new feature called metadata filtering and also changed the default engine. This change allows you to use metadata fields during the retrieval process. However, the metadata fields need to be configured during the knowledge base ingestion process. Often, you might have tabular data where details about one field are available in another field. Also, you could have a requirement to cite the exact text document or text field to prevent hallucination. In this post, we show you how to use the new metadata filtering feature with Knowledge Bases for Amazon Bedrock for such tabular data.
Solution overview
The solution consists of the following high-level steps:

Prepare data for metadata filtering.
Create and ingest data and metadata into the knowledge base.
Retrieve data from the knowledge base using metadata filtering.

Prepare data for metadata filtering
As of this writing, Knowledge Bases for Amazon Bedrock supports Amazon OpenSearch Serverless, Amazon Aurora, Pinecone, Redis Enterprise, and MongoDB Atlas as underlying vector store providers. In this post, we create and access an OpenSearch Serverless vector store using the Amazon Bedrock Boto3 SDK. For more details, see Set up a vector index for your knowledge base in a supported vector store.
For this post, we create a knowledge base using the public dataset – Recipes and Reviews. The following screenshot shows an example of the dataset.

The TotalTime is in ISO 8601 format. You can convert that to minutes using the following logic:

# Function to convert ISO 8601 duration to minutes
def convert_to_minutes(duration):
hours = 0
minutes = 0

# Find hours and minutes using regex
match = re.match(r’PT(?:(d+)H)?(?:(d+)M)?’, duration)

if match:
hours = int(
minutes = int(

# Convert total time to minutes
total_minutes = hours * 60 + minutes
return total_minutes

df[‘TotalTimeInMinutes’] = df[‘TotalTime’].apply(convert_to_minutes)

After converting some of the features like CholesterolContent, SugarContent, and RecipeInstructions, the data frame looks like the following screenshot.

To enable the FM to point to a specific menu with a link (cite the document), we split each row of the tabular data in a single text file, with each file containing RecipeInstructions as the data field and TotalTimeInMinutes, CholesterolContent, and SugarContent as metadata. The metadata should be kept in a separate JSON file with the same name as the data file and .metadata.json added to its name. For example, if the data file name is 100.txt, the metadata file name should be 100.txt.metadata.json. For more details, see Add metadata to your files to allow for filtering. Also, the content in the metadata file should be in the following format:

“metadataAttributes”: {
“${attribute1}”: “${value1}”,
“${attribute2}”: “${value2}”,


For the sake of simplicity, we only process the top 2,000 rows to create the knowledge base.

After you import the necessary libraries, create a local directory using the following Python code:

import pandas as pd
import os, json, tqdm, boto3

metafolder = ‘multi_file_recipe_data’os.mkdir(metafolder)

Iterate over the top 2,000 rows to create data and metadata files to store in the local folder:

for i in tqdm.trange(2000):
desc = str(df[‘RecipeInstructions’][i])
meta = {
“metadataAttributes”: {
“Name”: str(df[‘Name’][i]),
“TotalTimeInMinutes”: str(df[‘TotalTimeInMinutes’][i]),
“CholesterolContent”: str(df[‘CholesterolContent’][i]),
“SugarContent”: str(df[‘SugarContent’][i]),
filename = metafolder+’/’ + str(i+1)+ ‘.txt’
f = open(filename, ‘w’)
metafilename = filename+’.metadata.json’
with open( metafilename, ‘w’) as f:
json.dump(meta, f)

Create an Amazon Simple Storage Service (Amazon S3) bucket named food-kb and upload the files:

# Upload data to s3
s3_client = boto3.client(“s3”)
bucket_name = “recipe-kb”
data_root = metafolder+’/’
def uploadDirectory(path,bucket_name):
for root,dirs,files in os.walk(path):
for file in tqdm.tqdm(files):

uploadDirectory(data_root, bucket_name)

Create and ingest data and metadata into the knowledge base
When the S3 folder is ready, you can create the knowledge base on the Amazon Bedrock console using the SDK according to this example notebook.
Retrieve data from the knowledge base using metadata filtering
Now let’s retrieve some data from the knowledge base. For this post, we use Anthropic Claude Sonnet on Amazon Bedrock for our FM, but you can choose from a variety of Amazon Bedrock models. First, you need to set the following variables, where kb_id is the ID of your knowledge base. The knowledge base ID can be found programmatically, as shown in the example notebook, or from the Amazon Bedrock console by navigating to the individual knowledge base, as shown in the following screenshot.

Set the required Amazon Bedrock parameters using the following code:

import boto3
import pprint
from botocore.client import Config
import json

pp = pprint.PrettyPrinter(indent=2)
session = boto3.session.Session()
region = session.region_name
bedrock_config = Config(connect_timeout=120, read_timeout=120, retries={‘max_attempts’: 0})
bedrock_client = boto3.client(‘bedrock-runtime’, region_name = region)
bedrock_agent_client = boto3.client(“bedrock-agent-runtime”,
config=bedrock_config, region_name = region)
kb_id = “EIBBXVFDQP”
model_id = ‘anthropic.claude-3-sonnet-20240229-v1:0’

# retrieve api for fetching only the relevant context.

query = ” Tell me a recipe that I can make under 30 minutes and has cholesterol less than 10 ”

relevant_documents = bedrock_agent_runtime_client.retrieve(
retrievalQuery= {
‘text’: query
retrievalConfiguration= {
‘vectorSearchConfiguration’: {
‘numberOfResults’: 2

The following code is the output of the retrieval from the knowledge base without metadata filtering for the query “Tell me a recipe that I can make under 30 minutes and has cholesterol less than 10.” As we can see, out of the two recipes, the preparation durations are 30 and 480 minutes, respectively, and the cholesterol contents are 86 and 112.4, respectively. Therefore, the retrieval isn’t following the query accurately.

The following code demonstrates how to use the Retrieve API with the metadata filters set to a cholesterol content less than 10 and minutes of preparation less than 30 for the same query:

def retrieve(query, kbId, numberOfResults=5):
return bedrock_agent_client.retrieve(
retrievalQuery= {
‘text’: query
retrievalConfiguration= {
‘vectorSearchConfiguration’: {
‘numberOfResults’: numberOfResults,
“filter”: {
“lessThan”: {
“key”: “CholesterolContent”,
“value”: 10
“lessThan”: {
“key”: “TotalTimeInMinutes”,
“value”: 30
query = “Tell me a recipe that I can make under 30 minutes and has cholesterol less than 10″
response = retrieve(query, kb_id, 2)
retrievalResults = response[‘retrievalResults’]

As we can see in the following results, out of the two  recipes, the preparation times are 27 and 20, respectively, and the cholesterol contents are 0 and 0, respectively. With the use of metadata filtering, we get more accurate results.

The following code shows how to get accurate output using the same metadata filtering with the retrieve_and_generate API. First, we set the prompt, then we set up the API with metadata filtering:

prompt = f”””
Human: You have great knowledge about food, so provide answers to questions by using fact.
If you don’t know the answer, just say that you don’t know, don’t try to make up an answer.


def retrieve_and_generate(query, kb_id,modelId, numberOfResults=10):
return bedrock_agent_client.retrieve_and_generate(
input= {
‘text’: query,
‘knowledgeBaseConfiguration’: {
‘generationConfiguration’: {
‘promptTemplate’: {
‘textPromptTemplate’: f”{prompt} $search_results$”
‘knowledgeBaseId’: kb_id,
‘modelArn’: model_id,
‘retrievalConfiguration’: {
‘vectorSearchConfiguration’: {
‘numberOfResults’: numberOfResults,
‘overrideSearchType’: ‘HYBRID’,
“filter”: {
“lessThan”: {
“key”: “CholesterolContent”,
“value”: 10
“lessThan”: {
“key”: “TotalTimeInMinutes”,
“value”: 30

query = “Tell me a recipe that I can make under 30 minutes and has cholesterol less than 10”
response = retrieve_and_generate(query, kb_id,modelId, numberOfResults=10)

As we can see in the following output, the model returns a detailed recipe that follows the instructed metadata filtering of less than 30 minutes of preparation time and a cholesterol content less than 10.

Clean up
Make sure to comment the following section if you’re planning to use the knowledge base that you created for building your RAG application. If you only wanted to try out creating the knowledge base using the SDK, make sure to delete all the resources that were created because you will incur costs for storing documents in the OpenSearch Serverless index. See the following code:

bedrock_agent_client.delete_data_source(dataSourceId = ds[“dataSourceId”], knowledgeBaseId=kb[‘knowledgeBaseId’])
aoss_client.delete_access_policy(type=”data”, name=access_policy[‘accessPolicyDetail’][‘name’])
aoss_client.delete_security_policy(type=”network”, name=network_policy[‘securityPolicyDetail’][‘name’])
aoss_client.delete_security_policy(type=”encryption”, name=encryption_policy[‘securityPolicyDetail’][‘name’])
# Delete roles and polices

In this post, we explained how to split a large tabular dataset into rows to set up a knowledge base with metadata for each of those records, and how to then retrieve outputs with metadata filtering. We also showed how retrieving results with metadata is more accurate than retrieving results without metadata filtering. Lastly, we showed how to use the result with an FM to get accurate results.
To further explore the capabilities of Knowledge Bases for Amazon Bedrock, refer to the following resources:

Knowledge bases for Amazon Bedrock
Amazon Bedrock Knowledge Base – Samples for building RAG workflows

About the Author
Tanay Chowdhury is a Data Scientist at Generative AI Innovation Center at Amazon Web Services. He helps customers to solve their business problem using Generative AI and Machine Learning.

Secure AccountantAI Chatbot: Lili’s journey with Amazon Bedrock

This post was written in collaboration with Liran Zelkha and Eyal Solnik from Lili.
Small business proprietors tend to prioritize the operational aspects of their enterprises over administrative tasks, such as maintaining financial records and accounting. While hiring a professional accountant can provide valuable guidance and expertise, it can be cost-prohibitive for many small businesses. Moreover, the availability of accountants might not always align with the immediate needs of business owners, leaving them with unanswered questions or delayed decision-making processes.
In the rapidly evolving world of large language models (LLMs) and generative artificial intelligence (AI), Lili recognized an opportunity to use this technology to address the financial advisory needs of their small business customers. Using Anthropic’s Claude 3 Haiku on Amazon Bedrock, Lili developed an intelligent AccountantAI chatbot capable of providing on-demand accounting advice tailored to each customer’s financial history and unique business requirements. The AccountantAI chatbot serves as a virtual assistant, offering affordable and readily available financial guidance, empowering small business owners to focus on their core expertise while ensuring the financial health of their operations.
About Lili
Lili is a financial platform designed specifically for businesses, offering a combination of advanced business banking with built-in accounting and tax preparation software.
By consolidating financial tools into a user-friendly interface, Lili streamlines and simplifies managing business finances and makes it an attractive solution for business owners seeking a centralized and efficient way to manage their financial operations.
In this post, we’ll explore how Lili, a financial platform designed specifically for businesses, used Amazon Bedrock to build a secure and intelligent AccountantAI chatbot for small business owners. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like Anthropic, Meta, Mistral AI, Stability AI, Cohere, AI21 Labs, and Amazon through a single API, along with a broad set of capabilities that you need to build generative AI applications with security, privacy, and responsible AI.
Solution overview
The AccountantAI chatbot provides small business owners with accurate and relevant financial accounting advice in a secure manner. To achieve this, the solution is designed to address two key requirements:

Question validation: Implementing guardrails to ensure that the user’s input is a valid and a legitimate financial accounting question. This step helps filter out irrelevant or inappropriate queries, maintaining the integrity of the system.
Context enrichment: Augmenting the user’s question with relevant contextual data, such as up-to-date accounting information and user-specific financial data. This step ensures that the chatbot’s responses are tailored to the individual user’s business and financial situation, providing more personalized and actionable advice.

To address the two key requirements of question validation and context enrichment, the AccountantAI solution employs a two-stage architecture comprising an ingestion workflow and a retrieval workflow.
Ingestion workflow

The ingestion workflow is an offline process that prepares the system for serving customer queries. For this stage, Lili curated a comprehensive golden collection of financial accounting questions, drawing from common inquiries as well as real-world questions from their customer base over the years. This diverse and high-quality collection serves as a reference corpus, ensuring that the chatbot can handle a wide range of relevant queries. The ingestion workflow transforms these curated questions into vector embeddings using Amazon Titan Text Embeddings model API. This process occurs over AWS PrivateLink for Amazon Bedrock, a protected and private connection in your VPC. The vector embeddings are persisted in the application in-memory vector store. These vectors will help to validate user input during the retrieval workflow.
Each curated vector embedding is paired with a matching prompt template that was evaluated during testing to be the most effective.
Example prompt template

Provides context about the agent’s role as Lili’s AI assistant for financial questions and outlines the general guidelines applied to all queries.

Provides details on Lili platform.

Lists out all of Lili’s product features in detail. This section aims to explain Lili’s features in detail, ensuring that answers are aligned with the Lili platform. For instance, when addressing questions about tax reduction management, highlight the relevant features that Lili offers, which customers should be familiar with.

Outlines the required formatting for the response to ensure it meets the expected structure.

Data relevant to answering the customer’s question.

Specific accounting knowledge that is relevant to the question and the model is not familiar with, such as updated data for 2024.

Contains the user’s actual question.

Provides the core instructions on how to approach answering the question appropriately and meet expectations. It also defines the steps in providing a detailed and high-quality answer.

Important guidelines to remind the agent and make sure it follows them, such as the exact format of the answer.

Retrieval workflow

Lili’s web chatbot web interface allows users to submit queries and receive real-time responses. When a customer asks a question, it’s sent to the backend system for processing.

The system first converts the query into a vector embedding using the Amazon Titan Text Embeddings model API, which is accessed securely through PrivateLink.
Next, the system performs a similarity search on the pre-computed embeddings of the golden collection, to find the most relevant matches for the user’s query. The system evaluates the similarity scores of the search results against a predetermined threshold. If the user’s question yields matches with low similarity scores, it’s deemed malformed or unclear, and the user is prompted to rephrase or refine their query.
However, if the user’s question produces matches with high similarity scores, it’s considered a legitimate query. In this case, Lili’s backend system proceeds with further processing using the golden question that has the highest similarity score to the user’s query.
Based on the golden question with the highest similarity score, the system retrieves the corresponding prompt template.

This template is augmented with up-to-date accounting information and the customer’s specific financial data from external sources such as Amazon RDS for MySQL. The resulting contextualized prompt is sent to Anthropic’s Claude 3 Haiku on Amazon Bedrock, which generates a tailored response addressing the customer’s query within their unique business context.
Because model providers continually enhance their offerings with innovative updates, Amazon Bedrock simplifies the ability to adopt emerging advancements in generative AI across multiple model providers. This approach has demonstrated its advantages right from the initial rollout of AccountantAI. Lili transitioned from Anthropic’s Claude Instant to Claude 3 within two weeks of its official release on the Amazon Bedrock environment and three weeks after its general availability.
Lili selected Anthropic’s Claude model family for AccountantAI after reviewing industry benchmarks and conducting their own quality assessment. Anthropic Claude on Amazon Bedrock consistently outperformed other models in understanding financial concepts, generating coherent natural language, and providing accurate, tailored recommendations.
After the initial release of AcountantAI, Amazon Bedrock introduced Anthropic’s Claude 3 Haiku model, which Lili evaluated against Anthropic Claude Instant version. The Anthropic Claude 3 Haiku model demonstrated significant improvements across three key evaluation metrics:

Quality – Anthropic Claude 3 Haiku delivered higher quality outputs, providing more detailed and better-phrased responses compared to its predecessor.
Response time – Anthropic Claude 3 Haiku exhibited a 10 percent to 20 percent improvement in response times over Claude Instant, offering faster performance.
Cost – Anthropic Claude 3 Haiku on Amazon Bedrock is the most cost-effective choice. For instance, it is up to 68 percent less costly per 1,000 input/output tokens compared to Anthropic Claude Instant, while delivering higher levels of intelligence and performance. See Anthropic’s Claude 3 models on Amazon Bedrock for more information.

For customers like Lili, this underscores the importance of having access to a fully managed service like Amazon Bedrock, which offers a choice of high-performing foundation models to meet diverse enterprise AI needs. There is no “one size fits all” model, and the ability to select from a range of cutting-edge FMs is crucial for organizations seeking to use the latest advancements in generative AI effectively and cost-efficiently.
The AccountantAI feature, exclusively available to Lili customers, reduces the need for hiring a professional accountant. While professional accountants can provide valuable guidance and expertise, their services can be cost-prohibitive for many small businesses. AccountantAI has already answered thousands of questions, delivering real value to businesses and providing quality responses to financial, tax, and accounting inquiries.
Using Amazon Bedrock for easy, secure, and reliable access to high-performing foundation models from leading AI companies, Lili integrates accounting knowledge at scale with each customer’s unique data. This innovative solution offers affordable expertise on optimizing cash flow, streamlining tax planning, and enabling informed decisions to drive growth. AccountantAI bridges the gap in accounting resources, democratizing access to high-quality financial intelligence for every business.
Explore Lili’s AccountantAI feature powered by Amazon Bedrock to gain affordable and accessible financial intelligence for your business today, or use Amazon Bedrock Playgrounds to experiment with running inference on different models on your data.

About the authors
Doron Bleiberg is a senior AWS Startups Solution Architect helping Fintech customers in their cloud journey.
Liran Zelkha is the co-founder and CTO at Lili, leading our development and data efforts.
Eyal Solnik is the head of Data at Lili and leads our AccountantAI product.

NVIDIA Researchers Introduce Flextron: A Network Architecture and Post …

Large language models (LLMs) such as GPT-3 and Llama-2 have made significant strides in understanding and generating human language. These models boast billions of parameters, allowing them to perform complex tasks accurately. However, the substantial computational resources required for training and deploying these models present significant challenges, particularly in resource-limited environments. Addressing these challenges is essential to making AI technologies more accessible and broadly applicable.

The primary issue with deploying large language models is their immense size and the corresponding need for extensive computational power and memory. This limitation significantly restricts their usability in scenarios where computational resources are constrained. Traditionally, multiple versions of the same model are trained to balance efficiency and accuracy based on the available resources. For example, the Llama-2 model family includes variants with 7 billion, 13 billion, and 70 billion parameters. Each variant is designed to operate efficiently within different levels of computational power. However, this approach is resource-intensive, requiring significant effort and computational resource duplication.

Existing methods to address this issue include training several versions of a model, each tailored for different resource constraints. While effective in providing flexibility, this strategy involves considerable redundancy in the training process, consuming time and computational resources. For instance, training multiple multi-billion parameter models, like those in the Llama-2 family, demands substantial data and computational power, making the process impractical for many applications. To streamline this, researchers have been exploring more efficient alternatives.

Researchers from NVIDIA and the University of Texas at Austin introduced FLEXTRON, a novel flexible model architecture and post-training optimization framework. FLEXTRON is designed to support adaptable model deployment without requiring additional fine-tuning, thus addressing the inefficiencies of traditional methods. This architecture employs a nested elastic structure, allowing it to adjust dynamically to specific latency and accuracy targets during inference. This adaptability makes using a single pre-trained model across various deployment scenarios possible, significantly reducing the need for multiple model variants.

FLEXTRON transforms a pre-trained LLM into an elastic model through a sample-efficient training method and advanced routing algorithms. The transformation process includes ranking and grouping network components and training routers that manage sub-network selection based on user-defined constraints such as latency and accuracy. This innovative approach enables the model to automatically select the optimal sub-network during inference, ensuring efficient and accurate performance across different computational environments.

Performance evaluations of FLEXTRON demonstrated its superior efficiency and accuracy compared to multiple end-to-end trained models and other state-of-the-art elastic networks. For example, FLEXTRON performed remarkably on the GPT-3 and Llama-2 model families, requiring only 7.63% of the training tokens used in the original pre-training. This efficiency translates into significant savings in computational resources and time. The evaluation included various benchmarks, such as ARC-easy, LAMBADA, PIQA, WinoGrande, MMLU, and HellaSwag, where FLEXTRON consistently outperformed other models.

The FLEXTRON framework also includes an elastic Multi-Layer Perceptron (MLP) and elastic Multi-Head Attention (MHA) layers, enhancing its adaptability. Elastic MHA layers, which constitute a significant portion of LLM runtime and memory usage, improve overall efficiency by selecting a subset of attention heads based on the input data. This feature is particularly beneficial in scenarios with limited computational resources, as it allows more efficient use of available memory and processing power.

In conclusion, FLEXTRON, offering a flexible and adaptable architecture that optimizes resource use and performance, addresses the critical need for efficient model deployment in diverse computational environments. The introduction of this framework by researchers from NVIDIA and the University of Texas at Austin highlights the potential for innovative solutions in overcoming the challenges associated with large language models.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. 

Join our Telegram Channel and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 46k+ ML SubReddit

The post NVIDIA Researchers Introduce Flextron: A Network Architecture and Post-Training Model Optimization Framework Supporting Flexible AI Model Deployment appeared first on MarkTechPost.

Nvidia AI Releases BigVGAN v2: A State-of-the-Art Neural Vocoder Trans …

In the rapidly developing field of audio synthesis, Nvidia has recently introduced BigVGAN v2. This neural vocoder breaks previous records for audio creation speed, quality, and adaptability by converting Mel spectrograms into high-fidelity waveforms. This team has thoroughly examined the main enhancements and ideas that set BigVGAN v2 apart.

One of BigVGAN v2’s most notable features is its unique inference CUDA kernel, which combines fused upsampling and activation processes. With this breakthrough, performance has been greatly increased, with Nvidia’s A100 GPUs attaining up to three times faster inference speeds. BigVGAN v2 assures that high-quality audio may be synthesized more efficiently than ever before by streamlining the processing pipeline, which makes it an invaluable tool for real-time applications and massive audio projects.

Nvidia has also improved BigVGAN v2’s discriminator and loss algorithms significantly. The unique model uses a multi-scale Mel spectrogram loss in conjunction with a multi-scale sub-band constant-Q transform (CQT) discriminator. Improved fidelity in the synthesized waveforms results from this twofold upgrade, which makes it easier to analyze audio quality during training in a more accurate and subtle manner. BigVGAN v2 can now more accurately record and replicate the minute nuances of a wide range of audio formats, including intricate musical compositions and human speech.

The training regimen for BigVGAN v2 makes use of a large dataset that contains a variety of audio categories, such as musical instruments, speech in several languages, and ambient noises. The model has a strong capacity to generalize across various audio situations and sources with the help of a variety of training data. The end product is a universal vocoder that can be applied to a wide range of settings and is remarkably accurate in handling out-of-distribution scenarios without requiring fine-tuning.

BigVGAN v2’s pre-trained model checkpoints enable a 512x upsampling ratio and sampling speeds up to 44 kHz. In order to meet the requirements of professional audio production and research, this feature guarantees that the generated audio maintains high resolution and fidelity. BigVGAN v2 produces audio of unmatched quality, whether it is used to create realistic environmental soundscapes, lifelike synthetic voices, or sophisticated instrumental compositions.

Nvidia is opening up a wide range of applications in industries, including media and entertainment, assistive technology, and more, with the innovations in BigVGAN v2. BigVGAN v2’s improved performance and adaptability make it a priceless tool for researchers, developers, and content producers who want to push the limits of audio synthesis.

Neural vocoding technology has advanced significantly with the release of Nvidia’s BigVGAN v2. It is an effective tool for producing high-quality audio because of its sophisticated CUDA kernels, improved discriminator and loss functions, variety of training data, and high-resolution output capabilities. With its promise to transform audio synthesis and interaction in the digital age, Nvidia’s BigVGAN v2 establishes a new benchmark in the industry.

Check out the Model and Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. 

Join our Telegram Channel and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 46k+ ML SubReddit

The post Nvidia AI Releases BigVGAN v2: A State-of-the-Art Neural Vocoder Transforming Audio Synthesis appeared first on MarkTechPost.

Is 9.11 larger than 9.9? Comparison on Llama 3 vs Claude vs Gpt 4o vs …

Today, in a really interesting Reddit post, we saw someone comparing 9.9 vs 9.11 on various AI Chatbot Models (Llama 3 vs Claude vs Gpt 4o vs. Gemini). So, we tried asking these models, and we found these interesting findings

We asked Llama 3:‘Is 9.11 larger than 9.9?’The answer was ‘Yes,’ and of course that’s wrong. Please see the screenshot

We asked Claude:‘Is 9.11 larger than 9.9?’The answer was ‘No’, and of course, that’s correct. Please see the screenshot

We asked Gpt 4o:‘Is 9.11 larger than 9.9?’The answer was ‘No,’ and of course that’s correct. Please see the screenshot

We asked Gemini‘Is 9.11 larger than 9.9?’The answer was ‘No’, and of course, that’s correct. Please see the screenshot.

Date: July 17, 2024

What could be some possible reasons for Llama 3’s wrong answer to this question, ‘Is 9.11 larger than 9.9?”???

The post Is 9.11 larger than 9.9? Comparison on Llama 3 vs Claude vs Gpt 4o vs Gemini appeared first on MarkTechPost.

How Deloitte Italy built a digital payments fraud detection solution u …

As digital commerce expands, fraud detection has become critical in protecting businesses and consumers engaging in online transactions. Implementing machine learning (ML) algorithms enables real-time analysis of high-volume transactional data to rapidly identify fraudulent activity. This advanced capability helps mitigate financial risks and safeguard customer privacy within expanding digital markets.
Deloitte is a strategic global systems integrator with over 19,000 certified AWS practitioners across the globe. It continues to raise the bar through participation in the AWS Competency Program with 29 competencies, including Machine Learning.
This post demonstrates the potential for quantum computing algorithms paired with ML models to revolutionize fraud detection within digital payment platforms. We share how Deloitte built a hybrid quantum neural network solution with Amazon Braket to demonstrate the possible gains coming from this emerging technology.
The promise of quantum computing
Quantum computers harbor the potential to radically overhaul financial systems, enabling much faster and more precise solutions. Compared to classical computers, quantum computers are expected in the long run to have to advantages in the areas of simulation, optimization, and ML. Whether quantum computers can provide a meaningful speedup to ML is an active topic of research.
Quantum computing can perform efficient near real-time simulations in critical areas such as pricing and risk management. Optimization models are key activities in financial institutions, aimed at determining the best investment strategy for a portfolio of assets, allocating capital, or achieving productivity improvements. Some of these optimization problems are nearly impossible for traditional computers to tackle, so approximations are used to solve the problems in a reasonable amount of time. Quantum computers could perform faster and more accurate optimizations without using any approximations.
Despite the long-term horizon, the potentially disruptive nature of this technology means that financial institutions are looking to get an early foothold in this technology by building in-house quantum research teams, expanding their existing ML COEs to include quantum computing, or engaging with partners such as Deloitte.
At this early stage, customers seek access to a choice of different quantum hardware and simulation capabilities in order to run experiments and build expertise. Braket is a fully managed quantum computing service that lets you explore quantum computing. It provides access to quantum hardware from IonQ, OQC, Quera, Rigetti, IQM, a variety of local and on-demand simulators including GPU-enabled simulations, and infrastructure for running hybrid quantum-classical algorithms such as quantum ML. Braket is fully integrated with AWS services such as Amazon Simple Storage Service (Amazon S3) for data storage and AWS Identity and Access Management (IAM) for identity management, and customers only pay for what you use.
In this post, we demonstrate how to implement a quantum neural network-based fraud detection solution using Braket and AWS native services. Although quantum computers can’t be used in production today, our solution provides a workflow that will seamlessly adapt and function as a plug-and-play system in the future, when commercially viable quantum devices become available.
Solution overview
The goal of this post is to explore the potential of quantum ML and present a conceptual workflow that could serve as a plug-and-play system when the technology matures. Quantum ML is still in its early stages, and this post aims to showcase the art of the possible without delving into specific security considerations. As quantum ML technology advances and becomes ready for production deployments, robust security measures will be essential. However, for now, the focus is on outlining a high-level conceptual architecture that can seamlessly adapt and function in the future when the technology is ready.
The following diagram shows the solution architecture for the implementation of a neural network-based fraud detection solution using AWS services. The solution is implemented using a hybrid quantum neural network. The neural network is built using the Keras library; the quantum component is implemented using PennyLane.

The workflow includes the following key components for inference (A–F) and training (G–I):

Ingestion – Real-time financial transactions are ingested through Amazon Kinesis Data Streams
Preprocessing – AWS Glue streaming extract, transform, and load (ETL) jobs consume the stream to do preprocessing and light transforms
Storage – Amazon S3 is used to store output artifacts
Endpoint deployment – We use an Amazon SageMaker endpoint to deploy the models
Analysis – Transactions along with the model inferences are stored in Amazon Redshift
Data visualization – Amazon QuickSight is used to visualize the results of fraud detection
Training data – Amazon S3 is used to store the training data
Modeling – A Braket environment produces a model for inference
Governance – Amazon CloudWatch, IAM, and AWS CloudTrail are used for observability, governance, and auditability, respectively

For training the model, we used open source data available on Kaggle. The dataset contains transactions made by credit cards in September 2013 by European cardholders. This dataset records transactions that occurred over a span of 2 days, during which there were 492 instances of fraud detected out of a total of 284,807 transactions. The dataset exhibits a significant class imbalance, with fraudulent transactions accounting for just 0.172% of the entire dataset. Because the data is highly imbalanced, various measures have been taken during data preparation and model development.
The dataset exclusively comprises numerical input variables, which have undergone a Principal Component Analysis (PCA) transformation because of confidentiality reasons.
The data only includes numerical input features (PCA-transformed due to confidentiality) and three key fields:

Time – Time between each transaction and first transaction
Amount – Transaction amount
Class – Target variable, 1 for fraud or 0 for non-fraud

Data preparation
We split the data into training, validation, and test sets, and we define the target and the features sets, where Class is the target variable:

y_train = df_train[‘Class’]
x_train = df_ train.drop([‘Class’], axis=1)
y_validation = df_ validation [‘Class’]
x_ validation = df_ validation.drop([‘Class’], axis=1)
y_test = df_test[‘Class’]
x_test = df_test.drop([‘Class’], axis=1)

The Class field assumes values 0 and 1. To make the neural network deal with data imbalance, we perform a label encoding on the y sets:

lbl_clf = LabelEncoder()
y_train = lbl_clf.fit_transform(y_train)
y_train = tf.keras.utils.to_categorical(y_train)

The encoding applies to all the values the mapping: 0 to [1,0], and 1 to [0,1].
Finally, we apply scaling that standardizes the features by removing the mean and scaling to unit variance:

std_clf = StandardScaler()
x_train = std_clf.fit_transform(x_train)
x_validation = std_clf.fit_transform(x_validation)
x_test = std_clf.transform(x_test)

The functions LabelEncoder and StandardScaler are available in the scikit-learn Python library.
After all the transformations are applied, the dataset is ready to be the input of the neural network.
Neural network architecture
We composed the neural network architecture with the following layers based on several tests empirically:

A first dense layer with 32 nodes
A second dense layer with 9 nodes
A quantum layer as neural network output
Dropout layers with rate equals to 0.3

We apply an L2 regularization on the first layer and both L1 and L2 regularization on the second one, to avoid overfitting. We initialize all the kernels using the he_normal function. The dropout layers are meant to reduce overfitting as well.

hidden = Dense(32, activation =”relu”, kernel_initializer=’he_normal’, kernel_regularizer=tf.keras.regularizers.l2(0,01))
out_2 = Dense(9, activation =”relu”, kernel_initializer=’he_normal’, kernel_regularizer=tf.keras.regularizers.l1_l2(l1=0,001, l2=0,001))
do = Dropout(0,3)

Quantum circuit
The first step to obtain the layer is to build the quantum circuit (or the quantum node). To accomplish this task, we used the Python library PennyLane.
PennyLane is an open source library that seamlessly integrates quantum computing with ML. It allows you to create and train quantum-classical hybrid models, where quantum circuits act as layers within classical neural networks. By harnessing the power of quantum mechanics and merging it with classical ML frameworks like PyTorch, TensorFlow, and Keras, PennyLane empowers you to explore the exciting frontier of quantum ML. You can unlock new realms of possibility and push the boundaries of what’s achievable with this cutting-edge technology.
The design of the circuit is the most important part of the overall solution. The predictive power of the model depends entirely on how the circuit is built.
Qubits, the fundamental units of information in quantum computing, are entities that behave quite differently from classical bits. Unlike classical bits that can only represent 0 or 1, qubits can exist in a superposition of both states simultaneously, enabling quantum parallelism and faster calculations for certain problems.
We decide to use only three qubits, a small number but sufficient for our case.
We instantiate the qubits as follows:

num_wires = 3
dev = qml.device(‘default.qubit’, wires=num_wires)

‘default.qubit’ is the PennyLane qubits simulator. To access qubits on a real quantum computer, you can replace the second line with the following code:

device_arn = “arn:aws:braket:eu-west-2::device/qpu/ionq/Aria-1″
dev = qml.device(‘’,device_arn=device_arn, wires=num_wires)

device_ARN could be the ARN of the devices supported by Braket (for a list of supported devices, refer to Amazon Braket supported devices).
We defined the quantum node as follows:

@qml.qnode(dev, interface=”tf”, diff_method=”backprop”)
def quantum_nn(inputs, weights):
qml.RY(inputs[0], wires=0)
qml.RY(inputs[1], wires=1)
qml.RY(inputs[2], wires=2)
qml.Rot(weights[0] * inputs[3], weights[1] * inputs[4], weights[2] * inputs[5], wires=1)
qml.Rot(weights[3] * inputs[6], weights[4] * inputs[7], weights[5] * inputs[8], wires=2)
qml.CNOT(wires=[1, 2])
qml.RY(weights[6], wires=2)
qml.CNOT(wires=[0, 2])
qml.CNOT(wires=[1, 2])
return [qml.expval(qml.PauliZ(0)), qml.expval(qml.PauliZ(2))]

The inputs are the values yielded as output from the previous layer of the neural network, and the weights are the actual weights of the quantum circuit.
RY and Rot are rotation functions performed on qubits; CNOT is a controlled bitflip gate allowing us to embed the qubits.
qml.expval(qml.PauliZ(0)), qml.expval(qml.PauliZ(2)) are the measurements applied respectively to the qubits 0 and the qubits 1, and these values will be the neural network output.
Diagrammatically, the circuit can be displayed as:

0: ──RY(1.00)──────────────────────────────────────╭●────┤  <Z>

1: ──RY(2.00)──Rot(4.00,10.00,18.00)──╭●───────────│──╭●─┤

2: ──RY(3.00)──Rot(28.00,40.00,54.00)─╰X──RY(7.00)─╰X─╰X─┤  <Z>

The transformations applied to qubit 0 are fewer than the transformations applied to qbit 2. This choice is because we want to separate the states of the qubits in order to obtain different values when the measures are performed. Applying different transformations to qubits allows them to enter distinct states, resulting in varied outcomes when measurements are performed. This phenomenon stems from the principles of superposition and entanglement inherent in quantum mechanics.
After we define the quantum circuit, we define the quantum hybrid neural network:

def hybrid_model(num_layers, num_wires):
    weight_shapes = {“weights”: (7,)}
    qlayer = qml.qnn.KerasLayer(quantum_nn, weight_shapes, output_dim=2)
    hybrid_model = tf.keras.Sequential([hidden,do, out_2,do,qlayer])
    return hybrid_model

KerasLayer is the PennyLane function that turns the quantum circuit into a Keras layer.
Model training
After we have preprocessed the data and defined the model, it’s time to train the network.
A preliminary step is needed in order to deal with the unbalanced dataset. We define a weight for each class according to the inverse root rule:

class_counts = np.bincount(y_train_list)
class_frequencies = class_counts / float(len(y_train))
class_weights = 1 / np.sqrt(class_frequencies)

The weights are given by the inverse of the root of occurrences for each of the two possible target values.
We compile the model next:

model.compile(optimizer=’adam’, loss = ‘MSE’, metrics = [custom_metric])

custom_metric is a modified version of the metric precision, which is a custom subroutine to postprocess the quantum data into a form compatible with the optimizer.
For evaluating model performance on imbalanced data, precision is a more reliable metric than accuracy, so we optimize for precision. Also, in fraud detection, incorrectly predicting a fraudulent transaction as valid (false negative) can have serious financial consequences and risks. Precision evaluates the proportion of fraud alerts that are true positives, minimizing costly false negatives.
Finally, we fit the model:

history =, y_train, epochs = 30, batch_size = 200, validation_data=(x_validation, y_ validation),class_weight=class_weights,shuffle=True)

At each epoch, the weights of both the classic and quantum layer are updated in order to reach higher accuracy. At the end of the training, the network showed a loss of 0.0353 on the training set and 0.0119 on the validation set. When the fit is complete, the trained model is saved in .h5 format.
Model results and analysis
Evaluating the model is vital to gauge its capabilities and limitations, providing insights into the predictive quality and value derived from the quantum techniques.
To test the model, we make predictions on the test set:

preds = model.predict(x_test)

Because the neural network is a regression model, it yields for each record of x_test a 2-D array, where each component can assume values between 0 and 1. Because we’re essentially dealing with a binary classification problem, the outputs should be as follows:

[1,0] – No fraud
[0,1] – Fraud

To convert the continuous values into binary classification, a threshold is necessary. Predictions that are equal to or above the threshold are assigned 1, and those below the threshold are assigned 0.
To align with our goal of optimizing precision, we chose the threshold value that results in the highest precision.
The following table summarizes the mapping between various threshold values and the precision.

Threshold = 0.65
Threshold = 0.70
Threshold = 0.75

No Fraud


The model demonstrates almost flawless performance on the predominant non-fraud class, with precision and recall scores close to a perfect 1. Despite far less data, the model achieves precision of 0.87 for detecting the minority fraud class at a 0.65 threshold, underscoring performance even on sparse data. To efficiently identify fraud while minimizing incorrect fraud reports, we decide to prioritize precision over recall.
We also wanted to compare this model with a classic neural network only model to see if we are exploiting the gains coming from the quantum application. We built and trained an identical model in which the quantum layer is replaced by the following:

Dense(2,activation = “softmax”)

In the last epoch, the loss was 0.0119 and the validation loss was 0.0051.
The following table summarizes the mapping between various threshold values and the precision for the classic neural network model.

Threshold = 0.70
Threshold = 0.75

No Fraud

0. 86

Like the quantum hybrid model, the model performance is almost perfect for the majority class and very good for the minority class.
The hybrid neural network has 1,296 parameters, whereas the classic one has 1,329. When comparing precision values, we can observe how the quantum solution provides better results. The hybrid model, inheriting the properties of high-dimensional spaces exploration and a non-linearity from the quantum layer, is able to generalize the problem better using fewer parameters, resulting in better performance.
Challenges of a quantum solution
Although the adoption of quantum technology shows promise in providing organizations numerous benefits, practical implementation on large-scale, fault-tolerant quantum computers is a complex task and is an active area of research. Therefore, we should be mindful of the challenges that it poses:

Sensitivity to noise – Quantum computers are extremely sensitive to external factors (such as atmospheric temperature) and require more attention and maintenance than traditional computers, and this can drift over time. One way to minimize the effects of drift is by taking advantage of parametric compilation—the ability to compile a parametric circuit such as the one used here only one time, and feed it fresh parameters at runtime, avoiding repeated compilation steps. Braket automatically does this for you.
Dimensional complexity – The inherent nature of qubits, the fundamental units of quantum computing, introduces a higher level of intricacy compared to traditional binary bits employed in conventional computers. By harnessing the principles of superposition and entanglement, qubits possess an elevated degree of complexity in their design. This intricate architecture renders the evaluation of computational capacity a formidable challenge, because the multidimensional aspects of qubits demand a more nuanced approach to assessing their computational prowess.
Computational errors – Increased calculation errors are intrinsic to quantum computing’s probabilistic nature during the sampling phase. These errors could impact accuracy and reliability of the results obtained through quantum sampling. Techniques such as error mitigation and error suppression are actively being developed in order to minimize the effects of errors resulting from noisy qubits. To learn more about error mitigation, see Enabling state-of-the-art quantum algorithms with Qedma’s error mitigation and IonQ, using Braket Direct.

The results discussed in this post suggest that quantum computing holds substantial promise for fraud detection in the financial services industry. The hybrid quantum neural network demonstrated superior performance in accurately identifying fraudulent transactions, highlighting the potential gains offered by quantum technology. As quantum computing continues to advance, its role in revolutionizing fraud detection and other critical financial processes will become increasingly evident. You can extend the results of the simulation by using real qubits and testing various outcomes on real hardware available on Braket, such as those from IQM, IonQ, and Rigetti, all on demand, with pay-as-you-go pricing and no upfront commitments.
To prepare for the future of quantum computing, organizations must stay informed on the latest advancements in quantum technology. Adopting quantum-ready cloud solutions now is a strategic priority, allowing a smooth transition to quantum when hardware reaches commercial viability. This forward-thinking approach will provide both a technological edge and rapid adaptation to quantum computing’s transformative potential across industries. With an integrated cloud strategy, businesses can proactively get quantum-ready, primed to capitalize on quantum capabilities at the right moment. To accelerate your learning journey and earn a digital badge in quantum computing fundamentals, see Introducing the Amazon Braket Learning Plan and Digital Badge.
Connect with Deloitte to pilot this solution for your enterprise on AWS.

About the authors
Federica Marini is a Manager in Deloitte Italy AI & Data practice with a strong experience as a business advisor and technical expert in the field of AI, Gen AI, ML and Data. She addresses research and customer business needs with tailored data-driven solutions providing meaningful results. She is passionate about innovation and believes digital disruption will require a human centered approach to achieve full potential.
Matteo Capozi is a Data and AI expert in Deloitte Italy, specializing in the design and implementation of advanced AI and GenAI models and quantum computing solutions. With a strong background on cutting-edge technologies, Matteo excels in helping organizations harness the power of AI to drive innovation and solve complex problems. His expertise spans across industries, where he collaborates closely with executive stakeholders to achieve strategic goals and performance improvements.
Kasi Muthu is a senior partner solutions architect focusing on generative AI and data at AWS based out of Dallas, TX. He is passionate about helping partners and customers accelerate their cloud journey. He is a trusted advisor in this field and has plenty of experience architecting and building scalable, resilient, and performant workloads in the cloud. Outside of work, he enjoys spending time with his family.
Kuldeep Singh is a Principal Global AI/ML leader at AWS with over 20 years in tech. He skillfully combines his sales and entrepreneurship expertise with a deep understanding of AI, ML, and cybersecurity. He excels in forging strategic global partnerships, driving transformative solutions and strategies across various industries with a focus on generative AI and GSIs.

Amazon SageMaker unveils the Cohere Command R fine-tuning model

AWS announced the availability of the Cohere Command R fine-tuning model on Amazon SageMaker. This latest addition to the SageMaker suite of machine learning (ML) capabilities empowers enterprises to harness the power of large language models (LLMs) and unlock their full potential for a wide range of applications.
Cohere Command R is a scalable, frontier LLM designed to handle enterprise-grade workloads with ease. Cohere Command R is optimized for conversational interaction and long context tasks. It targets the scalable category of models that balance high performance with strong accuracy, enabling companies to move beyond proof of concept and into production. The model boasts high precision on Retrieval Augmented Generation (RAG) and tool use tasks, low latency and high throughput, a long 128,000-token context length, and strong capabilities across 10 key languages.
In this post, we explore the reasons for fine-tuning a model and the process of how to accomplish it with Cohere Command R.
Fine-tuning: Tailoring LLMs for specific use cases
Fine-tuning is an effective technique to adapt LLMs like Cohere Command R to specific domains and tasks, leading to significant performance improvements over the base model. Evaluations of fine-tuned Cohere Command R model have demonstrated improved performance by over 20% across various enterprise use cases in industries such as financial services, technology, retail, healthcare, legal, and healthcare. Because of its smaller size, a fine-tuned Cohere Command R model can be served more efficiently compared to models much larger than its class.
The recommendation is to use a dataset that contains at least 100 examples.
Cohere Command R uses a RAG approach, retrieving relevant context from an external knowledge base to improve outputs. However, fine-tuning allows you to specialize the model even further. Fine-tuning text generation models like Cohere Command R is crucial for achieving ultimate performance in several scenarios:

 Domain-specific adaptation – RAG models may not perform optimally in highly specialized domains like finance, law, or medicine. Fine-tuning allows you to adapt the model to these domains’ nuances for improved accuracy.
Data augmentation – Fine-tuning enables incorporating additional data sources or techniques, augmenting the model’s knowledge base for increased robustness, especially with sparse data.
Fine-grained control – Although RAG offers impressive general capabilities, fine-tuning permits fine-grained control over model behavior, tailoring it precisely to your desired task for ultimate precision.

The combined power of RAG and fine-tuned LLMs empowers you to tackle diverse challenges with unparalleled versatility and effectiveness. With the introduction of Cohere Command R fine-tuning on SageMaker, enterprises can now customize and optimize the model’s performance for their unique requirements. By fine-tuning on domain-specific data, businesses can enhance Cohere Command R’s accuracy, relevance, and effectiveness for their use cases, such as natural language processing, text generation, and question answering.
By combining the scalability and robustness of Cohere Command R with the ability to fine-tune its performance on SageMaker, AWS empowers enterprises to navigate the complexities of AI adoption and use its transformative power to drive innovation and growth across various industries and domains.
Customer data, including prompts, completions, custom models, and data used for fine-tuning or continued pre-training, remains private to customer AWS accounts and is never shared with third-party model providers.
Solution overview
In the following sections, we walk through the steps to fine-tune the Cohere Command R model on SageMaker. This includes preparing the data, deploying a model, preparing for fine-tuning, creating an endpoint for inference, and performing inference.
Prepare the fine-tuning data
Before you can start a fine-tuning job, you need to upload a dataset with training and (optionally) evaluation data.
First, make sure your data is in jsonl format. It should have the following structure:

 messages – This contains a list of messages of the conversation. A message consists of the following parts:
 role – This specifies the current speaker. You can pick from System, User, or Chatbot.
 content – This contains the content of the message.

The following is an example that trains a chatbot to answer questions. For the sake of readability, the document spans over multiple lines. For your dataset, make sure that each line contains one whole example.

  “messages”: [
      “role”: “System”,
      “content”: “You are a large language model trained by Cohere.”
      “role”: “User”,
      “content”: “Hi! What were Time magazines top 10 cover stories in the last 10 years?”
      “role”: “Chatbot”,
      “content”: “Time magazines top 10 cover stories in the last 10 years were:\n\n1. Volodymyr Zelenskyy\n2. Elon Musk\n3. Martin Luther King Jr.\n4. How Earth Survived\n5. Her Lasting Impact\n6. Nothing to See Here\n7. Meltdown\n8. Deal With It\n9. The Top of America\n10. Bitter Pill”
      “role”: “User”,
      “content”: “Who is Volodymyr Zelenskyy?”
      “role”: “Chatbot”,
      “content”: “Volodymyr Zelenskyy is a Ukrainian politician who was elected President of Ukraine in 2019. He previously served as the Minister of Internal Affairs in the government of Prime Minister Volodymyr Groysman.”
      “role”: “User”,
      “content”: “Thank you!”

Deploy a model
Complete the following steps to deploy the model:

On AWS Marketplace, subscribe to the Cohere Command R model

After you subscribe to the model, you can configure it and create a training job.

Choose View in Amazon SageMaker.
Follow the instructions in the UI to create a training job.

Alternatively, you can use the following example notebook to create the training job.
Prepare for fine-tuning
To fine-tune the model, you need the following:

Product ARN – This will be provided to you after you subscribe to the product.
Training dataset and evaluation dataset – Prepare your datasets for fine-tuning.
Amazon S3 location – Specify the Amazon Simple Storage Service (Amazon S3) location that stores the training and evaluation datasets.
Hyperparameters – Fine-tuning typically involves adjusting various hyperparameters like learning rate, batch size, number of epochs, and so on. You need to specify the appropriate hyperparameter ranges or values for your fine-tuning task.

Create an endpoint for inference
When the fine-tuning is complete, you can create an endpoint for inference with the fine-tuned model. To create the endpoint, use the create_endpoint method. If the endpoint already exists, you can connect to it using the connect_to_endpoint method.
Perform inference
You can now perform real-time inference using the endpoint. The following is the sample message that you use for input:

message = “Classify the following text as either very negative, negative, neutral, positive or very positive: mr. deeds is , as comedy goes , very silly — and in the best way.”
result =

The following screenshot shows the output of the fine-tuned model.
Optionally, you can also test the accuracy of the model using the evaluation data (sample_finetune_scienceQA_eval.jsonl).

Clean up
After you have completed running the notebook and experimenting with the Cohere Command R fine-tuned model, it is crucial to clean up the resources you have provisioned. Failing to do so may result in unnecessary charges accruing on your account. To prevent this, use the following code to delete the resources and stop the billing process:


Cohere Command R with fine-tuning allows you to customize your models to be performant for your business, domain, and industry. Alongside the fine-tuned model, users additionally benefit from Cohere Command R’s proficiency in the most commonly used business languages (10 languages) and RAG with citations for accurate and verified information. Cohere Command R with fine-tuning achieves high levels of performance with less resource usage on targeted use cases. Enterprises can see lower operational costs, improved latency, and increased throughput without extensive computational demands.
Start building with Cohere’s fine-tuning model in SageMaker today.

About the Authors
Shashi Raina is a Senior Partner Solutions Architect at Amazon Web Services (AWS), where he specializes in supporting generative AI (GenAI) startups. With close to 6 years of experience at AWS, Shashi has developed deep expertise across a range of domains, including DevOps, analytics, and generative AI.
James Yi is a Senior AI/ML Partner Solutions Architect in the Emerging Technologies team at Amazon Web Services. He is passionate about working with enterprise customers and partners to design, deploy and scale AI/ML applications to derive their business values. Outside of work, he enjoys playing soccer, traveling and spending time with his family.
Pradeep Prabhakaran is a Customer Solutions Architect at Cohere. In his current role at Cohere, Pradeep acts as a trusted technical advisor to customers and partners, providing guidance and strategies to help them realize the full potential of Cohere’s cutting-edge Generative AI platform. Prior to joining Cohere, Pradeep was a Principal Customer Solutions Manager at Amazon Web Services, where he led Enterprise Cloud transformation programs for large enterprises. Prior to AWS, Pradeep has held various leadership positions at consulting companies such as Slalom, Deloitte, and Wipro. Pradeep holds a Bachelor’s degree in Engineering and is based in Dallas, TX.

Derive meaningful and actionable operational insights from AWS Using A …

As a customer, you rely on Amazon Web Services (AWS) expertise to be available and understand your specific environment and operations. Today, you might implement manual processes to summarize lessons learned, obtain recommendations, or expedite the resolution of an incident. This can be time consuming, inconsistent, and not readily accessible.
This post shows how to use AWS generative artificial intelligence (AI) services, like Amazon Q Business, with AWS Support cases, AWS Trusted Advisor, and AWS Health data to derive actionable insights based on common patterns, issues, and resolutions while using the AWS recommendations and best practices enabled by support data. This post will also demonstrate how you can integrate these insights with your IT service management (ITSM) system (such as ServiceNow, Jira, and Zendesk), to allow you to implement recommendations and keep your AWS operations healthy.
Amazon Q Business is a fully managed, secure, generative-AI powered enterprise chat assistant that enables natural language interactions with your organization’s data. Ingesting data for support cases, Trusted Advisor checks, and AWS Health notifications into Amazon Q Business enables interactions through natural language conversations, sentiment analysis, and root cause analysis without needing to fully understand the underlying data models or schemas. The AI assistant provides answers along with links that point directly to the data sources. This allows you to easily identify and reference the underlying information sources that informed the AI’s response, providing more context and enabling further exploration of the topic if needed. Amazon Q Business integrates with ITSM solutions, allowing recommendations to be tracked and actioned within your existing workflows.
AWS Support offers a range of capabilities powered by technology and subject matter experts that support the success and operational health of your AWS environments. AWS Support provides you with proactive planning and communications, advisory, automation, and cloud expertise to help you achieve business outcomes with increased speed and scale in the cloud. These capabilities enable proactive planning for upcoming changes, expedited recovery from operational disruptions, and recommendations to optimize the performance and reliability of your AWS IT infrastructure.
This solution will demonstrate how to deploy Amazon Q Business and ingest data from AWS Support cases, AWS Trusted Advisor, and AWS Health using the provided code sample to generate insights based on your support data.
Overview of solution
Today, Amazon Q Business provides 43 connectors available to natively integrate with multiple data sources. In this post, we’re using the APIs for AWS Support, AWS Trusted Advisor, and AWS Health to programmatically access the support datasets and use the Amazon Q Business native Amazon Simple Storage Service (Amazon S3) connector to index support data and provide a prebuilt chatbot web experience. The AWS Support, AWS Trusted Advisor, and AWS Health APIs are available for customers with Enterprise Support, Enterprise On-Ramp, or Business support plans.
Q Support Insights (QSI) is the name of the solution provided in the code sample repository. QSI enables insights on your AWS Support datasets across your AWS accounts. The following diagram describes at a high level the QSI solution and components.

Figure 1: Overview of the QSI solution

There are two major components in the QSI solution. First, as illustrated in the Linked Accounts group in Figure 1, this solution supports datasets from linked accounts and aggregates your data using the various APIs, AWS Lambda, and Amazon EventBridge. Second, the support datasets from linked accounts are stored in a central S3 bucket that you own, as shown in the Data Collection Account group in the Figure 1. These datasets are then indexed using the Amazon Q Business S3 connector.
Under the hood, the Amazon Q Business S3 connector creates a searchable index of your AWS Support datasets, and gathers relevant important details related to keywords like case titles, descriptions, best practices, keywords, dates, and so on. The generative AI capabilities of Amazon Q Business enable it to synthesize insights and generate natural language responses available for users in the Amazon Q Business web chat experience. Amazon Q Business also supports plugins and actions so users can directly create tickets in the ITSM system without leaving the chat experience.
By default, Amazon Q Business will only produce responses using the data you’re indexing. This behavior is aligned with the use cases related to our solution. If needed, this response setting can be changed to allow Amazon Q to fallback to large language model (LLM) knowledge.
The high-level steps to deploy the solution are the following:

Create the necessary buckets to contain the support cases exports and deployment resources.
Upload the support datasets (AWS Support cases, AWS Trusted Advisor, and AWS Health) to the S3 data source bucket.
Create the Amazon Q Business application, the data source, and required components using deployment scripts.
Optionally, configure ITSM integration by using one of the available Amazon Q Business built-in plugins.
Synchronize the data source to index the data.
Test the solution through chat.

The full guidance and deployment options are available in the aws-samples Github repository. The solution can be deployed in a single account or in an AWS Organizations. In addition to the data security and protection Amazon Q Business supports, this solution integrates with your identity provider and respects access control lists (ACLs) so users get answers based on their unique permissions. This solution also provides additional controls to include or exclude specific accounts.
For this solution to work, the following prerequisites are needed:

An AWS Support plan such as Business, Enterprise On-Ramp, or Enterprise Support to access the AWS Support API.
AWS IAM Identity Center as the SAML 2.0-compliant identity provider (IdP) configured in the same AWS Region as your Amazon Q Business application. Please ensure that you have enabled an IAM Identity Center instance, provisioned at least one user, and provided each user with a valid email address. For more details, see Configure user access with the default IAM Identity Center directory.
A new or existing AWS account that will be the data collection account.
Corresponding AWS Identity and Access Management (IAM) permissions to create S3 buckets and deploy AWS CloudFormation stacks.
An S3 bucket to store the AWS Support data. You can export the support dataset to an S3 bucket following the steps provided in the GitHub repository. This bucket should be in the same Region as your Amazon Q Business index. At the time of writing this post, Amazon Q Business supports the us-west-2 or us-east-1 Region. See Creating a bucket.
An S3 bucket to store the resources for deployment.

Create the Amazon Q Business application using the deployment scripts
Using the Amazon Q Business application creation module, you can set up and configure an Amazon Q Business application, along with its crucial components, in an automated manner. These components include an Amazon S3 data source connector, required IAM roles, and Amazon Q Business web experience.
Deploy the Amazon Q Business application
As stated in the preceding prerequisites section, IAM Identity Center must be configured in the same Region (us-east-1 or us-west-2) as your Amazon Q Business application.
To deploy and use the Amazon Q Business application, follow the steps described in the Amazon Q Business application creation module. The steps can be summarized as:

Launch an AWS CloudShell in either the us-east-1 or us-west-2 Region in your data collection central account and clone the repository from GitHub.
Navigate to the repository directory and run the deployment script, providing the required inputs when prompted. As stated in the prerequisites, an S3 bucket name is required in the data collection central account.
After deployment, synchronize the data source, assign access to users and groups, and use the deployed web experience URL to interact with the Amazon Q Business application.

[Optional] Integrate your ITSM system
To integrate with your ITSM system, follow these steps:

Within the Amazon Q Business application page, choose Plugins in the navigation pane and choose Add plugin.
From the list of available plugins, select the one that matches your system. For example, Jira, ServiceNow, or Zendesk.
Enter the details on the next screen (see Figure 2) for Amazon Q Business application to make the connection. This integration will result in directly logging tickets from Amazon Q Business to your IT teams based on data within the Amazon Q Business application.

Figure 2 The Amazon Q Business plug-in creation page

Support Collector
You can use the Support Collector module to set up and configure AWS EventBridge to collect support-related data. This data includes information from AWS Support cases, AWS Trusted Advisor, and AWS Health. The collected data is then uploaded to a designated S3 bucket in the data collection account. The solution will retrieve up to 6 months of data by default, though you can change the timeframe to a maximum of 12 months.
Additionally, the Support Collector can synchronize with the latest updates on a daily basis, ensuring that your support data is always up to date. The Support Collector is configured through an AWS Lambda function and EventBridge, offering flexibility in terms of the data sources (AWS Support cases, AWS Trusted Advisor, and AWS Health) you want to include or exclude. You can choose data from one, two, or all three of these sources by configuring the appropriate scheduler.
Deploy the Support Collector
To deploy and use the Support Collector, follow the steps described in the Support Collector module.
The repository contains scripts and resources to automate the deployment of Lambda functions in designated member accounts. The deployed Lambda functions collect and upload AWS Support data (Support Cases, Health Events, and Trusted Advisor Checks) to an S3 bucket in the data collection central account. The collected data can be analyzed using Amazon Q Business.
There are two deployment options:

AWS Organizations (StackSet): Use this option if you have AWS Organizations set up and want to deploy in accounts under organizational units. It creates a CloudFormation StackSet in the central account to deploy resources (IAM roles, Lambda functions, and EventBridge) across member accounts.
Manual deployment of individual accounts (CloudFormation): Use this option if you don’t want to use AWS Organizations and want to target a few accounts. It creates a CloudFormation stack in a member account to deploy resources (IAM roles, Lambda functions, and EventBridge).

After deployment, an EventBridge scheduler periodically invokes the Lambda function to collect support data and store it in the data collection S3 bucket. Testing the Lambda function is possible with a custom payload. The deployment steps are fully automated using a shell script. The Q Support Insights (QSI) – AWS Support Collection Deployment guide, located in the src/support_collector subdirectory, outlines the steps to deploy the resources.
Amazon Q Business web experience
You can ask support-related questions using the Amazon Q Business web experience after you have the relevant support data collected in the S3 bucket and successfully indexed. For steps to configure and collect the data, see the preceding Support Collector section. Using the web experience, you can then ask questions as shown in the following demonstration.

Figure 3 Using Amazon Q Business web experience to get performance recommendations

Sample prompts
Try some of the following sample prompts:

I am having trouble with EKS add-on installation failures. It is giving ConfigurationConflict errors. Based on past support cases, please provide a resolution.
List AWS Account IDs with insufficient IPs
List health events with increased error rates
List services being deprecated this year
My Lambda function is running slow. How can I speed it up?

Clean up
After you’re done testing the solution, you can delete the resources to avoid incurring additional charges. See the Amazon Q Business pricing page for more information. Follow the instructions in the GitHub repository to delete the resources and corresponding CloudFormation templates.
In this post, you deployed a solution that indexes data from your AWS Support datasets stored in Amazon S3 and other AWS data sources like AWS Trusted Advisor and AWS Health. This demonstrates how to use new generative AI services like Amazon Q Business to find patterns across your most frequent issues, author new content such as internal documentation or an FAQ. Using support data presents a valuable opportunity to proactively address and prevent recurring issues in your AWS environment by using insights gained from past experiences. Embracing these insights enables a more resilient and optimized AWS experience tailored to your specific needs.
This solution can be expanded to use other internal data sources your company might use and use natural language to understand optimization opportunities that your teams can implement.

About the authors
Chitresh Saxena is a Sr. Technical Account Manager specializing in generative AI solutions and dedicated to helping customers successfully adopt AI/ML on AWS. He excels at understanding customer needs and provides technical guidance to build, launch, and scale AI solutions that solve complex business problems.
Jonathan Delfour is a Principal Technical Account Manager supporting Energy customers, providing top-notch support as part of the AWS Enterprise Support team. His technical guidance and unwavering commitment to excellence ensure that customers can leverage the full potential of AWS, optimizing their operations and driving success.
Krishna Atluru is an Enterprise Support Lead at AWS. He provides customers with in-depth guidance on improving security posture and operational excellence for their workloads. Outside of work, Krishna enjoys cooking, swimming and travelling.
Arish Labroo is a Principal Specialist Technical Account Manager – Builder supporting large AWS customers. He is focused on building strategic tools that help customers get the most value out of Enterprise Support.
Manik Chopra is a Principal Technical Account Manager at AWS. He helps customers adopt AWS services and provides guidance in various areas around Data Analytics and Optimization. His areas of expertise include delivering solutions using Amazon QuickSight, Amazon Athena, and various other automation techniques. Outside of work, he enjoys spending time outdoors and traveling.

Hugging Face Introduces SmolLM: Transforming On-Device AI with High-Pe …

Hugging Face has recently released SmolLM, a family of state-of-the-art small models designed to provide powerful performance in a compact form. The SmolLM models are available in three sizes: 135M, 360M, and 1.7B parameters, making them suitable for various applications while maintaining efficiency and performance. 

SmolLM is a new series of small language models developed by Hugging Face, aimed at delivering high performance with lower computational costs and improved user privacy. These models are trained on a meticulously curated high-quality dataset, SmolLM-Corpus, which includes diverse educational and synthetic data sources. The three models in the SmolLM family, 135M, 360M, and 1.7B parameters, are designed to cater to different levels of computational resources while maintaining state-of-the-art performance.

The SmolLM models are built on the SmolLM-Corpus, a dataset comprising various high-quality sources such as Cosmopedia v2, Python-Edu, and FineWeb-Edu. Cosmopedia v2, for instance, is an enhanced version of a synthetic dataset generated by Mixtral, consisting of over 30 million textbooks, blog posts, and stories. This dataset ensures a broad coverage of topics and prompts, improving the diversity and quality of the training data.

Image Source

For the 1.7B parameter model, Hugging Face used 1 trillion tokens from the SmolLM-Corpus, while the 135M and 360M parameter models were trained on 600 billion tokens. The training process employed a trapezoidal learning rate scheduler with a cooldown phase, ensuring efficient and effective model training. The smaller models incorporated Grouped-Query Attention (GQA) and prioritized depth over width in their architecture, while the larger 1.7B parameter model utilized a more traditional design.

SmolLM models were evaluated across benchmarks, testing common sense reasoning and world knowledge. The models demonstrated impressive performance, outperforming others in their respective size categories. For instance, despite being trained on fewer tokens, the SmolLM-135M model surpassed MobileLM-125M, the current best model with less than 200M parameters. Similarly, the SmolLM-360M and SmolLM-1.7B models outperformed all other models with less than 500M and 2B parameters, respectively.

Image Source

The models were also instruction-tuned using publicly available permissive instruction datasets, enhancing their performance on benchmarks like IFEval. The tuning involved training the models for one epoch on a subset of the WebInstructSub dataset, combined with StarCoder2-Self-OSS-Instruct, and performing Direct Preference Optimization (DPO) for another epoch. This process ensured that the models balanced between size and performance.

One of the significant advantages of the SmolLM models is their ability to run efficiently on various hardware configurations, including smartphones and laptops. This makes them suitable for deployment in multiple applications, from personal devices to more substantial computational setups. Hugging Face has also released WebGPU demos for the SmolLM-135M and SmolLM-360M models, showcasing their capabilities and ease of use.

In conclusion, Hugging Face has successfully demonstrated that high-performance models can be achieved with efficient training on high-quality datasets, providing a robust balance between model size and performance. The SmolLM models are set to revolutionize the landscape of small language models, offering powerful and efficient solutions for various applications.

Check out the Models and Details. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. 

Join our Telegram Channel and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 46k+ ML SubReddit

The post Hugging Face Introduces SmolLM: Transforming On-Device AI with High-Performance Small Language Models from 135M to 1.7B Parameters appeared first on MarkTechPost.

Revolutionizing Cellular Analysis: Deep Visual Proteomics Integrates A …

Deep Visual Proteomics: Integrating AI and Mass Spectrometry for Cellular Phenotyping:

Deep Visual Proteomics (DVP) revolutionizes the analysis of cellular phenotypes by combining advanced microscopy, AI, and ultra-sensitive mass spectrometry (MS). Traditional methods often target a limited subset of proteins, but DVP extends this capability by enabling comprehensive proteomic analysis within the native spatial context of cells. This approach involves high-resolution imaging for single-cell phenotyping, AI-driven cell segmentation, and automated laser microdissection to isolate cellular or subcellular regions of interest precisely. These isolated samples are subjected to ultra-high sensitivity mass spectrometry for detailed proteomic profiling.

Developed using the ‘BIAS’ (Biology Image Analysis Software), DVP facilitates seamless integration of imaging and proteomic technologies. It enables the identification of distinct cell types and states based on AI-defined features, enhancing the accuracy and efficiency of cellular phenotyping. Applications of DVP span from studying single-cell heterogeneity to characterizing proteomic differences in disease tissues like melanoma and salivary gland carcinoma. By preserving spatial information alongside molecular insights, DVP offers a powerful tool for advancing research and clinical diagnostics in cell and disease biology.

Image Processing and Single Cell Isolation Workflow in Deep Visual Proteomics:

The image processing and single-cell isolation workflow in DVP integrates cutting-edge microscopy technologies with advanced AI-driven image analysis and automated laser microdissection. Beginning with high-resolution scanning microscopy, the process involves capturing whole-slide images that are processed using the BIAS. BIAS supports various microscopy formats and utilizes deep learning algorithms to segment cellular components like nuclei and cytoplasm precisely. This includes innovative techniques like image style transfer to optimize deep learning model training for specific biological contexts. BIAS facilitates seamless interaction with laser microdissection systems such as ZEISS PALM MicroBeam and Leica LMD6 & 7, ensuring accurate transfer and automated targeted cell extraction. This integrated workflow enables rapid and precise single-cell isolation, which is crucial for in-depth proteomic analysis of cellular and tissue samples in DVP applications.

Characterizing Single Cell Heterogeneity with Deep Visual Proteomics:

DVP enables the characterization of functional differences among phenotypically distinct cells at the subcellular level. Applying this workflow to an unperturbed cancer cell line, researchers used deep learning-based segmentation to isolate and analyze individual cells and nuclei. This approach addressed the challenges of processing minute samples, allowing direct analysis from 384 wells using advanced mass spectrometry. The proteomic profiles of whole cells and isolated nuclei were distinct, with high reproducibility. Machine learning identified six classes of nuclei with significant morphological and proteomic differences. This demonstrated that visible cellular phenotypes correspond to distinct proteome profiles, offering insights into cell cycle regulation and potential cancer prognostic markers.

DVP Uncovers Cancer Tissue Heterogeneity:

DVP offers high-resolution, unbiased proteomic profiling of distinct cell classes within their spatial environments. Applied to archived salivary gland acinic cell carcinoma tissue, DVP revealed significant proteomic differences between normal and cancerous cells. Normal acinar cells showed high expression of secretory proteins, while cancer cells exhibited elevated interferon-response proteins and the proto-oncogene SRC. Extending this to melanoma, DVP differentiated central tumor cells from those at the tumor-stroma border, identifying distinct proteomic signatures linked to disease progression and prognosis. These findings underscore DVP’s potential for precise molecular disease subtyping, guiding clinical decision-making.

Image source

Outlook for DVP:

The DVP pipeline integrates high-resolution microscopy with advanced image recognition, automated laser microdissection, and ultra-sensitive MS-based proteomics. This robust system applies to diverse biological systems that can be microscopically imaged, from cell cultures to pathology samples. DVP allows the rapid scanning of slides to isolate rare cell states and study the extracellular matrix’s proteomic composition. With the potential for super-resolution microscopy, DVP can achieve precise cell state classification. By combining powerful imaging technologies with unbiased proteomics, DVP offers significant applications in basic biology and biomedicine, particularly in oncology, where it enhances digital pathology by providing a comprehensive proteomic context.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. 

Join our Telegram Channel and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 46k+ ML SubReddit

The post Revolutionizing Cellular Analysis: Deep Visual Proteomics Integrates AI and Mass Spectrometry for Advanced Phenotyping appeared first on MarkTechPost.

COCOM: An Effective Context Compression Method that Revolutionizes Con …

One of the central challenges in Retrieval-Augmented Generation (RAG) models is efficiently managing long contextual inputs. While RAG models enhance large language models (LLMs) by incorporating external information, this extension significantly increases input length, leading to longer decoding times. This issue is critical as it directly impacts user experience by prolonging response times, particularly in real-time applications such as complex question-answering systems and large-scale information retrieval tasks. Addressing this challenge is crucial for advancing AI research, as it makes LLMs more practical and efficient for real-world applications.

Current methods to address this challenge primarily involve context compression techniques, which can be divided into lexical-based and embedding-based approaches. Lexical-based methods filter out unimportant tokens or terms to reduce input size but often miss nuanced contextual information. Embedding-based methods transform the context into fewer embedding tokens, yet they suffer from limitations such as large model sizes, low effectiveness due to untuned decoder components, fixed compression rates, and inefficiencies in handling multiple context documents. These limitations restrict their performance and applicability, particularly in real-time processing scenarios.

A team of researchers from the University of Amsterdam, The University of Queensland, and  Naver Labs Europe introduce COCOM (COntext COmpression Model), a novel and effective context compression method that overcomes the limitations of existing techniques. COCOM compresses long contexts into a small number of context embeddings, significantly speeding up the generation time while maintaining high performance. This method offers various compression rates, enabling a balance between decoding time and answer quality. The innovation lies in its ability to efficiently handle multiple contexts, unlike previous methods that struggled with multi-document contexts. By using a single model for both context compression and answer generation, COCOM demonstrates substantial improvements in speed and performance, providing a more efficient and accurate solution compared to existing methods.

COCOM involves compressing contexts into a set of context embeddings, significantly reducing the input size for the LLM. The approach includes pre-training tasks such as auto-encoding and language modeling from context embeddings. The method uses the same model for both compression and answer generation, ensuring effective utilization of the compressed context embeddings by the LLM. The dataset used for training includes various QA datasets like Natural Questions, MS MARCO, HotpotQA, WikiQA, and others. Evaluation metrics focus on Exact Match (EM) and Match (M) scores to assess the effectiveness of the generated answers. Key technical aspects include parameter-efficient LoRA tuning and the use of SPLADE-v3 for retrieval.

COCOM achieves significant improvements in decoding efficiency and performance metrics. It demonstrates a speed-up of up to 5.69 times in decoding time while maintaining high performance compared to existing context compression methods. For example, COCOM achieved an Exact Match (EM) score of 0.554 on the Natural Questions dataset with a compression rate of 4, and 0.859 on TriviaQA, significantly outperforming other methods like AutoCompressor, ICAE, and xRAG. These improvements highlight COCOM’s superior ability to handle longer contexts more effectively while maintaining high answer quality, showcasing the method’s efficiency and robustness across various datasets.

In conclusion, COCOM represents a significant advancement in context compression for RAG models by reducing decoding time and maintaining high performance. Its ability to handle multiple contexts and offer adaptable compression rates makes it a critical development for enhancing the scalability and efficiency of RAG systems. This innovation has the potential to greatly improve the practical application of LLMs in real-world scenarios, overcoming critical challenges and paving the way for more efficient and responsive AI applications.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. 

Join our Telegram Channel and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 46k+ ML SubReddit

The post COCOM: An Effective Context Compression Method that Revolutionizes Context Embeddings for Efficient Answer Generation in RAG appeared first on MarkTechPost.