This AI Paper Proposes a Novel Bayesian Deep Learning Model with Kerne …

Integrating artificial intelligence (AI) in healthcare transforms medical practices by improving diagnostics and treatment planning accuracy and efficiency. By leveraging advanced algorithms, AI supports a range of applications, from anomaly detection in medical imaging to predicting disease progression, enhancing the overall efficacy of medical interventions.

One of the primary hurdles in deploying AI within the medical sector is ensuring the accuracy and reliability of AI-driven predictions, particularly when data is scarce. Small datasets are common in healthcare due to privacy concerns and the specialized nature of medical data, which often restricts the information available for training AI systems. This scarcity challenges the AI’s ability to learn effectively and deliver reliable results, which is critical when these outcomes directly affect patient care.

Existing research in medical AI includes transformative models like TranSQ, enhancing medical report generation through semantic query features. Advanced NLP techniques improve Electronic Health Records management, facilitating the extraction of valuable information. Clinical applications of AI, such as GPT-3, innovate in diagnosis and clinical judgments. BioBERT and BlueBERT, pre-trained on biomedical texts, significantly advance disease classification accuracy. Moreover, efforts like Deep Gaussian Processes address AI’s black-box nature, providing greater interpretability and fostering user trust in medical applications.

Researchers from esteemed institutions, including the University of Southampton, University of New South Wales, Technology Innovation Institute, UAE, and Thomson Reuters Labs, UK, have collaborated to introduce a Bayesian Monte Carlo Dropout model, enhancing the reliability of AI predictions in healthcare. Unlike conventional methods, this approach utilizes Bayesian inference and Monte Carlo techniques to effectively manage uncertainty and data scarcity. Integrating kernel functions tailors the model’s sensitivity to the unique dynamics of medical datasets, offering a significant advancement in predictive accuracy and model transparency.

The methodology integrates Bayesian inference with Monte Carlo Dropout techniques, leveraging kernel functions to handle sparse data effectively. This model was rigorously tested using the SOAP, Medical Transcription, and ROND Clinical text classification datasets, chosen for their diverse medical contexts and data challenges. The Bayesian Monte Carlo Dropout approach systematically evaluates the uncertainty of predictions by incorporating prior knowledge through Bayesian priors and assessing variability through dropout configurations. This process enhances the model’s reliability and applicability in medical diagnostics by providing a quantifiable measure of confidence in its outputs, which is crucial for high-stakes healthcare decisions. 

The Bayesian Monte Carlo Dropout model demonstrated significant improvements in prediction reliability. On the SOAP dataset, it achieved a Brier score of 0.056, indicating high prediction accuracy. Similarly, in the ROND dataset, the model outperformed traditional methods with an F1 score of 0.916 and maintained a low Brier score of 0.056, confirming its effectiveness across different settings. The Medical Transcription dataset results showed a consistent enhancement in predictive accuracy with a notable increase in model confidence, evidenced by a substantial reduction in prediction error rates compared to baseline models.

To conclude, the research introduces a novel Bayesian Monte Carlo Dropout model that significantly enhances the reliability and transparency of AI predictions in medical applications. The model demonstrates robust performance across varied medical datasets by effectively integrating Bayesian inference with Monte Carlo techniques and kernel functions. The proven capability to quantify prediction uncertainties not only offers a tangible improvement in AI-driven medical diagnostics but also holds the potential to directly impact patient care, paving the way for broader acceptance and trust in AI technologies within the healthcare sector.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 40k+ ML SubReddit
The post This AI Paper Proposes a Novel Bayesian Deep Learning Model with Kernel Dropout Designed to Enhance the Reliability of Predictions in Medical Text Classification Tasks appeared first on MarkTechPost.

Google AI Proposes MathWriting: Transforming Handwritten Mathematical …

Online text recognition models have advanced significantly in recent years due to enhanced model structures and larger datasets. However, mathematical expression (ME) recognition, a more intricate task, has yet to receive comparable attention. Unlike text, MEs have a rigid two-dimensional structure where the spatial arrangement of symbols is crucial. Handwritten MEs (HMEs) pose even greater challenges due to ambiguity and the need for specialized hardware. Obtaining handwritten samples is costly as they require human input, further compounded by the necessity for dedicated devices like touchscreens or digital pens. Therefore, improving ME recognition demands tailored approaches distinct from text recognition.

Google Research has unveiled MathWriting, a dataset for online HME. Comprising 230k human-written and 400k synthetic samples, it surpasses offline HME datasets like IM2LATEX-100K. MathWriting facilitates online and offline HME recognition, aiding research by providing ample data. Compatible with other online datasets like CROHME and Detexify, MathWriting is shared in InkML format. Rasterizing inks effectively expand offline HME datasets. This initiative introduces a new benchmark for ME recognition, featuring normalized ground truth expressions for simplified training and robust evaluation alongside code examples on GitHub for seamless usage.

When comparing MathWriting to CROHME23, MathWriting stands out with nearly 3.9 times more samples and 4.5 times more distinct labels after normalization. Although there’s a considerable overlap of labels between the two datasets (47k), most are dataset-specific. Notably, MathWriting boasts a larger number of human-written inks compared to CROHME23. Moreover, MathWriting offers a broader array of tokens, encompassing 254 distinct ones, including Latin capitals, the majority of the Greek alphabet, and matrices—enabling representation of diverse scientific fields like quantum mechanics, differential calculus, and linear algebra.

The MathWriting dataset comprises 253k human-written expressions and 6k isolated symbols for training, validation, and testing, alongside 396k synthetic expressions. It covers 244 mathematical symbols and ten syntactic tokens. The dataset, released under Creative Commons, employs normalized LATEX notation as ground truth. The handwritten mathematical expression recognition benchmark is based on MathWriting’s test split, utilizing the character error rate (CER) metric. Various recognition models, including CTC Transformer and OCR, demonstrate the dataset’s utility. Data collection involved human contributors copying rendered expressions via an Android app, followed by minimal postprocessing and label normalization to enhance model performance.

The MathWriting dataset offers a detailed insight into handwritten mathematical expressions compared with the CROHME23 dataset. With extensive label and ink statistics, MathWriting provides valuable information on the diversity of expressions and writing styles. It emphasizes the significance of synthetic data in enhancing model diversity and highlights challenges such as device variations and noise sources like stray strokes and incorrect ground truth. Despite inherent recognition challenges, MathWriting is a comprehensive resource for training and evaluating handwriting recognition models, offering insights into real-world recognition scenarios.

In conclusion, MathWriting’s broad applications support recognition training across scientific domains and enable synthetic expression generation. Integration with datasets like CROHME23 promises enhanced model performance and diversity. Bounding box data facilitates synthetic ink generation, potentially refining LATEX’s rigid structure for more natural synthesis. Additionally, it offers avenues for character segmentation in UI features. Further improvements include exploring varied label normalization and leveraging contextual information for enhanced recognition. Future research could focus on optimizing train/validation/test splits and developing language models tailored to mathematical expressions. In conclusion, MathWriting empowers recognition research, offering extensive data and avenues for advancement.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 40k+ ML SubReddit
The post Google AI Proposes MathWriting: Transforming Handwritten Mathematical Expression Recognition with Extensive Human-Written and Synthetic Dataset Integration and Enhanced Model Training appeared first on MarkTechPost.

Accelerate ML workflows with Amazon SageMaker Studio Local Mode and Do …

We are excited to announce two new capabilities in Amazon SageMaker Studio that will accelerate iterative development for machine learning (ML) practitioners: Local Mode and Docker support. ML model development often involves slow iteration cycles as developers switch between coding, training, and deployment. Each step requires waiting for remote compute resources to start up, which delays validating implementations and getting feedback on changes.
With Local Mode, developers can now train and test models, debug code, and validate end-to-end pipelines directly on their SageMaker Studio notebook instance without the need for spinning up remote compute resources. This reduces the iteration cycle from minutes down to seconds, boosting developer productivity. Docker support in SageMaker Studio notebooks enables developers to effortlessly build Docker containers and access pre-built containers, providing a consistent development environment across the team and avoiding time-consuming setup and dependency management.
Local Mode and Docker support offer a streamlined workflow for validating code changes and prototyping models using local containers running on a SageMaker Studio notebook
instance. In this post, we guide you through setting up Local Mode in SageMaker Studio, running a sample training job, and deploying the model on an Amazon SageMaker endpoint from a SageMaker Studio notebook.
SageMaker Studio Local Mode
SageMaker Studio introduces Local Mode, enabling you to run SageMaker training, inference, batch transform, and processing jobs directly on your JupyterLab, Code Editor, or SageMaker Studio Classic notebook instances without requiring remote compute resources. Benefits of using Local Mode include:

Instant validation and testing of workflows right within integrated development environments (IDEs)
Faster iteration through local runs for smaller-scale jobs to inspect outputs and identify issues early
Improved development and debugging efficiency by eliminating the wait for remote training jobs
Immediate feedback on code changes before running full jobs in the cloud

The following figure illustrates the workflow using Local Mode on SageMaker.

To use Local Mode, set instance_type=’local’ when running SageMaker Python SDK jobs such as training and inference. This will run them on the instances used by your SageMaker Studio IDEs instead of provisioning cloud resources.
Although certain capabilities such as distributed training are only available in the cloud, Local Mode removes the need to switch contexts for quick iterations. When you’re ready to take advantage of the full power and scale of SageMaker, you can seamlessly run your workflow in the cloud.
Docker support in SageMaker Studio
SageMaker Studio now also enables building and running Docker containers locally on your SageMaker Studio notebook instance. This new feature allows you to build and validate Docker images in SageMaker Studio before using them for SageMaker training and inference.
The following diagram illustrates the high-level Docker orchestration architecture within SageMaker Studio.

With Docker support in SageMaker Studio, you can:

Build Docker containers with integrated models and dependencies directly within SageMaker Studio
Eliminate the need for external Docker build processes to simplify image creation
Run containers locally to validate functionality before deploying models to production
Reuse local containers when deploying to SageMaker for training and hosting

Although some advanced Docker capabilities like multi-container and custom networks are not supported as of this writing, the core build and run functionality is available to accelerate developing containers for bring your own container (BYOC) workflows.
Prerequisites
To use Local Mode in SageMaker Studio applications, you must complete the following prerequisites:

For pulling images from Amazon Elastic Container Registry (Amazon ECR), the account hosting the ECR image must provide access permission to the user’s Identity and Access Management (IAM) role. The domain’s role must also allow Amazon ECR access.
To enable Local Mode and Docker capabilities, you must set the EnableDockerAccess parameter to true for the domain’s DockerSettings using the AWS Command Line Interface (AWS CLI). This allows users in the domain to use Local Mode and Docker features. By default, Local Mode and Docker are disabled in SageMaker Studio. Any existing SageMaker Studio apps will need to be restarted for the Docker service update to take effect. The following is an example AWS CLI command for updating a SageMaker Studio domain:

aws sagemaker –region <REGION>
update-domain –domain-id <DOMAIN-ID>
–domain-settings-for-update ‘{“DockerSettings”: {“EnableDockerAccess”: “ENABLED”}}’

You need to update the SageMaker IAM role in order to be able to push Docker images to Amazon ECR:

{
“Version”: “2012-10-17”,
“Statement”: [
{
“Effect”: “Allow”,
“Action”: [
“ecr:CompleteLayerUpload”,
“ecr:UploadLayerPart”,
“ecr:InitiateLayerUpload”,
“ecr:BatchCheckLayerAvailability”,
“ecr:PutImage”
],
“Resource”: “arn:aws:ecr:us-east-2:123456789012:repository/<repositoryname>”
},
{
“Effect”: “Allow”,
“Action”: “ecr:GetAuthorizationToken”,
“Resource”: “*”
}
]
}

Run Python files in SageMaker Studio spaces using Local Mode
SageMaker Studio JupyterLab and Code Editor (based on Code-OSS, Visual Studio Code – Open Source), extends SageMaker Studio so you can write, test, debug, and run your analytics and ML code using the popular lightweight IDE. For more details on how to get started with SageMaker Studio IDEs, refer to Boost productivity on Amazon SageMaker Studio: Introducing JupyterLab Spaces and generative AI tools and New – Code Editor, based on Code-OSS VS Code Open Source now available in Amazon SageMaker Studio. Complete the following steps:

Create a new Code Editor or JupyterLab space called my-sm-code-editor-space or my-sm-jupyterlab-space, respectively.
Choose Create space. 
Choose the ml.m5.large instance and set storage to 32 GB.
Choose Run space. 
Open the JupyterLab or Code Editor space and clone the GitHub repo. 
Clone the GitHub repo, with /home/sagemaker-user/ as the target folder.

Create a new terminal. 
Install the Docker CLI and Docker Compose plugin following the instructions in the following GitHub repo. If chained commands fail, run the commands one at a time.

You must update the SageMaker SDK to the latest version.

Run pip install sagemaker -Uq in the terminal.

For Code Editor only, you need to set the Python environment to run in the current terminal.

In Code Editor, on the File menu¸ choose Preferences and Settings.

Search for and select Terminal: Execute in File Dir.

In Code Editor or JupyterLab, open the scikit_learn_script_mode_local_training_and_serving folder and run the scikit_learn_script_mode_local_training_and_serving.py file.

You can run the script by choosing Run in Code Editor or using the CLI in a JupyterLab terminal. You will be able to see how the model is trained locally. Then you deploy the model to a SageMaker endpoint locally, and calculate the root mean square error (RMSE).
Simulate training and inference in SageMaker Studio Classic using Local Mode
You can also use a notebook in SageMaker Studio Classic to run a small-scale training job on CIFAR10 using Local Mode, deploy the model locally, and perform inference.
Set up your notebook
To set up the notebook, complete the following steps:

Open SageMaker Studio Classic and clone the following GitHub repo.

Open the pytorch_local_mode_cifar10.ipynb notebook in blog/pytorch_cnn_cifar10.

For Image, choose PyTorch 2.1.0 Python 3.10 CPU Optimized.

Confirm that your notebook shows the correct instance and kernel selection.

Open a terminal by choosing Launch Terminal in the current SageMaker image.

Install the Docker CLI and Docker Compose plugin following the instructions in the following GitHub repo.

Because you’re using Docker from SageMaker Studio Classic, remove sudo when running commands because the terminal already runs under superuser. For SageMaker Studio Classic, the installation commands depend on the SageMaker Studio app image OS. For example, DLC-based framework images are Ubuntu based, in which the following instructions would work. However, for a Debian-based image like DataScience Images, you must follow the instructions in the following GitHub repo. If chained commands fail, run the commands one at a time. You should see the Docker version displayed.

Leave the terminal window open, go back to the notebook, and start running it cell by cell.

Make sure to run the cell with pip install -U sagemaker so you’re using the latest version of the SageMaker Python SDK.
Local training
When you start running the local SageMaker training job, you will see the following log lines:

INFO:sagemaker.local.image:’Docker Compose’ found using Docker CLI.
INFO:sagemaker.local.local_session:Starting training job

This indicates that the training was running locally using Docker.

Be patient while the pytorch-training:2.1-cpu-py310 Docker image is pulled. Due to its large size (5.2 GB), it could take a few minutes.
Docker images will be stored in the SageMaker Studio app instance’s root volume, which is not accessible to end-users. The only way to access and interact with Docker images is via the exposed Docker API operations.
From a user confidentiality standpoint, the SageMaker Studio platform never accesses or stores user-specific images.
When the training is complete, you’ll be able to see the following success log lines:

8zlz1zbfta-sagemaker-local exited with code 0
Aborting on container exit…
Container 8zlz1zbfta-sagemaker-local  Stopping
Container 8zlz1zbfta-sagemaker-local  Stopped
INFO:sagemaker.local.image:===== Job Complete =====

Local inference
Complete the following steps:

Deploy the SageMaker endpoint using SageMaker Local Mode.

Be patient while the pytorch-inference:2.1-cpu-py310 Docker image is pulled. Due to its large size (4.32 GB), it could take a few minutes.

Invoke the SageMaker endpoint deployed locally using the test images.

You will be able to see the predicted classes: frog, ship, car, and plane:

Predicted:  frog ship  car plane

Because the SageMaker Local endpoint is still up, navigate back to the open terminal window and list the running containers:

docker ps
You’ll be able to see the running pytorch-inference:2.1-cpu-py310 container backing the SageMaker endpoint.

To shut down the SageMaker local endpoint and stop the running container, because you can only run one local endpoint at a time, run the cleanup code.

To make sure the Docker container is down, you can navigate to the opened terminal window, run docker ps, and make sure there are no running containers.
If you see a container running, run docker stop <CONTAINER_ID> to stop it.

Tips for using SageMaker Local Mode
If you’re using SageMaker for the first time, refer to Train machine learning models. To learn more about deploying models for inference with SageMaker, refer to Deploy models for inference.
Keep in mind the following recommendations:

Print input and output files and folders to understand dataset and model loading
Use 1–2 epochs and small datasets for quick testing
Pre-install dependencies in a Dockerfile to optimize environment setup
Isolate serialization code in endpoints for debugging

Configure Docker installation as a Lifecycle Configuration
You can define the Docker install process as a Lifecycle Configuration (LCC) script to simplify setup each time a new SageMaker Studio space starts. LCCs are scripts that SageMaker runs during events like space creation. Refer to the JupyterLab, Code Editor, or SageMaker Studio Classic LCC setup (using docker install cli as reference) to learn more.

Build and test custom Docker images in SageMaker Studio spaces
In this step, you install Docker inside the JupyterLab (or Code Editor) app space and use Docker to build, test, and publish custom Docker images with SageMaker Studio spaces. Spaces are used to manage the storage and resource needs of some SageMaker Studio applications. Each space has a 1:1 relationship with an instance of an application. Every supported application that is created gets its own space. To learn more about SageMaker spaces, refer to Boost productivity on Amazon SageMaker Studio: Introducing JupyterLab Spaces and generative AI tools. Make sure you provision a new space with at least 30 GB of storage to allow sufficient storage for Docker images and artifacts.
Install Docker inside a space
To install the Docker CLI and Docker Compose plugin inside a JupyterLab space, run the commands in the following GitHub repo. SageMaker Studio only supports Docker version 20.10.X.
Build Docker images
To confirm that Docker is installed and working inside your JupyterLab space, run the following code:

# to verify docker service
sagemaker-user@default:~$ docker version
Client: Docker Engine – Community
Version:           24.0.7
API version:       1.41 (downgraded from 1.43)
Go version:        go1.20.10
Git commit:        afdd53b
Built:             Thu Oct 26 09:07:41 2023
OS/Arch:           linux/amd64
Context:           default

Server:
Engine:
Version:          20.10.25
API version:      1.41 (minimum version 1.12)
Go version:       go1.20.10
Git commit:       5df983c
Built:            Fri Oct 13 22:46:59 2023
OS/Arch:          linux/amd64
Experimental:     false
containerd:
Version:          1.7.2
GitCommit:        0cae528dd6cb557f7201036e9f43420650207b58
runc:
Version:          1.1.7
GitCommit:        f19387a6bec4944c770f7668ab51c4348d9c2f38
docker-init:
Version:          0.19.0
GitCommit:        de40ad0

To build a custom Docker image inside a JupyterLab (or Code Editor) space, complete the following steps:

Create an empty Dockerfile:

touch Dockerfile

Edit the Dockerfile with the following commands, which create a simple flask web server image from the base python:3.10.13-bullseye image hosted on Docker Hub:

# Use the specified Python base image
FROM python:3.10.13-bullseye

# Create a code dir
RUN mkdir /code/

# Set the working directory in the container
WORKDIR /code

# Upgrade pip and install required packages
RUN python3 -m pip install –upgrade pip &&
python3 -m pip install flask

# Copy the app.py file to the container
COPY app.py /code/

# Set the command to run the app
ENTRYPOINT [“python”, “app.py”]

The following code shows the contents of an example flask application file app.py:

from flask import Flask, jsonify

app = Flask(__name__)

@app.route(‘/’)
def hello():
return jsonify({“response”: “Hello”})

if __name__ == ‘__main__’:
app.run(host=’0.0.0.0′, port=6006)

Additionally, you can update the reference Dockerfile commands to include packages and artifacts of your choice.

Build a Docker image using the reference Dockerfile:

docker build –network sagemaker –tag myflaskapp:v1 –file ./Dockerfile .
Include –network sagemaker in your docker build command, otherwise the build will fail. Containers can’t be run in Docker default bridge or custom Docker networks. Containers are run in same network as the SageMaker Studio application container. Users can only use sagemaker for the network name.

When your build is complete, validate if the image exists. Re-tag the build as an ECR image and push. If you run into permission issues, run the aws ecr get-login-password… command and try to rerun the Docker push/pull:

sagemaker-user@default:~$ docker image list
REPOSITORY      TAG       IMAGE ID       CREATED          SIZE
myflaskapp      v1        d623f1538f20   27 minutes ago   489MB

sagemaker-user@default:~$ docker tag myflaskapp:v1 123456789012.dkr.ecr.us-east-2.amazonaws.com/myflaskapp:v1

sagemaker-user@default:~$ docker image list
REPOSITORY                                                  TAG       IMAGE ID       CREATED          SIZE
123456789012.dkr.ecr.us-east-2.amazonaws.com/myflaskapp     latest    d623f1538f20   27 minutes ago   489MB
myflaskapp                                                  v1        d623f1538f20   27 minutes ago   489MB

sagemaker-user@default:~$ aws ecr get-login-password –region region | docker login –username AWS –password-stdin aws_account_id.dkr.ecr.region.amazonaws.com

sagemaker-user@default:~$ docker push 123456789012.dkr.ecr.us-east-2.amazonaws.com/myflaskapp:latest

Test Docker images
Having Docker installed inside a JupyterLab (or Code Editor) SageMaker Studio space allows you to test pre-built or custom Docker images as containers (or containerized applications). In this section, we use the docker run command to provision Docker containers inside a SageMaker Studio space to test containerized workloads like REST web services and Python scripts. Complete the following steps:

Check if the image you’re testing exists on the space’s Amazon Elastic Block Store (Amazon EBS) volume:

sagemaker-user@default:~$ docker image list
REPOSITORY                                                  TAG       IMAGE ID       CREATED       SIZE

If the test image doesn’t exist, run docker pull to pull the image into your local machine:

sagemaker-user@default:~$ docker pull 123456789012.dkr.ecr.us-east-2.amazonaws.com/myflaskapp:v1

If you encounter authentication issues, run the following commands:

aws ecr get-login-password –region region | docker login –username AWS –password-stdin aws_account_id.dkr.ecr.region.amazonaws.com

Create a container to test your workload:

docker run –network sagemaker 123456789012.dkr.ecr.us-east-2.amazonaws.com/myflaskapp:v1
This spins up a new container instance and runs the application defined using Docker’s ENTRYPOINT:

sagemaker-user@default:~$ docker run –network sagemaker 905418447590.dkr.ecr.us-east-2.amazonaws.com/myflaskapp:v1
* Serving Flask app ‘app’
* Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on all addresses (0.0.0.0)
* Running on http://127.0.0.1:6006
* Running on http://169.255.255.2:6006

To test if your web endpoint is active, navigate to the URL https://<sagemaker-space-id>.studio.us-east-2.sagemaker.aws/jupyterlab/default/proxy/6006/.

You should see a JSON response similar to following screenshot.

Clean up
To avoid incurring unnecessary charges, delete the resources that you created while running the examples in this post:

In your SageMaker Studio domain, choose Studio Classic in the navigation pane, then choose Stop.
In your SageMaker Studio domain, choose JupyterLab or Code Editor in the navigation pane, choose your app, and then choose Stop.

Conclusion
SageMaker Studio Local Mode and Docker support empower developers to build, test, and iterate on ML implementations faster without leaving their workspace. By providing instant access to test environments and outputs, these capabilities optimize workflows and improve productivity. Try out SageMaker Studio Local Model and Docker support using our quick onboard feature, which allows you to spin up a new domain for single users within minutes. Share your thoughts in the comments section!

About the Authors
Shweta Singh is a Senior Product Manager in the Amazon SageMaker Machine Learning (ML) platform team at AWS, leading SageMaker Python SDK. She has worked in several product roles in Amazon for over 5 years. She has a Bachelor of Science degree in Computer Engineering and Masters of Science in Financial Engineering, both from New York University
Eitan Sela is a Generative AI and Machine Learning Specialist Solutions Architect ta AWS. He works with AWS customers to provide guidance and technical assistance, helping them build and operate Generative AI and Machine Learning solutions on AWS. In his spare time, Eitan enjoys jogging and reading the latest machine learning articles.
Pranav Murthy is an AI/ML Specialist Solutions Architect at AWS. He focuses on helping customers build, train, deploy and migrate machine learning (ML) workloads to SageMaker. He previously worked in the semiconductor industry developing large computer vision (CV) and natural language processing (NLP) models to improve semiconductor processes using state of the art ML techniques. In his free time, he enjoys playing chess and traveling. You can find Pranav on LinkedIn.
Mufaddal Rohawala is a Software Engineer at AWS. He works on the SageMaker Python SDK library for Amazon SageMaker. In his spare time, he enjoys travel, outdoor activities and is a soccer fan.

Significant new capabilities make it easier to use Amazon Bedrock to b …

We introduced Amazon Bedrock to the world a little over a year ago, delivering an entirely new way to build generative artificial intelligence (AI) applications. With the broadest selection of first- and third-party foundation models (FMs) as well as user-friendly capabilities, Amazon Bedrock is the fastest and easiest way to build and scale secure generative AI applications. Now tens of thousands of customers are using Amazon Bedrock to build and scale impressive applications. They are innovating quickly, easily, and securely to advance their AI strategies. And we’re supporting their efforts by enhancing Amazon Bedrock with exciting new capabilities including even more model choice and features that make it easier to select the right model, customize the model for a specific use case, and safeguard and scale generative AI applications.
Customers across diverse industries from finance to travel and hospitality to healthcare to consumer technology are making remarkable progress. They are realizing real business value by quickly moving generative AI applications into production to improve customer experiences and increase operational efficiency. Consider the New York Stock Exchange (NYSE), the world’s largest capital market processing billions of transactions each day. NYSE is leveraging Amazon Bedrock’s choice of FMs and cutting-edge AI generative capabilities across several use cases, including the processing of thousands of pages of regulations to provide answers in easy-to-understand language

Global airline United Airlines modernized their Passenger Service System to translate legacy passenger reservation codes into plain English so that agents can provide swift and efficient customer support. LexisNexis Legal & Professional, a leading global provider of information and analytics, developed a personalized legal generative AI assistant on Lexis+ AI. LexisNexis customers receive trusted results two times faster than the nearest competing product and can save up to five hours per week for legal research and summarization. And HappyFox, an online help desk software, selected Amazon Bedrock for its security and performance, boosting the efficiency of its AI-powered automated ticket system in its customer support solution by 40% and agent productivity by 30%.
And across Amazon, we are continuing to innovate with generative AI to deliver more immersive, engaging experiences for our customers. Just last week Amazon Music announced Maestro. Maestro is an AI playlist generator powered by Amazon Bedrock that gives Amazon Music subscribers an easier, more fun way to create playlists based on prompts. Maestro is now rolling out in beta to a small number of U.S. customers on all tiers of Amazon Music.
With Amazon Bedrock, we’re focused on the key areas that customers need to build production-ready, enterprise-grade generative AI applications at the right cost and speed. Today I’m excited to share new features that we’re announcing across the areas of model choice, tools for building generative AI applications, and privacy and security.
1. Amazon Bedrock expands model choice with Llama 3 models and helps you find the best model for your needs
In these early days, customers are still learning and experimenting with different models to determine which ones to use for various purposes. They want to be able to easily try the latest models, and test which capabilities and features will give them the best results and cost characteristics for their use cases. The majority of Amazon Bedrock customers use more than one model, and Amazon Bedrock provides the broadest selection of first- and third-party large language models (LLMs) and other FMs.  This includes models from AI21 labs, Anthropic, Cohere, Meta, Mistral AI, and Stability AI, as well as our own Amazon Titan models. In fact, Joel Hron, head of AI and Thomson Reuters Labs at Thomson Reuters recently said this about their adoption of Amazon Bedrock, “Having the ability to use a diverse range of models as they come out was a key driver for us, especially given how quickly this space is evolving.” The cutting-edge models of the Mistral AI model family including Mistral 7B, Mixtral 8x7B, and Mistral Large have customers excited about their high performance in text generation, summarization, Q&A, and code generation. Since we introduced the Anthropic Claude 3 model family, thousands of customers have experienced how Claude 3 Haiku, Sonnet, and Opus have established new benchmarks across cognitive tasks with unrivaled intelligence, speed, and cost-efficiency. After the initial evaluation using Claude 3 Haiku and Opus in Amazon Bedrock, BlueOcean.ai, a brand intelligence platform, saw a cost reduction of over 50% when they were able to consolidate four separate API calls into a single, more efficient call.
Masahiro Oba, General Manager, Group Federated Governance of DX Platform at Sony Group corporation shared,

“While there are many challenges with applying generative AI to the business, Amazon Bedrock’s diverse capabilities help us to tailor generative AI applications to Sony’s business. We are able to take advantage of not only the powerful LLM capabilities of Claude 3, but also capabilities that help us safeguard applications at the enterprise-level. I’m really proud to be working with the Bedrock team to further democratize generative AI within the Sony Group.”

I recently sat down with Aaron Linsky, CTO of Artificial Investment Associate Labs at Bridgewater Associates, a premier asset management firm, where they are using generative AI to enhance their “Artificial Investment Associate,” a major leap forward for their customers. It builds on their experience of giving rules-based expert advice for investment decision-making. With Amazon Bedrock, they can use the best available FMs, such as Claude 3, for different tasks-combining fundamental market understanding with the flexible reasoning capabilities of AI. Amazon Bedrock allows for seamless model experimentation, enabling Bridgewater to build a powerful, self-improving investment system that marries systematic advice with cutting-edge capabilities–creating an evolving, AI-first process.

To bring even more model choice to customers, today, we are making Meta Llama 3 models available in Amazon Bedrock. Llama 3’s Llama 3 8B and Llama 3 70B models are designed for building, experimenting, and responsibly scaling generative AI applications. These models were significantly improved from the previous model architecture, including scaling up pretraining, as well as instruction fine-tuning approaches. Llama 3 8B excels in text summarization, classification, sentiment analysis, and translation, ideal for limited resources and edge devices. Llama 3 70B shines in content creation, conversational AI, language understanding, R&D, enterprises, accurate summarization, nuanced classification/sentiment analysis, language modeling, dialogue systems, code generation, and instruction following. Read more about Meta Llama 3 now available in Amazon Bedrock.
We are also announcing support coming soon for Cohere’s Command R and Command R+ enterprise FMs. These models are highly scalable and optimized for long-context tasks like retrieval-augmented generation (RAG) with citations to mitigate hallucinations, multi-step tool use for automating complex business tasks, and support for 10 languages for global operations. Command R+ is Cohere’s most powerful model optimized for long-context tasks, while Command R is optimized for large-scale production workloads. With the Cohere models coming soon in Amazon Bedrock, businesses can build enterprise-grade generative AI applications that balance strong accuracy and efficiency for day-to-day AI operations beyond proof-of-concept.
Amazon Titan Image Generator now generally available and Amazon Titan Text Embeddings V2 coming soon
In addition to adding the most capable 3P models, Amazon Titan Image Generator is generally available today. With Amazon Titan Image Generator, customers in industries like advertising, e-commerce, media, and entertainment can efficiently generate realistic, studio-quality images in large volumes and at low cost, utilizing natural language prompts. They can edit generated or existing images using text prompts, configure image dimensions, or specify the number of image variations to guide the model. By default, every image produced by Amazon Titan Image Generator contains an invisible watermark, which aligns with AWS’s commitment to promoting responsible and ethical AI by reducing the spread of misinformation. The Watermark Detection feature identifies images created by Image Generator, and is designed to be tamper-resistant, helping increase transparency around AI-generated content. Watermark Detection helps mitigate intellectual property risks and enables content creators, news organizations, risk analysts, fraud-detection teams, and others, to better identify and mitigate dissemination of misleading AI-generated content. Read more about Watermark Detection for Titan Image Generator.

Coming soon, Amazon Titan Text Embeddings V2 efficiently delivers more relevant responses for critical enterprise use cases like search. Efficient embeddings models are crucial to performance when leveraging RAG to enrich responses with additional information. Embeddings V2 is optimized for RAG workflows and provides seamless integration with Knowledge Bases for Amazon Bedrock to deliver more informative and relevant responses efficiently. Embeddings V2 enables a deeper understanding of data relationships for complex tasks like retrieval, classification, semantic similarity search, and enhancing search relevance. Offering flexible embedding sizes of 256, 512, and 1024 dimensions, Embeddings V2 prioritizes cost reduction while retaining 97% of the accuracy for RAG use cases, out-performing other leading models. Additionally, the flexible embedding sizes cater to diverse application needs, from low-latency mobile deployments to high-accuracy asynchronous workflows.
New Model Evaluation simplifies the process of accessing, comparing, and selecting LLMs and FMs
Choosing the appropriate model is a critical first step toward building any generative AI application. LLMs can vary drastically in performance based on the task, domain, data modalities, and other factors. For example, a biomedical model is likely to outperform general healthcare models in specific medical contexts, whereas a coding model may face challenges with natural language processing tasks. Using an excessively powerful model could lead to inefficient resource usage, while an underpowered model might fail to meet minimum performance standards – potentially providing incorrect results. And selecting an unsuitable FM at a project’s onset could undermine stakeholder confidence and trust.
With so many models to choose from, we want to make it easier for customers to pick the right one for their use case.
Amazon Bedrock’s Model Evaluation tool, now generally available, simplifies the selection process by enabling benchmarking and comparison against specific datasets and evaluation metrics, ensuring developers select the model that best aligns with their project goals. This guided experience allows developers to evaluate models across criteria tailored to each use case. Through Model Evaluation, developers select candidate models to assess – public options, imported custom models, or fine-tuned versions. They define relevant test tasks, datasets, and evaluation metrics, such as accuracy, latency, cost projections, and qualitative factors. Read more about Model Evaluation in Amazon Bedrock.

The ability to select from the top-performing FMs in Amazon Bedrock has been extremely beneficial for Elastic Security. James Spiteri, Director of Product Management at Elastic shared,

“With just a few clicks, we can assess a single prompt across multiple models simultaneously. This model evaluation functionality enables us to compare the outputs, metrics, and associated costs across different models, allowing us to make an informed decision on which model would be most suitable for what we are trying to accomplish. This has significantly streamlined our process, saving us a considerable amount of time in deploying our applications to production.”

2. Amazon Bedrock offers capabilities to tailor generative AI to your business needs
While models are incredibly important, it takes more than a model to build an application that is useful for an organization. That’s why Amazon Bedrock has capabilities to help you easily tailor generative AI solutions to specific use cases. Customers can use their own data to privately customize applications through fine-tuning or by using Knowledge Bases for a fully managed RAG experience to deliver more relevant, accurate, and customized responses. Agents for Amazon Bedrock allows developers to define specific tasks, workflows, or decision-making processes, enhancing control and automation while ensuring consistent alignment with an intended use case. Starting today, you can now use Agents with Anthropic Claude 3 Haiku and Sonnet models. We are also introducing an updated AWS console experience, supporting a simplified schema and return of control to make it easy for developers to get started. Read more about Agents for Amazon Bedrock, now faster and easier to use.

With new Custom Model Import, customers can leverage the full capabilities of Amazon Bedrock with their own models
All these features are essential to building generative AI applications, which is why we wanted to make them available to even more customers including those who have already invested significant resources in fine-tuning LLMs with their own data on different services or in training custom models from scratch. Many customers have customized models available on Amazon SageMaker, which provides the broadest array of over 250 pre-trained FMs. These FMs include cutting-edge models such as Mistral, Llama2, CodeLlama, Jurassic-2, Jamba, pplx-7B, 70B, and the impressive Falcon 180B. Amazon SageMaker helps with getting data organized and fine-tuned, building scalable and efficient training infrastructure, and then deploying models at scale in a low latency, cost-efficient manner. It has been a game changer for developers in preparing their data for AI, managing experiments, training models faster (e.g. Perplexity AI trains models 40% faster in Amazon SageMaker), lowering inference latency (e.g. Workday has reduced inference latency by 80% with Amazon SageMaker), and improving developer productivity (e.g. NatWest reduced its time-to-value for AI from 12-18 months to under seven months using Amazon SageMaker). However, operationalizing these customized models securely and integrating them into applications for specific business use cases still has challenges.
That is why today we’re introducing Amazon Bedrock Custom Model Import, which enables organizations to leverage their existing AI investments along with Amazon Bedrock’s capabilities. With Custom Model Import, customers can now import and access their own custom models built on popular open model architectures including Flan-T5, Llama, and Mistral, as a fully managed application programming interface (API) in Amazon Bedrock. Customers can take models that they customized on Amazon SageMaker, or other tools, and easily add them to Amazon Bedrock. After an automated validation, they can seamlessly access their custom model, as with any other model in Amazon Bedrock. They get all the same benefits, including seamless scalability and powerful capabilities to safeguard their applications, adherence to responsible AI principles – as well as the ability to expand a model’s knowledge base with RAG, easily create agents to complete multi-step tasks, and carry out fine tuning to keep teaching and refining models. All without needing to manage the underlying infrastructure.
With this new capability, we’re making it easy for organizations to choose a combination of Amazon Bedrock models and their own custom models while maintaining the same streamlined development experience. Today, Amazon Bedrock Custom Model Import is available in preview and supports three of the most popular open model architectures and with plans for more in the future. Read more about Custom Model Import for Amazon Bedrock.

ASAPP is a generative AI company with a 10-year history of building ML models.

“Our conversational generative AI voice and chat agent leverages these models to redefine the customer service experience. To give our customers end to end automation, we need LLM agents, knowledge base, and model selection flexibility. With Custom Model Import, we will be able to use our existing custom models in Amazon Bedrock. Bedrock will allow us to onboard our customers faster, increase our pace of innovation, and accelerate time to market for new product capabilities.”
– Priya Vijayarajendran, President, Technology.

3. Amazon Bedrock provides a secure and responsible foundation to implement safeguards easily
As generative AI capabilities progress and expand, building trust and addressing ethical concerns becomes even more important. Amazon Bedrock addresses these concerns by leveraging AWS’s secure and trustworthy infrastructure with industry-leading security measures, robust data encryption, and strict access controls.
Guardrails for Amazon Bedrock, now generally available, helps customers prevent harmful content and manage sensitive information within an application.
We also offer Guardrails for Amazon Bedrock, which is now generally available. Guardrails offers industry-leading safety protection, giving customers the ability to define content policies, set application behavior boundaries, and implement safeguards against potential risks. Guardrails for Amazon Bedrock is the only solution offered by a major cloud provider that enables customers to build and customize safety and privacy protections for their generative AI applications in a single solution. It helps customers block as much as 85% more harmful content than protection natively provided by FMs on Amazon Bedrock. Guardrails provides comprehensive support for harmful content filtering and robust personal identifiable information (PII) detection capabilities. Guardrails works with all LLMs in Amazon Bedrock as well as fine-tuned models, driving consistency in how models respond to undesirable and harmful content. You can configure thresholds to filter content across six categories – hate, insults, sexual, violence, misconduct (including criminal activity), and prompt attack (jailbreak and prompt injection). You can also define a set of topics or words that need to be blocked in your generative AI application, including harmful words, profanity, competitor names, and products. For example, a banking application can configure a guardrail to detect and block topics related to investment advice. A contact center application summarizing call center transcripts can use PII redaction to remove PIIs in call summaries, or a conversational chatbot can use content filters to block harmful content. Read more about Guardrails for Amazon Bedrock.

Companies like Aha!, a software company that helps more than 1 million people bring their product strategy to life, uses Amazon Bedrock to power many of their generative AI capabilities.

“We have full control over our information through Amazon Bedrock’s data protection and privacy policies, and can block harmful content through Guardrails for Amazon Bedrock. We just built on it to help product managers discover insights by analyzing feedback submitted by their customers. This is just the beginning. We will continue to build on advanced AWS technology to help product development teams everywhere prioritize what to build next with confidence.”

With even more choice of leading FMs and features that help you evaluate models and safeguard applications as well as leverage your prior investments in AI along with the capabilities of Amazon Bedrock, today’s launches make it even easier and faster for customers to build and scale generative AI applications. This blog post highlights only a subset of the new features. You can learn more about everything we’ve launched in the resources of this post, including asking questions and summarizing data from a single document without setting up a vector database in Knowledge Bases and the general availability of support for multiple data sources with Knowledge Bases.
Early adopters leveraging Amazon Bedrock’s capabilities are gaining a crucial head start – driving productivity gains, fueling ground-breaking discoveries across domains, and delivering enhanced customer experiences that foster loyalty and engagement. I’m excited to see what our customers will do next with these new capabilities.
As my mentor Werner Vogels always says “Now Go Build” and I’ll add “…with Amazon Bedrock!”
Resources
Check out the following resources to learn more about this announcement:

Visit our community.aws site to find deep-dive technical content and to discover how our builder communities are using Amazon Bedrock in their solutions
Learn more about Generative AI on AWS
Learn more about Amazon Bedrock
Learn more about customers achieving success with Amazon Bedrock

About the author
Swami Sivasubramanian is Vice President of Data and Machine Learning at AWS. In this role, Swami oversees all AWS Database, Analytics, and AI & Machine Learning services. His team’s mission is to help organizations put their data to work with a complete, end-to-end data solution to store, access, analyze, and visualize, and predict.

Building scalable, secure, and reliable RAG applications using Knowled …

Generative artificial intelligence (AI) has gained significant momentum with organizations actively exploring its potential applications. As successful proof-of-concepts transition into production, organizations are increasingly in need of enterprise scalable solutions. However, to unlock the long-term success and viability of these AI-powered solutions, it is crucial to align them with well-established architectural principles.
The AWS Well-Architected Framework provides best practices and guidelines for designing and operating reliable, secure, efficient, and cost-effective systems in the cloud. Aligning generative AI applications with this framework is essential for several reasons, including providing scalability, maintaining security and privacy, achieving reliability, optimizing costs, and streamlining operations. Embracing these principles is critical for organizations seeking to use the power of generative AI and drive innovation.
This post explores the new enterprise-grade features for Knowledge Bases on Amazon Bedrock and how they align with the AWS Well-Architected Framework. With Knowledge Bases for Amazon Bedrock, you can quickly build applications using Retrieval Augmented Generation (RAG) for use cases like question answering, contextual chatbots, and personalized search.
Here are some features which we will cover:

AWS CloudFormation support
Private network policies for Amazon OpenSearch Serverless
Multiple S3 buckets as data sources
Service Quotas support
Hybrid search, metadata filters, custom prompts for the RetreiveAndGenerate API, and maximum number of retrievals.

AWS Well-Architected design principles
RAG-based applications built using Knowledge Bases for Amazon Bedrock can greatly benefit from following the AWS Well-Architected Framework. This framework has six pillars that help organizations make sure their applications are secure, high-performing, resilient, efficient, cost-effective, and sustainable:

Operational Excellence – Well-Architected principles streamline operations, automate processes, and enable continuous monitoring and improvement of generative AI app performance.
Security – Implementing strong access controls, encryption, and monitoring helps secure sensitive data used in your organization’s knowledge base and prevent misuse of generative AI.
Reliability – Well-Architected principles guide the design of resilient and fault-tolerant systems, providing consistent value delivery to users.
Performance Optimization – Choosing the appropriate resources, implementing caching strategies, and proactively monitoring performance metrics ensure that applications deliver fast and accurate responses, leading to optimal performance and an enhanced user experience.
Cost Optimization – Well-Architected guidelines assist in optimizing resource usage, using cost-saving services, and monitoring expenses, resulting in long-term viability of generative AI projects.
Sustainability – Well-Architected principles promote efficient resource utilization and minimizing carbon footprints, addressing the environmental impact of growing generative AI usage.

By aligning with the Well-Architected Framework, organizations can effectively build and manage enterprise-grade RAG applications using Knowledge Bases for Amazon Bedrock. Now, let’s dive deep into the new features launched within Knowledge Bases for Amazon Bedrock.
AWS CloudFormation support
For organizations building RAG applications, it’s important to provide efficient and effective operations and consistent infrastructure across different environments. This can be achieved by implementing practices such as automating deployment processes. To accomplish this, Knowledge Bases for Amazon Bedrock now offers support for AWS CloudFormation.
With AWS CloudFormation and the AWS Cloud Development Kit (AWS CDK), you can now create, update, and delete knowledge bases and associated data sources. Adopting AWS CloudFormation and the AWS CDK for managing knowledge bases and associated data sources not only streamlines the deployment process, but also promotes adherence to the Well-Architected principles. By performing operations (applications, infrastructure) as code, you can provide consistent and reliable deployments in multiple AWS accounts and AWS Regions, and maintain versioned and auditable infrastructure configurations.
The following is a sample CloudFormation script in JSON format for creating and updating a knowledge base in Amazon Bedrock:

{
“Type” : “AWS::Bedrock::KnowledgeBase”,
“Properties” : {
“Name”: String,
“RoleArn”: String,
“Description”: String,
“KnowledgeBaseConfiguration”: {
“Type” : String,
“VectorKnowledgeBaseConfiguration” : VectorKnowledgeBaseConfiguration
},
“StorageConfiguration”: StorageConfiguration,
}
}

Type specifies a knowledge base as a resource in a top-level template. Minimally, you must specify the following properties:

Name – Specify a name for the knowledge base.
RoleArn – Specify the Amazon Resource Name (ARN) of the AWS Identity and Access Management (IAM) role with permissions to invoke API operations on the knowledge base. For more information, see Create a service role for Knowledge bases for Amazon Bedrock.
KnowledgeBaseConfiguration – Specify the embeddings configuration of the knowledge base. The following sub-properties are required:

Type – Specify the value VECTOR.
VectorKnowledgeBaseConfiguration – Contains details about the model used to create vector embeddings for the knowledge base.

StorageConfiguration – Specify information about the vector store in which the data source is stored. The following sub-properties are required:

Type – Specify the vector store service that you are using.
You would also need to select one of the vector stores supported by Knowledge Bases such OpenSearchServerless, Pinecone or Amazon PostgreSQL and provide configuration for the selected vector store.

For details on all the fields and providing configuration of various vector stores supported by Knowledge Bases for Amazon Bedrock, refer to AWS::Bedrock::KnowledgeBase.
Redis Enterprise Cloud vector stores are not supported as of this writing in AWS CloudFormation. For latest information, please refer to the documentation above.
After you create a knowledge base, you need to create a data source from the Amazon Simple Storage Service (Amazon S3) bucket containing the files for your knowledge base. It calls the CreateDataSource and DeleteDataSource APIs.
The following is the sample CloudFormation script in JSON format:

{
“Type” : “AWS::Bedrock::DataSource”,
“Properties” : {
“KnowledgeBaseId”: String,
“Name”: String,
“RoleArn”: String,
“Description”: String,
“DataSourceConfiguration”: {
“S3Configuration” : S3DataSourceConfiguration,
“Type” : String
},
ServerSideEncryptionConfiguration”:ServerSideEncryptionConfiguration,
“VectorIngestionConfiguration”: VectorIngestionConfiguration
}
}

Type specifies a data source as a resource in a top-level template. Minimally, you must specify the following properties:

Name – Specify a name for the data source.
KnowledgeBaseId – Specify the ID of the knowledge base for the data source to belong to.
DataSourceConfiguration – Specify information about the S3 bucket containing the data source. The following sub-properties are required:

Type – Specify the value S3.
S3Configuration – Contains details about the configuration of the S3 object containing the data source.

VectorIngestionConfiguration – Contains details about how to ingest the documents in a data source. You need to provide “ChunkingConfiguration” where you can define your chunking strategy.
ServerSideEncryptionConfiguration – Contains the configuration for server-side encryption, where you can provide the Amazon Resource Name (ARN) of the AWS KMS key used to encrypt the resource.

For more information about setting up data sources in Amazon Bedrock, see Set up a data source for your knowledge base.
Note: You cannot change the chunking configuration after you create the data source.
The CloudFormation template allows you to define and manage your knowledge base resources using infrastructure as code (IaC). By automating the setup and management of the knowledge base, you can provide a consistent infrastructure across different environments. This approach aligns with the Operational Excellence pillar, which emphasizes performing operations as code. By treating your entire workload as code, you can automate processes, create consistent responses to events, and ultimately reduce human errors.
Private network policies for Amazon OpenSearch Serverless
For companies building RAG applications, it’s critical that the data remains secure and the network traffic does not go to public internet. To support this, Knowledge Bases for Amazon Bedrock now supports private network policies for Amazon OpenSearch Serverless.
Knowledge Bases for Amazon Bedrock provides an option for using OpenSearch Serverless as a vector store. You can now access OpenSearch Serverless collections that have a private network policy, which further enhances the security posture for your RAG application. To achieve this, you need to create an OpenSearch Serverless collection and configure it for private network access. First, create a vector index within the collection to store the embeddings. Then, while creating the collection, set Network access settings to Private and specify the VPC endpoint for access. Importantly, you can now provide private network access to OpenSearch Serverless collections specifically for Amazon Bedrock. To do this, select AWS service private access and specify bedrock.amazonaws.com as the service.

This private network configuration makes sure that your embeddings are stored securely and are only accessible by Amazon Bedrock, enhancing the overall security and privacy of your knowledge bases. It aligns closely with the Security Pillar of controlling traffic at all layers, because all network traffic is kept within the AWS backbone with these settings.
So far, we have explored the automation of creating, deleting, and updating knowledge base resources and the enhanced security through private network policies for OpenSearch Serverless to store vector embeddings securely. Now, let’s understand how to build more reliable, comprehensive, and cost-optimized RAG applications.
Multiple S3 buckets as data sources
Knowledge Bases for Amazon Bedrock now supports adding multiple S3 buckets as data sources within single knowledge base, including cross-account access. This enhancement increases the knowledge base’s comprehensiveness and accuracy by allowing users to aggregate and use information from various sources seamlessly.
The following are key features:

Multiple S3 buckets – Knowledge Bases for Amazon Bedrock can now incorporate data from multiple S3 buckets, enabling users to combine and use information from different sources effortlessly. This feature promotes data diversity and makes sure that relevant information is readily available for RAG-based applications.
Cross-account data access – Knowledge Bases for Amazon Bedrock supports the configuration of S3 buckets as data sources across different accounts. You can provide the necessary credentials to access these data sources, expanding the range of information that can be incorporated into their knowledge bases.
Efficient data management – When setting up a data source in a knowledge base, you can specify whether the data belonging to that data source should be retained or deleted if the data source is deleted. This feature ensures that your knowledge base remains up-to-date and free from obsolete or irrelevant data, maintaining the integrity and accuracy of the RAG process.

By supporting multiple S3 buckets as data sources, the need for creating multiple knowledge bases or redundant data copies is eliminated, thereby optimizing cost and promoting cloud financial management. Furthermore, the cross-account access capabilities enable the development of resilient architectures, aligning with the Reliability pillar of the AWS Well-Architected Framework, providing high availability and fault tolerance.
Other recently announced features for Knowledge Bases
To further enhance the reliability of your RAG application, Knowledge Bases for Amazon Bedrock now extends support for Service Quotas. This feature provides a single pane of glass to view applied AWS quota values and usage. For example, you now have quick access to information such as the allowed number of `RetrieveAndGenerate API requests per second.
This feature allows you to effectively manage resource quotas, prevent overprovisioning, and limit API request rates to safeguard services from potential abuse.
You can also enhance your application’s performance by using recently announced features like hybrid search, filtering based on metadata, custom prompts for the RetreiveAndGenerate API, and maximum number of retrievals. These features collectively improve the accuracy, relevance, and consistency of generated responses, and align with the Performance Efficiency pillar of the AWS Well-Architected Framework.
Knowledge Bases for Amazon Bedrock aligns with the Sustainability pillar of the AWS Well-Architected Framework by using managed services and optimizing resource utilization. As a fully managed service, Knowledge Bases for Amazon Bedrock removes the burden of provisioning, managing, and scaling the underlying infrastructure, thereby reducing the environmental impact associated with operating and maintaining these resources.
Additionally, by aligning with the AWS Well-Architected principles, organizations can design and operate their RAG applications in a sustainable manner. Practices such as automating deployments through AWS CloudFormation, implementing private network policies for secure data access, and using efficient services like OpenSearch Serverless contribute to minimizing the environmental impact of these workloads.
Overall, Knowledge Bases for Amazon Bedrock, combined with the AWS Well-Architected Framework, empowers organizations to build scalable, secure, and reliable RAG applications while prioritizing environmental sustainability through efficient resource utilization and the adoption of managed services.
Conclusion
The new enterprise-grade features, such as AWS CloudFormation support, private network policies, the ability to use multiple S3 buckets as data sources, and support for Service Quotas, make it straightforward to build scalable, secure, and reliable RAG applications with Knowledge Bases for Amazon Bedrock. Using AWS managed services and following Well-Architected best practices allows organizations to focus on delivering innovative generative AI solutions while providing operational excellence, robust security, and efficient resource utilization. As you build applications on AWS, aligning RAG applications with the AWS Well-Architected Framework provides a solid foundation for building enterprise-grade solutions that drive business value while adhering to industry standards.
For additional resources, refer to the following:

Knowledge bases for Amazon Bedrock
Use RAG to improve responses in generative AI application
Amazon Bedrock Knowledge Base – Samples for building RAG workflows

About the authors
Mani Khanuja is a Tech Lead – Generative AI Specialists, author of the book Applied Machine Learning and High Performance Computing on AWS, and a member of the Board of Directors for Women in Manufacturing Education Foundation Board. She leads machine learning projects in various domains such as computer vision, natural language processing, and generative AI. She speaks at internal and external conferences such AWS re:Invent, Women in Manufacturing West, YouTube webinars, and GHC 23. In her free time, she likes to go for long runs along the beach.
Nitin Eusebius is a Sr. Enterprise Solutions Architect at AWS, experienced in Software Engineering, Enterprise Architecture, and AI/ML. He is deeply passionate about exploring the possibilities of generative AI. He collaborates with customers to help them build well-architected applications on the AWS platform, and is dedicated to solving technology challenges and assisting with their cloud journey.
Pallavi Nargund is a Principal Solutions Architect at AWS. In her role as a cloud technology enabler, she works with customers to understand their goals and challenges, and give prescriptive guidance to achieve their objective with AWS offerings. She is passionate about women in technology and is a core member of Women in AI/ML at Amazon. She speaks at internal and external conferences such as AWS re:Invent, AWS Summits, and webinars. Outside of work she enjoys volunteering, gardening, cycling and hiking.

Tencent AI Lab Developed AlphaLLM: A Novel Machine Learning Framework …

Large Language Models (LLMs) stand out for their ability to parse and generate human-like text across various applications. These models have become integral to technologies that automate and enhance text-based tasks. Despite their advanced capabilities, modern LLMs face significant challenges in scenarios requiring intricate reasoning and strategic planning. These challenges stem from the limitations in current training methodologies, which rely heavily on vast amounts of high-quality, annotated data that are only sometimes available or feasible to gather.

Existing research includes advanced prompting techniques like GPT-4’s Chain-of-Thought, which improves reasoning by outlining intermediate steps. Some models demonstrate the potential of fine-tuning LLMs with high-quality data, although this approach is constrained by data availability. Self-correction strategies enable LLMs to refine outputs through internal feedback. Furthermore, Monte Carlo Tree Search (MCTS), as seen in strategic games like Go, has been adapted to enhance decision-making in language models such as AlphaZero.

Researchers from Tencent AI lab have introduced ALPHALLM, a novel framework that integrates MCTS with LLMs to promote self-improvement without additional data annotations. This framework is distinct because it borrows strategic planning techniques from board games, applying them to the language processing domain, which allows the model to simulate and evaluate potential responses independently.

The ALPHALLM methodology is structured around three core components: the imagination component, which synthesizes new prompts to expand learning scenarios; the MCTS mechanism, which navigates through potential responses; and critic models that assess the efficacy of these responses. The framework was empirically tested using the GSM8K and MATH datasets, focusing on mathematical reasoning tasks. This method allows the LLM to enhance its problem-solving abilities by learning from simulated outcomes and internal feedback, optimizing the model’s strategic decision-making capabilities without relying on new external data.

Empirical testing of ALPHALLM demonstrated significant performance improvements in mathematical reasoning tasks. Specifically, the model’s accuracy on the GSM8K dataset increased from 57.8% to 92.0%, and on the MATH dataset, it improved from 20.7% to 51.0%. These results validate the framework’s effectiveness in enhancing LLM capabilities through its unique self-improving mechanism. By leveraging internal feedback and strategic simulations, ALPHALLM achieves substantial gains in task-specific performance without additional data annotations.

In conclusion, the research introduced ALPHALLM, a framework that integrates MCTS with LLMs for self-improvement, eliminating the need for additional data annotations. By successfully applying strategic game techniques to language processing, ALPHALLM significantly enhances LLMs’ reasoning capabilities, as evidenced by its marked performance improvements on the GSM8K and MATH datasets. This approach not only advances the autonomy of LLMs but also underscores the potential for continuous, data-independent model enhancement in complex problem-solving domains.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 40k+ ML SubReddit
The post Tencent AI Lab Developed AlphaLLM: A Novel Machine Learning Framework for Self-Improving Language Models appeared first on MarkTechPost.

Japanese Heron-Bench: A Novel AI Benchmark for Evaluating Japanese Cap …

The rapid progression of Large Language Models (LLMs) is a pivotal milestone in the evolution of artificial intelligence. In recent years, we have witnessed a surge in the development and public accessibility of well-trained LLMs in English and other languages, including Japanese. This expansion underscores a global effort to democratize AI capabilities across linguistic and cultural boundaries.

Building upon the advancements in LLMs, novel approaches have emerged for constructing Vision Language Models (VLMs), which integrate image encoders into language models. These VLMs hold promise in their capacity to understand and generate textual descriptions of visual content. Various evaluation metrics have been proposed to gauge their effectiveness, encompassing tasks such as image captioning, similarity scoring between images and text, and visual question answering (VQA). However, it’s notable that most high-performing VLMs are trained and evaluated predominantly on English-centric datasets.

The need for robust evaluation methodologies becomes increasingly urgent as the demand for non-English models burgeons, particularly in languages like Japanese. Recognizing this imperative, a new evaluation benchmark called the Japanese Heron-Bench has been introduced. This benchmark comprises a meticulously curated dataset of images and contextually relevant questions tailored to the Japanese language and culture. Through this benchmark, the efficacy of VLMs in comprehending visual scenes and responding to queries within the Japanese context can be thoroughly scrutinized.

In tandem with establishing the Japanese Heron-Bench, efforts have been directed toward developing Japanese VLMs trained on Japanese image-text pairs using existing Japanese LLMs. This serves as a foundational step in bridging the gap between LLMs and VLMs in the Japanese linguistic landscape. Such models’ availability facilitates research and fosters innovation in diverse applications ranging from language understanding to visual comprehension.

Despite the strides made in evaluation methodologies, inherent limitations persist. For instance, the accuracy of assessments may be compromised by the performance disparities between languages in LLMs. This is particularly salient in the case of Japanese, where the language proficiency of models may differ from that of English. Additionally, concerns regarding safety aspects such as misinformation, bias, or toxicity in generated content warrant further exploration in evaluation metrics.

In conclusion, while introducing the Japanese Heron-Bench and Japanese VLMs represents significant strides toward comprehensive evaluation and utilization of VLMs in non-English contexts, challenges remain to be addressed. In the future, researchers will research evaluation metrics, and safety considerations will be pivotal in ensuring VLMs’ efficacy, reliability, and ethical deployment across diverse linguistic and cultural landscapes.

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 40k+ ML SubReddit
The post Japanese Heron-Bench: A Novel AI Benchmark for Evaluating Japanese Capabilities of Vision Language Models VLMs appeared first on MarkTechPost.

DynGAN: A Machine Learning Framework that Detects Collapsed Samples in …

Generative adversarial networks (GANs) are a popular tool for creating realistic data, but they often struggle with a problem called mode collapse. This happens when the variety of generated samples isn’t as diverse as real ones. Researchers have had trouble figuring out why this happens and finding a solution.

A team of scientists from the University of Science and Technology of China (USTC) of the Chinese Academy of Sciences (CAS) recently investigated the reasons behind mode collapse and developed a new approach called Dynamic GAN (DynGAN). This method is designed to find and fix mode collapse in GANs.

They found that how GANs learn from real data can lead to mode collapse. DynGAN works by setting boundaries to figure out when the generator isn’t making enough different samples. Then, the training data is divided based on these boundaries and trains different parts separately.

The team tested DynGAN using both made-up and real-world data. They found that it worked better than other GANs in solving mode collapse issues.

This new approach is a big step forward in understanding and improving GANs. By tackling mode collapse, DynGAN could help make generated data more realistic and useful for various applications.

In conclusion, mode collapse has been a tough problem for GANs, but DynGAN offers a promising solution. By detecting and addressing this issue, DynGAN could make GANs more effective in creating diverse and realistic data.

Check out the Paper and Blog. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 40k+ ML SubReddit
The post DynGAN: A Machine Learning Framework that Detects Collapsed Samples in the Generator by Thresholding on Observable Discriminator Outputs appeared first on MarkTechPost.

Integrate HyperPod clusters with Active Directory for seamless multi-u …

Amazon SageMaker HyperPod is purpose-built to accelerate foundation model (FM) training, removing the undifferentiated heavy lifting involved in managing and optimizing a large training compute cluster. With SageMaker HyperPod, you can train FMs for weeks and months without disruption.
Typically, HyperPod clusters are used by multiple users: machine learning (ML) researchers, software engineers, data scientists, and cluster administrators. They edit their own files, run their own jobs, and want to avoid impacting each other’s work. To achieve this multi-user environment, you can take advantage of Linux’s user and group mechanism and statically create multiple users on each instance through lifecycle scripts. The drawback to this approach, however, is that user and group settings are duplicated across multiple instances in the cluster, making it difficult to configure them consistently on all instances, such as when a new team member joins.
To solve this pain point, we can use Lightweight Directory Access Protocol (LDAP) and LDAP over TLS/SSL (LDAPS) to integrate with a directory service such as AWS Directory Service for Microsoft Active Directory. With the directory service, you can centrally maintain users and groups, and their permissions.
In this post, we introduce a solution to integrate HyperPod clusters with AWS Managed Microsoft AD, and explain how to achieve a seamless multi-user login environment with a centrally maintained directory.
Solution overview
The solution uses the following AWS services and resources:

SageMaker HyperPod to create a cluster
AWS Managed Microsoft AD to create a managed directory
Elastic Load Balancing (ELB) to create a Network Load Balancer (NLB) in front of the directory service
AWS Certificate Manager (ACM) to import and maintain an SSL/TLS certificate for LDAP over an SSL/TLS (LDAPS) connection
Amazon Elastic Compute Cloud (Amazon EC2) to create a Windows machine to administer users and groups in the directory

We also use AWS CloudFormation to deploy a stack to create the prerequisites for the HyperPod cluster: VPC, subnets, security group, and Amazon FSx for Lustre volume.
The following diagram illustrates the high-level solution architecture.

In this solution, HyperPod cluster instances use the LDAPS protocol to connect to the AWS Managed Microsoft AD via an NLB. We use TLS termination by installing a certificate to the NLB. To configure LDAPS in HyperPod cluster instances, the lifecycle script installs and configures System Security Services Daemon (SSSD)—an open source client software for LDAP/LDAPS.
Prerequisites
This post assumes you already know how to create a basic HyperPod cluster without SSSD. For more details on how to create HyperPod clusters, refer to Getting started with SageMaker HyperPod and the HyperPod workshop.
Also, in the setup steps, you will use a Linux machine to generate a self-signed certificate and obtain an obfuscated password for the AD reader user. If you don’t have a Linux machine, you can create an EC2 Linux instance or use AWS CloudShell.
Create a VPC, subnets, and a security group
Follow the instructions in the Own Account section of the HyperPod workshop. You will deploy a CloudFormation stack and create prerequisite resources such as VPC, subnets, security group, and FSx for Lustre volume. You need to create both a primary subnet and backup subnet when deploying the CloudFormation stack, because AWS Managed Microsoft AD requires at least two subnets with different Availability Zones.
In this post, for simplicity, we use the same VPC, subnets, and security group for both the HyperPod cluster and directory service. If you need to use different networks between the cluster and directory service, make sure security groups and route tables are configured so that they can communicate each other.
Create AWS Managed Microsoft AD on Directory Service
Complete the following steps to set up your directory:

On the Directory Service console, choose Directories in the navigation pane.
Choose Set up directory.
For Directory type, select AWS Managed Microsoft AD.
Choose Next.
For Edition, select Standard Edition.
For Directory DNS name, enter your preferred directory DNS name (for example, hyperpod.abc123.com).
For Admin password¸ set a password and save it for later use.
Choose Next.
In the Networking section, specify the VPC and two private subnets you created.
Choose Next.
Review the configuration and pricing, then choose Create directory. The directory creation starts. Wait until the status changes from Creating to Active, which can take 20–30 minutes.
When the status changes to Active, open the detail page of the directory and take note of the DNS addresses for later use.

Create an NLB in front of Directory Service
To create the NLB, complete the following steps:

On the Amazon EC2 console, choose Target groups in the navigation pane.
Choose Create target groups.
Create a target group with the following parameters:

For Choose a target type, select IP addresses.
For Target group name, enter LDAP.
For Protocol: Port, choose TCP and enter 389.
For IP address type, select IPv4.
For VPC, choose SageMaker HyperPod VPC (which you created with the CloudFormation template).
For Health check protocol, choose TCP.

Choose Next.
In the Register targets section, register the directory service’s DNS addresses as the targets.
For Ports, choose Include as pending below.The addresses are added in the Review targets section with Pending status.
Choose Create target group.
On the Load Balancers console, choose Create load balancer.
Under Network Load Balancer, choose Create.
Configure an NLB with the following parameters:

For Load balancer name, enter a name (for example, nlb-ds).
For Scheme, select Internal.
For IP address type, select IPv4.
For VPC, choose SageMaker HyperPod VPC (which you created with the CloudFormation template).
Under Mappings, select the two private subnets and their CIDR ranges (which you created with the CloudFormation template).
For Security groups, choose CfStackName-SecurityGroup-XYZXYZ (which you created with the CloudFormation template).

In the Listeners and routing section, specify the following parameters:

For Protocol, choose TCP.
For Port, enter 389.
For Default action, choose the target group named LDAP.
Here, we are adding a listener for LDAP. We will add LDAPS later.
Choose Create load balancer.Wait until the status changes from Provisioning to Active, which can take 3–5 minutes.
When the status changes to Active, open the detail page of the provisioned NLB and take note of the DNS name (xyzxyz.elb.region-name.amazonaws.com) for later use.

Create a self-signed certificate and import it to Certificate Manager
To create a self-signed certificate, complete the following steps:

On your Linux-based environment (local laptop, EC2 Linux instance, or CloudShell), run the following OpenSSL commands to create a self-signed certificate and private key:

$ openssl genrsa 2048 > ldaps.key

$ openssl req -new -key ldaps.key -out ldaps_server.csr

You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter ‘.’, the field will be left blank.
—–
Country Name (2 letter code) [AU]:US
State or Province Name (full name) [Some-State]:Washington
Locality Name (eg, city) []:Bellevue
Organization Name (eg, company) [Internet Widgits Pty Ltd]:CorpName
Organizational Unit Name (eg, section) []:OrgName
Common Name (e.g., server FQDN or YOUR name) []:nlb-ds-abcd1234.elb.region.amazonaws.com
Email Address []:your@email.address.com

Please enter the following ‘extra’ attributes
to be sent with your certificate request
A challenge password []:
An optional company name []:

$ openssl x509 -req -sha256 -days 365 -in ldaps_server.csr -signkey ldaps.key -out ldaps.crt

Certificate request self-signature ok
subject=C = US, ST = Washington, L = Bellevue, O = CorpName, OU = OrgName, CN = nlb-ds-abcd1234.elb.region.amazonaws.com, emailAddress = your@email.address.com

$ chmod 600 ldaps.key

On the Certificate Manager console, choose Import.
Enter the certificate body and private key, from the contents of ldaps.crt and ldaps.key respectively.
Choose Next.
Add any optional tags, then choose Next.
Review the configuration and choose Import.

Add an LDAPS listener
We added a listener for LDAP already in the NLB. Now we add a listener for LDAPS with the imported certificate. Complete the following steps:

On the Load Balancers console, navigate to the NLB details page.
On the Listeners tab, choose Add listener.
Configure the listener with the following parameters:

For Protocol, choose TLS.
For Port, enter 636.
For Default action, choose LDAP.
For Certificate source, select From ACM.
For Certificate, enter what you imported in ACM.

Choose Add.Now the NLB listens to both LDAP and LDAPS. It is recommended to delete the LDAP listener because it transmits data without encryption, unlike LDAPS.

Create an EC2 Windows instance to administer users and groups in the AD
To create and maintain users and groups in the AD, complete the following steps:

On the Amazon EC2 console, choose Instances in the navigation pane.
Choose Launch instances.
For Name, enter a name for your instance.
For Amazon Machine Image, choose Microsoft Windows Server 2022 Base.
For Instance type, choose t2.micro.
In the Network settings section, provide the following parameters:

For VPC, choose SageMaker HyperPod VPC (which you created with the CloudFormation template).
For Subnet, choose either of two subnets you created with the CloudFormation template.
For Common security groups, choose CfStackName-SecurityGroup-XYZXYZ (which you created with the CloudFormation template).

For Configure storage, set storage to 30 GB gp2.
In the Advanced details section, for Domain join directory¸ choose the AD you created.
For IAM instance profile, choose an AWS Identity and Access Management (IAM) role with at least the AmazonSSMManagedEC2InstanceDefaultPolicy policy.
Review the summary and choose Launch instance.

Create users and groups in AD using the EC2 Windows instance
With Remote Desktop, connect to the EC2 Windows instance you created in the previous step. Using an RDP client is recommended over using a browser-based Remote Desktop so that you can exchange the contents of the clipboard with your local machine using copy-paste operations. For more details about connecting to EC2 Windows instances, refer to Connect to your Windows instance.
If you are prompted for a login credential, use hyperpodAdmin (where hyperpod is the first part of your directory DNS name) as the user name, and use the admin password you set to the directory service.

When the Windows desktop screen opens, choose Server Manager from the Start menu.
Choose Local Server in the navigation pane, and confirm that the domain is what you specified to the directory service.
On the Manage menu, choose Add Roles and Features.
Choose Next until you are at the Features page.
Expand the feature Remote Server Administration Tools, expand Role Administration Tools, and select AD DS and AD LDS Tools and Active Directory Rights Management Service.
Choose Next and Install.Feature installation starts.
When the installation is complete, choose Close.
Open Active Directory Users and Computers from the Start menu.
Under hyperpod.abc123.com, expand hyperpod.
Choose (right-click) hyperpod, choose New, and choose Organizational Unit.
Create an organizational unit called Groups.
Choose (right-click) Groups, choose New, and choose Group.
Create a group called ClusterAdmin.
Create a second group called ClusterDev.
Choose (right-click) Users, choose New, and choose User.
Create a new user.
Choose (right-click) the user and choose Add to a group.
Add your users to the groups ClusterAdmin or ClusterDev.Users added to the ClusterAdmin group will have sudo privilege on the cluster.

Create a ReadOnly user in AD
Create a user called ReadOnly under Users. The ReadOnly user is used by the cluster to programmatically access users and groups in AD.

Take note of the password for later use.

(For SSH public key authentication) Add SSH public keys to users
By storing an SSH public key to a user in AD, you can log in without entering a password. You can use an existing key pair, or you can create a new key pair with OpenSSH’s ssh-keygen command. For more information about generating a key pair, refer to Create a key pair for your Amazon EC2 instance.

In Active Directory Users and Computers, on the View menu, enable Advanced Features.
Open the Properties dialog of the user.
On the Attribute Editor tab, choose altSecurityIdentities choose Edit.
For Value to add, choose Add.
For Values, add an SSH public key.
Choose OK.Confirm that the SSH public key appears as an attribute.

Get an obfuscated password for the ReadOnly user
To avoid including a plain text password in the SSSD configuration file, you obfuscate the password. For this step, you need a Linux environment (local laptop, EC2 Linux instance, or CloudShell).
Install the sssd-tools package on the Linux machine to install the Python module pysss for obfuscation:

# Ubuntu
$ sudo apt install sssd-tools

# Amazon Linux
$ sudo yum install sssd-tools

Run the following one-line Python script. Input the password of the ReadOnly user. You will get the obfuscated password.

$ python3 -c “import getpass,pysss; print(pysss.password().encrypt(getpass.getpass(‘AD reader user password: ‘).strip(), pysss.password().AES_256))”
AD reader user password: (Enter ReadOnly user password)
AAAQACK2….

Create a HyperPod cluster with an SSSD-enabled lifecycle script
Next, you create a HyperPod cluster with LDAPS/Active Directory integration.

Find the configuration file config.py in your lifecycle script directory, open it with your text editor, and edit the properties in the Config class and SssdConfig class:

Set True for enable_sssd to enable setting up SSSD.
The SssdConfig class contains configuration parameters for SSSD.
Make sure you use the obfuscated password for the ldap_default_authtok property, not a plain text password.

# Basic configuration parameters
class Config:
:
# Set true if you want to install SSSD for ActiveDirectory/LDAP integration.
# You need to configure parameters in SssdConfig as well.
enable_sssd = True
# Configuration parameters for ActiveDirectory/LDAP/SSSD
class SssdConfig:

# Name of domain. Can be default if you are not sure.
domain = “default”

# Comma separated list of LDAP server URIs
ldap_uri = “ldaps://nlb-ds-xyzxyz.elb.us-west-2.amazonaws.com”

# The default base DN to use for performing LDAP user operations
ldap_search_base = “dc=hyperpod,dc=abc123,dc=com”

# The default bind DN to use for performing LDAP operations
ldap_default_bind_dn = “CN=ReadOnly,OU=Users,OU=hyperpod,DC=hyperpod,DC=abc123,DC=com”

# “password” or “obfuscated_password”. Obfuscated password is recommended.
ldap_default_authtok_type = “obfuscated_password”

# You need to modify this parameter with the obfuscated password, not plain text password
ldap_default_authtok = “placeholder”

# SSH authentication method – “password” or “publickey”
ssh_auth_method = “publickey”

# Home directory. You can change it to “/home/%u” if your cluster doesn’t use FSx volume.
override_homedir = “/fsx/%u”

# Group names to accept SSH login
ssh_allow_groups = {
“controller” : [“ClusterAdmin”, “ubuntu”],
“compute” : [“ClusterAdmin”, “ClusterDev”, “ubuntu”],
“login” : [“ClusterAdmin”, “ClusterDev”, “ubuntu”],
}

# Group names for sudoers
sudoers_groups = {
“controller” : [“ClusterAdmin”, “ClusterDev”],
“compute” : [“ClusterAdmin”, “ClusterDev”],
“login” : [“ClusterAdmin”, “ClusterDev”],
}

Copy the certificate file ldaps.crt to the same directory (where config.py exists).
Upload the modified lifecycle script files to your Amazon Simple Storage Service (Amazon S3) bucket, and create a HyperPod cluster with it.
Wait until the status changes to InService.

Verification
Let’s verify the solution by logging in to the cluster with SSH. Because the cluster was created in a private subnet, you can’t directly SSH into the cluster from your local environment. You can choose from two options to connect to the cluster.
Option 1: SSH login through AWS Systems Manager
You can use AWS Systems Manager as a proxy for the SSH connection. Add a host entry to the SSH configuration file ~/.ssh/config using the following example. For the HostName field, specify the Systems Manger target name in the format of sagemaker-cluster:[cluster-id]_[instance-group-name]-[instance-id]. For the IdentityFile field, specify the file path to the user’s SSH private key. This field is not required if you chose password authentication.

Host MyCluster-LoginNode
HostName sagemaker-cluster:abcd1234_LoginGroup-i-01234567890abcdef
User user1
IdentityFile ~/keys/my-cluster-ssh-key.pem
ProxyCommand aws –profile default –region us-west-2 ssm start-session –target %h –document-name AWS-StartSSHSession –parameters portNumber=%p

Run the ssh command using the host name you specified. Confirm you can log in to the instance with the specified user.

$ ssh MyCluster-LoginNode
:
:
____ __ ___ __ __ __ ___ __
/ __/__ ____ ____ / |/ /__ _/ /_____ ____ / // /_ _____ ___ ____/ _ ___ ___/ /
_ / _ `/ _ `/ -_) /|_/ / _ `/ ‘_/ -_) __/ / _ / // / _ / -_) __/ ___/ _ / _ /
/___/_,_/_, /__/_/ /_/_,_/_/_\__/_/ /_//_/_, / .__/__/_/ /_/ ___/_,_/
/___/ /___/_/
You’re on the controller
Instance Type: ml.m5.xlarge
user1@ip-10-1-111-222:~$

At this point, users can still use the Systems Manager default shell session to log in to the cluster as ssm-user with administrative privileges. To block the default Systems Manager shell access and enforce SSH access, you can configure your IAM policy by referring to the following example:

{
“Version”: “2012-10-17”,
“Statement”: [
{
“Effect”: “Allow”,
“Action”: [
“ssm:StartSession”,
“ssm:TerminateSession”
],
“Resource”: [
“arn:aws:sagemaker:us-west-2:123456789012:cluster/abcd1234efgh”,
“arn:aws:ssm:us-west-2:123456789012:document/AWS-StartSSHSession”
],
“Condition”: {
“BoolIfExists”: {
“ssm:SessionDocumentAccessCheck”: “true”
}
}
}
]
}

For more details on how to enforce SSH access, refer to Start a session with a document by specifying the session documents in IAM policies.
Option 2: SSH login through bastion host
Another option to access the cluster is to use a bastion host as a proxy. You can use this option when the user doesn’t have permission to use Systems Manager sessions, or to troubleshoot when Systems Manager is not working.

Create a bastion security group that allows inbound SSH access (TCP port 22) from your local environment.
Update the security group for the cluster to allow inbound SSH access from the bastion security group.
Create an EC2 Linux instance.
For Amazon Machine Image, choose Ubuntu Server 20.04 LTS.
For Instance type, choose t3.small.
In the Network settings section, provide the following parameters:

For VPC, choose SageMaker HyperPod VPC (which you created with the CloudFormation template).
For Subnet, choose the public subnet you created with the CloudFormation template.
For Common security groups, choose the bastion security group you created.

For Configure storage, set storage to 8 GB.
Identify the public IP address of the bastion host and the private IP address of the target instance (for example, the login node of the cluster), and add two host entries in the SSH config, by referring to the following example:

Host Bastion
HostName 11.22.33.44
User ubuntu
IdentityFile ~/keys/my-bastion-ssh-key.pem

Host MyCluster-LoginNode-with-Proxy
HostName 10.1.111.222
User user1
IdentityFile ~/keys/my-cluster-ssh-key.pem
ProxyCommand ssh -q -W %h:%p Bastion

Run the ssh command using the target host name you specified earlier, and confirm you can log in to the instance with the specified user:

$ ssh MyCluster-LoginNode-with-Proxy
:
:
____ __ ___ __ __ __ ___ __
/ __/__ ____ ____ / |/ /__ _/ /_____ ____ / // /_ _____ ___ ____/ _ ___ ___/ /
_ / _ `/ _ `/ -_) /|_/ / _ `/ ‘_/ -_) __/ / _ / // / _ / -_) __/ ___/ _ / _ /
/___/_,_/_, /__/_/ /_/_,_/_/_\__/_/ /_//_/_, / .__/__/_/ /_/ ___/_,_/
/___/ /___/_/
You’re on the controller
Instance Type: ml.m5.xlarge
user1@ip-10-1-111-222:~$

Clean up
Clean up the resources in the following order:

Delete the HyperPod cluster.
Delete the Network Load Balancer.
Delete the load balancing target group.
Delete the certificate imported to Certificate Manager.
Delete the EC2 Windows instance.
Delete the EC2 Linux instance for the bastion host.
Delete the AWS Managed Microsoft AD.
Delete the CloudFormation stack for the VPC, subnets, security group, and FSx for Lustre volume.

Conclusion
This post provided steps to create a HyperPod cluster integrated with Active Directory. This solution removes the hassle of user maintenance on large-scale clusters and allows you to manage users and groups centrally in one place.
For more information about HyperPod, check out the HyperPod workshop and the SageMaker HyperPod Developer Guide. Leave your feedback on this solution in the comments section.

About the Authors
Tomonori Shimomura is a Senior Solutions Architect on the Amazon SageMaker team, where he provides in-depth technical consultation to SageMaker customers and suggests product improvements to the product team. Before joining Amazon, he worked on the design and development of embedded software for video game consoles, and now he leverages his in-depth skills in Cloud side technology. In his free time, he enjoys playing video games, reading books, and writing software.
Giuseppe Angelo Porcelli is a Principal Machine Learning Specialist Solutions Architect for Amazon Web Services. With several years software engineering and an ML background, he works with customers of any size to understand their business and technical needs and design AI and ML solutions that make the best use of the AWS Cloud and the Amazon Machine Learning stack. He has worked on projects in different domains, including MLOps, computer vision, and NLP, involving a broad set of AWS services. In his free time, Giuseppe enjoys playing football.
Monidipa Chakraborty currently serves as a Senior Software Development Engineer at Amazon Web Services (AWS), specifically within the SageMaker HyperPod team. She is committed to assisting customers by designing and implementing robust and scalable systems that demonstrate operational excellence. Bringing nearly a decade of software development experience, Monidipa has contributed to various sectors within Amazon, including Video, Retail, Amazon Go, and AWS SageMaker.
Satish Pasumarthi is a Software Developer at Amazon Web Services. With several years of software engineering and an ML background, he loves to bridge the gap between the ML and systems and is passionate to build systems that make large scale model training possible. He has worked on projects in a variety of domains, including Machine Learning frameworks, model benchmarking, building hyperpod beta involving a broad set of AWS services. In his free time, Satish enjoys playing badminton.

The executive’s guide to generative AI for sustainability

Organizations are facing ever-increasing requirements for sustainability goals alongside environmental, social, and governance (ESG) practices. A Gartner, Inc. survey revealed that 87 percent of business leaders expect to increase their organization’s investment in sustainability over the next years. This post serves as a starting point for any executive seeking to navigate the intersection of generative artificial intelligence (generative AI) and sustainability. It provides examples of use cases and best practices for using generative AI’s potential to accelerate sustainability and ESG initiatives, as well as insights into the main operational challenges of generative AI for sustainability. This guide can be used as a roadmap for integrating generative AI effectively within sustainability strategies while ensuring alignment with organizational objectives.
A roadmap to generative AI for sustainability
In the sections that follow, we provide a roadmap for integrating generative AI into sustainability initiatives
1. Understand the potential of generative AI for sustainability
Generative AI has the power to transform every part of a business with its wide range of capabilities. These include the ability to analyze massive amounts of data, identify patterns, summarize documents, perform translations, correct errors, or answer questions. These capabilities can be used to add value throughout the entire value chain of your organization. Figure 1 illustrates selected examples of use cases of generative AI for sustainability across the value chain.

Figure 1: Examples of generative AI for sustainability use cases across the value chain
According to KPMG’s 2024 ESG Organization Survey, investment in ESG capabilities is another top priority for executives as organizations face increasing regulatory pressure to disclose information about ESG impacts, risks, and opportunities. Within this context, you can use generative AI to advance your organization’s ESG goals.
The typical ESG workflow consists of multiple phases, each presenting unique pain points. Generative AI offers solutions that can address these pain points throughout the process and contribute to sustainability efforts. Figure 2 provides examples illustrating how generative AI can support each phase of the ESG workflow within your organization. These examples include speeding up market trend analysis, ensuring accurate risk management and compliance, and facilitating data collection or report generation. Note that ESG workflows may vary across different verticals, organizational maturities, and legislative frameworks. Factors such as industry-specific regulations, company size, and regional policies can influence the ESG workflow steps. Therefore, prioritizing use cases according to your specific needs and context and defining a clear plan to measure success is essential for optimal effectiveness.

Figure 2: Mapping generative AI benefits across the ESG workflow
2. Recognize the operational challenges of generative AI for sustainability
Understanding and appropriately addressing the challenges of implementing generative AI is crucial for organizations aiming to use its potential to address the organization’s sustainability goals and ESG initiatives. These challenges include collecting and managing high-quality data, integrating generative AI into existing IT systems, navigating ethical concerns, filling skills gaps and setting the organization up for success by bringing in key stakeholders such as the chief information security officer (CISO) or chief financial officer (CFO) early so you build responsibly. Legal challenges are a huge blocker for transitioning from proof of concept (POC) to production. Therefore, it’s essential to involve legal teams early in the process to build with compliance in mind. Figure 3 provides an overview of the main operational challenges of generative AI for sustainability.

Figure 3: Operational challenges of generative AI for sustainability
3. Set the right data foundations
As a CEO aiming to use generative AI to achieve sustainability goals, remember that data is your differentiator. Companies that lack ready access to high-quality data will not be able to customize generative AI models with their own data, thus missing out on realizing the full scaling potential of generative AI and creating a competitive advantage. Invest in acquiring diverse and high-quality datasets to enrich and accelerate your ESG initiatives. You can use resources such as the Amazon Sustainability Data Initiative or the AWS Data Exchange to simplify and expedite the acquisition and analysis of comprehensive datasets. Alongside external data acquisition, prioritize internal data management to maximize the potential of generative AI and use its capabilities in analyzing your organizational data and uncovering new insights.
From an operational standpoint, you can embrace foundation model ops (FMOps) and large language model ops (LLMOps) to make sure your sustainability efforts are data-driven and scalable. This involves documenting data lineage, data versioning, automating data processing, and monitoring data management costs.
4. Identify high-impact opportunities
You can use Amazon’s working backwards principle to pinpoint opportunities within your sustainability strategy where generative AI can make a significant impact. Prioritize projects that promise immediate enhancements in key areas within your organization. While ESG remains a key aspect of sustainability, tapping into industry-specific expertise across sectors such as energy, supply chain, and manufacturing, transportation, or agriculture can uncover diverse generative AI for sustainability use cases tailored to your business’s applications. Moreover, exploring alternative avenues, such as using generative AI for improving research and development, enabling customer self-service, optimizing energy usage in buildings or slowing down deforestation, can also provide impactful opportunities for sustainable innovation.
5. Use the right tools
Failing to use the appropriate tools can add complexity, compromise security, and reduce effectiveness in using generative AI for sustainability. The right tool should offer you choice and flexibility and enable you to customize your solutions to specific needs and requirements.
Figure 4 illustrates the AWS generative AI stack as of 2023, which offers a set of capabilities that encompass choice, breadth, and depth across all layers. Moreover, it is built on a data-first approach, ensuring that every aspect of its offerings is designed with security and privacy in mind.
Examples of tools you can use to advance sustainability initiatives are:
Amazon Bedrock – a fully managed service that provides access to high-performing FMs from leading AI companies through a single API, enabling you to choose the right model for your sustainability use cases.
AWS Trainium2 – Purpose-built for high-performance training of FMs and LLMs, Trainium2 provides up to 2x better energy efficiency (performance/watt) compared to first-generation Trainium chips.
Inferentia2-based Amazon EC2 Inf2 instances – These instances offer up to 50 percent better performance/watt over comparable Amazon Elastic Compute Cloud (Amazon EC2) instances. Purpose-built to handle deep learning models at scale, Inf2 instances are indispensable for deploying ultra-large models while meeting sustainability goals through improved energy efficiency.

Figure 4: AWS generative AI stack
6. Use the right approach
Generative AI isn’t a one-size-fits-all solution. Tailoring your approach by choosing the right modality and optimization strategy is crucial for maximizing its impact on sustainability initiatives. Figure 5 offers an overview on generative AI modalities and optimization strategies, including prompt engineering, Retrieval Augmented Generation, and fine-tuning or continued pre-training.

Figure 5: Generative AI modalities
In addition, figure 6 outlines the main generative AI optimization strategies, including prompt engineering, Retrieval Augmented Generation, and fine-tuning or continued pre-training.

Figure 6: Generative AI optimization strategies
7. Simplify the development of your applications by using generative AI agents
Generative AI agents offer a unique opportunity to drive sustainability initiatives forward with their advanced capabilities of automating a wide range of routine and repetitive tasks, such as data entry, customer support inquiries, and content generation. Moreover, they can orchestrate complex, multistep workflows by breaking down tasks into smaller, manageable steps, coordinating various actions, and ensuring the efficient execution of processes within your organization. For example, you can use Agents for Amazon Bedrock to configure an agent that monitors and analyzes energy usage patterns across your operations and identifies opportunities for energy savings. Alternatively, you can create a specialized agent that monitors compliance with sustainability regulations in real time.
8. Build robust feedback mechanisms for evaluation
Take advantage of feedback insights for strategic improvements, whether adjusting generative AI models or redefining objectives to ensure agility and alignment with sustainability challenges. Consider the following guidelines:
Implement real-time monitoring – Set up monitoring systems to track generative AI performance against sustainability benchmarks, focusing on efficiency and environmental impact. Establish a metrics pipeline to provide insights into the sustainability contributions of your generative AI initiatives.
Engage stakeholders for human-in-the-loop evaluation – Rely on human-in-the-loop auditing and regularly collect feedback from internal teams, customers, and partners to gauge the impact of generative AI–driven processes on the organization’s sustainability benchmarks. This enhances transparency and promotes trust in your commitment to sustainability.
Use automated testing for continuous improvement – With tools such as RAGAS and LangSmith, you can use LLM-based evaluation to identify and correct inaccuracies or hallucinations, facilitating rapid optimization of generative AI models in line with sustainability goals.
9. Measure impact and maximize ROI from generative AI for sustainability
Establish clear key performance indicators (KPIs) that capture the environmental impact, such as carbon footprint reduction, alongside economic benefits, such as cost savings or enhanced business agility. This dual focus ensures that your investments not only contribute to programs focused on environmental sustainability but also reinforces the business case for sustainability while empowering you to drive innovation and competitive advantage in sustainable practices. Share success stories internally and externally to inspire others and demonstrate your organization’s commitment to sustainability leadership.
10. Minimize resource usage throughout the generative AI lifecycle
In some cases, generative AI itself can have a high energy cost. To achieve maximum impact, consider the trade-off between the benefits of using generative AI for sustainability initiatives and the energy efficiency of the technology itself. Make sure to gain a deep understanding of the iterative generative AI lifecycle and optimize each phase for environmental sustainability. Typically, the journey into generative AI begins with identifying specific application requirements. From there, you have the option to either train your model from scratch or use an existing one. In most cases, opting for an existing model and customizing it is preferred. Following this step and evaluating your system thoroughly is essential before deployment. Lastly, continuous monitoring enables ongoing refinement and adjustments. Throughout this lifecycle, implementing AWS Well-Architected Framework best practices is recommended. Refer to Figure 7 for an overview of the generative AI lifecycle.

Figure 7: The generative AI lifecycle
11. Manage risks and implement responsibly
While generative AI holds significant promise for working towards your organization’s sustainability goals, it also poses challenges such as toxicity and hallucinations. Striking the right balance between innovation and the responsible use of generative AI is fundamental for mitigating risks and enabling responsible AI innovation. This balance must account for the assessment of risk in terms of several factors such as quality, disclosures, or reporting. To achieve this, adopting specific tools and capabilities and working with your security team experts to adopt security best practices is necessary. Scaling generative AI in a safe and secure manner requires putting in place guardrails that are customized to your use cases and aligned with responsible AI policies.
12. Invest in educating and training your teams
Continuously upskill your team and empower them with the right skills to innovate and actively contribute to achieving your organization’s sustainability goals. Identify relevant resources for sustainability and generative AI to ensure your teams stay updated with the essential skills required in both areas.
Conclusion
In this post, we provided a guide for executives to integrate generative AI into their sustainability strategies, focusing on both sustainability and ESG goals. The adoption of generative AI in sustainability efforts is not just about technological innovation. It is about fostering a culture of responsibility, innovation, and continuous improvement. By prioritizing high-quality data, identifying impactful opportunities, and fostering stakeholders’ engagement, companies can harness the transformative power of generative AI to not only achieve but surpass their sustainability goals.
How can AWS help?
Explore the AWS Solutions Library to discover ways to build sustainability solutions on AWS.
The AWS Generative AI Innovation Center can assist you in the process with expert guidance on ideation, strategic use case identification, execution, and scaling to production.
To learn more about how Amazon is using AI to reach our climate pledge commitment of net-zero carbon by 2040, explore the 7 ways AI is helping Amazon build a more sustainable future and business.

About the Authors
Dr. Wafae Bakkali is a Data Scientist at AWS. As a generative AI expert, Wafae is driven by the mission to empower customers in solving their business challenges through the utilization of generative AI techniques, ensuring they do so with maximum efficiency and sustainability.

Dr. Mehdi Noori is a Senior Scientist at AWS Generative AI Innovation Center. With a passion for bridging technology and innovation in the sustainability field, he assists AWS customers in unlocking the potential of Generative AI, turning potential challenges into opportunities for rapid experimentation and innovation. By focusing on scalable, measurable, and impactful uses of advanced AI technologies and streamlining the path to production, he helps customers achieve their sustainability goals.

Rahul Sareen is the GM for Sustainability Solutions and GTM at AWS. Rahul has a team of high performing individuals consisting of sustainability strategists, GTM specialists and technology architects to create great business outcomes for customer’s sustainability goals (everything from carbon emission tracking, sustainable packaging and operations, circular economy to renewable energy). Rahul’s team provides technical expertise (ML, GenAI, IoT) to solve sustainability use cases

Coix: A JAX-based AI Framework Designed for Composing Probabilistic Pr …

In probabilistic programming, developers often face the challenge of efficiently composing and performing inference on intricate probabilistic programs. A recent release, Coix (COmbinators In jaX), has emerged as a flexible and backend-agnostic solution to address this. Coix offers a comprehensive set of program transformations known as inference combinators, enabling compositional inference with probabilistic programs.

One of Coix’s standout features is its support for multiple backends, including numpyro and oryx. This versatility allows developers to choose the backend that best fits their needs and seamlessly switch between them as necessary. Moreover, Coix comes equipped with a range of pre-implemented losses and utility functions, empowering users to effortlessly implement and execute various inference algorithms straight out of the box.

The framework comprises several main components, each serving a specific purpose. 

coix.api module: Implements program combinators, providing a high-level interface for composing probabilistic programs.

coix.core module: Provides basic program transformations to modify the behavior of stochastic programs, increasing their flexibility and adaptability.

coix.loss module: Offers common objectives for variational inference, simplifying the process of optimizing probabilistic models.

coix.algo module: This module includes example inference algorithms, serving as valuable resources for developers looking to explore and understand the framework’s capabilities.

With its modular architecture, Coix facilitates the seamless integration of additional backends via the coix.register_backend utility. This extensibility ensures that the framework remains adaptable to evolving requirements and preferences within the probabilistic programming community.

In conclusion, Coix represents a significant advancement in probabilistic programming. It offers a user-friendly and versatile framework for composing probabilistic programs and easily performing inference. With its rich feature set, support for multiple backends, and emphasis on flexibility and extensibility, Coix is poised to become a valuable tool for researchers and practitioners alike in probabilistic modeling and inference.
The post Coix: A JAX-based AI Framework Designed for Composing Probabilistic Programs and Performing Inference on Them appeared first on MarkTechPost.

COCONut: A High-Quality, Large-Scale Dataset for Next-Gen Segmentation …

Computer vision has advanced significantly in recent decades, thanks in large part to comprehensive benchmark datasets like COCO. However, nearly a decade after its introduction, COCO’s suitability as a benchmark for modern AI models is being questioned. Its annotations may contain biases and nuances reflecting the early stages of computer vision research. With model performance plateauing on COCO, there are concerns about overfitting to the dataset’s specific characteristics, potentially limiting real-world applicability.

To modernize COCO segmentation, researchers have proposed COCONut – a novel, large-scale universal segmentation dataset in this paper. Unlike previous attempts at creating large datasets that often compromised label accuracy for scale, COCONut features human-verified mask labels for 383K images. Imagine having to manually annotate millions of objects in images – it would take years! COCONut solves this challenge through an innovative assisted-manual annotation pipeline leveraging neural networks to augment human annotators.

The pipeline involves four key stages: machine-generated prediction, human inspection and editing, mask generation/refinement, and expert quality verification. Different neural models handle ‘thing’ (countable objects) and ‘stuff’ (amorphous regions) classes at each stage, ensuring high-quality annotations.

But how does this assisted-manual pipeline actually work? In the first stage, a bounding box detector and a mask segmenter generate initial proposals for ‘thing’ and ‘stuff’ classes, respectively. Human annotators then inspect these proposals, editing or adding new ones as needed. The refined boxes and points are fed into separate modules to generate final segmentation masks. Lastly, expert annotators verify a random sample of these masks, relabeling any that don’t meet stringent quality standards.

To scale up the dataset size while maintaining quality, the researchers built a data engine. It uses the annotated data to iteratively retrain the neural networks, generating improved proposals for the annotation pipeline. This positive feedback loop, coupled with additional images from other datasets, culminated in the COCONut-L split with 358K images and 4.75M masks.

The researchers conducted a thorough analysis comparing COCONut annotations to purely manual ones. Their expert annotators exhibited high agreement on both ‘thing’ and ‘stuff’ masks. Meanwhile, the assisted-manual pipeline significantly accelerated annotation speed, especially for ‘thing’ classes. COCONut is available in three sizes – COCONut-S (118K images), COCONut-B (242K images), and COCONut-L (358K images with 4.75M masks). Quantitative results showcase consistent improvements across various neural architectures as the training set size increases from COCONut-S to COCONut-L.

Interestingly, while larger pseudo-label datasets provided minimal gains, training on the fully human-annotated COCONut-B yielded the most significant performance boost. This underscores the importance of human-annotated data for training robust segmentation models.

COCONut represents a significant step forward in modernizing the COCO benchmark. With its meticulous human-verified annotations and a rigorously curated 25K image validation set (COCONut-val), it promises to be a more challenging testbed for evaluating contemporary segmentation models. The open-source release of COCONut paves the way for developing more capable and unbiased computer vision systems applicable to real-world scenarios.

Check out the Paper and Project. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 40k+ ML SubReddit

For Content Partnership, Please Fill Out This Form Here..
The post COCONut: A High-Quality, Large-Scale Dataset for Next-Gen Segmentation Models appeared first on MarkTechPost.

MuPT: A Series of Pre-Trained AI Models for Symbolic Music Generation …

In the ever-expanding landscape of artificial intelligence, Large Language Models (LLMs) have emerged as versatile tools, making significant strides across various domains. As they venture into multimodal realms like visual and auditory processing, their capacity to comprehend and represent complex data, from images to speech, becomes increasingly indispensable. Nevertheless, this expansion brings forth many challenges, particularly in developing efficient tokenization techniques for diverse data types, such as images, videos, and audio streams.

Among the myriad applications of LLMs, the domain of music poses unique challenges that necessitate innovative approaches. Despite achieving remarkable musical performance, these models often need to improve in capturing the structural coherence crucial for aesthetically pleasing compositions. The reliance on the Musical Instrument Digital Interface (MIDI) presents inherent limitations, hindering musical structures’ readability and faithful representation.

Addressing these challenges, a team of researchers, including M-A-P, University of Waterloo, HKUST, University of Manchester, and many others, have proposed integrating ABC notation, offering a promising alternative to overcome the constraints imposed by MIDI formats. Advocates for this approach highlight ABC notation’s inherent readability and structural coherence, underscoring its potential to enhance the fidelity of musical representations. By fine-tuning LLMs with ABC notation and leveraging techniques like instruction tuning, researchers aim to elevate the models’ musical output capabilities.

Their ongoing research extends beyond mere adaptation to proposing a standardized training approach tailored explicitly for symbolic music generation tasks. By employing transformer decoder-only architecture, suitable for both single and multi-track music generation, they aim to tackle inherent discrepancies in representing musical measures. Their proposed SMT-ABC notation facilitates a deeper understanding of each measure’s expression across multiple tracks, mitigating issues stemming from the traditional ‘next-token-prediction’ paradigm.

Furthermore, their investigation reveals that additional training epochs yield tangible benefits for the ABC Notation model, indicating a positive correlation between repeated data exposure and model performance. They introduce the SMS Law to elucidate this phenomenon, which explores how scaling up training data influences model performance, particularly concerning validation loss. Their findings provide valuable insights into optimizing training strategies for symbolic music generation models, paving the way for enhanced musical fidelity and creativity in AI-generated compositions.

Their research underscores the importance of continuous innovation and refinement in developing AI models for music generation. By delving into the nuances of symbolic music representation and training methodologies, they strive to push the boundaries of what is achievable in AI-generated music. Through ongoing exploration of novel tokenization techniques, such as ABC notation, and meticulous optimization of training processes, they aim to unlock new levels of structural coherence and expressive richness in AI-generated compositions. Ultimately, their efforts not only contribute to advancing the field of AI in music but also hold the promise of enhancing human-AI collaboration in creative endeavors, ushering in a new era of musical exploration and innovation.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 40k+ ML SubReddit

For Content Partnership, Please Fill Out This Form Here..
The post MuPT: A Series of Pre-Trained AI Models for Symbolic Music Generation that Sets the Standard for Training Open-Source Symbolic Music Foundation Models appeared first on MarkTechPost.

3 Ways to Run Llama 3 on Your PC or Mac

Running Llama 3 locally on your PC or Mac has become more accessible thanks to various tools that leverage this powerful language model’s open-source capabilities. Below are three effective methods to install and run Llama 3, each catering to different user needs and technical expertise.

1. Using Ollama

Supported Platforms: MacOS, Ubuntu, Windows (Preview)

Steps:

Download Ollama from the official site.

Image Source

To run Llama 3, use the command: ‘ollama run llama3’. This command downloads the 8B instruct model by default. You can specify a different model by adding a tag, like ‘ollama run llama3:70b-instruct’ for specific versions.

ollama run llama3
ollama run llama3:70b-text
ollama run llama3:70b-instruct #For Specific Versions

Note: If you want to integrate a chatbot UI similar to ChatGPT, further configurations are needed, possibly involving the OpenWebUI project.

2. Using LM Studio

Supported Platforms:  MacOS, Ubuntu, Windows

Features: LM Studio is built on the llama.cpp project supports various models like ggml Llama, MPT, and StarCoder on Hugging Face.

Steps:

Download LM Studio from its website.

Image Source

Install it according to the provided system requirements.

LM Studio features a built-in chat interface, enhancing user interaction.

3. Using GPT4All

Supported Platforms: MacOS, Ubuntu, Windows

Utility: GPT4All offers a versatile setup to run various open-source LLMs.

Steps: Go to GPT4All

This method typically involves more DIY setups and might require familiarity with programming environments and software dependencies.

        Image Source

ConclusionEach method provides a unique approach to running Llama 3 on your PC or Mac, catering to different levels of technical expertise and user needs. By following the outlined steps and using the provided tools, you can effectively harness Llama 3’s capabilities locally.
The post 3 Ways to Run Llama 3 on Your PC or Mac appeared first on MarkTechPost.

Formal Interaction Model (FIM): A Mathematics-based Machine Learning M …

Machine learning has become an important domain that has contributed to developing platforms and products that are data-driven, adaptive, and intelligent. The AI systems help to shape the users, and in turn, users shape these systems. A popular method, Content Recommender Systems (CRS), can interact with viewers and creators and facilitate algorithmic curation and personalization. The CRS interactions can affect downstream recommendations by shaping viewer preferences and content available on the platform. Its old design helps users to navigate songs and videos over email lists, whereas large online platforms use the modern design.

Although these AI systems are helpful, their design and evaluation do not highlight how these systems and users shape one another, and this problem can be seen in multiple learning algorithms. For example, when a large static dataset is trained using supervised learning settings, it fails to prove how the AI system transforms the environment where it operates. Besides, deploying AI systems can harm performance and society on a large scale through distribution shifts. Another problem arises from Reinforcement Learning (RL), which fails to capture key interactions and dynamics between the AI system and users. This paper resolved all these shortcomings of AI systems.

Researchers from Cornell University, the University of California, Princeton University, and the University of Texas at Austin proposed Formal Interaction Models (FIM). This mathematical model formalizes how AI and users shape one another. FIM is a coupled dynamic system between the AI system and users that enhances the AI system’s design and evaluation. It includes four major use cases: (a) it specifies interactions for implementation, (b) it monitors interactions with the help of empirical analysis, (c) it anticipates societal impacts using counterfactual analysis, and (d) it controls societal impacts through interventions. Design axes such as style, granularity, mathematical complexity, and measurability are considered carefully during the model’s design.

FIM helps to create new metrics that capture these societal impacts that lead to benefits in the design of objectives. These new metrics can be optimized through supervised learning or RL-based algorithms to control the societal effects. Few societal impacts can be evaluated directly with the help of a single parameter of FIM, but other effects may arise as complex combinations of multiple parameters. For example, one should emphasize measuring value instead of engagement during a metrics proposal. This paper discusses the optimization of downstream user welfare and ecosystem health with the help of tools from mechanism design to recommender systems design.

Researchers performed analyses, solving various limitations and mostly focusing on anticipating societal impacts and controlling the societal effects. The model designs used during analysis are fairly homogeneous within each interaction type and have a large separation between viewer and creator interactions. Moreover, dynamic models are not used because they create feedback loops due to the feedback of viewers fed into the recommender system regarding the used product from recommended content and use viewer feedback to estimate viewer utilities.

In conclusion, Researchers from four universities proposed Formal Interaction Models (FIM), a mathematical model that formalizes how AI and users shape one another. FIM is a coupled dynamical system between the AI system and users that enhances AI system design and evaluation. This paper mentions four major use cases of FIM and discusses the role of model style, granularity, mathematical complexity, and measurability. Researchers used the dynamical systems language to highlight the limitations in the use cases for future work.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 40k+ ML SubReddit

For Content Partnership, Please Fill Out This Form Here..
The post Formal Interaction Model (FIM): A Mathematics-based Machine Learning Model that Formalizes How AI and Users Shape One Another appeared first on MarkTechPost.