September 2023 - Page 8 of 8

Unveil The Secrets Of Anatomical Segmentation With HybridGNet: An AI E …

Posted on September 5, 2023 by i-genie

Recent advancements in deep neural networks have enabled new approaches to address anatomical segmentation. For instance, state-of-the-art performance in the anatomical segmentation of biomedical images has been attained by deep convolutional neural networks (CNNs). Conventional strategies adopt standard encoder-decoder CNN architectures to predict pixel-level segmentation using annotated datasets. While this approach suits scenarios where the topology isn’t preserved across individuals, such as lesion segmentation, it might not be ideal for anatomical structures with regular topology. Deep segmentation networks are often trained to minimize pixel-level loss functions, which might not ensure anatomical plausibility due to insensitivity to global shape and topology. This can result in artifacts like fragmented structures and topological inconsistencies.

To mitigate these issues, incorporating prior knowledge and shape constraints becomes crucial, especially for downstream tasks like disease diagnosis and therapy planning. Instead of dense pixel-level masks, alternatives like statistical shape models or graph-based representations offer a more natural way to include topological constraints. Graphs, in particular, provide a means to represent landmarks, contours, and surfaces, enabling the incorporation of topological correctness. Geometric deep learning has extended CNNs to non-Euclidean domains, facilitating the development of discriminative and generative models for graph data. These advancements enable accurate predictions and the generation of realistic graph structures aligned with specific distributions.

Following the mentioned considerations, the novel HybridGNet architecture has been introduced to exploit the benefits of landmark-based segmentation in standard convolutions for image feature encoding.

The architecture overview is presented in the figure below.

HybridGNet is coupled with generative models based on graph neural networks (GCNNs) to create anatomically accurate segmented structures. It processes input images through standard convolutions and generates landmark-oriented segmentations by sampling a “bottleneck latent distribution,” a compact encoded representation containing the essential information about the image. Sampling from this distribution allows the model to create diverse and plausible segmented outputs based on the encoded image features. After sampling, follows reshaping and graph domain convolutions.

Additionally, under the hypothesis that local image features may help to produce more accurate estimates of landmark positions, an Image-to-Graph Skip Connection (IGSC) module is presented. Analogous to UNet skip connections, the IGSC module, combined with graph unpooling operations, allows feature maps to flow from encoder to decoder, thus enhancing the model’s capacity to recover fine details.

Sample outcome results selected from the study are depicted in the image below. These visuals provide a comparative overview between HybridGNet and state-of-the-art approaches.

This was the summary of HybridGNet, a novel AI encoder-decoder neural architecture that leverages standard convolutions for image feature encoding and graph convolutional neural networks (GCNNs) to decode plausible representations of anatomical structures. If you are interested and want to learn more about it, please feel free to refer to the links cited below.

Check out the Paper and Github. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 29k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

The post Unveil The Secrets Of Anatomical Segmentation With HybridGNet: An AI Encoder-Decoder For Plausible Anatomical Structures Decoding appeared first on MarkTechPost.

XLang NLP Lab Researchers Propose Lemur: The State-of-the-Art Open Pre …

Posted on September 4, 2023 by i-genie

In a world increasingly driven by the intersection of language and technology, the demand for versatile and powerful language models has never been greater. Traditional large language models (LLMs) have excelled in textual comprehension or coding tasks but seldom managed to strike a harmonious balance between the two. This imbalance has left a gap in the market for models that can seamlessly navigate textual reasoning and coding proficiency. Enter Lemur and Lemur-chat, two groundbreaking contributions to the realm of open pre-trained and supervised fine-tuned LLMs that aim to bridge this gap.

Creating language models that can proficiently handle both text and code has been a long-standing challenge. Existing LLMs have typically been specialized for textual comprehension or coding tasks, but seldom both. This specialization has left developers and researchers grappling with the need to choose between models that excel in one area while falling short in the other. Consequently, a pressing need has arisen for LLMs that can offer a multifaceted skill set encompassing understanding, reasoning, planning, coding, and context grounding.

While some solutions exist in the form of traditional LLMs, their limitations have remained evident. The industry has lacked models that can truly balance the intricate demands of both textual and code-related tasks. This has created a void in the landscape of language model agents, where an integrated approach to understanding, reasoning, and coding is essential.

The Lemur project, spearheaded by XLang Lab in collaboration with Salesforce Research, seeks to address this critical gap in language model technology. Lemur and Lemur-chat represent a pioneering effort to develop open, pretrained, and supervised fine-tuned LLMs that excel in both text and code-related tasks. The cornerstone of this endeavor is the extensive pretraining of Llama 2 on a vast corpus of ~100 billion lines of code-intensive data. This pre-training phase is followed by supervised fine-tuning on ~300,000 instances of public instructional and dialog data. The result is a language model with enhanced coding and grounding abilities while retaining competitive textual reasoning and knowledge performance.

The performance metrics of Lemur and Lemur-chat are a testament to their prowess. Lemur stands out as it surpasses other open-source language models on coding benchmarks, demonstrating its coding proficiency. Simultaneously, it maintains its competitive edge in textual reasoning and knowledge-based tasks, showcasing its versatile skill set. Meanwhile, Lemur-chat significantly outperforms other open-source supervised fine-tuned models across various dimensions, indicating its exceptional capabilities in bridging the gap between text and code in conversational contexts.

The Lemur project represents a collaborative research effort with contributions from both XLang Lab and Salesforce Research, with support from generous gifts from Salesforce Research, Google Research, and Amazon AWS. While the journey towards a balanced open-source language model is ongoing, Lemur’s contributions have already begun reshaping the language model technology landscape. By offering a model that excels in both text and code-related tasks, Lemur provides a powerful tool for developers, researchers, and organizations seeking to navigate the increasingly intricate intersection of language and technology.

In conclusion, the Lemur project stands as a beacon of innovation in the world of language models. Its ability to harmoniously balance text and code-related tasks has addressed a longstanding challenge in the field. As Lemur continues to evolve and set new benchmarks, it promises to drive further research on agent models and establish a more powerful and balanced foundation for open-source language models. With Lemur, the future of language model technology is brighter and more versatile than ever before.

Check out the Github, HugginFace Page, and Reference Article. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 29k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

The post XLang NLP Lab Researchers Propose Lemur: The State-of-the-Art Open Pretrained Large Language Models Balancing Text and Code Capabilities appeared first on MarkTechPost.

Researchers from Inception, MBZUAI, and Cerebras Open-Sourced ‘Jais� …

Posted on September 4, 2023 by i-genie

Large language models like GPT-3 and their impact on various aspects of society are a subject of significant interest and debate. Large language models have significantly advanced the field of NLP. They have improved the accuracy of various language-related tasks, including translation, sentiment analysis, summarization, and question-answering. Chatbots and virtual assistants powered by large language models are becoming more sophisticated and capable of handling complex conversations. They are used in customer support, online chat services, and even companionship for some users.

Building Arabic Large Language Models (LLMs) presents unique challenges due to the characteristics of the Arabic language and the diversity of its dialects. Similar to large language models in other languages, Arabic LLMs may inherit biases from the training data. Addressing these biases and ensuring the responsible use of AI in Arabic contexts is an ongoing concern.

Researchers at Inception, Cerebras, and Mohamed bin Zayed University of Artificial Intelligence ( UAE ) introduced Jais and Jais-chat, a new Arabic language-based Large Language Model. Their model is based on the GPT-3 generative pretraining architecture and uses only 13B parameters.

Their primary challenge was to obtain high-quality Arabic data for training the model. Compared to English data, which has corpora of up to two trillion tokens, they were readily available, but the Arabic corpora were significantly smaller. Corpora are large, structured collections of texts used in linguistics, natural language processing (NLP), and text analysis for research and language model training. Corpora serve as valuable resources for studying language patterns, semantics, grammar, and more.

They trained bilingual models to resolve this by augmenting the limited Arabic pretraining data with abundant English pretraining data. They pretrained Jais on 395 billion tokens, including 72 billion Arabic and 232 billion English tokens. They developed a specialized Arabic text processing pipeline that includes thorough data filtering and cleaning to produce high-quality Arabic data.

They say that their model’s pretrained and fine-tuned capabilities outperform all known open-source Arabic models and are comparable to state-of-the-art open-source English models that were trained on larger datasets. Considering the inherent safety concerns of LLMs, they further fine-tune it with safety-oriented instructions. They added extra guardrails in the form of safety prompts, keyword-based filtering, and external classifiers.

They say that Jais represents an important evolution and expansion of the NLP and AI landscape in the Middle East. It advances the Arabic language understanding and generation, empowering local players with sovereign and private deployment options and nurturing a vibrant ecosystem of applications and innovation; this work supports a broader strategic initiative of digital and AI transformation to usher in an open, more linguistically inclusive, and culturally-aware era.

Check out the Paper and Reference Article. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 29k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

The post Researchers from Inception, MBZUAI, and Cerebras Open-Sourced ‘Jais’: The World’s Most Advanced Arabic Large Language Model appeared first on MarkTechPost.

Meet AnomalyGPT: A Novel IAD Approach Based on Large Vision-Language M …

Posted on September 3, 2023 by i-genie

On various Natural Language Processing (NLP) tasks, Large Language Models (LLMs) such as GPT-3.5 and LLaMA have displayed outstanding performance. The capacity of LLMs to interpret visual information has more recently been expanded by cutting-edge techniques like MiniGPT-4, BLIP-2, and PandaGPT by aligning visual aspects with text features, ushering in a huge shift in the field of artificial general intelligence (AGI). The potential of LVLMs in IAD tasks is constrained even though they have been pre-trained on large amounts of data obtained from the Internet. Additionally, their domain-specific knowledge is only moderately developed, and they need more sensitivity to local features inside objects. The IAD assignment tries to find and pinpoint abnormalities in photographs of industrial products.

Models must be trained only on normal samples to identify anomalous samples that depart from normal samples since real-world examples are uncommon and unpredictable. Most current IAD systems only offer anomaly scores for test samples and ask for manually defining criteria to tell apart normal from anomalous instances for each class of objects, making them unsuitable for actual production settings. Researchers from Chinese Academy of Sciences, University of Chinese Academy of Sciences, Objecteye Inc., and Wuhan AI Research present AnomalyGPT, a unique IAD methodology based on LVLM, as shown in Figure 1, as neither existing IAD approaches nor LVLMs can adequately handle the IAD problem. Without requiring manual threshold adjustments, AnomalyGPT can identify anomalies and their locations.

Figure 1 shows a comparison of our AnomalyGPT with existing IAD techniques and LVLMs.

Additionally, their approach may offer picture information and promote interactive interaction, allowing users to pose follow-up queries depending on their requirements and responses. With just a few normal samples, AnomalyGPT can also learn in context, allowing for quick adjustment to new objects. They optimize the LVLM using synthesized anomalous visual-textual data and incorporating IAD expertise. Direct training using IAD data, however, needs to be improved. Data scarcity is the first. Pre-trained on 160k photos with associated multi-turn conversations, including techniques like LLaVA and PandaGPT. However, the small sample sizes of the IAD datasets currently available make direct fine-tuning vulnerable to overfitting and catastrophic forgetting.

To fix this, they fine-tune the LVLM using prompt embeddings rather than parameter fine-tuning. After picture inputs, more prompt embeddings are inserted, adding additional IAD information to the LVLM. The second difficulty has to do with fine-grained semantics. They suggest a simple, visual-textual feature-matching-based decoder to get pixel-level anomaly localization findings. The decoder’s outputs are made available to the LVLM and the original test pictures through prompt embeddings. This enables the LVLM to use both the raw image and the decoder’s outputs to identify anomalies, increasing the precision of its judgments. On the MVTec-AD and VisA databases, they undertake comprehensive experiments.

They attain an accuracy of 93.3%, an image-level AUC of 97.4%, and a pixel-level AUC of 93.1% with unsupervised training on the MVTec-AD dataset. They attain an accuracy of 77.4%, an image-level AUC of 87.4%, and a pixel-level AUC of 96.2% when one shot is transferred to the VisA dataset. On the other hand, one-shot transfer to the MVTec-AD dataset following unsupervised training on the VisA dataset produced an accuracy of 86.1%, an image-level AUC of 94.1%, and a pixel-level AUC of 95.3%.

The following is a summary of their contributions:

• They present the innovative use of LVLM for handling IAD duty. Their approach facilitates multi-round discussions and detects and localizes anomalies without manually adjusting thresholds. Their work’s lightweight, visual-textual feature-matching-based decoder addresses the limitation of the LLM’s weaker discernment of fine-grained semantics. It alleviates the constraint of LLM’s restricted ability to generate text outputs. To their knowledge, they are the first to apply LVLM to industrial anomaly detection successfully.

• To preserve the LVLM’s intrinsic capabilities and enable multi-turn conversations, they train their model concurrently with the data used during LVLM pre-training and use prompt embeddings for fine-tuning.

• Their approach maintains strong transferability and can do in-context few-shot learning on new datasets, producing excellent results.

Check out the Paper and Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 29k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

The post Meet AnomalyGPT: A Novel IAD Approach Based on Large Vision-Language Models (LVLM) to Detect Industrial Anomalies appeared first on MarkTechPost.

This AI Research Paper Presents a Comprehensive Survey of Deep Learnin …

Posted on September 3, 2023 by i-genie

If I ask you, “Where are you now?’” or “What do your surroundings look like?” you will immediately be able to answer owing to a unique ability known as multisensory perception in humans that allows you to perceive your motion and your surrounding environment ensuring you have complete spatial awareness. But think as if the same question is posed to a robot: how would it approach the challenge?

The issue is if this robot does not have a map, it cannot know where it is, and if it does not know what its surroundings look like, neither can it create a map. Essentially, making this a ‘who came first, chicken or egg?’ problem which in the machine learning world in this context is termed as a localization and mapping problem.

“Localization” is the capability to acquire internal system information related to a robot’s motion, including its position, orientation, and speed. On the other hand, “mapping” pertains to the ability to perceive external environmental conditions, encompassing aspects such as the shape of the surroundings, their visual characteristics, and semantic attributes. These functions can operate independently, with one focused on internal states and the other on external conditions, or they can work together as a single system known as Simultaneous Localization and Mapping (SLAM).

The existing challenges with algorithms such as image-based relocalization, visual odometry, and SLAM include imperfect sensor measurements, dynamic scenes, adverse lighting conditions, and real-world constraints that somewhat hinder their practical implementation. The image above demonstrates how individual modules can be integrated into a deep learning-based SLAM system. This piece of research presents a comprehensive survey on how deep learning-based approaches and traditional approaches and simultaneously answers two essential questions:

Is deep learning promising for visual localization and mapping?

Researchers believe three properties listed below could make deep learning a unique direction for a general-purpose SLAM system in the future.

First, deep learning offers powerful perception tools that can be integrated into the visual SLAM front end to extract features in challenging areas for odometry estimation or relocalization and provide dense depth for mapping.

Second, deep learning empowers robots with advanced comprehension and interaction capabilities. Neural networks excel at bridging abstract concepts with human-understandable terms, like labeling scene semantics within a mapping or SLAM systems, which are typically challenging to describe using formal mathematical methods.

Finally, learning methods allow SLAM systems or individual localization/mapping algorithms to learn from experience and actively exploit new information for self-learning.

How can deep learning be applied to solve the problem of visual localization and mapping?

Deep learning is a versatile tool for modeling various aspects of SLAM and individual localization/mapping algorithms. For instance, it can be employed to create end-to-end neural network models that directly estimate pose from images. It is particularly beneficial in handling challenging conditions like featureless areas, dynamic lighting, and motion blur, where conventional modeling methods may struggle.

Deep learning is used to solve association problems in SLAM. It aids in relocalization, semantic mapping, and loop-closure detection by connecting images to maps, labeling pixels semantically, and recognizing relevant scenes from previous visits.

Deep learning is leveraged to discover features relevant to the task of interest automatically. By exploiting prior knowledge, e.g., the geometry constraints, a self-learning framework can automatically be set up for SLAM to update parameters based on input images.

It may be pointed out that deep learning techniques rely on large, accurately labeled datasets to extract meaningful patterns but may have difficulty generalizing to unfamiliar environments. These models lack interpretability, often functioning as black boxes. Additionally, localization and mapping systems can be computationally intensive while highly parallelizable unless model compression techniques are applied.

Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 29k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

The post This AI Research Paper Presents a Comprehensive Survey of Deep Learning for Visual Localization and Mapping appeared first on MarkTechPost.

Unlocking the Power of Diversity in Neural Networks: How Adaptive Neur …

Posted on September 3, 2023 by i-genie

A neural network is a method in artificial intelligence that teaches computers to process data in a way inspired by the human brain. It uses interconnected nodes or neurons in a layered structure that resembles the human brain. Artificial neurons are arranged into layers to form neural networks, which are used for various tasks such as pattern recognition, classification, regression, and more. These neurons form solid connections by altering numerical weights and biases throughout training sessions.

Despite the advancements of these neural networks, they have a limitation. They are made up of a large number of neurons of similar types. The number and strength of connections between those identical neurons can change till the network learns. However, once the network is optimized, these fixed connections define its architecture and functioning, which cannot be changed.

Consequently, the researchers have developed a method that can enhance the abilities of artificial intelligence. It allows artificial intelligence to look inward at its structure and fine-tune its neural network. Studies have shown that diversifying the activation functions can overcome limitations and enable the model to work efficiently.

They tested AI on diversity. William Ditto, professor of physics at North Carolina State University and director of NC State’s Nonlinear Artificial Intelligence Laboratory (NAIL), said that they have created a test system with a non-human intelligence, an artificial intelligence(AI), to see if the AI would choose diversity over the lack of diversity and if its choice would improve the performance of the AI. Further, he said that the key was allowing the AI to look inward and learn how it learns.

Neural networks that allow neurons to learn their activation functions autonomously tend to exhibit rapid diversification and perform better than their homogeneous counterparts in tasks such as image classification and nonlinear regression. On the other hand, Ditto’s team granted their AI the ability to autonomously determine the count, configuration, and connection strengths among neurons in its neural network. This approach allowed the creation of sub-networks composed of various neuron types and connection strengths within the network as it learned.

Ditto said that they gave AI the ability to look inward and decide whether it needed to modify the composition of its neural network. Essentially, they gave it the control knob for its brain. So, it can solve the problem, look at the result, and change the type and mixture of artificial neurons until it finds the most advantageous one. He called it meta-learning for AI. Their AI could also decide between diverse or homogenous neurons. He further said that they found that the AI chose diversity in every instance to strengthen its performance.

The researchers tested the system on a standard numerical classifying task and found that the system’s accuracy increased with the increase in neurons and diversity. The researchers said the homogeneous AI achieved an accuracy rate of 57% in number identification, whereas the meta-learning, diverse AI achieved an impressive 70% accuracy.

The researchers said that in the future, they might focus on improving the performance by optimizing learned diversity by adjusting hyperparameters. Additionally, they will apply the acquired diversity to a broader spectrum of regression and classification tasks, diversify the neural networks, and evaluate their robustness and performance across various scenarios.

If you like our work, you will love our newsletter..

The post Unlocking the Power of Diversity in Neural Networks: How Adaptive Neurons Outperform Homogeneity in Image Classification and Nonlinear Regression appeared first on MarkTechPost.

Elevating the generative AI experience: Introducing streaming support …

Posted on September 2, 2023 by i-genie

We’re excited to announce the availability of response streaming through Amazon SageMaker real-time inference. Now you can continuously stream inference responses back to the client when using SageMaker real-time inference to help you build interactive experiences for generative AI applications such as chatbots, virtual assistants, and music generators. With this new feature, you can start streaming the responses immediately when they’re available instead of waiting for the entire response to be generated. This lowers the time-to-first-byte for your generative AI applications.
In this post, we’ll show how to build a streaming web application using SageMaker real-time endpoints with the new response streaming feature for an interactive chat use case. We use Streamlit for the sample demo application UI.
Solution overview
To get responses streamed back from SageMaker, you can use our new InvokeEndpointWithResponseStream API. It helps enhance customer satisfaction by delivering a faster time-to-first-response-byte. This reduction in customer-perceived latency is particularly crucial for applications built with generative AI models, where immediate processing is valued over waiting for the entire payload. Moreover, it introduces a sticky session that will enable continuity in interactions, benefiting use cases such as chatbots, to create more natural and efficient user experiences.
The implementation of response streaming in SageMaker real-time endpoints is achieved through HTTP 1.1 chunked encoding, which is a mechanism for sending multiple responses. This is a HTTP standard that supports binary content and is supported by most client/server frameworks. HTTP chunked encoding supports both text and image data streaming, which means the models hosted on SageMaker endpoints can send back streamed responses as text or image, such as Falcon, Llama 2, and Stable Diffusion models. In terms of security, both the input and output are secured using TLS using AWS Sigv4 Auth. Other streaming techniques like Server-Sent Events (SSE) are also implemented using the same HTTP chunked encoding mechanism. To take advantage of the new streaming API, you need to make sure the model container returns the streamed response as chunked encoded data.
The following diagram illustrates the high-level architecture for response streaming with a SageMaker inference endpoint.

One of the use cases that will benefit from streaming response is generative AI model-powered chatbots. Traditionally, users send a query and wait for the entire response to be generated before receiving an answer. This could take precious seconds or even longer, which can potentially degrade the performance of the application. With response streaming, the chatbot can begin sending back partial inference results as they are generated. This means that users can see the initial response almost instantaneously, even as the AI continues refining its answer in the background. This creates a seamless and engaging conversation flow, where users feel like they’re chatting with an AI that understands and responds in real time.
In this post, we showcase two container options to create a SageMaker endpoint with response streaming: using an AWS Large Model Inference (LMI) and Hugging Face Text Generation Inference (TGI) container. In the following sections, we walk you through the detailed implementation steps to deploy and test the Falcon-7B-Instruct model using both LMI and TGI containers on SageMaker. We chose Falcon 7B as an example, but any model can take advantage of this new streaming feature.
Prerequisites
You need an AWS account with an AWS Identity and Access Management (IAM) role with permissions to manage resources created as part of the solution. For details, refer to Creating an AWS account. If this is your first time working with Amazon SageMaker Studio, you first need to create a SageMaker domain. Additionally, you may need to request a service quota increase for the corresponding SageMaker hosting instances. For the Falcon-7B-Instruct model, we use an ml.g5.2xlarge SageMaker hosting instance. For hosting a Falcon-40B-Instruct model, we use an ml.g5.48xlarge SageMaker hosting instance. You can request a quota increase from the Service Quotas UI. For more information, refer to Requesting a quota increase.
Option 1: Deploy a real-time streaming endpoint using an LMI container
The LMI container is one of the Deep Learning Containers for large model inference hosted by SageMaker to facilitate hosting large language models (LLMs) on AWS infrastructure for low-latency inference use cases. The LMI container uses Deep Java Library (DJL) Serving, which is an open-source, high-level, engine-agnostic Java framework for deep learning. With these containers, you can use corresponding open-source libraries such as DeepSpeed, Accelerate, Transformers-neuronx, and FasterTransformer to partition model parameters using model parallelism techniques to use the memory of multiple GPUs or accelerators for inference. For more details on the benefits using the LMI container to deploy large models on SageMaker, refer to Deploy large models at high performance using FasterTransformer on Amazon SageMaker and Deploy large models on Amazon SageMaker using DJLServing and DeepSpeed model parallel inference. You can also find more examples of hosting open-source LLMs on SageMaker using the LMI containers in this GitHub repo.
For the LMI container, we expect the following artifacts to help set up the model for inference:

serving.properties (required) – Defines the model server settings
model.py (optional) – A Python file to define the core inference logic
requirements.txt (optional) – Any additional pip wheel will need to install

LMI containers can be used to host models without providing your own inference code. This is extremely useful when there is no custom preprocessing of the input data or postprocessing of the model’s predictions. We use the following configuration:

For this example, we host the Falcon-7B-Instruct model. We need to create a serving.properties configuration file with our desired hosting options and package it up into a tar.gz artifact. Response streaming can be enabled in DJL Serving by setting the enable_streaming option in the serving.properties file. For all the supported parameters, refer to Streaming Python configuration.
In this example, we use the default handlers in DJL Serving to stream responses, so we only care about sending requests and parsing the output response. You can also provide an entrypoint code with a custom handler in a model.py file to customize input and output handlers. For more details on the custom handler, refer to Custom model.py handler.
Because we’re hosting the Falcon-7B-Instruct model on a single GPU instance (ml.g5.2xlarge), we set option.tensor_parallel_degree to 1. If you plan to run in multiple GPUs, use this to set the number of GPUs per worker.
We use option.output_formatter to control the output content type. The default output content type is application/json, so if your application requires a different output, you can overwrite this value. For more information on the available options, refer to Configurations and settings and All DJL configuration options.

%%writefile serving.properties
engine=MPI
option.model_id=tiiuae/falcon-7b-instruct
option.trust_remote_code=true
option.tensor_parallel_degree=1
option.max_rolling_batch_size=32
option.rolling_batch=auto
option.output_formatter=jsonlines
option.paged_attention=false
option.enable_streaming=true

To create the SageMaker model, retrieve the container image URI:

image_uri = image_uris.retrieve(
framework=”djl-deepspeed”,
region=sess.boto_session.region_name,
version=”0.23.0″
)

Use the SageMaker Python SDK to create the SageMaker model and deploy it to a SageMaker real-time endpoint using the deploy method:

instance_type = “ml.g5.2xlarge”
endpoint_name = sagemaker.utils.name_from_base(“lmi-model-falcon-7b”)

model = Model(sagemaker_session=sess,
image_uri=image_uri,
model_data=code_artifact,
role=role)

model.deploy(
initial_instance_count=1,
instance_type=instance_type,
endpoint_name=endpoint_name,
container_startup_health_check_timeout=900
)

When the endpoint is in service, you can use the InvokeEndpointWithResponseStream API call to invoke the model. This API allows the model to respond as a stream of parts of the full response payload. This enables models to respond with responses of larger size and enables faster-time-to-first-byte for models where there is a significant difference between the generation of the first and last byte of the response.
The response content type shown in x-amzn-sagemaker-content-type for the LMI container is application/jsonlines as specified in the model properties configuration. Because it’s part of the common data formats supported for inference, we can use the default deserializer provided by the SageMaker Python SDK to deserialize the JSON lines data. We create a helper LineIterator class to parse the response stream received from the inference request:

class LineIterator:
“””
A helper class for parsing the byte stream input.

The output of the model will be in the following format:
“`
b'{“outputs”: [” a”]}n’
b'{“outputs”: [” challenging”]}n’
b'{“outputs”: [” problem”]}n’
…
“`

While usually each PayloadPart event from the event stream will contain a byte array
with a full json, this is not guaranteed and some of the json objects may be split across
PayloadPart events. For example:
“`
{‘PayloadPart’: {‘Bytes’: b'{“outputs”: ‘}}
{‘PayloadPart’: {‘Bytes’: b'[” problem”]}n’}}
“`

This class accounts for this by concatenating bytes written via the ‘write’ function
and then exposing a method which will return lines (ending with a ‘n’ character) within
the buffer via the ‘scan_lines’ function. It maintains the position of the last read
position to ensure that previous bytes are not exposed again.
“””

def __init__(self, stream):
self.byte_iterator = iter(stream)
self.buffer = io.BytesIO()
self.read_pos = 0

def __iter__(self):
return self

def __next__(self):
while True:
self.buffer.seek(self.read_pos)
line = self.buffer.readline()
if line and line[-1] == ord(‘n’):
self.read_pos += len(line)
return line[:-1]
try:
chunk = next(self.byte_iterator)
except StopIteration:
if self.read_pos < self.buffer.getbuffer().nbytes:
continue
raise
if ‘PayloadPart’ not in chunk:
print(‘Unknown event type:’ + chunk)
continue
self.buffer.seek(0, io.SEEK_END)
self.buffer.write(chunk[‘PayloadPart’][‘Bytes’])

With the class in the preceding code, each time a response is streamed, it will return a binary string (for example, b'{“outputs”: [” a”]}n’) that can be deserialized again into a Python dictionary using JSON package. We can use the following code to iterate through each streamed line of text and return text response:

body = {“inputs”: “what is life”, “parameters”: {“max_new_tokens”:400}}
resp = smr.invoke_endpoint_with_response_stream(EndpointName=endpoint_name, Body=json.dumps(body), ContentType=”application/json”)
event_stream = resp[‘Body’]

for line in LineIterator(event_stream):
resp = json.loads(line)
print(resp.get(“outputs”)[0], end=”)

The following screenshot shows what it would look like if you invoked the model through the SageMaker notebook using an LMI container.

Option 2: Implement a chatbot using a Hugging Face TGI container
In the previous section, you saw how to deploy the Falcon-7B-Instruct model using an LMI container. In this section, we show how to do the same using a Hugging Face Text Generation Inference (TGI) container on SageMaker. TGI is an open source, purpose-built solution for deploying LLMs. It incorporates optimizations including tensor parallelism for faster multi-GPU inference, dynamic batching to boost overall throughput, and optimized transformers code using flash-attention for popular model architectures including BLOOM, T5, GPT-NeoX, StarCoder, and LLaMa.
TGI deep learning containers support token streaming using Server-Sent Events (SSE). With token streaming, the server can start answering after the first prefill pass directly, without waiting for all the generation to be done. For extremely long queries, this means clients can start to see something happening orders of magnitude before the work is done. The following diagram shows a high-level end-to-end request/response workflow for hosting LLMs on a SageMaker endpoint using the TGI container.

To deploy the Falcon-7B-Instruct model on a SageMaker endpoint, we use the HuggingFaceModel class from the SageMaker Python SDK. We start by setting our parameters as follows:

hf_model_id = “tiiuae/falcon-7b-instruct” # model id from huggingface.co/models
number_of_gpus = 1 # number of gpus to use for inference and tensor parallelism
health_check_timeout = 300 # Increase the timeout for the health check to 5 minutes for downloading the model
instance_type = “ml.g5.2xlarge” # instance type to use for deployment

Compared to deploying regular Hugging Face models, we first need to retrieve the container URI and provide it to our HuggingFaceModel model class with image_uri pointing to the image. To retrieve the new Hugging Face LLM DLC in SageMaker, we can use the get_huggingface_llm_image_uri method provided by the SageMaker SDK. This method allows us to retrieve the URI for the desired Hugging Face LLM DLC based on the specified backend, session, Region, and version. For more details on the available versions, refer to HuggingFace Text Generation Inference Containers.

llm_image = get_huggingface_llm_image_uri(
“huggingface”,
version=”0.9.3″
)

We then create the HuggingFaceModel and deploy it to SageMaker using the deploy method:

endpoint_name = sagemaker.utils.name_from_base(“tgi-model-falcon-7b”)
llm_model = HuggingFaceModel(
role=role,
image_uri=llm_image,
env={
‘HF_MODEL_ID’: hf_model_id,
# ‘HF_MODEL_QUANTIZE’: “bitsandbytes”, # comment in to quantize
‘SM_NUM_GPUS’: str(number_of_gpus),
‘MAX_INPUT_LENGTH’: “1900”, # Max length of input text
‘MAX_TOTAL_TOKENS’: “2048”, # Max length of the generation (including input text)
}
)

llm = llm_model.deploy(
initial_instance_count=1,
instance_type=instance_type,
container_startup_health_check_timeout=health_check_timeout,
endpoint_name=endpoint_name,
)

The main difference compared to the LMI container is that you enable response streaming when you invoke the endpoint by supplying stream=true as part of the invocation request payload. The following code is an example of the payload used to invoke the TGI container with streaming:

body = {
“inputs”:”tell me one sentence”,
“parameters”:{
“max_new_tokens”:400,
“return_full_text”: False
},
“stream”: True
}

Then you can invoke the endpoint and receive a streamed response using the following command:

from sagemaker.base_deserializers import StreamDeserializer

llm.deserializer=StreamDeserializer()
resp = smr.invoke_endpoint_with_response_stream(EndpointName=llm.endpoint_name, Body=json.dumps(body), ContentType=’application/json’)

The response content type shown in x-amzn-sagemaker-content-type for the TGI container is text/event-stream. We use StreamDeserializer to deserialize the response into the EventStream class and parse the response body using the same LineIterator class as that used in the LMI container section.
Note that the streamed response from the TGI containers will return a binary string (for example, b`data:{“token”: {“text”: ” sometext”}}`), which can be deserialized again into a Python dictionary using a JSON package. We can use the following code to iterate through each streamed line of text and return a text response:

event_stream = resp[‘Body’]
start_json = b'{‘
for line in LineIterator(event_stream):
if line != b” and start_json in line:
data = json.loads(line[line.find(start_json):].decode(‘utf-8’))
if data[‘token’][‘text’] != stop_token:
print(data[‘token’][‘text’],end=”)

The following screenshot shows what it would look like if you invoked the model through the SageMaker notebook using a TGI container.

Run the chatbot app on SageMaker Studio
In this use case, we build a dynamic chatbot on SageMaker Studio using Streamlit, which invokes the Falcon-7B-Instruct model hosted on a SageMaker real-time endpoint to provide streaming responses. First, you can test that the streaming responses work in the notebook as shown in the previous section. Then, you can set up the Streamlit application in the SageMaker Studio JupyterServer terminal and access the chatbot UI from your browser by completing the following steps:

Open a system terminal in SageMaker Studio.
On the top menu of the SageMaker Studio console, choose File, then New, then Terminal.
Install the required Python packages that are specified in the requirements.txt file:

$ pip install -r requirements.txt

Set up the environment variable with the endpoint name deployed in your account:

$ export endpoint_name=<Falcon-7B-instruct endpoint name deployed in your account>

Launch the Streamlit app from the streamlit_chatbot_<LMI or TGI>.py file, which will automatically update the endpoint names in the script based on the environment variable that was set up earlier:

$ streamlit run streamlit_chatbot_LMI.py –server.port 6006

To access the Streamlit UI, copy your SageMaker Studio URL to another tab in your browser and replace lab? with proxy/[PORT NUMBER]/. Because we specified the server port to 6006, the URL should look as follows:

https://<domain ID>.studio.<region>.sagemaker.aws/jupyter/default/proxy/6006/

Replace the domain ID and Region in the preceding URL with your account and Region to access the chatbot UI. You can find some suggested prompts in the left pane to get started.
The following demo shows how response streaming revolutionizes the user experience. It can make interactions feel fluid and responsive, ultimately enhancing user satisfaction and engagement. Refer to the GitHub repo for more details of the chatbot implementation.

Clean up
When you’re done testing the models, as a best practice, delete the endpoint to save costs if the endpoint is no longer required:

# – Delete the end point
sm_client.delete_endpoint(EndpointName=endpoint_name)

Conclusion
In this post, we provided an overview of building applications with generative AI, the challenges, and how SageMaker real-time response streaming helps you address these challenges. We showcased how to build a chatbot application to deploy the Falcon-7B-Instruct model to use response streaming using both SageMaker LMI and HuggingFace TGI containers using an example available on GitHub.
Start building your own cutting-edge streaming applications with LLMs and SageMaker today! Reach out to us for expert guidance and unlock the potential of large model streaming for your projects.

About the Authors
Raghu Ramesha is a Senior ML Solutions Architect with the Amazon SageMaker Service team. He focuses on helping customers build, deploy, and migrate ML production workloads to SageMaker at scale. He specializes in machine learning, AI, and computer vision domains, and holds a master’s degree in Computer Science from UT Dallas. In his free time, he enjoys traveling and photography.
Abhi Shivaditya is a Senior Solutions Architect at AWS, working with strategic global enterprise organizations to facilitate the adoption of AWS services in areas such as artificial intelligence, distributed computing, networking, and storage. His expertise lies in deep learning in the domains of natural language processing (NLP) and computer vision. Abhi assists customers in deploying high-performance machine learning models efficiently within the AWS ecosystem.
Alan Tan is a Senior Product Manager with SageMaker, leading efforts on large model inference. He’s passionate about applying machine learning to the area of analytics. Outside of work, he enjoys the outdoors.
Melanie Li, PhD, is a Senior AI/ML Specialist TAM at AWS based in Sydney, Australia. She helps enterprise customers build solutions using state-of-the-art AI/ML tools on AWS and provides guidance on architecting and implementing ML solutions with best practices. In her spare time, she loves to explore nature and spend time with family and friends.
Sam Edwards, is a Cloud Engineer (AI/ML) at AWS Sydney specialized in machine learning and Amazon SageMaker. He is passionate about helping customers solve issues related to machine learning workflows and creating new solutions for them. Outside of work, he enjoys playing racquet sports and traveling.
James Sanders is a Senior Software Engineer at Amazon Web Services. He works on the real-time inference platform for Amazon SageMaker.

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

Posted on September 2, 2023 by i-genie

Nowadays, the majority of our customers is excited about large language models (LLMs) and thinking how generative AI could transform their business. However, bringing such solutions and models to the business-as-usual operations is not an easy task. In this post, we discuss how to operationalize generative AI applications using MLOps principles leading to foundation model operations (FMOps). Furthermore, we deep dive on the most common generative AI use case of text-to-text applications and LLM operations (LLMOps), a subset of FMOps. The following figure illustrates the topics we discuss.

Specifically, we briefly introduce MLOps principles and focus on the main differentiators compared to FMOps and LLMOps regarding processes, people, model selection and evaluation, data privacy, and model deployment. This applies to customers that use them out of the box, create foundation models from scratch, or fine-tune them. Our approach applies to both open-source and proprietary models equally.
ML operationalization summary
As defined in the post MLOps foundation roadmap for enterprises with Amazon SageMaker, ML and operations (MLOps) is the combination of people, processes, and technology to productionize machine learning (ML) solutions efficiently. To achieve this, a combination of teams and personas need to collaborate, as illustrated in the following figure.

These teams are as follows:

Advanced analytics team (data lake and data mesh) – Data engineers are responsible for preparing and ingesting data from multiple sources, building ETL (extract, transform, and load) pipelines to curate and catalog the data, and prepare the necessary historical data for the ML use cases. These data owners are focused on providing access to their data to multiple business units or teams.
Data science team – Data scientists need to focus on creating the best model based on predefined key performance indicators (KPIs) working in notebooks. After the completion of the research phase, the data scientists need to collaborate with ML engineers to create automations for building (ML pipelines) and deploying models into production using CI/CD pipelines.
Business team – A product owner is responsible for defining the business case, requirements, and KPIs to be used to evaluate model performance. The ML consumers are other business stakeholders who use the inference results (predictions) to drive decisions.
Platform team – Architects are responsible for the overall cloud architecture of the business and how all the different services are connected together. Security SMEs review the architecture based on business security policies and needs. MLOps engineers are responsible for providing a secure environment for data scientists and ML engineers to productionize the ML use cases. Specifically, they are responsible for standardizing CI/CD pipelines, user and service roles and container creation, model consumption, testing, and deployment methodology based on business and security requirements.
Risk and compliance team – For more restrictive environments, auditors are responsible for assessing the data, code, and model artifacts and making sure that the business is compliant with regulations, such as data privacy.

Note that multiple personas can be covered by the same person depending on the scaling and MLOps maturity of the business.
These personas need dedicated environments to perform the different processes, as illustrated in the following figure.

The environments are as follows:

Platform administration – The platform administration environment is the place where the platform team has access to create AWS accounts and link the right users and data
Data – The data layer, often known as the data lake or data mesh, is the environment that data engineers or owners and business stakeholders use to prepare, interact, and visualize with the data
Experimentation – The data scientists use a sandbox or experimentation environment to test new libraries and ML techniques to prove that their proof of concept can solve business problems
Model build, model test, model deployment – The model build, test, and deployment environment is the layer of MLOps, where data scientists and ML engineers collaborate to automate and move the research to production
ML governance – The last piece of the puzzle is the ML governance environment, where all the model and code artifacts are stored, reviewed, and audited by the corresponding personas

The following diagram illustrates the reference architecture, which has already been discussed in MLOps foundation roadmap for enterprises with Amazon SageMaker.

Each business unit has each own set of development (automated model training and building), preproduction (automatic testing), and production (model deployment and serving) accounts to productionize ML use cases, which retrieve data from a centralized or decentralized data lake or data mesh, respectively. All the produced models and code automation are stored in a centralized tooling account using the capability of a model registry. The infrastructure code for all these accounts is versioned in a shared service account (advanced analytics governance account) that the platform team can abstract, templatize, maintain, and reuse for the onboarding to the MLOps platform of every new team.
Generative AI definitions and differences to MLOps
In classic ML, the preceding combination of people, processes, and technology can help you productize your ML use cases. However, in generative AI, the nature of the use cases requires either an extension of those capabilities or new capabilities. One of these new notions is the foundation model (FM). They are called as such because they can be used to create a wide range of other AI models, as illustrated in the following figure.

FM have been trained based on terabytes of data and have hundreds of billions of parameters to be able to predict the next best answer based on three main categories of generative AI use cases:

Text-to-text – The FMs (LLMs) have been trained based on unlabeled data (such as free text) and are able to predict the next best word or sequence of words (paragraphs or long essays). Main use cases are around human-like chatbots, summarization, or other content creation such as programming code.
Text-to-image – Labeled data, such as pairs of <text, image>, has been used to train FMs, which are able to predict the best combination of pixels. Example use cases are clothing design generation or imaginary personalized images.
Text-to-audio or video – Both labeled and unlabeled data can be used for FM training. One main generative AI use case example is music composition.

To productionize those generative AI use cases, we need to borrow and extend the MLOps domain to include the following:

FM operations (FMOps) – This can productionize generative AI solutions, including any use case type
LLM operations (LLMOps) – This is a subset of FMOps focusing on productionizing LLM-based solutions, such as text-to-text

The following figure illustrates the overlap of these use cases.

Compared to classic ML and MLOps, FMOps and LLMOps defer based on four main categories that we cover in the following sections: people and process, selection and adaptation of FM, evaluation and monitoring of FM, data privacy and model deployment, and technology needs. We will cover monitoring in a separate post.
Operationalization journey per generative AI user type
To simplify the description of the processes, we need to categorize the main generative AI user types, as shown in the following figure.

The user types are as follows:

Providers – Users who build FMs from scratch and provide them as a product to other users (fine-tuner and consumer). They have deep end-to-end ML and natural language processing (NLP) expertise and data science skills, and massive data labeler and editor teams.
Fine-tuners – Users who retrain (fine-tune) FMs from providers to fit custom requirements. They orchestrate the deployment of the model as a service for use by consumers. These users need strong end-to-end ML and data science expertise and knowledge of model deployment and inference. Strong domain knowledge for tuning, including prompt engineering, is required as well.
Consumers – Users who interact with generative AI services from providers or fine-tuners by text prompting or a visual interface to complete desired actions. No ML expertise is required but, mostly, application developers or end-users with understanding of the service capabilities. Only prompt engineering is necessary for better results.

As per the definition and the required ML expertise, MLOps is required mostly for providers and fine-tuners, while consumers can use application productionization principles, such as DevOps and AppDev to create the generative AI applications. Furthermore, we have observed a movement among the user types, where providers might become fine-tuners to support use cases based on a specific vertical (such as the financial sector) or consumers might become fine-tuners to achieve more accurate results. But let’s observe the main processes per user type.
The journey of consumers
The following figure illustrates the consumer journey.

As previously mentioned, consumers are required to select, test, and use an FM, interacting with it by providing specific inputs, otherwise known as prompts. Prompts, in the context of computer programming and AI, refer to the input that is given to a model or system to generate a response. This can be in the form of a text, command, or a question, which the system uses to process and generate an output. The output generated by the FM can then be utilized by end-users, who should also be able to rate these outputs to enhance the model’s future responses.
Beyond these fundamental processes, we’ve noticed consumers expressing a desire to fine-tune a model by harnessing the functionality offered by fine-tuners. Take, for instance, a website that generates images. Here, end-users can set up private accounts, upload personal photos, and subsequently generate content related to those images (for example, generating an image depicting the end-user on a motorbike wielding a sword or located in an exotic location). In this scenario, the generative AI application, designed by the consumer, must interact with the fine-tuner backend via APIs to deliver this functionality to the end-users.
However, before we delve into that, let’s first concentrate on the journey of model selection, testing, usage, input and output interaction, and rating, as shown in the following figure.

*15K available FM reference
Step 1. Understand top FM capabilities
There are many dimensions that need to be considered when selecting foundation models, depending on the use case, the data available, regulations, and so on. A good checklist, although not comprehensive, might be the following:

Proprietary or open-source FM – Proprietary models often come at a financial cost, but they typically offer better performance (in terms of quality of the generated text or image), often being developed and maintained by dedicated teams of model providers who ensure optimal performance and reliability. On the other hand, we also see adoption of open-source models that, other than being free, offer additional benefits of being accessible and flexible (for example, every open-source model is fine-tunable). An example of a proprietary model is Anthropic’s Claude model, and an example of a high performing open-source model is Falcon-40B, as of July 2023.
Commercial license – Licensing considerations are crucial when deciding on an FM. It’s important to note that some models are open-source but can’t be used for commercial purposes, due to licensing restrictions or conditions. The differences can be subtle: The newly released xgen-7b-8k-base model, for example, is open source and commercially usable (Apache-2.0 license), whereas the instruction fine-tuned version of the model xgen-7b-8k-inst is only released for research purposes only. When selecting an FM for a commercial application, it’s essential to verify the license agreement, understand its limitations, and ensure it aligns with the intended use of the project.
Parameters – The number of parameters, which consist of the weights and biases in the neural network, is another key factor. More parameters generally means a more complex and potentially powerful model, because it can capture more intricate patterns and correlations in the data. However, the trade-off is that it requires more computational resources and, therefore, costs more to run. Additionally, we do see a trend towards smaller models, especially in the open-source space (models ranging from 7–40 billion) that perform well, especially, when fine-tuned.
Speed – The speed of a model is influenced by its size. Larger models tend to process data slower (higher latency) due to the increased computational complexity. Therefore, it’s crucial to balance the need for a model with high predictive power (often larger models) with the practical requirements for speed, especially in applications, like chat bots, that demand real-time or near-real-time responses.
Context window size (number of tokens) – The context window, defined by the maximum number of tokens that can be input or output per prompt, is crucial in determining how much context the model can consider at a time (a token roughly translates to 0.75 words for English). Models with larger context windows can understand and generate longer sequences of text, which can be useful for tasks involving longer conversations or documents.
Training dataset – It’s also important to understand what kind of data the FM was trained on. Some models may be trained on diverse text datasets like internet data, coding scripts, instructions, or human feedback. Others may also be trained on multimodal datasets, like combinations of text and image data. This can influence the model’s suitability for different tasks. In addition, an organization might have copyright concerns depending on the exact sources a model has been trained on—therefore, it’s mandatory to inspect the training dataset closely.
Quality – The quality of an FM can vary based on its type (proprietary vs. open source), size, and what it was trained on. Quality is context-dependent, meaning what is considered high-quality for one application might not be for another. For example, a model trained on internet data might be considered high quality for generating conversational text, but less so for technical or specialized tasks.
Fine-tunable – The ability to fine-tune an FM by adjusting its model weights or layers can be a crucial factor. Fine-tuning allows for the model to better adapt to the specific context of the application, improving performance on the specific task at hand. However, fine-tuning requires additional computational resources and technical expertise, and not all models support this feature. Open-source models are (in general) always fine-tunable because the model artifacts are available for downloading and the users are able to extend and use them at will. Proprietary models might sometimes offer the option of fine-tuning.
Existing customer skills – The selection of an FM can also be influenced by the skills and familiarity of the customer or the development team. If an organization has no AI/ML experts in their team, then an API service might be better suited for them. Also, if a team has extensive experience with a specific FM, it might be more efficient to continue using it rather than investing time and resources to learn and adapt to a new one.

The following is an example of two shortlists, one for proprietary models and one for open-source models. You might compile similar tables based on your specific needs to get a quick overview of the available options. Note that the performance and parameters of those models change rapidly and might be outdated by the time of reading, while other capabilities might be important for specific customers, such as the supported language.
The following is an example of notable proprietary FMs available in AWS (July 2023).

The following is an example of notable open-source FM available in AWS (July 2023).

After you have compiled an overview of 10–20 potential candidate models, it becomes necessary to further refine this shortlist. In this section, we propose a swift mechanism that will yield two or three viable final models as candidates for the next round.
The following diagram illustrates the initial shortlisting process.

Typically, prompt engineers, who are experts in creating high-quality prompts that allow AI models to understand and process user inputs, experiment with various methods to perform the same task (such as summarization) on a model. We suggest that these prompts are not created on the fly, but are systematically extracted from a prompt catalog. This prompt catalog is a central location for storing prompts to avoid replications, enable version control, and share prompts within the team to ensure consistency between different prompt testers in the different development stages, which we introduce in the next section. This prompt catalog is analogous to a Git repository of a feature store. The generative AI developer, who could potentially be the same person as the prompt engineer, then needs to evaluate the output to determine if it would be suitable for the generative AI application they are seeking to develop.
Step 2. Test and evaluate the top FM
After the shortlist is reduced to approximately three FMs, we recommend an evaluation step to further test the FMs’ capabilities and suitability for the use case. Depending on the availability and nature of evaluation data, we suggest different methods, as illustrated in the following figure.

The method to use first depends on whether you have labeled test data or not.
If you have labeled data, you can use it to conduct a model evaluation, as we do with the traditional ML models (input some samples and compare the output with the labels). Depending on whether the test data has discrete labels (such as positive, negative, or neutral sentiment analysis) or is unstructured text (such as summarization), we propose different methods for evaluation:

Accuracy metrics – In case of discrete outputs (such as sentiment analysis), we can use standard accuracy metrics such as precision, recall, and F1 score
Similarity metrics – If the output is unstructured (such as a summary), we suggest similarity metrics like ROUGE and cosine similarity

Some use cases don’t lend themselves to having one true answer (for example, “Create a short children’s story for my 5-year-old daughter”). In such cases, it becomes more challenging to evaluate the models because you don’t have labeled test data. We propose two approaches, depending on the importance of human review of the model versus automated evaluation:

Human-in-the-Loop (HIL) – In this case, a team of prompt testers will review the responses from a model. Depending on how critical the application is, the prompt testers might review 100% of the model outputs or just a sample.
LLM-powered evaluation – In this scenario, the prompt testers are replaced by an LLM, ideally one that is more powerful (although perhaps slower and most costly) than the ones being tested. The LLM will review all model-generated responses and score them. This method may result in lower quality, but it’s a cheaper and faster evaluation option that might provide a good initial gauge on the models’ performance.

For example, we can use the following example prompt to an LLM, which includes the input query, the LLM response, and instructions: “We would like to request your feedback on the performance of an AI assistant in response to the user question displayed above. Please rate the helpfulness, relevance, accuracy, level of details of the assistant’s response. The assistant shall receive an overall score on a scale of 1 to 10, where a higher score indicates better overall performance.”
This prompt will be sent to another LLM (the EvalLLM) for evaluation. The following is an example where the response of the Falcon-40B model has been evaluated by an EvalLLM (in this case Claude). We use the question “What if the Suez Canal had never been constructed?” The response is as follows:

Question: What if the Suez Canal had never been constructed?
falcon-40b-instruct
If the Suez Canal had never been constructed, ships would have to travel around Africa to navigate between the Mediterranean and the Red Sea. This would add significant time and distance to the voyage, making it less efficient and more expensive. Additionally, without the Suez Canal, many countries in the Middle East and North Africa would have been much less connected to the rest of the world, hindering economic and cultural development.
Review (Score: 9/10)
The assistant’s response is highly relevant, accurate, and detailed. It provides an excellent answer to the user’s hypothetical question about the non-existence of the Suez Canal. The assistant correctly points out the implications on maritime travel and the economic and cultural impact on the Middle East and North Africa. However, it could have further elaborated on the geopolitical implications or the impact on global trade patterns for a more comprehensive response.

The following figure illustrates the end-to-end evaluation process example.

Based on this example, to perform evaluation, we need to provide the example prompts, which we store in the prompt catalog, and an evaluation labeled or unlabeled dataset based on our specific applications. For example, with a labeled evaluation dataset, we can provide prompts (input and query) such as “Give me the full name of the UK PM in 2023” and outputs and answers, such as “Rishi Sunak.” With an unlabeled dataset, we provide just the question or instruction, such as “Generate the source code for a retail website.” We call the combination of prompt catalog and evaluation dataset the evaluation prompt catalog. The reason that we differentiate the prompt catalog and evaluation prompt catalog is because the latter is dedicated to a specific use case instead of generic prompts and instructions (such as question answering) that the prompt catalog contains.
With this evaluation prompt catalog, the next step is to feed the evaluation prompts to the top FMs. The result is an evaluation result dataset that contains the prompts, outputs of each FM, and the labeled output together with a score (if it exists). In the case of an unlabeled evaluation prompt catalog, there is an additional step for an HIL or LLM to review the results and provide a score and feedback (as we described earlier). The final outcome will be aggregated results that combine the scores of all the outputs (calculate the average precision or human rating) and allow the users to benchmark the quality of the models.
After the evaluation results have been collected, we propose choosing a model based on several dimensions. These typically come down to factors such as precision, speed, and cost. The following figure shows an example.

Each model will possess strengths and certain trade-offs along these dimensions. Depending on the use case, we should assign varying priorities to these dimensions. In the preceding example, we elected to prioritize cost as the most important factor, followed by precision, and then speed. Even though it’s slower and not as efficient as FM1, it remains sufficiently effective and significantly cheaper to host. Consequently, we might select FM2 as the top choice.
Step 3. Develop the generative AI application backend and frontend
At this point, the generative AI developers have selected the right FM for the specific application together with the help of prompt engineers and testers. The next step is to start developing the generative AI application. We have separated the development of the generative AI application into two layers, a backend and front end, as shown in the following figure.

On the backend, the generative AI developers incorporate the selected FM into the solutions and work together with the prompt engineers to create the automation to transform the end-user input to appropriate FM prompts. The prompt testers create the necessary entries to the prompt catalog for automatic or manual (HIL or LLM) testing. Then, the generative AI developers create the prompt chaining and application mechanism to provide the final output. Prompt chaining, in this context, is a technique to create more dynamic and contextually-aware LLM applications. It works by breaking down a complex task into a series of smaller, more manageable sub-tasks. For example, if we ask an LLM the question “Where was the prime minister of the UK born and how far is that place from London,” the task can be broken down into individual prompts, where a prompt might be built based on the answer of a previous prompt evaluation, such as “Who is the prime minister of the UK,” “What is their birthplace,” and “How far is that place from London?” To ensure a certain input and output quality, the generative AI developers also need to create the mechanism to monitor and filter the end-user inputs and application outputs. If, for example, the LLM application is supposed to avoid toxic requests and responses, they could apply a toxicity detector for input and output and filter those out. Lastly, they need to provide a rating mechanism, which will support the augmentation of the evaluation prompt catalog with good and bad examples. A more detailed representation of those mechanisms will be presented in future posts.
To provide the functionality to the generative AI end-user, the development of a frontend website that interacts with the backend is necessary. Therefore, DevOps and AppDevs (application developers on the cloud) personas need to follow best development practices to implement the functionality of input/output and rating.
In addition to this basic functionality, the frontend and backend need to incorporate the feature of creating personal user accounts, uploading data, initiating fine-tuning as a black box, and using the personalized model instead of the basic FM. The productionization of a generative AI application is similar with a normal application. The following figure depicts an example architecture.

In this architecture, the generative AI developers, prompt engineers, and DevOps or AppDevs create and test the application manually by deploying it via CI/CD to a development environment (generative AI App Dev in the preceding figure) using dedicated code repositories and merging with the dev branch. At this stage, the generative AI developers will use the corresponding FM by calling the API as has been provided by the FM providers of fine-tuners. Then, to test the application extensively, they need to promote the code to the test branch, which will trigger the deployment via CI/CD to the preproduction environment (generative AI App Pre-prod). At this environment, the prompt testers need to try a large amount of prompt combinations and review the results. The combination of prompts, outputs, and review need to be moved to the evaluation prompt catalog to automate the testing process in the future. After this extensive test, the last step is to promote the generative AI application to production via CI/CD by merging with the main branch (generative AI App Prod). Note that all the data, including the prompt catalog, evaluation data and results, end-user data and metadata, and fine-tuned model metadata, need to be stored in the data lake or data mesh layer. The CI/CD pipelines and repositories need to be stored in a separate tooling account (similar to the one described for MLOps).
The journey of providers
FM providers need to train FMs, such as deep learning models. For them, the end-to-end MLOps lifecycle and infrastructure is necessary. Additions are required in historical data preparation, model evaluation, and monitoring. The following figure illustrates their journey.

In classic ML, the historical data is most often created by feeding the ground truth via ETL pipelines. For example, in a churn prediction use case, an automation updates a database table based on the new status of a customer to churn/not churn automatically. In the case of FMs, they need either billions of labeled or unlabeled data points. In text-to-image use cases, a team of data labelers need to label <text, image> pairs manually. This is an expensive exercise requiring a large number of people resources. Amazon SageMaker Ground Truth Plus can provide a team of labelers to perform this activity for you. For some use cases, this process can be also partially automated, for example by using CLIP-like models. In the case of an LLM, such as text-to-text, the data is unlabeled. However, they need to be prepared and follow the format of the existing historical unlabeled data. Therefore, data editors are needed to perform necessary data preparation and ensure consistency.
With the historical data prepared, the next step is the training and productionization of the model. Note that the same evaluation techniques as we described for consumers can be used.
The journey of fine-tuners
Fine-tuners aim to adapt an existing FM to their specific context. For example, an FM model can summarize a general-purpose text but not a financial report accurately or can’t generate source code for a non-common programming language. In those cases, the fine-tuners need to label data, fine-tune a model by running a training job, deploy the model, test it based on the consumer processes, and monitor the model. The following diagram illustrates this process.

For the time being, there are two fine-tuning mechanisms:

Fine-tuning – By using an FM and labeled data, a training job recalculates the weights and biases of the deep learning model layers. This process can be computationally intensive and requires a representative amount of data but can generate accurate results.
Parameter-efficient fine-tuning (PEFT) – Instead of recalculating all the weights and biases, researchers have shown that by adding additional small layers to the deep learning models, they can achieve satisfactory results (for example, LoRA). PEFT requires lower computational power than deep fine-tuning and a training job with less input data. The drawback is potential lower accuracy.

The following diagram illustrates these mechanisms.

Now that we have defined the two main fine-tuning methods, the next step is to determine how we can deploy and use the open-source and proprietary FM.
With open-source FMs, the fine-tuners can download the model artifact and the source code from the web, for example, by using the Hugging Face Model Hub. This gives you the flexibility to deep fine-tune the model, store it to a local model registry, and deploy it to an Amazon SageMaker endpoint. This process requires an internet connection. To support more secure environments (such as for customers in the financial sector), you can download the model on premises, run all the necessary security checks, and upload them to a local bucket on an AWS account. Then, the fine-tuners use the FM from the local bucket without an internet connection. This ensures data privacy, and the data doesn’t travel over the internet. The following diagram illustrates this method.

With proprietary FMs, the deployment process is different because the fine-tuners don’t have access to the model artifact or source code. The models are stored in proprietary FM provider AWS accounts and model registries. To deploy such a model to a SageMaker endpoint, the fine-tuners can request only the model package that will be deployed directly to an endpoint. This process requires customer data to be used in the proprietary FM providers’ accounts, which raises questions regarding customer-sensitive data being used in a remote account to perform fine-tuning, and models being hosted in a model registry that is shared among multiple customers. This leads to a multi-tenancy problem that becomes more challenging if the proprietary FM providers need to serve these models. If the fine-tuners use Amazon Bedrock, these challenges are resolved—the data doesn’t travel over the internet and the FM providers don’t have access to fine-tuners’ data. The same challenges hold for the open-source models if the fine-tuners want to serve models from multiple customers, such as the example we gave earlier with the website that thousands of customers will upload personalized images to. However, these scenarios can be considered controllable because only the fine-tuner is involved. The following diagram illustrates this method.

From a technology perspective, the architecture that a fine-tuner needs to support is like the one for MLOps (see the following figure). The fine-tuning needs to be conducted in dev by creating ML pipelines, such as using Amazon SageMaker Pipelines; performing preprocessing, fine-tuning (training job), and postprocessing; and sending the fine-tuned models to a local model registry in the case of an open-source FM (otherwise, the new model will be stored to the proprietary FM provide environment). Then, in pre-production, we need to test the model as we describe for the consumers’ scenario. Finally, the model will be served and monitored in prod. Note that the current (fine-tuned) FM requires GPU instance endpoints. If we need to deploy each fine-tuned model to a separate endpoint, this might increase the cost in the case of hundreds of models. Therefore, we need to use multi-model endpoints and resolve the multi-tenancy challenge.

The fine-tuners adapt an FM model based on a specific context to use it for their business purpose. That means that most of the time, the fine-tuners are also consumers required to support all the layers, as we described in the previous sections, including generative AI application development, data lake and data mesh, and MLOps.
The following figure illustrates the complete FM fine-tuning lifecycle that the fine-tuners need to provide the generative AI end-user.

The following figure illustrates the key steps.

The key steps are the following:

The end-user creates a personal account and uploads private data.
The data is stored in the data lake and is preprocessed to follow the format that the FM expects.
This triggers a fine-tuning ML pipeline that adds the model to the model registry,
From there, either the model is deployed to production with minimum testing or the model pushes extensive testing with HIL and manual approval gates.
The fine-tuned model is made available for end-users.

Because this infrastructure is complex for non-enterprise customers, AWS released Amazon Bedrock to offload the effort of creating such architectures and bringing fine-tuned FMs closer to production.
FMOps and LLMOps personas and processes differentiators
Based on the preceding user type journeys (consumer, producer, and fine-tuner), new personas with specific skills are required, as illustrated in the following figure.

The new personas are as follows:

Data labelers and editors – These users label data, such as <text, image> pairs, or prepare unlabeled data, such as free text, and extend the advanced analytics team and data lake environments.
Fine-tuners – These users have deep knowledge on FMs and know to tune them, extending the data science team that will focus on classic ML.
Generative AI developers – They have deep knowledge on selecting FMs, chaining prompts and applications, and filtering input and outputs. They belong a new team—the generative AI application team.
Prompt engineers – These users design the input and output prompts to adapt the solution to the context and test and create the initial version of prompt catalog. Their team is the generative AI application team.
Prompt testers – They test at scale the generative AI solution (backend and frontend) and feed their results to augment the prompt catalog and evaluation dataset. Their team is the generative AI application team.
AppDev and DevOps – They develop the front end (such as a website) of the generative AI application. Their team is the generative AI application team.
Generative AI end-users – These users consume generative AI applications as black boxes, share data, and rate the quality of the output.

The extended version of the MLOps process map to incorporate generative AI can be illustrated with the following figure.

A new application layer is the environment where generative AI developers, prompt engineers, and testers, and AppDevs created the backend and front end of generative AI applications. The generative AI end-users interact with the generative AI applications front end via the internet (such as a web UI). On the other side, data labelers and editors need to preprocess the data without accessing the backend of the data lake or data mesh. Therefore, a web UI (website) with an editor is necessary for interacting securely with the data. SageMaker Ground Truth provides this functionality out of the box.
Conclusion
MLOps can help us productionize ML models efficiently. However, to operationalize generative AI applications, you need additional skills, processes, and technologies, leading to FMOps and LLMOps. In this post, we defined the main concepts of FMOps and LLMOps and described the key differentiators compared to MLOps capabilities in terms of people, processes, technology, FM model selection, and evaluation. Furthermore, we illustrated the thought process of a generative AI developer and the development lifecycle of a generative AI application.
In the future, we will focus on providing solutions per the domain we discussed, and will provide more details on how to integrate FM monitoring (such as toxicity, bias, and hallucination) and third-party or private data source architectural patterns, such as Retrieval Augmented Generation (RAG), into FMOps/LLMOps.
To learn more, refer to MLOps foundation roadmap for enterprises with Amazon SageMaker and try out the end-to-end solution in Implementing MLOps practices with Amazon SageMaker JumpStart pre-trained models.
If you have any comments or questions, please leave them in the comments section.

About the Authors
Dr. Sokratis Kartakis is a Senior Machine Learning and Operations Specialist Solutions Architect for Amazon Web Services. Sokratis focuses on enabling enterprise customers to industrialize their Machine Learning (ML) solutions by exploiting AWS services and shaping their operating model, i.e. MLOps foundation, and transformation roadmap leveraging best development practices. He has spent 15+ years on inventing, designing, leading, and implementing innovative end-to-end production-level ML and Internet of Things (IoT) solutions in the domains of energy, retail, health, finance/banking, motorsports etc. Sokratis likes to spend his spare time with family and friends, or riding motorbikes.
Heiko Hotz is a Senior Solutions Architect for AI & Machine Learning with a special focus on natural language processing, large language models, and generative AI. Prior to this role, he was the Head of Data Science for Amazon’s EU Customer Service. Heiko helps our customers be successful in their AI/ML journey on AWS and has worked with organizations in many industries, including insurance, financial services, media and entertainment, healthcare, utilities, and manufacturing. In his spare time, Heiko travels as much as possible.

Is ChatGPT Really Neutral? An Empirical Study on Political Bias in AI- …

Posted on September 1, 2023 by i-genie

A recent study conducted by researchers from the UK and Brazil has illuminated concerns regarding the objectivity of ChatGPT, a popular AI language model developed by OpenAI. The researchers discovered a discernible political bias in ChatGPT’s responses, leaning towards the left side of the political spectrum. This bias, they argue, could perpetuate existing biases present in traditional media, potentially influencing various stakeholders such as policymakers, media outlets, political groups, and educational institutions.

At present, ChatGPT is one of the leading AI language models utilized for generating human-like text based on input prompts. While it has proven to be a versatile tool for various applications, the emergence of bias in its responses poses a significant challenge. Previous research has highlighted concerns about biases in AI models, emphasizing the importance of mitigating these biases to ensure fair and balanced outputs.

In response to the identified bias in ChatGPT, a team of researchers from the UK and Brazil has introduced a study aimed at addressing the political bias issue by analyzing ChatGPT’s responses to political compass questions and scenarios where the AI model impersonates both a Democrat and a Republican.

The researchers employed an empirical approach to gauge ChatGPT’s political orientation. They used questionnaires to evaluate the AI model’s stance on political issues and contexts. Additionally, they investigated scenarios where ChatGPT took on the persona of an average Democrat and a Republican. The study’s findings suggested that the bias was not a mechanical result but a deliberate tendency in the AI model’s output. The researchers explored both the training data and the algorithm, concluding that both factors likely contribute to the observed bias.

The study’s results indicated a substantial bias in ChatGPT’s responses, particularly favoring Democratic-leaning perspectives. This bias extended beyond the US and was also evident in responses related to Brazilian and British political contexts. The research shed light on the potential implications of biased AI-generated content on various stakeholders and emphasized the need for further investigation into the sources of the bias.

In light of the growing influence of AI-driven tools like ChatGPT, this study serves as a reminder of the necessity for vigilance and critical evaluation to ensure unbiased and fair AI technologies. Addressing biases in AI models is crucial to avoid perpetuating existing biases and uphold objectivity and neutrality principles. As AI technologies continue to evolve and expand into various sectors, it becomes imperative for developers, researchers, and stakeholders to work collectively toward minimizing biases and promoting ethical AI development. The introduction of ChatGPT Enterprise further underscores the need for robust measures to ensure that AI tools are not only efficient but also unbiased and reliable.

If you like our work, you will love our newsletter..

The post Is ChatGPT Really Neutral? An Empirical Study on Political Bias in AI-Driven Conversational Agents appeared first on MarkTechPost.

Top Data Privacy Tools 2023

Posted on September 1, 2023 by i-genie

Data privacy management software makes adhering to privacy regulations like the General Data Protection Regulation and the California Consumer Privacy Act easier. Data Subject Access Requests (DSARs) and the Right to Erasure (Right to Be Forgotten) under the General Data Protection Regulation (GDPR) are only a few examples of what must be done. Businesses can manage their privacy program more effectively by utilizing data privacy management solutions, which allow them to automate manual operations, increase transparency, and use reporting tools.

Enzuzo

Websites, online shops, mobile apps, and SaaS platforms benefit from legal privacy rules, and Enzuzo makes it possible to do so without breaking the bank. An automatic DSAR request generator, cookie consent banner templates and more make this a comprehensive compliance platform. Enzuzo’s ability to manage various features and demands from a single, intuitive interface is one of the platform’s greatest strengths. Unsurprisingly, methods for safely gathering individual details might take time to implement. The average business has a lot on its plate between dealing with regulatory frameworks, regional mandates, several languages, and frequent modifications to compliance requirements.

DataGrail

As a privacy management tool, DataGrail simplifies compliance for businesses by centralizing the tracking and administration of client data across systems and divisions. It also offers useful instruments for automating DSAR and other activities of a privacy-related nature. DataGrail complies with numerous international regulatory requirements, helping businesses manage data subject requests and other compliance issues. Among the many features offered by the system are data mapping and inventory, consent management, policy and notification management, and vendor management. It also provides real-time analytics and dashboards to monitor compliance operations and identify areas of concern for businesses.

PrivacyEngine

PrivacyEngine’s primary goal is to aid businesses in mitigating the risks associated with data privacy and establishing a culture in which privacy is prioritized. Data inventories and maps, privacy risk assessments, DSR management, incident management, and vendor management are just some of the services offered by PrivacyEngine. Through individualized risk evaluations and privacy impact evaluations, it assists businesses in identifying, measuring, and mitigating threats to the privacy of their sensitive data. However, PrivacyEngine can be pricey, making it unaffordable for certain small and medium-sized businesses. As a company expands or its needs change, the price of the software may also rise. PrivacyEngine’s implementation and maintenance can be challenging and time-consuming, much like similar platforms. It may take a substantial investment of time and energy for businesses to configure and integrate the software into their existing infrastructure and procedures.

OneTrust

Regarding privacy, security, and governance, OneTrust is your go-to provider. The company provides a whole suite of software solutions to aid organizations in meeting worldwide standards such as GDPR, CCPA, LGPD, and others. OneTrust’s software products provide businesses with robust tools for overseeing privacy, security, and governance initiatives. The software is flexible enough to be tailored to the specific requirements of firms in a wide range of sectors. Consulting and training are only two of the many expert services offered by OneTrust to assist organizations in improving their privacy initiatives and conforming to international standards.

Securiti

Securiti is an end-to-end privacy and data security automation system that works in on-premise, hybrid, and multi-cloud settings worldwide to provide security, governance, and compliance. Data cataloging, Sensitive data discovery, access intelligence & controls, security posture monitoring, and more are some features available in Securiti’s software products that aim to provide end-to-end data protection. Securiti can aid businesses in understanding and managing their sensitive data, lowering the likelihood of data breaches, and guaranteeing compliance with legal standards because of its all-encompassing approach to data protection. Securiti’s competitive edge is its extensive functionality at a low price. The platform provides helpful technical resources to facilitate onboarding and manages a set of useful dashboards and visualization tools to increase data transparency.

Collibra

Collibra is a data intelligence platform that operates in the cloud and aids businesses in managing and governing their data assets. The system provides a foundation for businesses to learn from and profit from their data. Data governance, categorization, data lineage, and data quality are only a few of the uses it was built to accommodate. Collibra’s software solutions provide features such as automated data discovery and categorization, visualization of data lineage, monitoring of data quality, cataloging and indexing of data, and administration of workflows. The platform’s integrated collaboration and communication features facilitate information sharing and cooperation between data stewards and analysts. Collibra’s platform complexity suits users with technical expertise, but deployment will require more technically savvy businesses. Many users may need clarification on the Collibra interface, and even with careful preparation, users may require outside assistance with implementation to get the most out of the software.

Palqee

Palqee is an all-encompassing tool for helping businesses meet their risk, compliance, and governance objectives. Data mapping, assessments, subject rights management, documentation, and fostering a privacy-first culture are some areas where this program shines. Palqee has become a popular alternative for compliance management and cooperation in the South American and Brazilian markets thanks to its user-friendly configuration choices, extensive library of pre-made templates, and active user community. Palqee, unlike many of the other software options on this list, has hefty upfront charges and requires users to commit to lengthier contract terms. The Palqee community and its compliance features primarily focus on South American markets, which may reduce the platform’s utility for businesses in other parts of the world.

Osano

Osano gives organizations a wide variety of resources for managing their website and app privacy policies and adhering to data privacy rules. Basic capabilities include analyzing sites and apps for privacy issues, managing user consent, and checking for regulatory compliance. Osano also offers businesses privacy policy templates and compliance reports that may be adapted for specific uses. The Osano platform has a straightforward interface and simple controls. Other useful tools include a thorough privacy scanning module that can check assets for privacy flaws. These capabilities are supported by editable reports that reveal how close an organization is to meeting major privacy standards.

TrustArc

TrustArc is a privacy management platform that centralizes compliance management by coordinating privacy frameworks, intelligence, reporting insights, and data inventory capabilities. For businesses looking to streamline their time-consuming and laborious compliance processes, TrustArc is an excellent option because of its completely automated platform for end-to-end compliance management that goes beyond the fundamentals. In addition, the platform is supported by a first-rate customer service team that can help with any problems that may crop up during deployment. TrustArc’s flaws stem from the platform’s extensive feature set and personalization choices. Users may discover that much oversight is needed to make TrustArc the reliable hub of information it was designed to be.

BigID

BigID is an enterprise-wide data discovery and management platform built on an artificial intelligence engine. Data discovery and intelligence are the tool’s strong suits; further features include efficient evaluation, categorization, and management of private data. To this end, BigID provides a set of out-of-the-box and customizable tools for businesses to understand better and utilize their data. BigID’s advantages and disadvantages reflect its status as an enterprise-level data discovery and classification solution. Powerful toolkits and a wide variety of discovery tools are great, but users should know that these tools require some control. BigID has a relatively unintuitive UI, making it difficult for newcomers to use.

Didomi

Didomi is a cookie consent solution that enables organizations to meet the requirements of local data privacy laws. It’s a system that allows users to record, modify, and demonstrate their acceptance of cookies and similar tracking technologies. In addition to helping organizations manage their cookie consent programs, Didomi provides several useful tools. An approval system that gives businesses the freedom to design their approval pop-ups and forms, A central repository for consent information that meets all legal requirements for storage and accessibility, and An authorization application programming interface that can be used to connect Didomi to other enterprise software. A consent analytics dashboard that can shed light on how users are using their permissions. Didomi is employed by companies of various sizes, from bootstrapped startups to multinational conglomerates. Google, Microsoft, and Salesforce are just a few top companies that put their faith in it.

IBM Security Verify

IBM Security Verify is an identity and access management (IAM) software hosted in the cloud that can assist businesses in ensuring the security of their sensitive information. Users’ identities, permissions, and consent settings may all be managed from one place. Data discovery, classification, and security functionalities are also available in Verify. IBM Security Verify’s consent management tools are very useful for protecting personal information. Organizations can use Verify to establish and administer consent rules that outline the parameters for collecting, using, and disclosing user data. Users can also give or withdraw narrowly tailored permissions. IBM Security Verify’s capacity for discovering and categorizing data is also useful for protecting user privacy. With the assistance of Verify, businesses can better recognize and organize sensitive information like PII. This information can be protected using additional security measures like encryption.

Privo

Privo helps businesses comply with the requirements of various privacy regulations by using a centralized data privacy management solution. Companies can communicate with children (or block them from accessing their services) in a way that does not compromise their customers’ personal information by using the tools provided by this platform. Organizations can benefit from age verification to verify they comply with regulations like COPPA and GDPR. A “consent management” system allows organizations to request and record individual users’ consent to collect and use their data. By employing data reduction strategies, companies may reduce the quantity of client data they collect and store without sacrificing quality. Not taking enough safety measures increases the risk of information being stolen, misused, manipulated, or destroyed. In the case of a data breach, firms may promptly alert their consumers thanks to data breach notification.

LogicGate

Businesses can use LogicGate, a cloud-based governance, risk, and compliance (GRC) software platform, to manage their data privacy initiatives better. Organizations may streamline and automate their data privacy operations using LogicGate’s Data Privacy Management solution. To better understand the risks connected with their data assets, LogicGate assists businesses in identifying and categorizing these assets. To comply with data privacy rules like the General Data Protection Regulation (GDPR), LogicGate assists enterprises in monitoring and managing their data processing activities. LogicGate facilitates DPIAs so that businesses can evaluate the potential privacy impacts of their data processing operations. LogicGate facilitates fast and lawful responses to DSARs for businesses. LogicGate aids businesses in discovering, evaluating, and resolving privacy threats. LogicGate’s pre-made reports and dashboards make it easy for companies to monitor their progress toward privacy compliance.

SAI360

SAI360 is data privacy management software that can assist with compliance with the California Consumer Privacy Act (CCPA), the General Data Protection Regulation (GDPR), and the Health Insurance Portability and Accountability Act (HIPAA). The SAI360 system provides a central repository for your company’s data privacy documentation, including policies, procedures, and training resources. Data privacy risk assessment, data flow management, and rule compliance monitoring tools are all provided. The PIA tool is useful for identifying and assessing risks to individuals’ privacy that may arise as a result of an organization’s data processing activities. The data flow mapping application provides a graphical depiction of the information flows occurring within a business. You can use this to ensure your data is handled properly and identify any potential weak spots.

Don’t forget to join our 29k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 1000’s AI Tools in AI Tools Club

The post Top Data Privacy Tools 2023 appeared first on MarkTechPost.

Best Proxy Servers (September 2023)

Posted on September 1, 2023 by i-genie

A proxy server is an application or web service operating on the network that allows computers to make requests on their behalf. It acts as a go-between for you, the customer, and the service, the website you’ve asked to see on your computer.

Proxy servers are often used so that users may mask their real IP address while surfing the web.

In addition to allowing users to access otherwise-blocked websites, proxy servers may limit or monitor users, such as minors or workers. It may be set up to prevent users from visiting certain websites. You may use it to protect your data from prying eyes, remain anonymous online, and evaluate the effectiveness of content filters, all while enjoying improved network speed.

The top proxy servers are listed below.

Bright Data

Bright Data stands as the leading global platform for web data. It serves as a dependable resource for a wide spectrum of entities, from Fortune 500 corporations to academic institutions and small enterprises. These organizations leverage the efficient, reliable, and adaptable solutions offered by Bright Data to harvest key public web data. This data is then utilized for research, surveillance, data analysis, and to enhance decision-making processes. Bright Data boasts a vast array of proxies spread across 195 countries, a stellar success rate of 99.99%, and an enormous reservoir of over 72 million real residential IPs

Ake

Ake is a superb residential proxy network that distinguishes itself as the most dependable and stable choice accessible. It gives customers access to a large selection of geo-located content by allowing them to connect to the internet through reliable and trustworthy sources and a large pool of residential IP addresses. You can connect to and choose from proxy servers located in more than 150 different nations. The US, France, Germany, the UK, and the Netherlands all provide a large selection of proxy servers. There are 650 locations and 150 countries where you may find global proxy servers for application testing.

Live Proxies

Live Proxies sets the industry standard for private residential and mobile proxy solutions. They offer top-quality, stable, and unblocked proxies tailored for diverse use cases, with the assurance of optimal transparency and reliability. With an assortment of rotating and static residential IPs, as well as rotating mobile IPs, Live Proxies caters to various needs ranging from eCommerce, Market Research, and brand protection to SEO/SERP and AdTech. Their proxies are exclusively assigned, ensuring their unblocked status on all websites, while their robust customer support and custom solutions add to their stellar reputation. Moreover, the user-friendly management dashboard enables easy proxy analytics viewing. Users can choose from a range of flexible plans starting at competitive prices, making Live Proxies an invaluable asset in today’s digitally-driven world.

NodeMaven

Unlike other providers, NodeMaven uses an advanced filtering algorithm to screen their proxy IPs in real time before assigning them to you. When you connect to a proxy provided by NodeMaven, you are only assigned an IP after it has passed through their advanced quality assurance algorithm, which ensures 95% of our IPs have a clean record.

Additionally, NodeMaven uses a hybrid proxy technology, which allows them to hold an IP session for up to 24 hours — many times longer than the industry average. This makes them ideal for managing accounts on platforms, such as Facebook, Google, eBay, Amazon, LinkedIn, and many more.

They offer over 5 million residential IPs from 1400+ cities in 150+ countries. Pricing is also very competitive with roll-over bandwidth that never expires.

IPRoyal

With thousands of IP addresses in 195+ countries, IPRoyal’s network of ethically sourced residential proxies is unrivaled. A total of 8,056,839 residential IP addresses were used to create the proxy pool. Using IPRoyal, you can obtain a real IP address from a real home user with a real Internet service provider (ISP) connection in any country in the world. It’s ideal for any situation where credibility matters, whether professional or personal.

Nimble

With Nimble, you may use IP addresses from homes, data centers, Internet service providers, and everywhere else globally from a single interface. The system improves data accessibility, reduces expenses, and facilitates the achievement of challenging goals. Nimble’s user-friendly control panel sets it different from other proxy service providers. The dashboard is useful for taking stock of your spending habits, tracking your consumption, etc. The control panel may also be used to set up, alter, or remove pipelines.

Smartproxy

Market research, account management, and ad verification utilizing public data are some of the uses for Smartproxy’s proxy services and data-collecting tools. This service provider works with both people and corporations to offer proxy and data-collecting services that are ethically sourced. The proxy network provided by Smartproxy is safe and anonymous, providing an extra degree of protection. The supplier accommodates various customer requirements and uses cases by providing four distinct proxy options: residential, mobile, shared, and dedicated data center proxies. Smartproxy’s dashboard and free Chrome and Firefox extensions make creating endpoints in a flash with only a few clicks.

Bright Data

If you’re looking for a reliable replacement for free proxy services, go no further than Bright Data. It provides an array of powerful resources for data collection. Around the world, this vendor has earned the confidence of over 10,000 experts. A clean dashboard is waiting for you. The available services are briefly explained. There is a preview sample for each function as well. In addition, the dashboard allows the management of your internet proxy server services. This is because PAYG options are available. In this setup, allocating resources like bandwidth, IP addresses, and network ports is simple.

Oxylabs

When it comes to corporate proxy servers, Oxylabs is a top choice. It has a more than 100 million IP address pool and provides worldwide coverage. In particular, its services excel in proxy-driven activities like marketing intelligence. The user interface is bright and simple to understand. You may use it to see detailed use statistics. Billing and ordering new services are split up into their tabs. There are several varieties of server proxies available. If you require to use a proxy with an unusual application or protocol, SOCKS5 Proxy will come in handy. Bulk scrapping is handled through residential proxy pools. They provide infinite connections at once, making them ideal for getting around internet censorship. Cities may also be targeted using geo-targeting.

ExpressVPN

ExpressVPN works like any other VPN service, masking your internet movements effectively. It’s simple to set up and secures all incoming communication to your device with military-grade encryption when you choose a location. ExpressVPN is far superior to the standard free proxy service in this aspect. ExpressVPN, once set up, can avoid local network dangers. You may immediately begin using the completely free and open internet. As a result, you may see materials previously blocked by the appropriate authorities. ExpressVPN is continually fine-tuning itself to provide a consistent and fast connection.

Whoer

Whoer is the most reliable proxy server program for those with many gadgets. It’s available as a download for PCs, Macs, Linux, mobile browsers, and phones. You can comfortably fit a family of five or a small group of pals. Whoer has a specialized app or browser extension for your preferred gadget. After the setup, you can begin using the program with a single click. With the free plan, you can only connect to one server. There are 17 available areas for premium customers. The browser add-ons provide IP leak protection using WebRTC. Moreover, a kill switch is included. It will prevent your address from being revealed when internet access is temporarily lost.

Hide.me

You will find better free proxy server software than Hide.me. Auto-connect, IP leak prevention, and BitTorrent compatibility are just some of the features you’ll find useful in this solution. The greatest part is that you will be protected from advertisements while utilizing this service. Hide.me is a simple and lightweight anonymizing service. Including the extension in Chrome or Firefox is all that’s required. If you prefer not to use browser extensions, use the built-in utility instead. Hide.me uses WebRTC to prevent your IP address from being exposed. Changing servers is as simple as using an automatic connection. This service provider keeps no records of your activities, greatly increasing your privacy.

CroxyProxy

CroxyProxy is a powerful, free web proxy that prioritizes security and privacy. It’s useful for video-sharing platforms, SERPs, SM, etc. It allows you to watch full-length videos while being completely anonymous. It’s a no-cost tool that doesn’t need any setup. It’s a browser that acts as a proxy. CroxyProxy employs innovative technologies to facilitate the usage of today’s most popular websites and online apps.

ProxySite.com

ProxySite is the most feature-rich of the free proxy services available today. This option is not only packed with useful features but it was also developed with speed in mind. This solution is fantastic for customers that need to access restricted websites. The first setup is painless. ProxySite includes a proxy extension for Chrome and Firefox. It provides quick links to frequently visited websites, including Twitter, YouTube, and Facebook. You may also choose from more than twenty-five United States and Europe servers.

NetNut.io

If you’re looking for a reliable proxy server, go beyond NetNut.io. The provider guarantees an uptime of 99.9 percent, provides unlimited simultaneous connections, and boasts lightning-fast connections. It’s perfect for intensive market research and data collection. The first steps are really simple. Instructions on how to set up your proxies may be found in a straightforward console.

Additionally, external programs can be incorporated. It’s also available as a Chrome add-on. If you’re looking for an anonymous proxy server, go beyond NetNut. DiviNetworks has partnered with more than 100 carriers throughout the globe, allowing it to supply home IPs derived from top-tier ISPs.

ScraperAPI

When it comes to web scraping, ScraperAPI is a top choice for proxies. It offers over 40 million IPs from various countries and is easy to scrape. It’s capable of processing CAPTCHAs, other anti-bot measures, and JavaScript. Background proxy rotation occurs automatically. Enter any website’s URL, and you’ll be presented with unfiltered HTML. Use a static IP address instead for a session that lasts longer.

Additionally, the option for custom headers and automatic retries is provided to desktop and mobile user agents. You may disguise your traffic as originating from a PC or mobile phone. However, it’s less sophisticated than some, so you’ll have to do some manual data parsing. However, for developers working on their own scraping solutions, its ease of use and complete API access make it an excellent choice.

HMA

HideMyAss (HMA) is a lightweight, free proxy server that provides access to the web anonymously and easily. The ability to remain hidden while surfing the web is a huge benefit, especially for first-time users. HMA does not include any browser add-ons, in contrast to other companies. It provides easy access to websites through a web-based proxy. Type in your desired URL and choose your desired server; you’ll be online quickly. Bandwidth use is unmetered. You have limitless bandwidth to peruse, harvest info from, and download anything you choose. There are zero advertisements. Using secure proxy services is another area of focus. To protect your anonymity online, HMA provides URL encryption. Website tracking scripts, including JavaScript, may be disabled as well. Furthermore, you can choose to turn off cookie support.

VPNBook

VPNBook is the top choice if you need a free HTTPS proxy server. The service offers a static IP address and unlimited sessions. A sticky bar is always provided to navigate between sessions quickly. The addition of a web browser is like the icing on the cake. It takes little effort to begin using this product. It includes a web-based proxy browser like HMA. Choose the quickest server available to you. The list of data centers is organized geographically, so you may choose one that’s convenient for you.

Regarding safety, VPNBook is the greatest proxy server list available. It uses AES-256-bit encryption to make SSL surfing possible. To avoid leaving a digital footprint, it erases your past activities. It also stops additional scripts that certain sites employ to monitor your behavior. With WebRTC, your IP will be safe from prying eyes.

Tor

Tor is an anonymous web browser. You won’t have to worry about being monitored or restricting your Internet access. It works on several operating systems, including Android, iOS, and Linux. Applications that are set up correctly may be shielded by Tor. Tor encrypts data traveling to and within the Tor network, but the ultimate destination website is responsible for its encryption. Tor can protect you from being tracked, monitored, and having your digital fingerprint taken. It also offers many layers of security for your data. Since browser plugins like RealPlayer and QuickTime can expose your IP address, Tor has disabled them entirely and implemented HTTPS for private encryption of all traffic to and from the internet.

Check Out 1000’s AI Tools in AI Tools Club

Note: Colored Tools are supported by their respective companies and sponsored.

The post Best Proxy Servers (September 2023) appeared first on MarkTechPost.

Use Amazon SageMaker Model Card sharing to improve model governance

Posted on September 1, 2023 by i-genie

As Artificial Intelligence (AI) and Machine Learning (ML) technologies have become mainstream, many enterprises have been successful in building critical business applications powered by ML models at scale in production. However, since these ML models are making critical business decisions for the business, it’s important for enterprises to add proper guardrails throughout their ML lifecycle. Guardrails ensure that security, privacy, and quality of the code, configuration, and data and model configuration used in model lifecycle are versioned and preserved.
Implementing these guardrails is getting harder for enterprises because the ML processes and activities within enterprises are becoming more complex due to the inclusion of deeply involved processes that require contributions from multiple stakeholders and personas. In addition to data engineers and data scientists, there have been inclusions of operational processes to automate & streamline the ML lifecycle. Additionally, the surge of business stakeholders and in some cases legal and compliance reviews need capabilities to add transparency for managing access control, activity tracking, and reporting across the ML lifecycle.
The framework that gives systematic visibility into ML model development, validation, and usage is called ML governance. During AWS re:Invent 2022, AWS introduced new ML governance tools for Amazon SageMaker which simplifies access control and enhances transparency over your ML projects. One of the tools available as part of the ML governance is Amazon SageMaker Model Cards, which has the capability to create a single source of truth for model information by centralizing and standardizing documentation throughout the model lifecycle.
SageMaker model cards enable you to standardize how models are documented, thereby achieving visibility into the lifecycle of a model, from designing, building, training, and evaluation. Model cards are intended to be a single source of truth for business and technical metadata about the model that can reliably be used for auditing and documentation purposes. They provide a fact sheet of the model that is important for model governance.
As you scale your models, projects, and teams, as a best practice we recommend that you adopt a multi-account strategy that provides project and team isolation for ML model development and deployment. For more information about improving governance of your ML models, refer to Improve governance of your machine learning models with Amazon SageMaker.
Architecture overview
The architecture is implemented as follows:

Data Science Account – Data Scientists conduct their experiments in SageMaker Studio and build an MLOps setup to deploy models to staging/production environments using SageMaker Projects.
ML Shared Services Account – The MLOps set up from the Data Science account will trigger continuous integration and continuous delivery (CI/CD) pipelines using AWS CodeCommit and AWS CodePipeline.
Dev Account – The CI/CD pipelines will further trigger ML pipelines in this account covering data pre-processing, model training and post processing like model evaluation and registration. Output of these pipelines will deploy the model in SageMaker endpoints to be consumed for inference purposes. Depending on your governance requirements, Data Science & Dev accounts can be merged into a single AWS account.
Data Account – The ML pipelines running in the Dev Account will pull the data from this account.
Test and Prod Accounts – The CI/CD pipelines will continue the deployment after the Dev Account to set up SageMaker endpoint configuration in these accounts.
Security and Governance – Services like AWS Identity and Access Management (IAM), AWS IAM Identity Center, AWS CloudTrail, AWS Key Management Service (AWS KMS), Amazon CloudWatch, and AWS Security Hub will be used across these accounts as part of security and governance.

The following diagram illustrates this architecture.

For more information about setting scalable multi account ML architecture, refer to MLOps foundation for enterprises with Amazon SageMaker.
Our customers need the capability to share model cards across accounts to improve visibility and governance of their models through information shared in the model card. Now, with cross-account model cards sharing, customers can enjoy the benefits of multi-account strategy while having accessibility into the available model cards in their organization, so they can accelerate collaboration and ensure governance.
In this post, we show how to set up and access model cards across Model Development Lifecycle (MDLC) accounts using the new cross-account sharing feature of the model card. First, we will describe a scenario and architecture for setting up the cross-account sharing feature of the model card, and then dive deep into each component of how to set up and access shared model cards across accounts to improve visibility and model governance.
Solution overview
When building ML models, we recommend setting up a multi-account architecture to provide workload isolation improving security, reliability, and scalability. For this post, we will assume building and deploying a model for Customer Churn use case. The architecture diagram that follows shows one of the recommended approaches – centralized model card – for managing a model card in a multi-account Machine Learning Model-Development Lifecycle (MDLC) architecture. However, you can also adopt another approach, a hub-and-spoke model card. In this post, we will focus only on a centralized model card approach, but the same principles can be extended to a hub-and-spoke approach. The main difference is that each spoke account will maintain their own version of model card and it will have processes to aggregate and copy to a centralized account.
The following diagram illustrates this architecture.

The architecture is implemented as follows:

Lead Data Scientist is notified to solve the Customer Churn use case using ML, and they start the ML project through creation of a model card for Customer Churn V1 model in Draft status in the ML Shared Services Account
Through automation, that model card is shared with ML Dev Account
Data Scientist builds the model and starts to populate information via APIs into the model card based on their experimentation results and the model card status is set to Pending Review
Through automation, that model card is shared with the ML test account
ML Engineer (MLE) runs integration and validation tests in ML Test account and the model in the central registry is marked Pending Approval
Model Approver reviews the model results with the supporting documentation provided in the central model card and approves the model card for production deployment.
Through automation, that model card is shared with ML Prod account in read-only mode.

Prerequisites
Before you get started, make sure you have the following prerequisites:

Two AWS accounts.
In both AWS accounts, an IAM federation role with administrator access to do the following:

Create, edit, view, and delete model cards within Amazon SageMaker.
Create, edit, view, and delete resource share within AWS RAM.

For more information, refer to Example IAM policies for AWS RAM.
Setting up model card sharing
The account where the model cards are created is the model card account. Users in the model card account share them with the shared accounts where they can be updated. Users in the model card account can share their model cards through AWS Resource Access Manager (AWS RAM). AWS RAM helps you share resources across AWS accounts.
In the following section, we show how to share model cards.
First, create a model card for a Customer Churn use case as previously described. On the Amazon SageMaker console, expand the Governance section and choose Model cards.

We create the model card in Draft status with the name Customer-Churn-Model-Card. For more information, refer to Create a model card. In this demonstration, you can leave the remainder of the fields blank and create the model card.

Alternatively, you can use the following AWS CLI command to create the model card:

aws sagemaker create-model-card –model-card-name Customer-Churn-Model-Card –content “{“model_overview”: {“model_owner”: “model-owner”,”problem_type”: “Customer Churn Model”}}” –model-card-status Draft

Now, create the cross-account share using AWS RAM. In the AWS RAM console, select Create a resource share.

Enter a name for the resource share, for example “Customer-Churn-Model-Card-Share”. In the Resources – optional section, select the resource type as SageMaker Model Cards. The model card we created in the previous step will appear in the listing.
Select that model and it will appear in the Selected resources section. Select that resource again as shown in the following steps and choose Next.

On the next page, you can select the Managed permissions. You can create custom permissions or use the default option “AWSRAMPermissionSageMakerModelCards” and select Next. For more information, refer to Managing permissions in AWS RAM.

On the next page, you can select Principals. Under Select principal type, choose AWS Account and enter the ID of the account of the share the model card. Select Add and continue to the next page.

On the last page, review the information and select “Create resource share”. Alternatively, you can use the following AWS CLI command to create a resource share:

aws ram create-resource-share –name <Name of the Model Card>

aws ram associate-resource-share –resource-share-arn <ARN of resource share create from the previous command> –resource-arns <ARN of the Model Card>

On the AWS RAM console, you see the attributes of the resource share. Make sure that Shared resources, Managed permissions, and Shared principals are in the “Associated” status.

After you use AWS RAM to create a resource share, the principals specified in the resource share can be granted access to the share’s resources.

If you turn on AWS RAM sharing with AWS Organizations, and your principals that you share with are in the same organization as the sharing account, those principals can receive access as soon as their account administrator grants them permissions.
If you don’t turn on AWS RAM sharing with Organizations, you can still share resources with individual AWS accounts that are in your organization. The administrator in the consuming account receives an invitation to join the resource share, and they must accept the invitation before the principals specified in the resource share can access the shared resources.
You can also share with accounts outside of your organization if the resource type supports it. The administrator in the consuming account receives an invitation to join the resource share, and they must accept the invitation before the principals specified in the resource share can access the shared resources.

For more information about AWS RAM, refer to Terms and concepts for AWS RAM.
Accessing shared model cards
Now we can log in to the shared AWS account to access the model card. Make sure that you are accessing the AWS console using IAM permissions (IAM role) which allow access to AWS RAM.
With AWS RAM, you can view the resource shares to which you have been added, the shared resources that you can access, and the AWS accounts that have shared resources with you. You can also leave a resource share when you no longer require access to its shared resources.
To view the model card in the shared AWS account:

Navigate to the Shared with me: Shared resources page in the AWS RAM console.
Make sure that you are operating in the same AWS region where the share was created.
The model shared from the model account will be available in the listing. If there is a long list of resources, then you can apply a filter to find specific shared resources. You can apply multiple filters to narrow your search.
The following information is available:

Resource ID – The ID of the resource. This is the name of the model card that we created earlier in the model card account.
Resource type – The type of resource.
Last share date – The date on which the resource was shared with you.
Resource shares – The number of resource shares in which the resource is included. Choose the value to view the resource shares.
Owner ID – The ID of the principal who owns the resource.

You can also access the model card using the AWS CLI option. For the AWS IAM policy configured with the correct credentials, make sure that you have permissions to create, edit, and delete model cards within Amazon SageMaker. For more information, refer to Configure the AWS CLI.
You can use the following AWS IAM permissions policy as template:

{
“Version”: “2012-10-17”,
“Statement”: [
{
“Effect”: “Allow”,
“Action”: [
“sagemaker:DescribeModelCard”,
“sagemaker:UpdateModelCard”,
“sagemaker:CreateModelCardExportJob”,
“sagemaker:ListModelCardVersions”,
“sagemaker:DescribeModelCardExportJob”
],
“Resource”: [
“arn:aws:sagemaker:AWS-Region:AWS-model-card-account-id:model-card/example-model-card-name-0”,
“arn:aws:sagemaker:AWS-Region:AWS-model-card-account-id:model-card/example-model-card-name-1/*”
]
},
{
“Effect”: “Allow”,
“Action”: “s3:PutObject”,
“Resource”: “arn:aws:s3:::Amazon-S3-bucket-storing-the-pdf-of-the-model-card/model-card-name/*”
}
]
}

You can run the following AWS CLI command to access the details of the shared model card.

aws sagemaker describe-model-card –model-card-name <ARN of the model card>

Now you can make changes to this model card from this account.

aws sagemaker update-model-card –model-card-name <ARN of the Model Card> –content “{“model_overview”: {“model_owner”: “model-owner”,”problem_type”: “Customer Churn Model”}}”

After you make changes, go back to the model card account to see the changes that we made in this shared account.

The problem type has been updated to “Customer Churn Model” which we had provided as part of the AWS CLI command input.
Clean up
You can now delete the model card you created. Make sure that you delete the AWS RAM resource share that you created to share the model card.
Conclusion
In this post, we provided an overview of multi-account architecture for scaling and governing your ML workloads securely and reliably. We discussed the architecture patterns for setting up model card sharing and illustrated how centralized model card sharing patterns work. Finally, we set up model card sharing across multiple accounts for improving visibility and governance in your model development lifecycle. We encourage you try out the new model card sharing feature and let us know your feedback.

About the authors
Vishal Naik is a Sr. Solutions Architect at Amazon Web Services (AWS). He is a builder who enjoys helping customers accomplish their business needs and solve complex challenges with AWS solutions and best practices. His core area of focus includes Machine Learning, DevOps, and Containers. In his spare time, Vishal loves making short films on time travel and alternate universe themes.
Ram Vittal is a Principal ML Solutions Architect at AWS. He has over 20 years of experience architecting and building distributed, hybrid, and cloud applications. He is passionate about building secure and scalable AI/ML and big data solutions to help enterprise customers with their cloud adoption and optimization journey to improve their business outcomes. In his spare time, he rides his motorcycle and walks with his 2-year-old sheep-a-doodle!

Month: September 2023

Unveil The Secrets Of Anatomical Segmentation With HybridGNet: An AI E …

XLang NLP Lab Researchers Propose Lemur: The State-of-the-Art Open Pre …

Researchers from Inception, MBZUAI, and Cerebras Open-Sourced ‘Jais� …

Top Video Conferencing Tools 2023

Meet AnomalyGPT: A Novel IAD Approach Based on Large Vision-Language M …

This AI Research Paper Presents a Comprehensive Survey of Deep Learnin …

Unlocking the Power of Diversity in Neural Networks: How Adaptive Neur …

Elevating the generative AI experience: Introducing streaming support …

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

Is ChatGPT Really Neutral? An Empirical Study on Political Bias in AI- …

Top Data Privacy Tools 2023

Best Proxy Servers (September 2023)

Use Amazon SageMaker Model Card sharing to improve model governance