Meet Time-LLM: A Reprogramming Machine Learning Framework to Repurpose …

In the rapidly evolving data analysis landscape, the quest for robust time series forecasting models has taken a novel turn with the introduction of TIME-LLM, a pioneering framework developed by a collaboration between esteemed institutions, including Monash University and Ant Group. This framework departs from traditional approaches by harnessing the vast potential of Large Language Models (LLMs), traditionally used in natural language processing, to predict future trends in time series data. Unlike the specialized models that require extensive domain knowledge and copious amounts of data, TIME-LLM cleverly repurposes LLMs without modifying their core structure, offering a versatile and efficient solution to the forecasting problem.

At the heart of TIME-LLM lies an innovative reprogramming technique that translates time series data into text prototypes, effectively bridging the gap between numerical data and the textual understanding of LLMs. This method, known as Prompt-as-Prefix (PaP), enriches the input with contextual cues, allowing the model to interpret and forecast time series data accurately. This approach not only leverages LLMs’ inherent pattern recognition and reasoning capabilities but also circumvents the need for domain-specific data, setting a new benchmark for model generalizability and performance.

The methodology behind TIME-LLM is both intricate and ingenious. By segmenting the input time series into discrete patches, the model applies learned text prototypes to each segment, transforming them into a format that LLMs can comprehend. This process ensures that the vast knowledge embedded in LLMs is effectively utilized, enabling them to draw insights from time series data as if it were natural language. Adding task-specific prompts further enhances the model’s ability to make nuanced predictions, providing a clear directive for transforming the reprogrammed input.

Empirical evaluations of TIME-LLM have underscored its superiority over existing models. Notably, the framework has demonstrated exceptional performance in both few-shot and zero-shot learning scenarios, outclassing specialized forecasting models across various benchmarks. This is particularly impressive considering the diverse nature of time series data and the complexity of forecasting tasks. Such results highlight the adaptability of TIME-LLM, proving its efficacy in making precise predictions with minimal data input, a feat that traditional models often need help to achieve.

The implications of TIME-LLM’s success extend far beyond time series forecasting. By demonstrating that LLMs can be effectively repurposed for tasks outside their original domain, this research opens up new avenues for applying LLMs in data analysis and beyond. The potential to leverage LLMs’ reasoning and pattern recognition capabilities for various types of data presents an exciting frontier for exploration.

In essence, TIME-LLM embodies a significant leap forward in data analysis. Its ability to transcend traditional forecasting models’ limitations, efficiency, and adaptability positions it as a groundbreaking tool for future research and applications. TIME-LLM and similar frameworks are vital for shaping the next generation of analytical tools. They’re versatile and powerful, making them indispensable for navigating complex data-driven decision-making.

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel

The post Meet Time-LLM: A Reprogramming Machine Learning Framework to Repurpose LLMs for General Time Series Forecasting with the Backbone Language Models Kept Intact appeared first on MarkTechPost.

This AI Paper from Apple Proposes Acoustic Model Fusion to Drastically …

Significant improvements have been made in enhancing the accuracy and efficiency of Automatic Speech Recognition (ASR) systems. The recent research delves into integrating an external Acoustic Model (AM) into End-to-End (E2E) ASR systems, presenting an approach that addresses the persistent challenge of domain mismatch – a common obstacle in speech recognition technology. This methodology by Apple, known as Acoustic Model Fusion (AMF), aims to refine the speech recognition process by leveraging the strengths of external acoustic models to complement the inherent capabilities of E2E systems.

Earlier E2E ASR systems are renowned for their streamlined architecture, combining all essential speech recognition components into a single neural network. This integration facilitates the system’s learning process, allowing it to predict sequences of characters or words directly from audio input. Despite the simplification and efficiency offered by this model, it encounters limitations when dealing with rare or complex words that are underrepresented in its training data. Previous efforts have primarily focused on incorporating external Language Models (LMs) to enhance the system’s vocabulary. This solution must fully address the domain mismatch between the model’s internal acoustic understanding and its diverse real-world applications.

The Apple research team’s AMF technique emerges as a groundbreaking solution to this problem. By integrating an external AM with the E2E system, AMF enriches the system with broader acoustic knowledge and significantly reduces Word Error Rates (WER). The methodology involves meticulously interpolating scores from the external AM with those of the E2E system, akin to shallow fusion techniques but applied distinctly to acoustic modeling. This innovative approach has demonstrated remarkable improvements in the system’s performance, particularly in recognizing named entities and addressing the challenges of rare words.

The efficacy of AMF was rigorously tested through a series of experiments using diverse datasets, including virtual assistant queries, dictated sentences, and synthesized audio-text pairs designed to test the system’s ability to recognize named entities accurately. The results of these tests were compelling, showcasing a notable reduction in WER – up to 14.3% across different test sets. This achievement highlights the potential of AMF to enhance the accuracy and reliability of ASR systems.

Some key findings and contributions of this research include:

The introduction of Acoustic Model Fusion as a novel method to integrate external acoustic knowledge into E2E ASR systems addresses the domain mismatch issue.

There was a significant reduction in Word Error Rates, with up to 14.3% improvement across various test sets, showcasing the effectiveness of AMF in enhancing speech recognition accuracy.

Enhanced recognition of named entities and rare words, underscoring the method’s potential to improve the system’s vocabulary and adaptability.

This demonstration of AMF’s superiority over traditional LM integration methods offers a promising direction for future advancements in ASR technology.

The implications of this research are profound, paving the way for more accurate, efficient, and adaptable speech recognition systems. The success of Acoustic Model Fusion in mitigating domain mismatches and improving word recognition opens new avenues for applying ASR technology across a myriad of domains. This study contributes a significant innovation to speech recognition and sets the stage for further exploration and development in the quest for flawless human-computer interaction through speech.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel

The post This AI Paper from Apple Proposes Acoustic Model Fusion to Drastically Cut Word Error Rates in Speech Recognition Systems appeared first on MarkTechPost.

This AI Paper from China Introduces BGE-M3: A New Member to BGE Model …

BAAI introduces BGE M3-Embedding with the help of researchers from the University of Science and Technology of China. The M3 refers to three novel properties of text embedding- Multi-Lingual, Multi-Functionality, and Multi-Granularity. It identifies the primary challenges in the existing embedding models, like being unable to support multiple languages, restrictions in retrieval functionalities, and difficulty handling varied input granularities. 

Existing embedding models, such as Contriever, GTR, E5, and others, have been proven to bring notable progress in the field, but they lack language support, multiple retrieval functionality, or long input texts. These models are mainly trained only for English and support only one retrieval functionality. The proposed solution, BGE M3-Embedding, supports over 100 languages, accommodates diverse retrieval functionalities (dense, sparse, and multi-vector retrieval), and processes input data ranging from short sentences to lengthy document handling up to 8192 tokens.

M3-Embedding involves a novel self-knowledge distillation approach, optimizing batching strategies for large input lengths, for which researchers used large-scale, diverse multi-lingual datasets from various sources like Wikipedia and S2ORC. It facilitates three common retrieval functionalities: dense retrieval, lexical retrieval, and multi-vector retrieval. The distillation process involves combining relevance scores from various retrieval functionalities to create a teacher signal that enables the model to perform multiple retrieval tasks efficiently. 

The model is evaluated for its performance with multilingual text(MLDR), varied sequence length, and narrative QA responses. The evaluation metric was nDCG@10(normalized discounted cumulative gain).  The experiments demonstrated that the M3 embedding model outperformed existing models in more than 10 languages, giving at-par results in English. The model performance was similar to the other models with smaller input lengths but showcased improved results with longer texts.

In conclusion, M3 embedding is a significant advancement in text embedding models. It is a versatile solution that supports multiple languages, varied retrieval functionalities, and different input granularities. The proposed model addresses crucial limitations in existing methods, marking a substantial step forward in information retrieval. It outperforms baseline methods like BM25, mDPR, and E5, showcasing its effectiveness in addressing the identified challenges.

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel

The post This AI Paper from China Introduces BGE-M3: A New Member to BGE Model Series with Multi-Linguality (100+ languages) appeared first on MarkTechPost.

Announcing support for Llama 2 and Mistral models and streaming respon …

Launched in 2021, Amazon SageMaker Canvas is a visual, point-and-click service for building and deploying machine learning (ML) models without the need to write any code. Ready-to-use Foundation Models (FMs) available in SageMaker Canvas enable customers to use generative AI for tasks such as content generation and summarization.
We are thrilled to announce the latest updates to Amazon SageMaker Canvas, which bring exciting new generative AI capabilities to the platform. With support for Meta Llama 2 and Mistral.AI models and the launch of streaming responses, SageMaker Canvas continues to empower everyone that wants to get started with generative AI without writing a single line of code. In this post, we discuss these updates and their benefits.
Introducing Meta Llama 2 and Mistral models
Llama 2 is a cutting-edge foundation model by Meta that offers improved scalability and versatility for a wide range of generative AI tasks. Users have reported that Llama 2 is capable of engaging in meaningful and coherent conversations, generating new content, and extracting answers from existing notes. Llama 2 is among the state-of-the-art large language models (LLMs) available today for the open source community to build their own AI-powered applications.
Mistral.AI, a leading AI French start-up, has developed the Mistral 7B, a powerful language model with 7.3 billion parameters. Mistral models has been very well received by the open-source community thanks to the usage of Grouped-query attention (GQA) for faster inference, making it highly efficient and performing comparably to model with twice or three times the number of parameters.
Today, we are excited to announce that SageMaker Canvas now supports three Llama 2 model variants and two Mistral 7B variants:

Llama-2-13B-chat and Llama-2-70B-chat, powered by Amazon Bedrock
Llama-2-7b-Chat, powered by Amazon SageMaker JumpStart
Mistral-7B and Mistral-7B-Chat, powered by Amazon SageMaker JumpStart

To test these models, navigate to the SageMaker Canvas Ready-to-use models page, then choose Generate, extract and summarize content. This is where you’ll find the SageMaker Canvas GenAI chat experience. In here, you can use any model from Amazon Bedrock or SageMaker JumpStart by selecting them on the model drop-down menu.
In our case, we choose one of the Llama 2 models. Now you can provide your input or query. As you send the input, SageMaker Canvas forwards your input to the model.

Choosing which one of the models available in SageMaker Canvas fits best for your use case requires you to take into account information about the models themselves: the Llama-2-70B-chat model is a bigger model (70 billion parameters, compared to 13 billion with Llama-2-13B-chat ), which means that its performance is generally higher that the smaller one, at the cost of a slightly higher latency and an increased cost per token. Mistral-7B has performances comparable to Llama-2-7B or Llama-2-13B, however it is hosted on Amazon SageMaker. This means that the pricing model is different, moving from a dollar-per-token pricing model, to a dollar-per-hour model. This can be more cost effective with a significant amount of requests per hour and a consistent usage at scale. All of the models above can perform well on a variety of use cases, so our suggestion is to evaluate which model best solves your problem, considering output, throughput, and cost trade-offs.
If you’re looking for a straightforward way to compare how models behave, SageMaker Canvas  natively provides this capability in the form of model comparisons. You can select up to three different models and send the same query to all of them at once. SageMaker Canvas will then get the responses from each of the models and show them in a side-by-side chat UI. To do this, choose Compare and choose other models to compare against, as shown below:

Introducing response streaming: Real-time interactions and enhanced performance
One of the key advancements in this release is the introduction of streamed responses. The streaming of responses provides a richer experience for the user and better reflects a chat experience. With streaming responses, users can receive instant feedback and seamless integration in their chatbot applications. This allows for a more interactive and responsive experience, enhancing the overall performance and user satisfaction of the chatbot. The ability to receive immediate responses in a chat-like manner creates a more natural conversation flow and improves the user experience.

With this feature, you can now interact with your AI models in real time, receiving instant responses and enabling seamless integration into a variety of applications and workflows. All models that can be queried in SageMaker Canvas—from Amazon Bedrock and SageMaker JumpStart—can stream responses to the user.
Get started today
Whether you’re building a chatbot, recommendation system, or virtual assistant, the Llama 2 and Mistral models combined with streamed responses bring enhanced performance and interactivity to your projects.
To use the latest features of SageMaker Canvas, make sure to delete and recreate the app. To do that, log out from the app by choosing Log out, then open SageMaker Canvas again. You should see the new models and enjoy the latest releases. Logging out of the SageMaker Canvas application will release all resources used by the workspace instance, therefore avoiding incurring additional unintended charges.

Conclusion
To get started with the new streamed responses for the Llama 2 and Mistral models in SageMaker Canvas, visit the SageMaker console and explore the intuitive interface. To learn more about how SageMaker Canvas and generative AI can help you achieve your business goals, refer to Empower your business users to extract insights from company documents using Amazon SageMaker Canvas and Generative AI and Overcoming common contact center challenges with generative AI and Amazon SageMaker Canvas.
If you want to learn more about SageMaker Canvas features and deep dive on other ML use cases, check out the other posts available in the SageMaker Canvas category of the AWS ML Blog. We can’t wait to see the amazing AI applications you will create with these new capabilities!

About the authors
Davide Gallitelli is a Senior Specialist Solutions Architect for AI/ML. He is based in Brussels and works closely with customers all around the globe that are looking to adopt Low-Code/No-Code Machine Learning technologies, and Generative AI. He has been a developer since he was very young, starting to code at the age of 7. He started learning AI/ML at university, and has fallen in love with it since then.
Dan Sinnreich is a Senior Product Manager at AWS, helping to democratize low-code/no-code machine learning. Previous to AWS, Dan built and commercialized enterprise SaaS platforms and time-series models used by institutional investors to manage risk and construct optimal portfolios. Outside of work, he can be found playing hockey, scuba diving, and reading science fiction.

How HSR.health is limiting risks of disease spillover from animals to …

This is a guest post co-authored by Ajay K Gupta, Jean Felipe Teotonio and Paul A Churchyard from HSR.health.
HSR.health is a geospatial health risk analytics firm whose vision is that global health challenges are solvable through human ingenuity and the focused and accurate application of data analytics. In this post, we present one approach for zoonotic disease prevention that uses Amazon SageMaker geospatial capabilities to create a tool that provides more accurate disease spread information to health scientists to help them save more lives, quicker.
Zoonotic diseases affect both animals and humans. The transition of a disease from animal to human, known as spillover, is a phenomenon that continually occurs on our planet. According to health organizations such as the Centers for Disease Control and Prevention (CDC) and the World Health Organization (WHO), a spillover event at a wet market in Wuhan, China most likely caused the coronavirus disease 2019 (COVID-19). Studies suggest that a virus found in fruit bats underwent significant mutations, allowing it to infect humans. The initial patient, or ‘patient zero’, for COVID-19 probably started a subsequent local outbreak that eventually spread on internationally. HSR.health’s Zoonotic Spillover Risk Index aims to assist in the identification of these early outbreaks before they cross international borders and lead to widespread global impact.
The main weapon public health has against the propagation of regional outbreaks is disease surveillance: an entire interlocking system of disease reporting, investigation, and data communication between different levels of a public health system. This system is dependent not only on human factors, but also on technology and resources to collect disease data, analyze patterns, and create a consistent and continuous stream of data transfer from local to regional to central health authorities.
The speed at which COVID-19 went from a local outbreak to a global disease present in every single continent should be a sobering example of the dire need to harness innovative technology to create more efficient and accurate disease surveillance systems.
The risk of zoonotic disease spillover is sharply correlated with multiple social, environmental, and geographic factors that influence how often human beings interact with wildlife. HSR.health’s Zoonotic Disease Spillover Risk Index uses over 20 distinct geographic, social, and environmental factors historically known to affect the risk of human-wildlife interaction and therefore zoonotic disease spillover risk. Many of these factors can be mapped through a combination of satellite imagery and remote sensing.
In this post, we explore how HSR.health uses SageMaker geospatial capabilities to retrieve relevant features from satellite imagery and remote sensing for developing the risk index. SageMaker geospatial capabilities make it easy for data scientists and machine learning (ML) engineers to build, train, and deploy models using geospatial data. With SageMaker geospatial capabilities, you can efficiently transform or enrich large-scale geospatial datasets, accelerate model building with pre-trained ML models, and explore model predictions and geospatial data on an interactive map using 3D accelerated graphics and built-in visualization tools.
Using ML and geospatial data for risk mitigation
ML is highly effective for anomaly detection on spatial or temporal data due to its ability to learn from data without being explicitly programmed to identify specific types of anomalies. Spatial data, which relates to the physical position and shape of objects, often contains complex patterns and relationships that may be difficult for traditional algorithms to analyze.
Incorporating ML with geospatial data enhances the capability to detect anomalies and unusual patterns systematically, which is essential for early warning systems. These systems are crucial in fields such as environmental monitoring, disaster management, and security. Predictive modeling using historical geospatial data allows organizations to identify and prepare for potential future events. These events range from natural disasters and traffic disruptions to, as this post discusses, disease outbreaks.
Detecting Zoonotic spillover risks
To predict zoonotic spillover risks, HSR.health has adopted a multimodal approach. By using a blend of data types—including environmental, biogeographical, and epidemiological information—this method enables a comprehensive assessment of disease dynamics. Such a multifaceted perspective is critical for developing proactive measures and enabling a rapid response to outbreaks.
The approach includes the following components:

Disease and outbreak data – HSR.health uses the extensive disease and outbreak data provided by Gideon and the World Health Organization (WHO), two trusted sources of global epidemiological information. This data serves as a fundamental pillar in the analytics framework. For Gideon, the data can be accessed through an API, and for the WHO, HSR.health has built a large language model (LLM) to mine outbreak data from past disease outbreak reports.
Earth observation data – Environmental factors, land use analysis and detection of habitat changes are integral components to assessing zoonotic risk. These insights can be derived from satellite-based earth observation data. HSR.health is able to streamline the use of earth observation data by using SageMaker geospatial capabilities to access and manipulate large-scale geospatial datasets. SageMaker geospatial offers a rich data catalog, including datasets from USGS Landsat-8, Sentinel-1, Sentinel-2, and others. It is also possible to bring in other datasets, such as high-resolution imagery from Planet Labs.
Social determinants of risk – Beyond biological and environmental factors, the team at HSR.health also considered social determinants, which encompass various socioeconomic and demographic indicators, and play a pivotal role in shaping zoonotic spillover dynamics.

From these components, HSR.health evaluated a range of different factors, and the following features have been identified as influential for identifying zoonotic spillover risks:

Animal habitats and habitable zones – Understanding the habitats of potential zoonotic hosts and their habitable zones is fundamental to assessing transmission risk.
Population centers – Proximity to densely populated areas is a key consideration because it influences the likelihood of human-animal interactions.
Loss of habitat – The degradation of natural habitats, particularly through deforestation, can accelerate zoonotic spillover events.
Human-wildland interface – Areas where human settlements intersect with wildlife habitats are potential hotspots for zoonotic transmission.
Social characteristics – Socioeconomic and cultural factors can significantly impact zoonotic risk, and HSR.health examines these as well.
Human health characteristics – The health status of local human populations is an essential variable because it affects susceptibility and transmission dynamics.

Solution overview
HSR.health’s workflow encompasses data preprocessing, feature extraction, and the creation of informative visualizations using ML techniques. This allows for a clear understanding of the data’s evolution from its raw form to actionable insights.
The following is a visual representation of the workflow, starting with input data from Gideon, earth observation data, and social determinant of risk data.

Retrieve and process satellite imagery using SageMaker geospatial capabilities
Satellite data forms a cornerstone of the analysis performed to build the risk index, providing critical information on environmental changes. To generate insights from satellite imagery, HSR.health uses Earth Observation Jobs (EOJs). EOJs enable the acquisition and transformation of raster data gathered from the Earth’s surface. An EOJ obtains satellite imagery from a designated data source—for instance, a satellite constellation—over a specific area and time period. It then applies one or more models to the retrieved images.
Additionally, Amazon SageMaker Studio offers a geospatial notebook pre-installed with commonly-used geospatial libraries. This notebook enables direct visualization and processing of geospatial data within a Python notebook environment. EOJs can be created in the geospatial notebook environment.
To configure an EOJ, the following parameters are used:

InputConfig – The input configuration specifies the data sources and the filtering criteria to be used during data acquisition:

RasterDataCollectionArn – Specifies the satellite from which to collect data.
AreaOfInterest – The geographical area of interest (AOI) defines the polygon boundaries for image collection.
TimeRangeFilter – The time range of interest: {StartTime: <string>, EndTime: <string>}.
PropertyFilters – Additional property filters, such as acceptable percentage of cloud coverage or desired sun azimuth angles.

JobConfig – This configuration defines the type of job to be applied to the retrieved satellite image data. It supports operations such as band math, resampling, geomosaic or cloud removal.

The following example code demonstrates running an EOJ for cloud removal, representative of the steps performed by HSR.health:

eoj_input_config = {
“RasterDataCollectionQuery”: {
“RasterDataCollectionArn”: “arn:aws:sagemaker-geospatial:us-west-2:378778860802:raster-data-collection/public/nmqj48dcu3g7ayw8”,
“AreaOfInterest”: {
“AreaOfInterestGeometry”: {
“PolygonGeometry”: {
“Coordinates”: [
[
[-76.23240119828894,-6.268815697653608],
[-76.23240119828894,-6.339419992332921],
[-76.13834453776985,-6.339419992332921],
[-76.13834453776985,-6.268815697653608],
[-76.23240119828894,-6.268815697653608]
]
]
}
}
},
“TimeRangeFilter”: {
“StartTime”: “2022-03-01T00:00:00Z”,
“EndTime”: “2022-06-30T23:59:59Z”,
},
“PropertyFilters”: {
“Properties”: [{“Property”: {“EoCloudCover”: {“LowerBound”: 0.0, “UpperBound”: 2.0}}}],
“LogicalOperator”: “AND”,
},
}
}
eoj_job_config = {
“CloudRemovalConfig”: {
“AlgorithmName”: “INTERPOLATION”,
“InterpolationValue”: “-9999”,
“TargetBands”: [“red”, “green”, “blue”, “nir”, “swir16″],
}
}

eoj = geospatial_client.start_earth_observation_job(
Name=”eoj-analysis-loreto”,
InputConfig=eoj_input_config,
JobConfig=eoj_job_config,
ExecutionRoleArn=execution_role,
)

HSR.health used several operations to preprocess the data and extract relevant features. This includes operations such as land cover classification, mapping temperature variation, and vegetation indexes.
One vegetation index relevant for indicating vegetation health is the Normalized Difference Vegetation Index (NDVI). The NDVI quantifies vegetation health by using near-infrared light, which vegetation reflects, and red light, which vegetation absorbs. Monitoring the NDVI over time can reveal changes in vegetation, such as the impact of human activities like deforestation.
The following code snippet demonstrates how to calculate a vegetation index like the NDVI based on the data that has been passed through cloud removal:

eoj_input_config = {
“PreviousEarthObservationJobArn”: eoj[“Arn”]
}
eoj_job_config = {
“BandMathConfig”: {
“CustomIndices”: {
“Operations”: [
{
“Equation”: “(nir – red) / (nir + red)”,
“Name”: “ndvi”,
“OutputType”: “FLOAT32″
}
]
}
}
}
eoj = geospatial_client.start_earth_observation_job(
Name=”eoj-vi-ndvi”,
InputConfig=eoj_input_config,
JobConfig=eoj_job_config,
ExecutionRoleArn=execution_role,
)

We can visualize the job output using SageMaker geospatial capabilities. SageMaker geospatial capabilities can help you overlay model predictions on a base map and provide layered visualization to make collaboration easier. With the GPU-powered interactive visualizer and Python notebooks, it’s possible to explore millions of data points in one view, facilitating the collaborative exploration of insights and results.
The steps outlined in this post demonstrate just one of the many raster-based features that HSR.health has extracted to create the risk index.
Combining raster-based features with health and social data
After extracting the relevant features in raster format, HSR.health used zonal statistics to aggregate the raster data within the administrative boundary polygons to which the social and health data are assigned. The analysis incorporates a combination of raster and vector geospatial data. This kind of aggregation allows for the management of raster data in a geodataframe, which facilitates its integration with the health and social data to produce the final risk index.
The following code snippet demonstrates how to aggregate raster data to administrative vector boundaries:

import geopandas as gp
import numpy as np
import pandas as pd
import rasterio
from rasterstats import zonal_stats
import pandas as pd

def get_proportions(inRaster, inVector, classDict, idCols, year):
# Reading In Vector File
if ‘.parquet’ in inVector:
vector = gp.read_parquet(inVector)
else:
vector = gp.read_file(inVector)
raster = rasterio.open(inRaster)
vector = vector.to_crs(raster.crs)
# Retrieving the Bounding Box for the Raster Image
xmin, ymin, xmax, ymax = raster.bounds
# Selecting the Vector Features that Intersect with the Raster Bounding Box
vector = vector.cx[xmin:xmax, ymin:ymax]
vector = vector.reset_index()
# Calculate the sum of pixels of each class in the vector geometries
stats = zonal_stats(vector.geometry, raster.read(1), affine=raster.transform, nodata=raster.nodata, categorical=True)
# Creating a dataframe with the class sum of pixels and the id fields of the vector geometries
df1 = pd.DataFrame(data=stats)
df1 = df1.fillna(0)
df1[‘totalpixels’] = df1.sum(axis=1)
df1[‘year’] = year
if ‘year’ in vector.columns.tolist():
vector = vector.drop([‘year’], 1)
# Merging the class sum of pixels dataframe with the vector geodataframe
df = vector.merge(df1, left_index=True, right_index=True)
# Renaming Columns
cdict = pd.read_csv(classDict)
cdict = cdict.set_index(“Value”)[‘Class_name’].to_dict()
df = df.rename(columns=cdict)
keptCols = [x for x in df.columns.tolist() if x in idCols + list(cdict.values()) + [‘totalpixels’, ‘year’]]
df = df[keptCols]
return(df)

def aggregateData(rasterList, inVector, classDict, idCols, years):
dfList = []
# Creating aggregated raster to vector geodataframes for all rasters in rasterList
for tiff in rasterList:
inRaster = tiff
year = [x for x in years if x in tiff][0]
dfList.append(get_proportions(inRaster, inVector, classDict, idCols, year))
# Concating into a single geodataframe
allDf = pd.concat(dfList, ignore_index=True)
classDictDf = pd.read_csv(classDict)
# Renaming the numerical values of the categories to the string version of the category name
classCols = classDictDf[‘Class_name’].unique().tolist()
# Summing the pixel counts by administrative division as a single administrative division might cover more than one raster image
for col in classCols:
allDf[col] = allDf[col].fillna(0)
allDf[col] = allDf.groupby(idCols + [‘year’])[col].transform(lambda x: x.sum())
# Removing Duplicates from the dataframe
allDf = allDf.groupby(idCols + [‘year’]).first().reset_index()
# Reattaching the geometry to the aggregated raster data
if ‘.parquet’ in inVector:
vector = gp.read_parquet(inVector)
else:
vector = gp.read_file(inVector)
allDf = vector.merge(allDf, on=idCols)
return(allDf)

To evaluate the extracted features effectively, ML models are used to predict factors representing each feature. One of the models used is a support vector machine (SVM). The SVM model assists in revealing patterns and associations within data that inform risk assessments.
The index represents a quantitative assessment of risk levels, calculated as a weighted average of these factors, to aid in understanding potential spillover events in various regions.

import pandas as pd
import numpy as np
import geopandas as gp

def finalIndicatorCalculation(inputLayer, weightDictionary, outLayer):
# Creating a dictionary with the weights for each factor in the indicator
weightsDict = pd.read_csv(weightDictionary).set_index(‘metric’)[‘weight’].to_dict()
# Reading in the data from the layer
layer = gp.read_file(inputLayer)
# Initializing the Sum of the Weights
layer[‘sumweight’] = 0
# Calculating the sum of the weighted factors
for col in weightsDict.keys():
layer[col] = layer[col].fillna(0)
layer[‘sumweight’] = layer[‘sumweight’] + (layer[col] * zweights[col])
# Calculating Raw Zoonotic Spillover Risk Index
layer[‘raw_idx’] = np.log(layer[‘e_pop’]) * layer[‘sumweight’]
# Normalizing the Index between 0 and 100
layer[‘zs_idx’] = ((layer[‘raw_idx’] – layer[‘raw_idx’].min()) / (layer[‘raw_idx’].max() – layer[‘raw_idx’].min()) * 100).round(2)
return(layer)

The following figure on the left shows the aggregation of the image classification from the test area scene in northern Peru aggregated to the district administrative level with the calculated change in the forest area between 2018–2023. Deforestation is one of the key factors that determine the risk of zoonotic spillover. The figure on the right highlights the zoonotic spillover risk severity levels within the regions covered, ranging from highest (red) to the lowest (dark green) risk. The area was chosen as one of the training areas for the image classification due to the diversity of land cover captured in the scene, including: urban, forest, sand, water, grassland, and agriculture, among others. Additionally, this is one of many areas of interest for potential zoonotic spillover events due to the deforestation and interaction between humans and animals.

By adopting this multi-modal approach, encompassing historical data on disease outbreak, Earth observation data, social determinants, and ML techniques, we can better understand and predict zoonotic spillover risk, ultimately directing disease surveillance and prevention strategies to areas of greatest outbreak risk. The following screenshot shows a dashboard of the output from a zoonotic spillover risk analysis. This risk analysis highlights where resources and surveillance for new potential zoonotic outbreaks can occur so that the next disease can be contained before it becomes an endemic or a new pandemic.

A novel approach to pandemic prevention
In 1998, along the Nipah River in Malaysia, between the fall of 1998 and spring of 1999, 265 people were infected with a then unknown virus that caused acute encephalitis and severe respiratory distress. 105 of them died, a 39.6% fatality rate. COVID-19’s untreated fatality rate by contrast is 6.3%. Since then, the Nipah Virus, as it is now dubbed, has transitioned out of its forest habitat and caused over 20 deadly outbreaks, mostly in India and Bangladesh.
Viruses such as Nipah surface each year, posing challenges to our daily lives, particularly in countries where establishing strong, lasting, and robust systems for disease surveillance and detection is more difficult. These detection systems are crucial for reducing the risks associated with such viruses.
Solutions that use ML and geospatial data, such as the Zoonotic Spillover Risk Index, can assist local public health authorities in prioritizing resource allocation to areas of highest risk. By doing so, they can establish targeted and localized surveillance measures to detect and halt regional outbreaks before they extend beyond borders. This approach can significantly limit the impact of a disease outbreak and save lives.
Conclusion
This post demonstrated how HSR.health successfully developed the Zoonotic Spillover Risk Index by integrating geospatial data, health, social determinants, and ML. By using SageMaker, the team created a scalable workflow that can pinpoint the most substantial threats of a potential future pandemic. Effective management of these risks can lead to a reduction in the global disease burden. The substantial economic and social advantages of reducing pandemic risk cannot be overstated, with benefits extending regionally and globally.
HSR.health used SageMaker geospatial capabilities for an initial implementation of the Zoonotic Spillover Risk Index and is now seeking partnerships, as well as support from host countries and funding sources, to develop the index further and extend its application to additional regions around the world. For more information about HSR.health and the Zoonotic Spillover Risk Index, visit www.hsr.health.
Discover the potential of integrating Earth observation data into your healthcare initiatives by exploring SageMaker geospatial features. For more information, refer to Amazon SageMaker geospatial capabilities, or engage with additional examples to get hands-on experience.

About the Authors
Ajay K Gupta is Co-Founder and CEO of HSR.health, a firm that disrupts and innovates health risk analytics through geospatial tech and AI techniques to predict the spread and severity of disease. And provides these insights to industry, governments, and the health sector so they can anticipate, mitigate, and take advantage of future risks. Outside of work, you can find Ajay behind the mic bursting eardrums while belting out his favorite pop music tunes from U2, Sting, George Michael, or Imagine Dragons.
Jean Felipe Teotonio is a driven physician and passionate expert in healthcare quality and infectious disease epidemiology, Jean Felipe leads the HSR.health public health team. He works towards the shared goal of improving public health by reducing the global burden of disease by leveraging GeoAI approaches to develop solutions for the greatest health challenges of our time. Outside of work, his hobbies include reading sci fi books, hiking, the English premier league, and playing bass guitar.
Paul A Churchyard, CTO and Chief Geospatial Engineer for HSR.health, uses his broad technical skills and expertise to build the core infrastructure for the firm as well as its patented and proprietary GeoMD Platform. Additionally, he and the data science team incorporate geospatial analytics and AI/ML techniques into all health risk indices HSR.health produces. Outside of work, Paul is a self-taught DJ and loves snow.
Janosch Woschitz is a Senior Solutions Architect at AWS, specializing in geospatial AI/ML. With over 15 years of experience, he supports customers globally in leveraging AI and ML for innovative solutions that capitalize on geospatial data. His expertise spans machine learning, data engineering, and scalable distributed systems, augmented by a strong background in software engineering and industry expertise in complex domains such as autonomous driving.
Emmett Nelson is an Account Executive at AWS supporting Nonprofit Research customers across the Healthcare & Life Sciences, Earth / Environmental Sciences, and Education verticals. His primary focus is enabling use cases across analytics, AI/ML, high performance computing (HPC), genomics, and medical imaging. Emmett joined AWS in 2020 and is based in Austin, TX.

This AI Paper from CMU and Apple Unveils WRAP: A Game-Changer for Pre- …

Large Language Models (LLMs) have gathered a massive amount of attention and popularity among the Artificial Intelligence (AI) community in recent months. These models have demonstrated great capabilities in tasks including text summarization, question answering, code completion, content generation, etc. 

LLMs are frequently trained on inadequate web-scraped data. Most of the time, this data is loud, unstructured, and not necessarily expressed clearly. Following the existing scaling principles, which indicate that as the size of the model increases, computational power and data quantity should also increase proportionately, comes as a challenge.

There are two main limitations. Firstly, there is the significant computational cost and time involved in pre-training. Secondly, there is the impending problem of the scarcity of high-quality data available on the Internet. In recent research, a team of researchers from Apple and Carnegie Mellon University has addressed these issues by introducing the idea of Web Rephrase Augmented Pre-training (WRAP). 

WRAP is an innovative method that makes use of an already-existing, instruction-tuned LLM. This LLM is used to paraphrase online pages into particular styles, including mimicking the tone of Wikipedia or converting text into an answer-question format. The main goal of WRAP is to improve LLMs’ pre-training by adding both genuine and artificially rephrased data. 

The primary features of WRAP are as follows:

Pre-training Efficiency: Applying WRAP to the noisy C4 dataset considerably speeds up pre-training, around three times faster. This effectiveness is critical in reducing the high expenses and time commitment usually related to LLM training.

Enhancement of Model Performance: WRAP makes the model perform better when run within the same computational budget. Using different subsets of the Pile, a large-scale dataset used for training and assessing LLMs reduces ambiguity by more than 10%. It improves zero-shot question-answer accuracy by over 2% for 13 different activities.

Rephrasing Web Documents: WRAP uses a medium-sized LLM to paraphrase documents from the web into several styles. This method is different from creating new data because it improves already-existing content while preserving the original information’s quality and diversity.

There are two main benefits to the synthetic data produced by WRAP. Firstly, it includes a range of styles that reflect the diversity of languages used in applications farther down the line. With this diversity, the LLM is better prepared for a wider variety of real-world events. Secondly, the synthetic data rephrased is of a higher quality than the raw web-scraped data. This quality enhancement results from language that is more ordered and cohesive, as this promotes more efficient model learning.

In conclusion, WRAP is a big advancement in the field of LLM pre-training. Through the use of superior-quality, different-style synthetic data, WRAP not only expedites the training process but also improves the overall performance of LLMs. Given the abundance of low-quality web data and the resource-intensive nature of classic LLM training approaches, this approach presents a possible way forward. 

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel

The post This AI Paper from CMU and Apple Unveils WRAP: A Game-Changer for Pre-training Language Models with Synthetic Data appeared first on MarkTechPost.

Meet RAGatouille: A Machine Learning Library to Train and Use SOTA Ret …

Creating effective pipelines, especially using RAG (Retrieval-Augmented Generation), can be quite challenging in information retrieval. These pipelines involve various components, and choosing the right models for retrieval is crucial. While dense embeddings like OpenAI’s text-ada-002 serve as a good starting point, recent research suggests that they might not always be the optimal choice for every scenario.

The Information Retrieval field has seen significant advancements, with models like ColBERT proving to generalize better to diverse domains and exhibit high data efficiency. However, these cutting-edge approaches often remain underutilized due to their complexity and the lack of user-friendly implementations. This is where RAGatouille steps in, aiming to simplify the integration of state-of-the-art retrieval methods, specifically focusing on making ColBERT more accessible.

Existing solutions often fail to provide a seamless bridge between complex research findings and practical implementation. RAGatouille addresses this gap by offering an easy-to-use framework that allows users to incorporate advanced retrieval methods effortlessly. Currently, RAGatouille primarily focuses on simplifying the usage of ColBERT, a model known for its effectiveness in various scenarios, including low-resource languages.

RAGatouille emphasizes two key aspects: providing strong default settings requiring minimal user intervention and offering modular components that users can customize. The library streamlines the training and fine-tuning process of ColBERT models, making it accessible even for users who may not have the resources or expertise to train their models from scratch.

Regarding metrics, RAGatouille showcases its capabilities through its TrainingDataProcessor, which automatically converts retrieval training data into training triplets. This process involves handling input pairs, labeled pairs, and various forms of triplets, removing duplicates, and generating hard negatives for more effective training. The library’s focus on simplicity is evident in its default settings, but users can easily tweak parameters to suit their specific requirements.

In conclusion, RAGatouille emerges as a solution to the complexities of incorporating state-of-the-art retrieval methods into RAG pipelines. Focusing on user-friendly implementations and simplifying the usage of models like Colbert, it opens up possibilities for a wider audience. The metrics, as demonstrated by its TrainingDataProcessor, showcase its effectiveness in handling diverse training data and generating meaningful triplets for training. RAGatouille aims to make advanced retrieval methods more accessible, bridging the gap between research findings and practical applications in the information retrieval world.

The post Meet RAGatouille: A Machine Learning Library to Train and Use SOTA Retrieval Model, ColBERT, in Just a Few Lines of Code appeared first on MarkTechPost.

Alibaba Researchers Introduce Mobile-Agent: An Autonomous Multi-Modal …

Mobile device agents utilizing Multimodal Large Language Models (MLLM) have gained popularity due to the rapid advancements in MLLMs, showcasing notable visual comprehension capabilities. This progress has made MLLM-based agents viable for diverse applications. The emergence of mobile device agents represents a novel application, requiring these agents to operate devices based on screen content and user instructions.

Existing work highlights the capabilities of Large Language Model (LLM)-based agents in task planning. However, challenges persist, particularly in the mobile device agent domain. While MLLMs show promise, including GPT-4V, they lack sufficient visual perception for effective mobile device operations. Previous attempts utilized interface layout files for localization but faced limitations in file accessibility, hindering their effectiveness.

Beijing Jiaotong University and Alibaba Group researchers have introduced Mobile-Agent, an autonomous multi-modal mobile device agent. Their approach utilizes visual perception tools to accurately identify and locate visual and textual elements within an app’s front-end interface. Leveraging the perceived vision context, Mobile-Agent autonomously plans and decomposes complex operation tasks, navigating through mobile apps step by step. Mobile-Agent differs from previous solutions by eliminating reliance on XML files or mobile system metadata, offering enhanced adaptability across diverse mobile operating environments through a vision-centric approach.Mobile-Agent employs OCR tools for text and CLIP for icon localization. The framework defines eight operations, enabling the agent to perform tasks such as opening apps, clicking text or icons, typing, and navigating. The Mobile Agent exhibits iterative self-planning and self-reflection, enhancing task completion through user instructions and real-time screen analysis. The mobile agent completes each step of the operation iteratively. Before the iteration begins, the user needs to input an instruction. During the iteration, the agent may encounter errors, leading to the inability to complete the instruction. To improve the success rate of instruction, there is a self-reflection method.

The researchers presented Mobile-Eval, a benchmark of 10 popular mobile apps with three instructions each to evaluate Mobile-Agent comprehensively. The framework achieved completion rates of 91%, 82%, and 82% across instructions, with a high Process Score of around 80%. Relative Efficiency demonstrated Mobile-Agent’s 80% capability compared to human-operated steps. The results highlight the effectiveness of Mobile-Agent, showcasing its self-reflective capabilities in correcting errors during the execution of instructions, contributing to its robust performance as a mobile device assistant.

To sum up, Beijing Jiaotong University and Alibaba Group researchers have introduced Mobile-Agent, an autonomous multimodal agent proficient in operating diverse mobile applications through a unified visual perception framework. By precisely identifying and locating visual and textual elements within app interfaces, Mobile-Agent autonomously plans and executes tasks. Its vision-centric approach enhances adaptability across mobile operating environments, eliminating the need for system-specific customizations. The study demonstrates Mobile-Agent’s effectiveness and efficiency through experiments, highlighting its potential as a versatile and adaptable solution for language-agnostic interaction with mobile applications.

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel

The post Alibaba Researchers Introduce Mobile-Agent: An Autonomous Multi-Modal Mobile Device Agent appeared first on MarkTechPost.

This AI Paper from China Introduces SegMamba: A Novel 3D Medical Image …

Enhancing the receptive field of models is crucial for effective 3D medical image segmentation. Traditional convolutional neural networks (CNNs) often struggle to capture global information from high-resolution 3D medical images. One proposed solution is the utilization of depth-wise convolution with larger kernel sizes to capture a wider range of features. However, CNN-based approaches need help in capturing relationships across distant pixels.

Recently, there has been an extensive exploration of transformer architectures, leveraging self-attention mechanisms to extract global information for 3D medical image segmentation like TransBTS, which combines 3D-CNN with transformers to capture both local spatial features and global dependencies in high-level features; UNETR, which adopts the Vision Transformer (ViT) as its encoder to learn contextual information. However, transformer-based methods often face computational challenges due to the high resolution of 3D medical images, leading to reduced speed performance.

To address the issues of long sequence modeling, researchers have previously introduced Mamba, a state space model (SSM), to model long-range dependencies efficiently through a selection mechanism and a hardware-aware algorithm. Various studies have applied Mamba in computer vision (CV) tasks. For instance, U-Mamba integrates the Mamba layer to improve medical image segmentation. 

At the same time, Vision Mamba proposes the Vim block, incorporating bidirectional SSM for global visual context modeling and position embeddings for location-aware understanding. VMamba also introduces a CSM module to bridge the gap between 1-D array scanning and 2-D plain traversing. However, traditional transformer blocks face challenges in handling large-size features, necessitating the modeling of correlations within high-dimensional features for enhanced visual understanding.

Motivated by this, researchers at the Beijing Academy of Artificial Intelligence introduced SegMamba, a novel architecture combining the U-shape structure with Mamba to model whole-volume global features at various scales. They utilize Mamba specifically for 3D medical image segmentation. SegMamba demonstrates remarkable capabilities in modeling long-range dependencies within volumetric data while maintaining outstanding inference efficiency compared to traditional CNN-based and transformer-based methods.

The researchers conducted Extensive experiments on the BraTS2023 dataset to affirm SegMamba’s effectiveness and efficiency in 3D medical image segmentation tasks. Unlike Transformer-based methods, SegMamba leverages the principles of state space modeling to excel in modeling whole-volume features while maintaining superior processing speed. Even with volume features at a resolution of 64 × 64 × 64 (equivalent to a sequential length of about 260k), SegMamba showcases remarkable efficiency.

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel

The post This AI Paper from China Introduces SegMamba: A Novel 3D Medical Image Segmentation Mamba Model Designed to Effectively Capture Long-Range Dependencies within Whole Volume Features at Every Scale appeared first on MarkTechPost.

A Meme’s Glimpse into the Pinnacle of Artificial Intelligence (AI) P …

In the dynamic field of Artificial Intelligence (AI), the trajectory from one foundational model to another has represented an amazing paradigm shift. The escalating series of models, including Mamba, Mamba MOE, MambaByte, and the latest approaches like Cascade, Layer-Selective Rank Reduction (LASER), and Additive Quantization for Language Models (AQLM) have revealed new levels of cognitive power. The famous ‘Big Brain’ meme has succinctly captured this progression and has humorously illustrated the rise from ordinary competence to extraordinary brilliance as one delf into the intricacies of each language model.

Mamba

Mamba is a linear-time sequence model that stands out for its rapid inference capabilities. Foundation models are predominantly built on the Transformer architecture due to its effective attention mechanism. However, Transformers encounter efficiency issues when dealing with long sequences. In contrast to conventional attention-based Transformer topologies, with Mamba, the team introduced structured State Space Models (SSMs) to address processing inefficiencies on extended sequences.

Mamba’s unique feature is its capacity for content-based reasoning, enabling it to spread or ignore information based on the current token. Mamba demonstrated rapid inference, linear sequence length scaling, and great performance in modalities such as language, audio, and genomics. It is distinguished by its linear scalability while managing lengthy sequences and its quick inference capabilities, allowing it to achieve a five times higher throughput rate than conventional Transformers.

Mamba MOE

MoE-Mamba has been built upon the foundation of Mamba and is the subsequent version that uses Mixture of Experts (MoE) power. By integrating SSMs with MoE, this model surpasses the capabilities of its predecessor and exhibits increased performance and efficiency. In addition to improving training efficiency, the integration of MoE keeps Mamba’s inference performance improvements over conventional Transformer models. 

Mamba MOE serves as a link between traditional models and the field of big-brained language processing. One of its main achievements is the effectiveness of MoE-Mamba’s training. While requiring 2.2 times fewer training steps than Mamba, it achieves the same level of performance.

MambaByte MOE

Token-free language models have represented a significant shift in Natural Language Processing (NLP), as they learn directly from raw bytes, bypassing the biases inherent in subword tokenization. However, this strategy has a problem as byte-level processing results in substantially longer sequences than token-level modeling. This length increase challenges ordinary autoregressive Transformers, whose quadratic complexity for sequence length usually makes it difficult to scale effectively for longer sequences.

MambaByte is a solution to this problem as is a modified version of the Mamba state space model that is intended to function autoregressively with byte sequences. It removes subword tokenization biases by operating directly on raw bytes, marking a step towards token-free language modeling. Comparative tests revealed that MambaByte outperformed other models built for comparable jobs in terms of computing performance while handling byte-level data. 

Self-reward fine-tuning

The concept of self-rewarding language models has been introduced with the goal of training the language model itself to produce incentives on its own. Using a technique known as LLM-as-a-Judge prompting, the language model assesses and rewards its own outputs for doing this. This strategy represents a substantial shift from depending on outside reward structures, and it can result in more flexible and dynamic learning processes.

With self-reward fine-tuning, the model takes charge of its own fate in the search for superhuman agents. After undergoing iterative DPO (Decision Process Optimization) training, the model becomes more adept at obeying instructions and rewarding itself with high-quality items. MambaByte MOE with Self-Reward Fine-Tuning represents a step toward models that continuously enhance in both directions, accounting for rewards and obeying commands.

CASCADE

A unique technique called Cascade Speculative Drafting (CS Drafting) has been introduced to improve the effectiveness of Large Language Model (LLM) inference by tackling the difficulties associated with speculative decoding. Speculative decoding provides preliminary outputs with a smaller, faster draft model, which is evaluated and improved upon by a bigger, more precise target model. 

Though this approach aims to lower latency, there are certain inefficiencies with it.

First, speculative decoding is inefficient because it relies on slow, autoregressive generation, which generates tokens sequentially and frequently causes delays. Second, regardless of how each token affects the overall quality of the output, this strategy allows the same amount of time to generate them all, regardless of how important they are.

CS. Drafting introduces both vertical and horizontal cascades to address inefficiencies in speculative decoding. While the horizontal cascade maximizes drafting time allocation, the vertical cascade removes autoregressive generation. Compared to speculative decoding, this new method can speed up processing by up to 72% while keeping the same output distribution.

LASER (LAyer-SElective Rank Reduction)

A counterintuitive approach referred to as LAyer-SElective Rank Reduction (LASER) has been introduced to improve LLM performance, which works by selectively removing higher-order components from the model’s weight matrices.  LASER ensures optimal performance by minimizing autoregressive generation inefficiencies by using a draft model to produce a bigger target model. 

LASER is a post-training intervention that doesn’t call for more information or settings. The major finding is that LLM performance can be greatly increased by choosing decreasing specific components of the weight matrices, in contrast to the typical trend of scaling-up models. The generalizability of the strategy has been proved through extensive tests conducted across multiple language models and datasets.

AQLM (Additive Quantization for Language Models)

AQLM introduces Multi-Codebook Quantization (MCQ) techniques, delving into severe LLM compression. This method, which builds upon Additive Quantization, achieves more accuracy at very low bit counts per parameter than any other recent method. Additive quantization is a sophisticated method that combines several low-dimensional codebooks to represent model parameters more effectively. 

On benchmarks such as WikiText2, AQLM delivers unprecedented compression while retaining high perplexity. This strategy greatly outperformed earlier methods when applied to LLAMA 2 models of different sizes, with lower perplexity scores indicating higher performance. 

DRUGS (Deep Random micro-Glitch Sampling)

This sampling technique redefines itself by introducing unpredictability into the model’s reasoning, which fosters originality. DRµGS presents a new method of sampling by introducing randomness in the thought process instead of after generation. This permits a variety of plausible continuations and provides adaptability in accomplishing different outcomes. It sets new benchmarks for effectiveness, originality, and compression.

Conclusion

To sum up, the progression of language modeling from Mamba to the ultimate set of incredible models is evidence of the unwavering quest for perfection. This progression’s models each provide a distinct set of advancements that advance the field. The meme’s representation of growing brain size is not just symbolic, it also captures the real increase in creativity, efficiency, and intellect that is inherent in each new model and approach.

This article was inspired by this Reddit post. All credit for this research goes to the researchers of these projects. Also, don’t forget to follow us on Twitter and Google News. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel

The post A Meme’s Glimpse into the Pinnacle of Artificial Intelligence (AI) Progress in a Mamba Series: LLM Enlightenment appeared first on MarkTechPost.

Meet DiffMoog: A Differentiable Modular Synthesizer with a Comprehensi …

Synthesizers, electronic instruments producing diverse sounds, are integral to music genres. Traditional sound design involves intricate parameter adjustments, demanding expertise. Neural networks aid by replicating input sounds, initially optimizing synthesizer parameters. Recent advances focus on optimizing sound directly for high-fidelity reproduction, requiring unsupervised learning for out-of-domain sounds. Differentiable synthesizers enable automatic differentiation crucial for backpropagation, but existing models could be more complex or lack modularity and essential sound modules. Practical applications require bridging this gap.

Researchers from Tel-Aviv University and The Open University, Israel, have unveiled DiffMoog, a differentiable modular synthesizer for AI-guided sound synthesis. DiffMoog integrates into neural networks, allowing automated sound matching by replicating audio inputs. Its modular architecture includes essential commercial instrument modules, facilitating custom signal chain creation. The open-source platform combines DiffMoog with an end-to-end system, introducing a unique signal-chain loss for optimization. Key contributions encompass an accessible gateway for AI sound synthesis research, a novel loss function, optimization insights, and showcasing the Wasserstein loss’s efficacy in frequency estimations. Challenges in frequency estimation persist, deviating from previous approaches emphasizing DiffMoog’s innovation.

Works in sound matching have utilized supervised datasets of sound samples and their parameters derived from non-differentiable synthesizers, training neural networks to predict sound parameters. Differentiable digital signal processing (DDSP) integrates signal processing modules as differential operations into neural networks, allowing backpropagation. It uses additive synthesis based on the Fourier theorem to construct complex sounds. Differentiable methods have been employed in audio effects applications, including a differentiable mixing console for automatic multitrack mixing and automating DJ transitions with differentiable audio effects. Other works have explored the power of generative adversarial networks (GANs) and diffusion models in sound synthesis. DiffMoog is the first and most comprehensive modular differentiable synthesizer, integrating both FM and subtractive synthesis techniques.

DiffMoog is a differentiable modular synthesizer that integrates a comprehensive set of modules typically found in commercial instruments, including modulation capabilities, low-frequency oscillators, filters, and envelope shapers. The synthesizer is designed to be differentiable, allowing it to be integrated into neural networks for automated sound matching. The study mentions an open-source platform that combines DiffMoog with an end-to-end sound-matching framework, utilizing a signal-chain loss and an encoder network. The researchers also report on their experiments with different synthesizer chains, loss configurations, and neural architectures, exploring the challenges and findings in sound matching using differentiable synthesis. 

DiffMoog is a differentiable modular synthesizer that enables automated sound matching and replication of given audio inputs. The researchers have developed an open-source platform that combines DiffMoog with an end-to-end sound-matching framework, utilizing a  signal-chain loss and an encoder network. The study provides insights and lessons learned towards sound matching using differentiable synthesis. DiffMoog, with its comprehensive set of modules and differentiable nature, stands as a premier asset for expediting research in audio synthesis and machine learning. The study also reports on the challenges faced in optimizing DiffMoog and demonstrates the excellence of the Wasserstein loss in frequency estimations.

In conclusion, The research suggests that differentiable synthesizers offer potential in sound matching when optimized with spectral loss. However, accurately replicating common sounds poses a significant challenge. Using the Wasserstein distance may address gradient issues in frequency estimation via spectral loss. The platform mentioned in this study is expected to stimulate additional research in this intriguing field. The researchers recommend investigating improved audio loss functions, optimization techniques, and alternative neural network structures to overcome the existing challenges and enhance precision in emulating typical sounds.

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel

The post Meet DiffMoog: A Differentiable Modular Synthesizer with a Comprehensive Set of Modules Typically Found in Commercial Instruments appeared first on MarkTechPost.

Meet DrugAssist: An Interactive Molecule Optimization Model that can I …

With the rise of Large Language Models (LLMs) in recent years, generative AI has made significant strides in the field of language processing, showcasing impressive abilities in a wide array of tasks. Given their potential in solving complex tasks, researchers have made quite a number of attempts to apply these models in the field of drug discovery to optimize the task. However, molecule optimization is one critical aspect of drug discovery that the LLMs have failed to affect significantly.

The existing methods generally focus on the patterns in the chemical structure provided by the data instead of leveraging the expert’s feedback and experience. This poses a problem as the drug discovery pipeline involves incorporating feedback from domain experts to refine the process further. In this work, the authors have tried to address the gaps in previous works by focusing on human-machine interaction and leveraging the interactivity and generalizability of powerful LLMs.

Researchers from Tencent AI Lab and Department of Computer Science, Hunan University released MolOpt-Instructions, which is a large instruction-based dataset for fine-tuning LLMs on molecule optimization tasks. This dataset has an adequate amount of data covering tasks associated with molecule optimization and ensures similarity constraints and a substantial difference in properties between molecules. Additionally, they have also proposed DrugAssist, a Llama-2-7B-Chat-based molecule optimization model capable of performing optimization interactively through human-machine dialogue. Through the dialogues, experts can further guide the model and optimize the initially generated results.

For evaluation, the researchers compared DrugAssist with two previous molecule optimization models and with three LLMs on metrics like solubility and BP and success rate and validity, respectively. As per the results, DrugAssist constantly achieved promising results in multi-property optimization and maintained optimized molecular property values within a given range.

Furthermore, the researchers demonstrated the exceptional capabilities of DrugAssist through a case study as well. Under the zero-shot setting, the model was asked to increase the values of two properties, BP and QED, by at least 0.1 simultaneously, and the model was successfully able to achieve the task even when it was exposed to the data during training only. 

Additionally, DrugAssist also successfully increased the logP value of a given molecule by 0.1, even though this property was not included in the training data. This showcases the good transferability of the model under zero-shot and few-shot settings, giving the users an option to combine individual properties and optimize them simultaneously. Lastly, in one of the interactions, the model generated a wrong answer by providing a molecule that did not meet the requirements. However, it corrected its mistake and provided a correct response based on human feedback.

In conclusion, DrugAssist is a molecule optimization model based on the Llama-2-7B-Chat model and is capable of interacting with humans in real time. It demonstrated exceptional results in single as well as multi-property optimizations and showed great transferability and iterative optimization capabilities. Lastly, the authors have aimed to improve the capabilities of the model further through multimodal data handling, which will significantly enhance and optimize the process of drug discovery.

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel

The post Meet DrugAssist: An Interactive Molecule Optimization Model that can Interact with Humans in Real-Time Using Natural Language appeared first on MarkTechPost.

Seeking Faster, More Efficient AI? Meet FP6-LLM: the Breakthrough in G …

In computational linguistics and artificial intelligence, researchers continually strive to optimize the performance of large language models (LLMs). These models, renowned for their capacity to process a vast array of language-related tasks, face significant challenges due to their expansive size. For instance, models like GPT-3, with 175 billion parameters, require substantial GPU memory, highlighting a need for more memory-efficient and high-performance computational methods.

One of the primary challenges in deploying large language models is their enormous size, which necessitates significant GPU memory and computational resources. The memory wall issues further compound this challenge during token generation, where the speed of model inference is primarily limited by the time taken to read model weights from GPU DRAM. Consequently, there is a pressing need for efficient methods to reduce the memory and computational load without compromising the models’ performance.

Current approaches to handling large language models often involve quantization techniques that use fewer bits to represent each model weight, resulting in a more compact representation. However, these techniques have limitations. For example, while reducing the model size, 4-bit and 8-bit quantizations do not efficiently support the execution of linear layers on modern GPUs, compromising either model quality or inference speed.

A team of researchers from Microsoft, the University of Sydney, and Rutgers University introduced a system design, TC-FPx, the first full-stack GPU kernel design scheme with unified Tensor Core support for various quantization bit-widths, including 6-bit, 5-bit, and 3-bit. This design addresses the challenges of unfriendly memory access and high runtime overhead associated with weight de-quantization in large language models. By integrating TC-FPx into existing inference systems, they developed a new end-to-end support system, FP6-LLM, for quantized LLM inference.

TC-FPx employs ahead-of-time bit-level pre-packing and SIMT-efficient GPU runtime to optimize memory access and minimize the runtime overhead of weight de-quantization. This approach significantly enhances the performance of large language models by enabling more efficient inference with reduced memory requirements. The researchers demonstrated that FP6-LLM allows the inference of models like LLaMA-70b using only a single GPU, achieving substantially higher normalized inference throughput than the FP16 baseline.

The performance of FP6-LLM has been rigorously evaluated, showcasing its significant improvements in normalized inference throughput compared to the FP16 baseline. In particular, FP6-LLM enabled the inference of models like LLaMA-70b using only a single GPU while achieving 1.69-2.65 times higher throughput. This breakthrough demonstrates FP6-LLM’s potential to offer a more efficient and cost-effective solution for deploying large language models. The system’s ability to handle the inference of complex models with a single GPU represents a considerable advancement in the field, opening new possibilities for applying large language models in various domains.

In conclusion, the research introduces a groundbreaking approach to deploying large language models through the development of FP6-LLM. Utilizing the TC-FPx kernel design, this system addresses the significant challenges posed by these models’ size and computational demands. By enabling more efficient GPU memory usage and higher inference throughput, FP6-LLM represents a vital step towards the practical and scalable deployment of large language models, paving the way for their broader application and utility in the field of artificial intelligence.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel

The post Seeking Faster, More Efficient AI? Meet FP6-LLM: the Breakthrough in GPU-Based Quantization for Large Language Models appeared first on MarkTechPost.

Seeking Speed without Loss in Large Language Models? Meet EAGLE: A Mac …

For LLMs, auto-regressive decoding is now considered the gold standard. Because LLMs generate output tokens individually, the procedure is time-consuming and expensive. Methods based on speculative sampling provide an answer to this problem. In the first, called the “draft” phase, LLMs are hypothesized at little cost; in the second, called the “verification” phase, all of the proposed tokens are checked in parallel using a single forward pass of the LLM. Speed is greatly improved by parallelizing speculative sampling, which allows for producing many post-check tokens for every LLM forward pass.

Speculative sampling aims to find a preliminary model that is comparable to the original LLM in terms of latency but faster. In most cases, a lower-parameter LLM derived from the same data set as the draft model is used in speculative sampling. 

Speeding up speculative sampling requires lowering the time overhead and increasing the draft’s acceptance rate by the original LLM. However, the drafts produced by these systems are less precise, limiting their potential.

Recent studies by Peking University, Microsoft Research, University of Waterloo, and Vector Institute present EAGLE (Extrapolation Algorithm for Greater Language-model Efficiency). It is a straightforward framework that departs from direct token prediction and executes auto-regressive operations at the feature level based on the observation that feature-level auto-regression is easier to handle than token-level auto-regression. EAGLE avoids the uncertainty in feature-level auto-regression by using a token sequence advanced by a one-time step. 

Theoretically, in both the greedy and non-greedy settings, EAGLE is guaranteed to preserve the output distribution and does not involve fine-tuning the original LLM. In certain instances, acceleration could cause LLM outputs to be incorrect or even hazardous, preventing any degradation. Lookahead and Medusa, on the other hand, are solely concerned with greedy situations. Compared to Medusa’s 0.6, EAGLE’s draft accuracy of about 0.8 is significantly better, achieved with a model that only includes a transformer decoder layer.

The study also offers views on aspects contributing to EAGLE’s effectiveness and introduces the simple yet efficient structure. These factors might be of independent relevance to other speculative sampling approaches. The foundation of EAGLE are these two findings: 

Top-layer features are more effective than bottom-layer token embeddings with the same lightweight network.

Draft models that only input top-layer features are severely limited in performance due to the inherent uncertainty in the sampling process. 

That is why it is critical to incorporate the token representing the sample results into the preliminary model.

The team tested EAGLE on the MT-bench, a realistic benchmark miming real-world scenarios and applications. This benchmark includes multi-turn instructions similar to ChatGPT dialogues. Because it has been used state-of-the-art to exhibit speedup ratios by Lookahead and Medusa, they have also decided to use it. This decision makes it easy to compare the proposed method to these standards impartially and straightforwardly. With a greedy decoding configuration, EAGLE provides a 3x acceleration for Vicuna-13B and LLaMA2-Chat 13B, 70B, which is theoretically certain to preserve the original LLM’s text distribution and is immediately usable. EAGLE outperforms the newly suggested speculative sampling-based frameworks Lookahead and Medusa with a speedup of 2x and a speedup of 1.6x, respectively. With EAGLE, performance is improved, and LLM systems’ throughput is doubled. 

EAGLE runs in tandem with other acceleration or throughput-enhancing techniques like quantization and compilation. The operational expenses of LLM systems could be further reduced by combining EAGLE with these approaches. Using gpt-fast1, EAGLE can increase the throughput of LLaMA2-Chat 7B decoding on a single RTX 3090 GPU from 24.5 to 160.4 tokens/s. Low training expenses are a feature of EAGLE. To train a decoder layer with less than 1 billion parameters for the LLaMA2-Chat 70B model, EAGLE uses the ShareGPT dataset with no more than 70k dialogues. On four A100 (40G) GPUs, the training takes about a day or two to finish. EAGLE can accelerate each query in real-world scenarios with just one training session. The amortized training cost of EAGLE falls to zero as the number of queries rises.

Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel

The post Seeking Speed without Loss in Large Language Models? Meet EAGLE: A Machine Learning Framework Setting New Standards for Lossless Acceleration appeared first on MarkTechPost.

Monitor embedding drift for LLMs deployed from Amazon SageMaker JumpSt …

One of the most useful application patterns for generative AI workloads is Retrieval Augmented Generation (RAG). In the RAG pattern, we find pieces of reference content related to an input prompt by performing similarity searches on embeddings. Embeddings capture the information content in bodies of text, allowing natural language processing (NLP) models to work with language in a numeric form. Embeddings are just vectors of floating point numbers, so we can analyze them to help answer three important questions: Is our reference data changing over time? Are the questions users are asking changing over time? And finally, how well is our reference data covering the questions being asked?
In this post, you’ll learn about some of the considerations for embedding vector analysis and detecting signals of embedding drift. Because embeddings are an important source of data for NLP models in general and generative AI solutions in particular, we need a way to measure whether our embeddings are changing over time (drifting). In this post, you’ll see an example of performing drift detection on embedding vectors using a clustering technique with large language models (LLMS) deployed from Amazon SageMaker JumpStart. You’ll also be able to explore these concepts through two provided examples, including an end-to-end sample application or, optionally, a subset of the application.
Overview of RAG
The RAG pattern lets you retrieve knowledge from external sources, such as PDF documents, wiki articles, or call transcripts, and then use that knowledge to augment the instruction prompt sent to the LLM. This allows the LLM to reference more relevant information when generating a response. For example, if you ask an LLM how to make chocolate chip cookies, it can include information from your own recipe library. In this pattern, the recipe text is converted into embedding vectors using an embedding model, and stored in a vector database. Incoming questions are converted to embeddings, and then the vector database runs a similarity search to find related content. The question and the reference data then go into the prompt for the LLM.
Let’s take a closer look at the embedding vectors that get created and how to perform drift analysis on those vectors.
Analysis on embedding vectors
Embedding vectors are numeric representations of our data so analysis of these vectors can provide insight into our reference data that can later be used to detect potential signals of drift. Embedding vectors represent an item in n-dimensional space, where n is often large. For example, the GPT-J 6B model, used in this post, creates vectors of size 4096. To measure drift, assume that our application captures embedding vectors for both reference data and incoming prompts.
We start by performing dimension reduction using Principal Component Analysis (PCA). PCA tries to reduce the number of dimensions while preserving most of the variance in the data. In this case, we try to find the number of dimensions that preserves 95% of the variance, which should capture anything within two standard deviations.
Then we use K-Means to identify a set of cluster centers. K-Means tries to group points together into clusters such that each cluster is relatively compact and the clusters are as distant from each other as possible.
We calculate the following information based on the clustering output shown in the following figure:

The number of dimensions in PCA that explain 95% of the variance
The location of each cluster center, or centroid

Additionally, we look at the proportion (higher or lower) of samples in each cluster, as shown in the following figure.

Finally, we use this analysis to calculate the following:

Inertia – Inertia is the sum of squared distances to cluster centroids, which measures how well the data was clustered using K-Means.
Silhouette score – The silhouette score is a measure for the validation of the consistency within clusters, and ranges from -1 to 1. A value close to 1 means that the points in a cluster are close to the other points in the same cluster and far from the points of the other clusters. A visual representation of the silhouette score can be seen in the following figure.

We can periodically capture this information for snapshots of the embeddings for both the source reference data and the prompts. Capturing this data allows us to analyze potential signals of embedding drift.
Detecting embedding drift
Periodically, we can compare the clustering information through snapshots of the data, which includes the reference data embeddings and the prompt embeddings. First, we can compare the number of dimensions needed to explain 95% of the variation in the embedding data, the inertia, and the silhouette score from the clustering job. As you can see in the following table, compared to a baseline, the latest snapshot of embeddings requires 39 more dimensions to explain the variance, indicating that our data is more dispersed. The inertia has gone up, indicating that the samples are in aggregate farther away from their cluster centers. Additionally, the silhouette score has gone down, indicating that the clusters are not as well defined. For prompt data, that might indicate that the types of questions coming into the system are covering more topics.

Next, in the following figure, we can see how the proportion of samples in each cluster has changed over time. This can show us whether our newer reference data is broadly similar to the previous set, or covers new areas.

Finally, we can see if the cluster centers are moving, which would show drift in the information in the clusters, as shown in the following table.

Reference data coverage for incoming questions
We can also evaluate how well our reference data aligns to the incoming questions. To do this, we assign each prompt embedding to a reference data cluster. We compute the distance from each prompt to its corresponding center, and look at the mean, median, and standard deviation of those distances. We can store that information and see how it changes over time.
The following figure shows an example of analyzing the distance between the prompt embedding and reference data centers over time.

As you can see, the mean, median, and standard deviation distance statistics between prompt embeddings and reference data centers is decreasing between the initial baseline and the latest snapshot. Although the absolute value of the distance is difficult to interpret, we can use the trends to determine if the semantic overlap between reference data and incoming questions is getting better or worse over time.
Sample application
In order to gather the experimental results discussed in the previous section, we built a sample application that implements the RAG pattern using embedding and generation models deployed through SageMaker JumpStart and hosted on Amazon SageMaker real-time endpoints.
The application has three core components:

We use an interactive flow, which includes a user interface for capturing prompts, combined with a RAG orchestration layer, using LangChain.
The data processing flow extracts data from PDF documents and creates embeddings that get stored in Amazon OpenSearch Service. We also use these in the final embedding drift analysis component of the application.
The embeddings are captured in Amazon Simple Storage Service (Amazon S3) via Amazon Kinesis Data Firehose, and we run a combination of AWS Glue extract, transform, and load (ETL) jobs and Jupyter notebooks to perform the embedding analysis.

The following diagram illustrates the end-to-end architecture.

The full sample code is available on GitHub. The provided code is available in two different patterns:

Sample full-stack application with a Streamlit frontend – This provides an end-to-end application, including a user interface using Streamlit for capturing prompts, combined with the RAG orchestration layer, using LangChain running on Amazon Elastic Container Service (Amazon ECS) with AWS Fargate
Backend application – For those that don’t want to deploy the full application stack, you can optionally choose to only deploy the backend AWS Cloud Development Kit (AWS CDK) stack, and then use the Jupyter notebook provided to perform RAG orchestration using LangChain

To create the provided patterns, there are several prerequisites detailed in the following sections, starting with deploying the generative and text embedding models then moving on to the additional prerequisites.
Deploy models through SageMaker JumpStart
Both patterns assume the deployment of an embedding model and generative model. For this, you’ll deploy two models from SageMaker JumpStart. The first model, GPT-J 6B, is used as the embedding model and the second model, Falcon-40b, is used for text generation.
You can deploy each of these models through SageMaker JumpStart from the AWS Management Console, Amazon SageMaker Studio, or programmatically. For more information, refer to How to use JumpStart foundation models. To simplify the deployment, you can use the provided notebook derived from notebooks automatically created by SageMaker JumpStart. This notebook pulls the models from the SageMaker JumpStart ML hub and deploys them to two separate SageMaker real-time endpoints.
The sample notebook also has a cleanup section. Don’t run that section yet, because it will delete the endpoints just deployed. You will complete the cleanup at the end of the walkthrough.
After confirming successful deployment of the endpoints, you’re ready to deploy the full sample application. However, if you’re more interested in exploring only the backend and analysis notebooks, you can optionally deploy only that, which is covered in the next section.
Option 1: Deploy the backend application only
This pattern allows you to deploy the backend solution only and interact with the solution using a Jupyter notebook. Use this pattern if you don’t want to build out the full frontend interface.
Prerequisites
You should have the following prerequisites:

A SageMaker JumpStart model endpoint deployed – Deploy the models to SageMaker real-time endpoints using SageMaker JumpStart, as previously outlined
Deployment parameters – Record the following:

Text model endpoint name – The endpoint name of the text generation model deployed with SageMaker JumpStart
Embeddings model endpoint name – The endpoint name of the embedding model deployed with SageMaker JumpStart

Deploy the resources using the AWS CDK
Use the deployment parameters noted in the previous section to deploy the AWS CDK stack. For more information about AWS CDK installation, refer to Getting started with the AWS CDK.
Make sure that Docker is installed and running on the workstation that will be used for AWS CDK deployment. Refer to Get Docker for additional guidance.

$ cd pattern1-rag/cdk
$ cdk deploy BackendStack –exclusively
-c textModelEndpointName=<Enter the SageMaker Endpoint Name for the Text generation model>
-c embeddingsModelEndpointName=<Enter the SageMaker Endpoint Name for the Text embeddings model>

Alternatively, you can enter the context values in a file called cdk.context.json in the pattern1-rag/cdk directory and run cdk deploy BackendStack –exclusively.
The deployment will print out outputs, some of which will be needed to run the notebook. Before you can start question and answering, embed the reference documents, as shown in the next section.
Embed reference documents
For this RAG approach, reference documents are first embedded with a text embedding model and stored in a vector database. In this solution, an ingestion pipeline has been built that intakes PDF documents.
An Amazon Elastic Compute Cloud (Amazon EC2) instance has been created for the PDF document ingestion and an Amazon Elastic File System (Amazon EFS) file system is mounted on the EC2 instance to save the PDF documents. An AWS DataSync task is run every hour to fetch PDF documents found in the EFS file system path and upload them to an S3 bucket to start the text embedding process. This process embeds the reference documents and saves the embeddings in OpenSearch Service. It also saves an embedding archive to an S3 bucket through Kinesis Data Firehose for later analysis.
To ingest the reference documents, complete the following steps:

Retrieve the sample EC2 instance ID that was created (see the AWS CDK output JumpHostId) and connect using Session Manager, a capability of AWS Systems Manager. For instructions, refer to Connect to your Linux instance with AWS Systems Manager Session Manager.
Go to the directory /mnt/efs/fs1, which is where the EFS file system is mounted, and create a folder called ingest:

$ cd /mnt/efs/fs1
$ mkdir ingest && cd ingest

Add your reference PDF documents to the ingest directory.

The DataSync task is configured to upload all files found in this directory to Amazon S3 to start the embedding process.
The DataSync task runs on an hourly schedule; you can optionally start the task manually to start the embedding process immediately for the PDF documents you added.

To start the task, locate the task ID from the AWS CDK output DataSyncTaskID and start the task with defaults.

After the embeddings are created, you can start the RAG question and answering through a Jupyter notebook, as shown in the next section.
Question and answering using a Jupyter notebook
Complete the following steps:

Retrieve the SageMaker notebook instance name from the AWS CDK output NotebookInstanceName and connect to JupyterLab from the SageMaker console.
Go to the directory fmops/full-stack/pattern1-rag/notebooks/.
Open and run the notebook query-llm.ipynb in the notebook instance to perform question and answering using RAG.

Make sure to use the conda_python3 kernel for the notebook.
This pattern is useful to explore the backend solution without needing to provision additional prerequisites that are required for the full-stack application. The next section covers the implementation of a full-stack application, including both the frontend and backend components, to provide a user interface for interacting with your generative AI application.
Option 2: Deploy the full-stack sample application with a Streamlit frontend
This pattern allows you to deploy the solution with a user frontend interface for question and answering.
Prerequisites
To deploy the sample application, you must have the following prerequisites:

SageMaker JumpStart model endpoint deployed – Deploy the models to your SageMaker real-time endpoints using SageMaker JumpStart, as outlined in the previous section, using the provided notebooks.
Amazon Route 53 hosted zone – Create an Amazon Route 53 public hosted zone to use for this solution. You can also use an existing Route 53 public hosted zone, such as example.com.
AWS Certificate Manager certificate – Provision an AWS Certificate Manager (ACM) TLS certificate for the Route 53 hosted zone domain name and its applicable subdomains, such as example.com and *.example.com for all subdomains. For instructions, refer to Requesting a public certificate. This certificate is used to configure HTTPS on Amazon CloudFront and the origin load balancer.
Deployment parameters – Record the following:

Frontend application custom domain name – A custom domain name used to access the frontend sample application. The domain name provided is used to create a Route 53 DNS record pointing to the frontend CloudFront distribution; for example, app.example.com.
Load balancer origin custom domain name – A custom domain name used for the CloudFront distribution load balancer origin. The domain name provided is used to create a Route 53 DNS record pointing to the origin load balancer; for example, app-lb.example.com.
Route 53 hosted zone ID – The Route 53 hosted zone ID to host the custom domain names provided; for example, ZXXXXXXXXYYYYYYYYY.
Route 53 hosted zone name – The name of the Route 53 hosted zone to host the custom domain names provided; for example, example.com.
ACM certificate ARN – The ARN of the ACM certificate to be used with the custom domain provided.
Text model endpoint name – The endpoint name of the text generation model deployed with SageMaker JumpStart.
Embeddings model endpoint name – The endpoint name of the embedding model deployed with SageMaker JumpStart.

Deploy the resources using the AWS CDK
Use the deployment parameters you noted in the prerequisites to deploy the AWS CDK stack. For more information, refer to Getting started with the AWS CDK.
Make sure Docker is installed and running on the workstation that will be used for the AWS CDK deployment.

$ cd pattern1-rag/cdk
$ cdk deploy –all -c appCustomDomainName=<Enter Custom Domain Name to be used for Frontend App>
-c loadBalancerOriginCustomDomainName=<Enter Custom Domain Name to be used for Load Balancer Origin>
-c customDomainRoute53HostedZoneID=<Enter Route53 Hosted Zone ID for the Custom Domain being used>
-c customDomainRoute53HostedZoneName=<Enter Route53 Hostedzone Name>
-c customDomainCertificateArn=<Enter ACM Certificate ARN for Custom Domains provided>
-c textModelEndpointName=<Enter the SageMaker Endpoint Name for the Text generation model>
-c embeddingsModelEndpointName=<Enter the SageMaker Endpoint Name for the Text embeddings model>

In the preceding code, -c represents a context value, in the form of the required prerequisites, provided on input. Alternatively, you can enter the context values in a file called cdk.context.json in the pattern1-rag/cdk directory and run cdk deploy –all.
Note that we specify the Region in the file bin/cdk.ts. Configuring ALB access logs requires a specified Region. You can change this Region before deployment.
The deployment will print out the URL to access the Streamlit application. Before you can start question and answering, you need to embed the reference documents, as shown in the next section.
Embed the reference documents
For a RAG approach, reference documents are first embedded with a text embedding model and stored in a vector database. In this solution, an ingestion pipeline has been built that intakes PDF documents.
As we discussed in the first deployment option, an example EC2 instance has been created for the PDF document ingestion and an EFS file system is mounted on the EC2 instance to save the PDF documents. A DataSync task is run every hour to fetch PDF documents found in the EFS file system path and upload them to an S3 bucket to start the text embedding process. This process embeds the reference documents and saves the embeddings in OpenSearch Service. It also saves an embedding archive to an S3 bucket through Kinesis Data Firehose for later analysis.
To ingest the reference documents, complete the following steps:

Retrieve the sample EC2 instance ID that was created (see the AWS CDK output JumpHostId) and connect using Session Manager.
Go to the directory /mnt/efs/fs1, which is where the EFS file system is mounted, and create a folder called ingest:

$ cd /mnt/efs/fs1
$ mkdir ingest && cd ingest

Add your reference PDF documents to the ingest directory.

The DataSync task is configured to upload all files found in this directory to Amazon S3 to start the embedding process.
The DataSync task runs on an hourly schedule. You can optionally start the task manually to start the embedding process immediately for the PDF documents you added.

To start the task, locate the task ID from the AWS CDK output DataSyncTaskID and start the task with defaults.

Question and answering
After the reference documents have been embedded, you can start the RAG question and answering by visiting the URL to access the Streamlit application. An Amazon Cognito authentication layer is used, so it requires creating a user account in the Amazon Cognito user pool deployed via the AWS CDK (see the AWS CDK output for the user pool name) for first-time access to the application. For instructions on creating an Amazon Cognito user, refer to Creating a new user in the AWS Management Console.
Embed drift analysis
In this section, we show you how to perform drift analysis by first creating a baseline of the reference data embeddings and prompt embeddings, and then creating a snapshot of the embeddings over time. This allows you to compare the baseline embeddings to the snapshot embeddings.
Create an embedding baseline for the reference data and prompt
To create an embedding baseline of the reference data, open the AWS Glue console and select the ETL job embedding-drift-analysis. Set the parameters for the ETL job as follows and run the job:

Set –job_type to BASELINE.
Set –out_table to the Amazon DynamoDB table for reference embedding data. (See the AWS CDK output DriftTableReference for the table name.)
Set –centroid_table to the DynamoDB table for reference centroid data. (See the AWS CDK output CentroidTableReference for the table name.)
Set –data_path to the S3 bucket with the prefix; for example, s3://<REPLACE_WITH_BUCKET_NAME>/embeddingarchive/. (See the AWS CDK output BucketName for the bucket name.)

Similarly, using the ETL job embedding-drift-analysis, create an embedding baseline of the prompts. Set the parameters for the ETL job as follows and run the job:

Set –job_type to BASELINE
Set –out_table to the DynamoDB table for prompt embedding data. (See the AWS CDK output DriftTablePromptsName for the table name.)
Set –centroid_table to the DynamoDB table for prompt centroid data. (See the AWS CDK output CentroidTablePrompts for the table name.)
Set –data_path to the S3 bucket with the prefix; for example, s3://<REPLACE_WITH_BUCKET_NAME>/promptarchive/. (See the AWS CDK output BucketName for the bucket name.)

Create an embedding snapshot for the reference data and prompt
After you ingest additional information into OpenSearch Service, run the ETL job embedding-drift-analysis again to snapshot the reference data embeddings. The parameters will be the same as the ETL job that you ran to create the embedding baseline of the reference data as shown in the previous section, with the exception of setting the –job_type parameter to SNAPSHOT.
Similarly, to snapshot the prompt embeddings, run the ETL job embedding-drift-analysis again. The parameters will be the same as the ETL job that you ran to create the embedding baseline for the prompts as shown in the previous section, with the exception of setting the –job_type parameter to SNAPSHOT.
Compare the baseline to the snapshot
To compare the embedding baseline and snapshot for reference data and prompts, use the provided notebook pattern1-rag/notebooks/drift-analysis.ipynb.
To look at embedding comparison for reference data or prompts, change the DynamoDB table name variables (tbl and c_tbl) in the notebook to the appropriate DynamoDB table for each run of the notebook.
The notebook variable tbl should be changed to the appropriate drift table name. The following is an example of where to configure the variable in the notebook.

The table names can be retrieved as follows:

For the reference embedding data, retrieve the drift table name from the AWS CDK output DriftTableReference
For the prompt embedding data, retrieve the drift table name from the AWS CDK output DriftTablePromptsName

In addition, the notebook variable c_tbl should be changed to the appropriate centroid table name. The following is an example of where to configure the variable in the notebook.

The table names can be retrieved as follows:

For the reference embedding data, retrieve the centroid table name from the AWS CDK output CentroidTableReference
For the prompt embedding data, retrieve the centroid table name from the AWS CDK output CentroidTablePrompts

Analyze the prompt distance from the reference data
First, run the AWS Glue job embedding-distance-analysis. This job will find out which cluster, from the K-Means evaluation of the reference data embeddings, that each prompt belongs to. It then calculates the mean, median, and standard deviation of the distance from each prompt to the center of the corresponding cluster.
You can run the notebook pattern1-rag/notebooks/distance-analysis.ipynb to see the trends in the distance metrics over time. This will give you a sense of the overall trend in the distribution of the prompt embedding distances.
The notebook pattern1-rag/notebooks/prompt-distance-outliers.ipynb is an AWS Glue notebook that looks for outliers, which can help you identify whether you’re getting more prompts that are not related to the reference data.
Monitor similarity scores
All similarity scores from OpenSearch Service are logged in Amazon CloudWatch under the rag namespace. The dashboard RAG_Scores shows the average score and the total number of scores ingested.
Clean up
To avoid incurring future charges, delete all the resources that you created.
Delete the deployed SageMaker models
Reference the cleanup up section of the provided example notebook to delete the deployed SageMaker JumpStart models, or you can delete the models on the SageMaker console.
Delete the AWS CDK resources
If you entered your parameters in a cdk.context.json file, clean up as follows:

$ cd pattern1-rag/cdk
$ cdk destroy –all

If you entered your parameters on the command line and only deployed the backend application (the backend AWS CDK stack), clean up as follows:

$ cd pattern1-rag/cdk
$ cdk destroy –all
-c textModelEndpointName=<Enter the SageMaker Endpoint Name for the Text generation model>
-c embeddingsModelEndpointName=<Enter the SageMaker Endpoint Name for the Text embeddings model>

If you entered your parameters on the command line and deployed the full solution (the frontend and backend AWS CDK stacks), clean up as follows:

$ cd pattern1-rag/cdk
$ cdk destroy –all -c appCustomDomainName=<Enter Custom Domain Name to be used for Frontend App>
-c loadBalancerOriginCustomDomainName=<Enter Custom Domain Name to be used for Load Balancer Origin>
-c customDomainRoute53HostedZoneID=<Enter Route53 Hosted Zone ID for the Custom Domain being used>
-c customDomainRoute53HostedZoneName=<Enter Route53 Hostedzone Name>
-c customDomainCertificateArn=<Enter ACM Certificate ARN for Custom Domains provided>
-c textModelEndpointName=<Enter the SageMaker Endpoint Name for the Text generation model>
-c embeddingsModelEndpointName=<Enter the SageMaker Endpoint Name for the Text embeddings model>

Conclusion
In this post, we provided a working example of an application that captures embedding vectors for both reference data and prompts in the RAG pattern for generative AI. We showed how to perform clustering analysis to determine whether reference or prompt data is drifting over time, and how well the reference data covers the types of questions users are asking. If you detect drift, it can provide a signal that the environment has changed and your model is getting new inputs that it may not be optimized to handle. This allows for proactive evaluation of the current model against changing inputs.

About the Authors
Abdullahi Olaoye is a Senior Solutions Architect at Amazon Web Services (AWS). Abdullahi holds a MSC in Computer Networking from Wichita State University and is a published author that has held roles across various technology domains such as DevOps, infrastructure modernization and AI. He is currently focused on Generative AI and plays a key role in assisting enterprises to architect and build cutting-edge solutions powered by Generative AI. Beyond the realm of technology, he finds joy in the art of exploration. When not crafting AI solutions, he enjoys traveling with his family to explore new places.
Randy DeFauw is a Senior Principal Solutions Architect at AWS. He holds an MSEE from the University of Michigan, where he worked on computer vision for autonomous vehicles. He also holds an MBA from Colorado State University. Randy has held a variety of positions in the technology space, ranging from software engineering to product management. In entered the Big Data space in 2013 and continues to explore that area. He is actively working on projects in the ML space and has presented at numerous conferences including Strata and GlueCon.
Shelbee Eigenbrode is a Principal AI and Machine Learning Specialist Solutions Architect at Amazon Web Services (AWS). She has been in technology for 24 years spanning multiple industries, technologies, and roles. She is currently focusing on combining her DevOps and ML background into the domain of MLOps to help customers deliver and manage ML workloads at scale. With over 35 patents granted across various technology domains, she has a passion for continuous innovation and using data to drive business outcomes. Shelbee is a co-creator and instructor of the Practical Data Science specialization on Coursera. She is also the Co-Director of Women In Big Data (WiBD), Denver chapter. In her spare time, she likes to spend time with her family, friends, and overactive dogs.