Launch: AWS Sagemaker available as a Compute Engine on #LetsData
Automate your inferences models and run inferences / vectors at scale
Today, we are announcing the public availability of AWS Sagemaker Compute Engine on #LetsData. Customers can now create vector embeddings, automate their model inference pipelines and run inferences for their documents at scale on #LetsData.
Architecture
Here is an architecture diagram that shows how the Sagemaker compute engine has been integrated with #LetsData pipelines.
The Sagemaker compute engine has two major components - a Lambda compute component and a Sagemaker compute component. Here is how the pipeline works:
Read and Parse Feature Doc: The Lambda compute component is responsible for reading the read destination, parsing the data using the user’ data handler interface implementations and creating a feature document as before. (Steps 1-4).
Extract Doc Elements For Vectorization: The Lambda feature document previously would have been written to the write destination. However, with the Sagemaker compute engine, the feature document is vectorized. Step 5 extracts the feature doc elements that require vectorization. (This is a user’s implementation of #LetsData interface.)
Generate Vector Embeddings using Sagemaker: The extracted elements are then sent to a Sagemaker Endpoint that generates vector embeddings (Step 6).
Construct Output Vector Doc: The output vector doc is constructed from these vectors (Step 7) .
Write Vector Doc: The rest of the pipeline is similar to earlier - the output vector document is written to the write destination and any errors are recorded in the error destination (Step 8-10).
Lets look at:
how this architecture can be used in the emerging LLM app stacks
the new #LetsData Sagemaker Vectors Interface
the AI / ML models and how they can be used with #LetsData Sagemaker compute engine
the details around setting up Sagemaker Endpoints that can be invoked to generate vector embeddings at scale.
the overall Sagemaker configuration
#LetsData Sagemaker Compute Pipelines For LLM Apps
I had shared some thoughts in an earlier post around how #LetsData might be useful in the AI / ML apps and promised a deep dive around this. The LLM App Architecture from the Emerging Architectures for LLM Applications (an Andreessen Horowitz blog) should help us understand how #LetsData can add value to the AI / ML ecosystem.
#LetsData Sagemaker pipelines can be used to implement Data Pipelines, Embedding Model, Orchestration, API / App Hosting and Queries.
#LetsData’s Sagemaker Vector Interface
We’ve defined a simple Sagemaker interface(GitHub) that users can implement to:
Extract Document for Vectorization from the Feature Doc
Construct a Vector Doc from the Feature Doc and the generated Vector Embeddings
Here is the interface definition:
Here is a sample implementation for the Common Crawl Web Archives (GitHub) documents.
The complete example and step by step instructions are also available on our website (Generate Vector Embeddings Using Lambda and Sagemaker Compute Engine).
Using AI/ ML Models with #LetsData Sagemaker
The #LetsData Sagemaker Compute Engine is automation built around the Sagemaker Inference models and endpoints - this essentially means that any AI / ML model that can be used with AWS Sagemaker can be used with #LetsData - the model code is packaged as a zip file and uploaded to S3 and imported as AWS Sagemaker model that is used by #LetsData.
#LetsData supports AI/ ML models for Sagemaker in these following configurations:
Reuse Existing LetsData Model: You created a model for a dataset on LetsData and would like to reuse it. You can specify the model Arn and LetsData will use that model for Sagemaker.
Create New LetsData Model: You have the model code packaged as a zip file in S3. You'll specify the S3 Arn and LetsData will create a model for Sagemaker.
We’ve tested with the Hugging Face’s Sentence Transformer models in our implementations and have detailed examples on how to get the AI models working with #LetsData and the different customizations that are offered. Here are some quick highlights:
Model Container Images: We support the entire gamut of ECR Sagemaker model container images - HuggingFace, Inferentia, Pytorch, Scikit to name a few. The complete support list is at our website and at the AWS ECR Sagemaker Image List
Model Environment Variables: With #LetsData Sagemaker, you can customize your model environment to specify 1./ model runtime configuration and 2./ your model implementation details. For example, our HuggingFace SentenceTransformer model implementation uses the following env variables, essentially informing the model to use the vector generations for question-answer format and that our model code is in inference.py file and model/ directory has the custom code.
"HF_TASK": "question-answering", "SAGEMAKER_PROGRAM": "inference.py", "SAGEMAKER_SUBMIT_DIRECTORY": "model/"
Request and Response Customizations: While we’ve defined a fix format for request and response to the Sagemaker endpoints, your model code can add customizations as needed.
Request ------- def input_fn(request_body, request_content_type): """ Args: request_body: The body of the request sent to the model. request_content_type: (string) the content type Returns: Input data in json format. """ if request_content_type == 'text/plain': inp_var = request_body return inp_var.decode("utf-8") else: raise ValueError("This model only supports text/plain input") Response -------- def predict_fn(data, model_and_tokenizer): """ Args: input_data: Returned input data from input_fn model: Returned model from model_fn Returns: The predictions """ model, tokenizer = model_and_tokenizer ... return vector_embeddings[0].tolist()
Our Compute Engine Documentation and Step By Step Example has more details around integrating models with #LetsData.
Sagemaker Endpoints to Generate Vector Embeddings
The Sagemaker Endpoint hosts the container image and the model, is called with the documents and it returns the vector result.
Sagemaker endpoints can be fine-tuned for concurrency, hardware, memory etc. Sagemaker supports two types of endpoints:
Serverless: Sagemaker automatically hosts the model and containers and scales it to your desired concurrency and memory
Provisioned: Sagemaker provisions the requested hardware and automatically hosts the model and containers
#LetsData supports both Serverless and Provisioned Endpoints for Sagemaker in these following configurations:
Bring Your Own Endpoint: You have an existing Sagemaker endpoint in an AWS account, you can specify the endpoint Arn and endpoint config Arn and LetsData will use that endpoint for Sagemaker.
Reuse Existing LetsData Endpoint: You created an endpoint for a dataset on LetsData and would like to reuse it. You can specify the endpoint Arn and endpoint config Arn and LetsData will use that endpoint for Sagemaker.
Create New LetsData Endpoint: You'd like a new Endpoint created for the dataset. You'll specify the endpoint type (Serverless/Provisioned) and its endpoint configuration and LetsData will create a Sagemaker endpoint for dataset execution.
We’ve run our tests with Serverless and Provisioned endpoints and have seen the benefits of GPU hardware acceleration and the ml.inf.* EC2 instance types. We’ve been impressed with the overall AI / ML inference infrastructure that AWS supports and how a diverse set of models and technology can all be integrated with the service. Our integration with Sagemaker further simplifies the end to end AI / ML use-case data integration.
Sagemaker Configuration
The overall sagemaker configuration / schema is as follows:
Schema
Example - Create New Model & Endpoints
Example - Bring Your Own Model & Endpoints
Our Compute Engine Documentation and Step By Step Example has the complete details around configuration and examples for different Sagemaker configurations.
Future Work
Integration with Vector Database:The write destination should be some Vector Database / Index instead of Kinesis Stream - we need to implement a Vector write destination. We are looking into different options and will work on having some option natively available in #LetsData.This is done, we are now integrated with Momento Vector Indexes (https://www.letsdata.io/docs/write-connectors?tab=momentovectorindexes)Validate a Customer Journey:While we’ve tested and built the ML / AI pipelines, we’ve not constructed an end user example / customer journey yet. In large part because we aren’t putting these vector embeddings in a vector database / queryable source. We should do this to further improve what we might have missed and validate the user scenario.We did validations with Web Crawl Archives vector index and web search. See the search section for results (https://www.letsdata.io/docs/write-connectors?tab=momentovectorindexes#momento-vector-index-write-connector-implementation)Enabling Learning Scenarios: Current Sagemaker Support is Inference only - we need to look at enabling learning / training scenarios as well.
Growing our system from individual datasets to a pipeline of connected datasets: Today, our datasets read from a read destination, perform compute and then write to a write destination. If the write destination is an intermediate destination such as a Kinesis stream, we need another dataset to read from stream and write it to some durable location such as Vector Database. We need to natively support this. Example:
{ "pipelineName": "VectorIndexPipeline", "artifact": { ... }, "errorConnector": { ... }, "datasets": [ { // dataset 1 // - read from s3 // - run sagemaker compute // - write to kinesis }, { // dataset 2 // - read kinesis stream in dataset 1, // - run lambda compute // - write to database } ] }
Resources
#LetsData Sagemaker Compute Engine Docs: https://www.letsdata.io/docs#computeengine
#LetsData Example - Generate Vector Embeddings Using Lambda and Sagemaker Compute Engine: https://www.letsdata.io/docs#examples
Here are some references on customizing models with Sagemaker.
Hugging Face Sagemaker Docs: User defined code and modules
Hugging Face Sagemaker Custom Inference Notebook: Sentence Embeddings with Hugging Face Transformers
Blog at medium.com: Leveraging AWS SageMaker Serverless Inference for Customized Model Serving
Some foundational reading on AI / ML / LLM architectures:
Emerging Architectures for LLM Applications (an Andreessen Horowitz blog): https://a16z.com/2023/06/20/emerging-architectures-for-llm-applications/
How OpenAI trained ChatGPT (an excellent summary of the MS Build talk): https://blog.quastor.org/p/openai-trained-chatgpt