Deploying a Custom Model Server with Seldon Core

Welcome to our guide on how to deploy a custom model server with Seldon Core! In this article, we will walk you through the steps of creating and deploying a custom server based on FLAML, a Microsoft library that provides Automated machine learning (AutoML). By following these steps, you will be able to deploy your own custom models on Seldon Core quickly and easily.

We have previously seen how to set up Seldon Core on Minikube, deploy the example models provided in the Seldon Core docs, and deploy a model stored on S3. To simplify model deployment, Seldon Core provides pre-packaged servers for Sklearn, Tensorflow, Pytorch, and others. But you may train a model that does not use one of these pre-packaged libraries but one may still want to deploy it on Seldon Core.

Since Seldon Core lets you deploy any dockerized image, which could also be a model, one option may be to write a custom Python wrapper, which would be dockerized and deployed to Seldon Core for each model. However, if you expect to create multiple models with this library it is best to create a custom server implementation on Seldon Core that would be reusable making model deployment quick and as easy as using the pre-packaged servers.

We’ll walk through how to deploy to a custom server by creating a server based on FLAML, a Microsoft library that provides Automated machine learning (AutoML). These are the steps we need to follow to create a custom server:

Create a Python class defining the server implementation
Create a Docker image with the required dependencies and the Python definition of the custom server and push it to a container registry
Update the Seldon Core config map to include a reference to the path of the container in the registry
Redeploy Seldon core with the updated config map

Create the Python Class Implementation

Here, we will define how the server will use the created models by defining init, load, and predict methods. Let’s create a file named FlamlServer.py and paste the details below. You can also see this file in this Github repo

import joblib
import numpy as np
import seldon_core
from seldon_core.user_model import SeldonComponent
from typing import Dict, List, Union, Iterable
import os
import logging

logger = logging.getLogger(__name__)

JOBLIB_FILE = "model.joblib"

class FlamlServer(SeldonComponent):
    def __init__(self, model_uri: str = None,  method: str = "predict"):
        super().__init__()
        self.model_uri = model_uri
        self.method = method
        self.ready = False
        print("Model uri:", self.model_uri)
        print("method:", self.method)

    def load(self):
        print("load")
        model_file = os.path.join(seldon_core.Storage.download(self.model_uri), JOBLIB_FILE)
        print("model file", model_file)
        self._joblib = joblib.load(model_file)
        self.ready = True
        print("Model has been loaded")

    def predict(self, X: np.ndarray, names: Iterable[str], meta: Dict = None) -> Union[np.ndarray, List, str, bytes]:
        try:
            if not self.ready:
                self.load()
            if self.method == "predict_proba":
                logger.info("Calling predict_proba")
                result = self._joblib.predict_proba(X)
            else:
                logger.info("Calling predict")
                result = self._joblib.predict(X)
            return result
        except Exception as ex:
            logging.exception("Exception during predict")

Package the Python file into a docker image

Next, we package this file into a docker image that will be deployed to Seldon Core. We create a Dockerfile in the same folder as the FlamServer.py file with the details below. This file is also on GitHub for download here

FROM python:3.8-slim
WORKDIR /app

RUN apt-get update
RUN apt-get upgrade -y
RUN apt-get install -y libgomp1

ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update && \
    apt-get -y install gcc mono-mcs && \
    rm -rf /var/lib/apt/lists/*

# Install python packages
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt

# Copy source code
COPY . .

# Port for GRPC
EXPOSE 5000
# Port for REST
EXPOSE 9000

# Define environment variables
ENV MODEL_NAME AutoFlamlServer
ENV SERVICE_TYPE MODEL

# Changing folder to default user
RUN chown -R 8888 /app

CMD exec seldon-core-microservice $MODEL_NAME --service-type $SERVICE_TYPE

In the same folder, we also create a file named requirements.txt containing the necessary Python packages

flaml[automl]
joblib==1.3.1
pandas==2.0.1
scikit-learn==1.3.0
seldon-core==1.17.1

We can now go ahead and build the Docker image. One thing to note is that since we want to push the image to a container registry, we should specify the registry and repository as well in the ID and optionally tag it with a version number. In my case, my registry is bmusisi and I’ll store this in the flamlserver repository

docker build -t bmusisi/flamserver:0.1 .

This will build a docker image that I intend to push to my Docker Hub where I have an account named bmusisi. While logged into Docker Hub from my terminal, I can push this image to Docker Hub by running this command

docker push bmusisi/flamserver:0.1

Update the Seldon Core Config Map

The Seldon Core config map can be updated using its Helm values.yaml file. Download the file locally from Github, go down to the predictor_servers key, and include the details of the image we created and pushed to Dockerhub below the configurations for the other servers. You can update the file to include the details below, including our new server details. The example file can be found here

predictor_servers:
HUGGINGFACE_SERVER:
protocols:
v2:
defaultImageVersion: 1.3.5-huggingface
image: seldonio/mlserver
MLFLOW_SERVER:
protocols:
seldon:
defaultImageVersion: 1.17.0
image: seldonio/mlflowserver
v2:
defaultImageVersion: 1.3.5-mlflow
image: seldonio/mlserver
SKLEARN_SERVER:
protocols:
seldon:
defaultImageVersion: 1.17.0
image: seldonio/sklearnserver
v2:
defaultImageVersion: 1.3.5-sklearn
image: seldonio/mlserver
TEMPO_SERVER:
protocols:
v2:
defaultImageVersion: 1.3.5-slim
image: seldonio/mlserver
TENSORFLOW_SERVER:
protocols:
seldon:
defaultImageVersion: 1.17.0
image: seldonio/tfserving-proxy
tensorflow:
defaultImageVersion: 2.1.0
image: tensorflow/serving
TRITON_SERVER:
protocols:
v2:
defaultImageVersion: 21.08-py3
image: nvcr.io/nvidia/tritonserver
XGBOOST_SERVER:
protocols:
seldon:
defaultImageVersion: 1.17.0
image: seldonio/xgboostserver
v2:
defaultImageVersion: 1.3.5-xgboost
image: seldonio/mlserver
FLAML_SERVER:
protocols:
seldon:
defaultImageVersion: 0.1
image: bmusisi/flamlserver
v2:
defaultImageVersion: 0.1
image: bmusisi/flamlserver

Redeploy Seldon Core with the Updated config

Once this is done, we can now redeploy Seldon Core to include our custom server by specifying the values.yaml file we created in the previous step

helm upgrade seldon-core seldon-core-operator \
--repo https://storage.googleapis.com/seldon-charts \
--namespace seldon-system \
--values values.yaml \
--set istio.enabled=true \
--set usageMetrics.enabled=true

We can then view the updated selcon-config config map that contains our new custom server

kubectl describe configmaps game-config -n=seldon-system

Testing our new model server

To test the FLAML server, we’ll train a simple iris model, push it to S3, deploy it to Seldon Core, and test its API. To create the model we can use the FLAML package.

If not installed already we can install these required libraries including flaml by running the command below in a virtual environment or wherever this is being run

pip install flaml[automl] pandas sciki-learn

We can now train our sample model

from sklearn.datasets import load_iris
from sklearn.metrics import f1_score
from sklearn.model_selection import train_test_split

# load the iris dataset
data, target = load_iris(return_X_y=True)

# train-test split
X_train, X_test, y_train, y_test = train_test_split(data, target)

# run automl
automl = AutoML()
automl.fit(X_train, y_train, task="classification", time_budget=60)

# run predictions on test data and get test score
y_pred = automl.predict(X_test)
test_f1_score = round(f1_score(y_test, y_pred, average='macro'), 4)
print(f"F1 score on test set is {test_f1_score}")

The result should be similar to

F1 score on test set is 0.97616

We’ll save our model locally

import joblib
joblib.dump(lr, 'iris_model_flaml.joblib')

And then push it to S3

aws create-bucket seldon-models-testing-flaml

aws s3 cp iris_model.joblib seldon-models-testing-flaml/objects/model.joblib

Once done we can now deploy this model to Seldon Core. This will require setting up Seldon Core to pull the model from S3 using RClone which we covered in a previous article. Finally, we can deploy the model to Seldon Core by creating a deployment file with these details named iris_model_flaml_deployment.yaml

apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
name: iris-model-flaml
namespace: seldon
spec:
name: iris-flaml
predictors:
- name: default
replicas: 1
graph:
name: classifier
implementation: FLAML_SERVER
modelUri: s3

Introducing AI for customer service

Top Stories

Signs of Stolen API Keys in CloudTrail Logs: Key Indicators

Generative AI Cheatsheet: Speech Recognition | Sep 2024

Create fast multimodal AI apps with low latency using sticky session routing in Amazon SageMaker

Custom Pre-packaged Servers on Seldon Core by Brian Musisi – Oct 2024

Deploying a Custom Model Server with Seldon Core

Leave a Reply Cancel reply

Related Strories

Mastering Linear Algebra: Eigen Decomposition – Part 7 | Ebrahim Mousavi | Sep, 2024

SQL and Python Integration for Data Science: An Intuitive Guide | by Eugenia Anello | Sep 2024

Accelerate Synthetic Data Generation with Rendered.ai Platform Update | Rendered.ai Marketing | Aug 2024

Connecting DeepMind Research with Alphabet Products

Quick Links

Follow Socials

Introducing AI for customer service

Top Stories

Signs of Stolen API Keys in CloudTrail Logs: Key Indicators

Generative AI Cheatsheet: Speech Recognition | Sep 2024

Create fast multimodal AI apps with low latency using sticky session routing in Amazon SageMaker

Custom Pre-packaged Servers on Seldon Core by Brian Musisi – Oct 2024

Deploying a Custom Model Server with Seldon Core

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Mastering Linear Algebra: Eigen Decomposition – Part 7 | Ebrahim Mousavi | Sep, 2024

SQL and Python Integration for Data Science: An Intuitive Guide | by Eugenia Anello | Sep 2024

Accelerate Synthetic Data Generation with Rendered.ai Platform Update | Rendered.ai Marketing | Aug 2024

Connecting DeepMind Research with Alphabet Products

Get Insider Tips and Tricks in Our Newsletter!