Deploying a Custom Model Server with Seldon Core
Welcome to our guide on how to deploy a custom model server with Seldon Core! In this article, we will walk you through the steps of creating and deploying a custom server based on FLAML, a Microsoft library that provides Automated machine learning (AutoML). By following these steps, you will be able to deploy your own custom models on Seldon Core quickly and easily.
We have previously seen how to set up Seldon Core on Minikube, deploy the example models provided in the Seldon Core docs, and deploy a model stored on S3. To simplify model deployment, Seldon Core provides pre-packaged servers for Sklearn, Tensorflow, Pytorch, and others. But you may train a model that does not use one of these pre-packaged libraries but one may still want to deploy it on Seldon Core.
Since Seldon Core lets you deploy any dockerized image, which could also be a model, one option may be to write a custom Python wrapper, which would be dockerized and deployed to Seldon Core for each model. However, if you expect to create multiple models with this library it is best to create a custom server implementation on Seldon Core that would be reusable making model deployment quick and as easy as using the pre-packaged servers.
We’ll walk through how to deploy to a custom server by creating a server based on FLAML, a Microsoft library that provides Automated machine learning (AutoML). These are the steps we need to follow to create a custom server:
- Create a Python class defining the server implementation
- Create a Docker image with the required dependencies and the Python definition of the custom server and push it to a container registry
- Update the Seldon Core config map to include a reference to the path of the container in the registry
- Redeploy Seldon core with the updated config map
Create the Python Class Implementation
Here, we will define how the server will use the created models by defining init
, load
, and predict
methods. Let’s create a file named FlamlServer.py
and paste the details below. You can also see this file in this Github repo
import joblib import numpy as np import seldon_core from seldon_core.user_model import SeldonComponent from typing import Dict, List, Union, Iterable import os import logging logger = logging.getLogger(__name__) JOBLIB_FILE = "model.joblib" class FlamlServer(SeldonComponent): def __init__(self, model_uri: str = None, method: str = "predict"): super().__init__() self.model_uri = model_uri self.method = method self.ready = False print("Model uri:", self.model_uri) print("method:", self.method) def load(self): print("load") model_file = os.path.join(seldon_core.Storage.download(self.model_uri), JOBLIB_FILE) print("model file", model_file) self._joblib = joblib.load(model_file) self.ready = True print("Model has been loaded") def predict(self, X: np.ndarray, names: Iterable[str], meta: Dict = None) -> Union[np.ndarray, List, str, bytes]: try: if not self.ready: self.load() if self.method == "predict_proba": logger.info("Calling predict_proba") result = self._joblib.predict_proba(X) else: logger.info("Calling predict") result = self._joblib.predict(X) return result except Exception as ex: logging.exception("Exception during predict")
Package the Python file into a docker image
Next, we package this file into a docker image that will be deployed to Seldon Core. We create a Dockerfile
in the same folder as the FlamServer.py
file with the details below. This file is also on GitHub for download here
FROM python:3.8-slim WORKDIR /app RUN apt-get update RUN apt-get upgrade -y RUN apt-get install -y libgomp1 ENV DEBIAN_FRONTEND noninteractive RUN apt-get update && \ apt-get -y install gcc mono-mcs && \ rm -rf /var/lib/apt/lists/* # Install python packages COPY requirements.txt requirements.txt RUN pip install -r requirements.txt # Copy source code COPY . . # Port for GRPC EXPOSE 5000 # Port for REST EXPOSE 9000 # Define environment variables ENV MODEL_NAME AutoFlamlServer ENV SERVICE_TYPE MODEL # Changing folder to default user RUN chown -R 8888 /app CMD exec seldon-core-microservice $MODEL_NAME --service-type $SERVICE_TYPE
In the same folder, we also create a file named requirements.txt
containing the necessary Python packages
flaml[automl] joblib==1.3.1 pandas==2.0.1 scikit-learn==1.3.0 seldon-core==1.17.1
We can now go ahead and build the Docker image. One thing to note is that since we want to push the image to a container registry, we should specify the registry and repository as well in the ID and optionally tag it with a version number. In my case, my registry is bmusisi
and I’ll store this in the flamlserver
repository
docker build -t bmusisi/flamserver:0.1 .
This will build a docker image that I intend to push to my Docker Hub where I have an account named bmusisi
. While logged into Docker Hub from my terminal, I can push this image to Docker Hub by running this command
docker push bmusisi/flamserver:0.1
Update the Seldon Core Config Map
The Seldon Core config map can be updated using its Helm values.yaml
file. Download the file locally from Github, go down to the predictor_servers
key, and include the details of the image we created and pushed to Dockerhub below the configurations for the other servers. You can update the file to include the details below, including our new server details. The example file can be found here
predictor_servers: HUGGINGFACE_SERVER: protocols: v2: defaultImageVersion: 1.3.5-huggingface image: seldonio/mlserver MLFLOW_SERVER: protocols: seldon: defaultImageVersion: 1.17.0 image: seldonio/mlflowserver v2: defaultImageVersion: 1.3.5-mlflow image: seldonio/mlserver SKLEARN_SERVER: protocols: seldon: defaultImageVersion: 1.17.0 image: seldonio/sklearnserver v2: defaultImageVersion: 1.3.5-sklearn image: seldonio/mlserver TEMPO_SERVER: protocols: v2: defaultImageVersion: 1.3.5-slim image: seldonio/mlserver TENSORFLOW_SERVER: protocols: seldon: defaultImageVersion: 1.17.0 image: seldonio/tfserving-proxy tensorflow: defaultImageVersion: 2.1.0 image: tensorflow/serving TRITON_SERVER: protocols: v2: defaultImageVersion: 21.08-py3 image: nvcr.io/nvidia/tritonserver XGBOOST_SERVER: protocols: seldon: defaultImageVersion: 1.17.0 image: seldonio/xgboostserver v2: defaultImageVersion: 1.3.5-xgboost image: seldonio/mlserver FLAML_SERVER: protocols: seldon: defaultImageVersion: 0.1 image: bmusisi/flamlserver v2: defaultImageVersion: 0.1 image: bmusisi/flamlserver
Redeploy Seldon Core with the Updated config
Once this is done, we can now redeploy Seldon Core to include our custom server by specifying the values.yaml
file we created in the previous step
helm upgrade seldon-core seldon-core-operator \ --repo https://storage.googleapis.com/seldon-charts \ --namespace seldon-system \ --values values.yaml \ --set istio.enabled=true \ --set usageMetrics.enabled=true
We can then view the updated selcon-config
config map that contains our new custom server
kubectl describe configmaps game-config -n=seldon-system
Testing our new model server
To test the FLAML server, we’ll train a simple iris model, push it to S3, deploy it to Seldon Core, and test its API. To create the model we can use the FLAML package.
If not installed already we can install these required libraries including flaml
by running the command below in a virtual environment or wherever this is being run
pip install flaml[automl] pandas sciki-learn
We can now train our sample model
from sklearn.datasets import load_iris from sklearn.metrics import f1_score from sklearn.model_selection import train_test_split # load the iris dataset data, target = load_iris(return_X_y=True) # train-test split X_train, X_test, y_train, y_test = train_test_split(data, target) # run automl automl = AutoML() automl.fit(X_train, y_train, task="classification", time_budget=60) # run predictions on test data and get test score y_pred = automl.predict(X_test) test_f1_score = round(f1_score(y_test, y_pred, average='macro'), 4) print(f"F1 score on test set is {test_f1_score}")
The result should be similar to
F1 score on test set is 0.97616
We’ll save our model locally
import joblib joblib.dump(lr, 'iris_model_flaml.joblib')
And then push it to S3
aws create-bucket seldon-models-testing-flaml aws s3 cp iris_model.joblib seldon-models-testing-flaml/objects/model.joblib
Once done we can now deploy this model to Seldon Core. This will require setting up Seldon Core to pull the model from S3 using RClone which we covered in a previous article. Finally, we can deploy the model to Seldon Core by creating a deployment file with these details named iris_model_flaml_deployment.yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
name: iris-model-flaml
namespace: seldon
spec:
name: iris-flaml
predictors:
- name: default
replicas: 1
graph:
name: classifier
implementation: FLAML_SERVER
modelUri: s3