Creating a Machine Learning App using FastAPI and Deploying it Using Kubernetes

August 17, 2021

FastAPI is a new Python-based web framework used to create Web APIs. FastAPI is fast when serving your application, also enhances the performance of our application.

Table of contents


  1. You must have a good understanding of Python.
  2. You must have an excellent working knowledge of machine learning models.
  3. You must have Docker installed in your machine.
  4. You must have Kubernetes installed in your machine.
  5. Know how to use Google Colab or Jupyter Notebook. In this tutorial, we shall use Google Colab in building our model.

Note: for you to follow along easily, use Google Colab. It’s an easy-to-use platform to get started quickly while building models.

Building the machine learning model

We will build a machine learning model that will predict the nationality of individuals using their names. This is a simple model that will explain the key concepts used in machine learning modeling.

Dataset to be used

The dataset used will contains common names of people and their nationalities. Our data used is as shown:

A Snip of the data

CSV File of data

Installation the Python packages

We will use the following packages when building our model:


Pandas is a software library written for the Python programming language for data manipulation and analysis. It’s a tool for reading and writing data between in-memory data structures and different file formats.


Numpy is the fundamental package for scientific computing in Python. NumPy arrays facilitate advanced mathematical and other types of operations on large numbers of data.


Scikit-learn is an open-source software machine learning library for the Python programming language. It consists of various classification, regression, and clustering algorithms, including support vector machines, random forests, gradient boosting, k-means, and linear regression.

Run the following commands to install the packages:

pip install pandas
pip install numpy
pip install sklearn

Loading our exploratory data analysis (EDA) packages

These packages are used for Exploratory Data Analysis (EDA) to summarise the main characteristics of our data for easy visualization.

It helps determine how best to manipulate data sources to get the answers you need, making it more accessible in discovering patterns, spot anomalies, test a hypothesis, and check for assumptions.

  • pandas is a library written for the Python programming language for data manipulation and analysis.
  • NumPy is the fundamental package for scientific computing in Python. NumPy arrays facilitate advanced mathematical and other types of operations on large numbers of data.
import pandas as pd
import numpy as np

Loading from Scikit-learn package

Scikit-learn will be the package used for predictive analysis since it contains different tools for machine learning modeling and various algorithms for classification, regression, and clustering.

import MultinomialNB from sklearn.naive_bayes
import CountVectorizer from sklearn.feature_extraction.text
import train_test_split from sklearn.model_selection
import accuracy_score from sklearn.metrics

In the above code, we have imported the following:


This is the classifier method that is found in the Naive Bayes algorithm. We shall use MultinomialNB to build our model. It is based on Bayes’ theorem, which is easy to build and particularly useful for enormous datasets. Along with simplicity, Naive Bayes is known to outperform even highly sophisticated classification methods.

Naïve Bayes classifiers are highly scalable algorithms that require several features when building a classification model. In our case, we use the MultinomialNB method from the Naive Bayes algorithm since its suitable for classification with discrete features, which is the case for our model.

To further read about the Naive Bayes algorithm and how it’s useful in performing classification click here.


It is used to fit our model into the inputs of our dataset. CountVectorizer also transforms our dataset into vectors which are more readable inputs. Our model then uses the dataset during the training phase. It is also used to extract features from our dataset. Features are the inputs for our model.

For more details about CountVectorizer click here.


This is what is used in splitting our dataset. Our dataset will be split into train_set and test_set.


It is used to measure our model’s accuracy in percentage and gauge the model performance during the training phase.

We will use the Naive Bayes Classifier for our modeling. We choose the Naive Bayes Classifier algorithm for our classification instead of the other algorithms for the following reasons:

  1. It’s simple and easy to implement.
  2. It tends to give a higher accuracy as compared to the other algorithm.
  3. Naive Bayes is fast during training as compared to other algorithms.
  4. Other algorithms tend to memorize rather than learn, unlike Naive Bayes, which ensures that a model learns during training.

Other common algorithms used are as follows:

  1. Logistic Regression.
  2. Stochastic Gradient Descent.
  3. K-Nearest Neighbours.
  4. Decision Tree.
  5. Random Forest.

Loading our dataset

We use the pandas package to import our nationality.csv dataset. We also use pandas for data manipulation and data analysis.

df = pd.read_csv("nationality.csv")

Nature of our data

We need to understand the nature of the dataset that we have. For example, we need to know the number of names in the dataset, the columns, and the rows present in the data.


The output is as shown. This shows the size of our dataset.

(3238, 3)

This shows that our dataset has two columns: the names and nationality columns.

Unnamed: 0  names nationality
0       0   Louane  french
1       1   Lucien  french
2       2   Yamazaki japanese
3       3   Zalman  yiddish
4       4   Zindel  yiddish

The output will show the available columns in our dataset.

Index(['Unnamed: 0', 'names', 'nationality'], dtype='object')

All the nationalities available in our data


The output gives an array of all the nationalities available in our dataset, as shown below.

array(['yiddish', 'gaelic', 'african', 'irish', 'hungarian', 'german',
       'swedish', 'japanese', 'italian', 'american', 'hawaiian', 'greek',
       'polynesian', 'scandinavian', 'spanish', 'celtic', 'old-english',
       'korean', 'sanskrit', 'african-american', 'hebrew', 'norse',
       'chinese', 'finnish', 'persian', 'scottish', 'slavic', 'english',
       'old-norse', 'dutch', 'armenian', 'welsh', 'polish', 'teutonic',
       'russian', 'egyptian', 'arabic', 'swahili', 'native-american',
       'old-french', 'french', 'middle-english', 'latin', 'vietnamese',
       'danish', 'hindi', 'old-german', 'turkish', 'indian',
       'czechoslovakian'], dtype=object)

Checking if our data is balanced

This shows the available number of names in each nationality. The nationalities should have almost the same number of names to ensure that the model is well trained. As we can see, most of the nationalities have a total of 100 names.


The output of our nationalities.

african             100
african-american    100
american            100
arabic              100
armenian             17
celtic               62
chinese             100
czechoslovakian      38
danish               11
dutch                24
egyptian             30
english             100
finnish              13
french              100
gaelic               87
german              100
greek               100
hawaiian            100
hebrew              100
hindi               100
hungarian            64
indian               25
irish               100
italian             100
japanese            100
korean               16
latin               100
middle-english       45
native-american     100
norse                40
old-english         100
old-french           46
old-german           40
old-norse            28
persian              55
polish               48
polynesian           15
russian              85
sanskrit             28
scandinavian        100
scottish             74
slavic               79
spanish             100
swahili              16
swedish              14
teutonic             32
turkish              52
vietnamese           52
welsh                91
yiddish              11
Name: names, dtype: int64

Visualizing our data using the Matplotlib library

Matplotlib is a Python library used for plotting hence easy visualization of our data in the form of a graph.

In this tutorial, we use Google Colab. Run the below code snippet on Google Colab so that you can import Matplotlib.

import matplotlib.pyplot as plt
%matplotlib inline

Our bar graph is as shown:

Bar Graph

Checking our features

  • Xfeatures are individual independent variables that act as input in your system. While making the predictions, models use such features to make the predictions.

  • ylabels will be used as outputs when making predictions.

Xfeatures = df['names']
ylabels= df['nationality']

Vectorizing our features

We will use the CountVectorizer() method to transform our dataset into readable inputs to be used by our model. This method also extracts features from our dataset.

vec = CountVectorizer()
X = vec.fit_transform(Xfeatures)

We also need to initialize the get_feature_names() method, which is used to get features of our system.


Splitting the data

We need to split our dataset into train_test and test_test. We use 70% of our data to train our model and 30% for testing.

x_train,x_test,y_train,y_test = train_test_split(X,ylabels,test_size=0.30)

Building the model

We fit our model to our dataset using the fit() method:

nb = MultinomialNB(),y_train)

Checking the accuracy of our model

We check the accuracy score of our model to know how well we trained our model. The higher the accuracy, the better we trained our model.


Our accuracy score is:


This is about 85.04% accuracy.

Making predictions

After training our model, we can now feed our model with new inputs to start making predictions. Our model will make accurate predictions based on how well we trained it. Therefore, the higher the accuracy score, the better our model will be in making predictions.

name1 = ["Yin","Bathsheba","Brittany","Vladmir"]
vector1 = vec.transform(sample1).toarray()


Saving our model using joblib

We will use joblib to save our model into a pickle file. Pickling our model makes it easier to use our model in the future without repeating the training process. A pickle file is a byte stream of our model.

To use joblib, we have to import the package from sklearn.externals. Here is a detailed article that helps a reader fully grasp the use and functionalities of joblib.

import joblib from sklearn.externals

nationality_predictor = open("naive_bayes.pkl","wb")

We will name our pickle file naive_bayes.pkl.

Introduction to the FastAPI

FastAPI is a modern, fast web framework for building APIs with Python 3.6+, based on standard Python-type hints. The key features for FastAPI are as follows:

  • Fast to code: Increases the speed of developing new features.
  • Fewer bugs: Reduce developer induced errors.
  • Intuitive: Has great editor support, completion everywhere, and less time debugging.
  • Easy: Designed to be easy to use and learn.
  • Short: Minimize code duplication with multiple features from each parameter declaration.
  • Robust: Get production-ready code with automatic interactive documentation.
  • Standards-based: Based on the open standards for APIs.

This makes Fast API potent since it combines the functionalities of best frameworks such as flask and swagger.

Installing FastAPI

Use the following commands to install FastAPI into our machine.

pip install fastapi

Let’s install the server.

univicorn is a server that is used to run FastAPI. First, we specify the standard version of univicorn, which contains minimal dependencies. This version contains pure Python dependencies. And is best suited for our model since we deal with the core Python packages and dependencies used to build our model.

pip install uvicorn[standard]

Creating the API

First, create a new Python file and name it Then, add our pickle file naive_bayes.pkl in a new folder.

The folder structure:

├── model
   ├── naive_bayes.pkl

Importing our FastAPI packages

import uvicorn
import FastAPI, Query from fast API

Loading ML packages

We will use joblib to unpickle our previously pickled file, convert our serialized model back to its original form.

import joblib from sklearn.externals

Unplickling our Naive Bayes classifier file

To use our saved model, we need to convert it back to the original object. This allows us to use our model in the original form we had created.

nationality_naive_bayes = open("model/naive_bayes.pkl","rb")
nationality_cv = joblib.load(nationality_naive_bayes)

Initializing our app

We initialize our model using the FastAPI() method:

app = FastAPI()

Creating our routes

We will create a simple route that will run on localhost port 8000. To create our route, we shall use the concept of Asynchronous programming in creating routes.

Asynchronous programming allows a program to run multiple operations without waiting for other operations to complete.

This is an important concept in any programming language since it allows multiple operations to run parallel without blocking each other. Asynchronous programming is an advanced concept that has become very important in the Python language. For detailed guidance on this concept, this article is very helpful.

We shall use the async function when creating our FastAPI routes. This enables the FastAPI to create multiple routes concurrently.

To make our first route, we use the async def index() function to makes the index route, which will run on localhost port 8000.

async def index():
  return {"text":"Our First route"}

if __name__ == '__main__':,host="",port=8000)

A Snip of our first route

Interactive API docs

The above routes are used to show how to make a simple index route using the FastAPI. Now we shall add more routes for our machine learning model.

Adding route for our machine learning logic

We will add a get route for making nationality predictions.

The following function can also be used to make predictions. For example, we use the predict_nationality() method to make predictions about someone’s nationality. We also need to convert our data inputs into an array using the toarray() to return a list of the nationalities available in our dataset.

def predict_nationality(x):
  vect = nationality_cv.transform(data).toarray()
  result = nationality_clf.predict(vect)
  return result

Adding a route to make predictions

We will use this route to get the ethnicity of a person based on the name input by the user. We need to send a GET request to our predict route to get the prediction. We also need to include the predict() method to query our route and return a prediction result.

async def predict(name: str = Query(None, min_length=2, max_length=12)):
  if request.method == 'GET':
    namequery = request.form['namequery']
    data = [namequery]
    vect = nationality_cv.transform(data).toarray()
    result = nationality_cv.predict(vect)

    return {"orig_name": name, "prediction": result}

Make sure to include this in your file to specify the port that will serve your app. For example, this will enable our route to run on localhost port 8000.

if __name__ == '__main__':,host="",port=8000)

Our output is as shown:

Interactive API docs:

All Routes

The route to be used to make a prediction:

Prediction route

We have finally served our machine learning model as API using the FastAPI.

Dockerizing the FastAPI application

It involves creating a Docker Container for our application.

A Docker Container is a standard unit of software that packages up code and all its dependencies, so the application runs quickly and reliably from one computing environment to another.

Docker container image is a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries, and settings.

Container images become containers at runtime, and in Docker containers, images become containers when they run on Docker Engine. To create a docker container, we have to use the following steps.

Create a Docker file

In your working directory, create a DockerFile.

Your working directory is as shown below:

├── Dockerfile
├── model
   ├── naive_bayes.pkl

Creating Docker Layers

Docker Layers are what compose the file system for both Docker images and Docker containers. Each layer corresponds to certain instructions in your Dockerfile. For example, in our Dockerfile, we have instructions. The instructions are shown below, from defining our base image to creating an entry point to execute our image.

If these steps are followed, we will end up with a Docker image. The steps are as follows.

Define base image

A base image is an image that is used to create all of your container images. Here we shall use Python as our base image.

FROM tiangolo/uvicorn-gunicorn-fastapi:python3.7
Create a working directory
Copy the app into the new directory created
COPY ./app /app
Install in the new working directory
RUN pip install fastapi uvicorn
Expose the port to serve your application

Docker will run on port 8000.

Create an entry point to be used to execute our image
ENTRYPOINT ["uvicorn", "app:app --reload"]

CMD ["uvicorn", "", "8000"]

Create Docker image

A Docker image contains application code, libraries, tools, dependencies, and other files needed to make an application run.

docker build -t fastapi-test-app:new .
The output is as shown

This output shows the process used when creating a docker image.

Sending context building to the Docker daemon  34.90kb
Step 1/7 : FROM tiangolo/uvicorn-gunicorn-fastapi:python3.7
Step 2/7 : WORKDIR /app
  --->Using Cache
Step 3/7 : COPY ./app /app
    --->Using Cache
Step 4/7 : RUN pip install fastapi uvicorn
    --->Using Cache
Step 5/7 : EXPOSE 8000
    --->Using Cache
Step 6/7 : ENTRYPOINT ["uvicorn", "app:app --reload"]
Removing intermediate container 4edte5ta382
 ---> 2de6fstf5uv09
step 7/7 : CMD ["uvicorn", "", "8000"]
Successfully built 2de6fstf5uv09
Successfully tagged fastapi-test-app:new
Listing all of our created images

To list all the docker images we had created earlier, you can use the following command. Our latest image is fastapi-test-app. This is the image we have just created with an id of 2de6fstf5uv09.

docker image ls


REPOSITORY                   TAG                 IMAGE ID            CREATED             SIZE
fastapi-test-app             new                2de6fstf5uv09      3 minutes ago       1.34GB
testing                      latest             d661f1t3e0b         2 weeks ago          994MB

Creating docker container

Docker containers are the live, running instances of Docker images, users can interact with them, and administrators can adjust their settings and conditions using docker commands.

docker run -p 8000:8000 fastapi:new



After Dockerizing our FastAPI application, we now need to deploy it to Kubernetes Cluster.

Deploying the FastAPI application to Kubernetes cluster

Kubernetes is a container orchestration system that is used for the deployment of docker-created containers. It is meant to efficiently manage and coordinate clusters and workloads at a larger scale in a production environment. Helps to manage containerized services through automation in deployment.

We create a new file called deployment.yaml in our working directory. Our folder structure is as shown:

├── Dockerfile
├── deployment.yaml
├── model
   ├── naive_bayes.pkl

The code snippet for the deployment.yaml file is as shown:

apiVersion: v1
kind: Service
  name: fastapi-test-service
    app: fastapi-test-app
    - protocol: "TCP"
      port: 3000
      targetPort: 8000
  type: LoadBalancer

apiVersion: apps/v1
kind: Deployment
  name: fastapi-test-app
      app: fastapi-test-app
  replicas: 5
        app: fastapi-test-app
        - name: fastapi-test-app
          image: fastapi-test-app
          imagePullPolicy: IfNotPresent
            - containerPort: 8000

The file has two sections:

  1. Service: Acts as the load balancer. A load balancer is used to distribute different sets of tasks to the various available servers in the network to maximize the usage of the available resources.

  2. Deployment: This is the intended application that we want to deploy to the Kubernetes engine. The user will then send a request to the load balancer in the service. Then the load balancer distributes the request by creating the number of replicas defined in the deployment.yaml file. Here, we are using five replicas for scalability. Hence there will be five instances of the application running at a time.

When we have various replicas, it creates redundancy so that if one instance fails, the others will continue running.

The deployment.yaml file is connected to the Docker image created earlier. In the deployment.yaml file, we specify the image name created earlier.

Deployment of our application to Kubernetes cluster

We have dockerized our FastAPI application. We will now deploy it to a Kubernetes engine.

Run the following command:

kubectl apply -f deployment.yaml

This command will deploy our service and application instances created above to the Kubernetes engine. After running this command, the fastapi-test-service and the fastapi-test-app are created.

Deployment dashboard

Minikube and Kubernetes provide a dashboard that is used to visualize the deployment. To see the deployed container in the dashboard, we use the following command:

minikube dashboard

Our dashboard will be as shown: Dashboard Overview

Running Clusters

Accessing our application

We access our application using the following command:

minikube start service: fastapi-test-service

Therefore we have deployed our Containerised FastAPI application to the Kubernetes cluster.


In this tutorial, we have learned how to create a machine learning model. We have followed all the steps from data pre-processing to train and build our model finally. We have also learned about the FastAPI, which is an efficient library for making WebAPIs. The FastAPI has helped us to serve our machine learning model as an API.

We then containerized our fast API application using docker. Finally, we deployed the application to the Kubernetes cluster. Using these steps, a reader should comfortably build a FastAPI application and deploy it to the Kubernetes cluster.


Peer Review Contributions by: Lalithnarayan C