Deploy your apps to a supercloud in a few clicks

This Engineering Education program is supported by Section. Instantly deploy your GitHub apps, Docker containers or K8s namespaces to a supercloud.

Try It For Free

Implementing Rate Limiting in Flask APIs

March 31, 2021

Rate limiting is a technique for limiting network traffic in a system. It sets a limit when a user can repeat an operation in a system within a specific timeframe.

In this article, you will learn:

  • What rate limiting is.
  • Benefits of limiting network request rates.
  • Rate limiting techniques.
  • How to implement rate limiting techniques in Flask APIs.


To follow and fully understand this tutorial, you will need to have:

  • Python 3.6 or newer.
  • Basic understanding of Flask.
  • A text editor.


Rate limiting controls the frequency of the repetition of an operation and ensures the set constraint guiding this operation is not exceeded (usually within a specific timeframe).

Essentially, the process of rate limiting involves the following procedure:

  1. The system defines the rate limit rules for specific operations.
  2. The system counts every operation or request made by the user initiating it.
  3. The frequency of the request increases based on the user’s demand.
  4. Once the frequency reaches the system’s threshold (rate limit), further requests are not processed until the limitation is lifted (or modified).

Benefits of rate limiting

  • Increase in efficiency of security by mitigating cyber attacks such as DDoS.
  • Prevents web scraping/bots/spam users.
  • Prevents server resource exhaustion.
  • Manages internal/external services, policies, and quotas.
  • Controls the flow of system processes and data.
  • Avoids high maintenance costs.

Flask Web Framework

Flask is a lightweight web framework written in Python. It is designed to make web development using Python quick and easy and can build complex applications. Popular websites built with Flask include Airbnb, Netflix, Reddit, Uber, Mailgun, and many more.

Rate limiting techniques

In a general context, a rate is a count of the number of times an operation is run in a system. There are different techniques for measuring and limiting rates.

They include:

  • Fixed Window: This technique defines a fixed amount of requests accommodated within a specific duration. For example, setting a window size of 100 requests per hour means that the system will process only the first 100 requests the user makes within that hour, and every subsequent request will be discarded until the next hour.
  • Sliding Window: This technique is very similar to the Fixed Window, where a fixed amount of requests are allowed within a specific duration. However, it implements a Rolling Time Window to account for substantial request spikes that would have reduced the Fixed Window technique’s efficiency.
  • Token Bucket: In this technique, the system keeps counting the number of tokens a user can utilize to make requests in its memory (bucket). Whenever requests are made, tokens are reduced from the bucket and fulfilled until tokens are exhausted. An advantage of this technique is assigning a varying amount of tokens to different operations depending on the process’s operational power.
  • Leaky Bucket: This technique is very similar to the Token Bucket. However, the rates are limited by the number of requests that leak out of the memory (bucket). This technique first confirms that the system has sufficient processing power to handle the incoming request before processing it and discards it if it cannot.

Building a simple Flask API

Rate limiting is commonly used in web applications and APIs to prevent user requests’ excessive inflow into a server. Using Python, lets us build a Flask API and implement rate limiting techniques in it.

First, you must install the Flask web framework, which you will use to build the API.

Step 1:

In the terminal, type:

pip install Flask

Step 2:

Write the code responsible for setting up the endpoints of the Flask API. Start by creating a file named and save the following code in it:

from flask import Flask

app = Flask(__name__)

def index():
  return "Welcome to my Flask API"

if __name__ == "__main__":

In the code above, you created a simple Flask application that renders the text "Welcome to my Flask API" when the index route is requested. You should get a response similar to the image below after running the Flask application.

Flask API response

Implementing rate limiting in Flask

After installing the Flask Framework and saving the code responsible for setting up the endpoints described in the steps above, the next step here is to install the Flask-Limiter library.

Flask-Limiter is a Flask extension that helps to implement rate limiting rules in a Flask application quickly.

In the terminal, type:

pip install Flask-Limiter

Now, you need to update your file and integrate the Flask-Limiter library to define rate limiting rules for specific endpoints in your API.

Importing the Flask-Limiter library

from flask_limiter import Limiter
from flask_limiter.util import get_remote_address

Setting Up Flask-Limiter

limiter = Limiter(app, key_func=get_remote_address)

Applying Rate Limiting Rules

@limiter.limit("10/minute") # maximum of 10 requests per minute
def index():
  return "Welcome to my Flask API"

In the code above, you defined the rate limiting rule for the endpoint by setting it to accept 10 requests per 60 seconds from every API user.

If the limit is exceeded, the user receives the response - status code 429 (Too Many Requests) with an error page like the image below.

Flask-Limiter limit exceeded error

Flask-Limiter provides a set of string notations for defining rate limit rules in its documentation with the given format:

[count] [per|/] [n (optional)] [second|minute|hour|day|month|year]

You can also combine multiple rate limits by separating them with a delimiter of your choice.

Examples include:

  • 10 per second
  • 60/hour
  • 5/second;60 per minute;2500 per hour
  • 50/day, 250/7days

Exploring Flask-Limiter functionalities

The Flask-Limiter documentation showcases a lot of features in the extension. We will briefly illustrate some of them in this section.

Setting default rate limit rules

Flask-Limiter provides the functionality to set default rate limit rules that Flask-Limiter would automatically apply to every endpoint in your Flask API.

To set default rules, use:

limiter = Limiter(
  default_limits=["200/day", "50/hour"]

NOTE: If an endpoint defines its rate limit rules when a default already exists, the newly defined rule is applied on the route alone, and the defaults are ignored.

To enable a route use the default rules and define its own simultaneously, use the override_defaults parameter in the decorator as shown:

@limiter.limit("1/second", override_defaults=False)
def ping():
  return "PONG"

Certain endpoints can also be excluded from applying the default rate limit rules with the @limiter.exempt decorator, as shown:

def ping():
  return "PONG"

Using custom rate limit keys

By default, Flask-Limiter uses the remote address of the request to identify each user interacting with the API. Flask-Limiter provides the functionality to use a custom user identifier function for events where you need to limit rates with API keys and usernames.

You can achieve this by passing the custom functions to the key_func parameter when initializing the limiter or when applying the @limiter.limit decorator on a route.

limiter = Limiter(app, key_func=custom_function_here)
@limiter.limit("10/minute", key_func=custom_function_here)

NOTE: The custom function is called from a Flask Request Context and must return a string. You can read more about key functions in the Flask-Limiter documentation.

Having multiple rate limit rules

Flask-Limiter can be used to define multiple rate limit rules for a particular route.

This can be done using the code:

@app.route("....") # for a single decorator
@app.route("....") # for multiple decorators

Generating custom limit exceeded responses

By default, Flask-Limiter triggers abort(429) each time a rate limit is exceeded for any particular route. You can customize this response for users by registering an error handler for 429 error code as shown below:

def ratelimit_handler(e):
  return "You have exceeded your rate-limit"

custom rate limit exceeded error


In this article, I introduced you to what rate limiting is, discussed the importance of limiting request rates, including mitigating cyberattacks, preventing server resource exhaustion, amongst other benefits. Also highlighted some standard rate limiting techniques and built a Flask application where you implemented rate limiting techniques with varying rules and customized responses.

I hope you find this tutorial as helpful as I anticipate.

Happy coding.


Peer Review Contributions by: Miller Juma