Search This Blog

Monday, May 13, 2024

FastAPI middleware performance

 As per the FastAPI docs, the way to create and add custom middlewares is


@app.middleware("http")
async def add_my_middlware(request: Request, call_next):
response = await call_next(request)
return response

Seems simple enough. Before the await, you can do something with the request. After the await, you can do something with the response.

But if you run benchmark, you will find something very surprising.

Lets check a vanilla FastAPI


from fastapi import FastAPI
app = FastAPI()
@app.get("/")
def read_root():
return {"Hello": "World"}

wrk benchmark

❯ wrk -t12 -c400 -d30s http://localhost:9001
Running 30s test @ http://localhost:9001
12 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 36.46ms 6.81ms 215.22ms 90.89%
Req/Sec 0.91k 60.89 1.24k 72.67%
326354 requests in 30.08s, 44.20MB read
Socket errors: connect 0, read 1298, write 0, timeout 0
Requests/sec: 10849.19
Transfer/sec: 1.47MB

Now, with middleware as suggested by FastAPI docs


from fastapi import FastAPI, Request
app = FastAPI()
@app.middleware("http")
async def add_my_middlware(request: Request, call_next):
response = await call_next(request)
return response
@app.get("/")
def read_root():
return {"Hello": "World"}

wrk benchmark

❯ wrk -t12 -c400 -d30s http://localhost:9002
Running 30s test @ http://localhost:9002
12 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 78.72ms 16.38ms 379.75ms 95.50%
Req/Sec 420.40 45.60 710.00 78.94%
151008 requests in 30.09s, 20.45MB read
Socket errors: connect 0, read 1395, write 16, timeout 0
Requests/sec: 5018.51
Transfer/sec: 695.94KB

Notice anything?

The "Requests/sec" went down from 10849 to 5018. That is a close to 54% drop.

Lets look at @app.middlware

from starlette.middleware.base import BaseHTTPMiddleware
self.add_middleware(BaseHTTPMiddleware, dispatch=func)

The starlette BaseHTTPMiddleware is known for its issues. But one undocumented issue seems to be performance.

  • https://www.starlette.io/middleware/#limitations
  • https://github.com/encode/starlette/issues/1678
  • https://github.com/encode/starlette/discussions/1729

So what is the alternative?

Write Pure ASGI Middleware!

from fastapi import FastAPI
from starlette.types import (
ASGIApp, Message, Receive, Scope, Send
)
app = FastAPI()
class ASGIMiddleware:
def __init__(self, app: ASGIApp) -> None:
self.app = app
async def __call__(
self,
scope: Scope,
receive: Receive,
send: Send
) -> None:
if scope["type"] != "http":
await self.app(scope, receive, send)
return
messages = []
# Create a new send channel which can be called
# multiple times with a single event
async def send_wrapper(message: Message):
# Read a single message
messages.append(message)
more_body = message.get("more_body", False)
# If the body is not complete yet, return to wait
if more_body:
return
# The body is complete, you can now access it
body = b"".join([message.get("body", b"")
for message in messages])
# Replay all the messages
while messages:
await send(messages.pop(0))
# Pass the new send channel to the downstream application
await self.app(scope, receive, send_wrapper)
app.add_middleware(ASGIMiddleware)
@app.get("/")
def read_root():
return {"Hello": "World"}

That looks like a lot of code.

Also, the other problem will be understanding the flow. 

Since the ASGI middleware works at a different level, you no longer have access to request and response objects. Instead, the middleware deals with messages. Even the response body could be received across multiple messages. Read up at here.

Now lets run the benchmark again.

❯ wrk -t12 -c400 -d30s http://localhost:9003
Running 30s test @ http://localhost:9003
12 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 38.73ms 7.68ms 219.67ms 90.27%
Req/Sec 0.86k 74.66 1.09k 74.94%
307336 requests in 30.07s, 41.62MB read
Socket errors: connect 0, read 1237, write 5, timeout 0
Requests/sec: 10220.39
Transfer/sec: 1.38MB

Requests/sec: Vanilla FastAPI - 10849 vs with ASGI middleware 10220.

That's only about 6% drop in performance.











No comments: