Python Backend Engineering Cheat Sheet

I/O vs CPU-bound, ASGI vs WSGI, Flask vs FastAPI, and production deployment with Docker.

1. Web Frameworks and Web Servers

What is the difference between a Web Framework and a Web Server?

▼

Web Server (Nginx, Apache, Uvicorn, Gunicorn) — the waiter.

Its primary job is handling the network. It listens on a specific IP address and port (like 80 or 443), accepts TCP connections from the client, parses raw HTTP requests, and manages the connection lifecycle.

Web Framework (FastAPI, Flask, Django) — the chef.

Its primary job is application logic. It doesn't communicate directly with the network. Instead, the web server passes it structured request data. The framework routes the request to your Python function, executes business logic (database calls, returning JSON), and sends the result back to the server.

Server handles networking. Framework handles logic.

What is Blocking vs Non-Blocking I/O?

▼

Blocking I/O — When your code makes a request to a database or an external API, the CPU stops running your code and waits doing absolutely nothing until the database responds. The thread is "blocked."

Non-Blocking (Async) I/O — When your code makes a database request, it says to the CPU: "I'm going to wait for this, but while I wait, go process another user's HTTP request." When the database finally responds, an "Event Loop" notifies your code to pick up where it left off.

CPU-Bound vs I/O-Bound Tasks

▼

I/O-Bound Tasks (waiting on databases, APIs, disk reads) — Asynchronous code (ASGI/FastAPI) is king here.

CPU-Bound Tasks (image processing, heavy math, ML model inference on CPU) — Async code won't help you here! In fact, it might make it worse because it blocks the Event Loop. You need multi-processing (Celery, separate workers) for CPU-bound tasks.

What is the difference between WSGI and ASGI?

▼

WSGI (Web Server Gateway Interface) — The old, synchronous standard. It handles one request per thread at a time. If a WSGI app gets 10 concurrent requests, it needs 10 separate threads or it will process them one by one.

ASGI (Asynchronous Server Gateway Interface) — The modern, asynchronous standard. It uses an Event Loop. A single thread running ASGI can handle thousands of concurrent requests because it switches between them whenever one is waiting on Non-blocking I/O

What is FastAPI? Flask? Gunicorn? Uvicorn?

▼

Flask — A WSGI web framework. You write your routes with it. It is synchronous by default.

FastAPI — An ASGI web framework. You write your routes with it (async def). It is asynchronous by default. Always prefer this framework except heavy CPU-bound or very simple projects.

Gunicorn — A Process Manager (Application Server). Its job is to spawn, monitor, and restart separate OS processes (Workers). It is designed for WSGI.

Uvicorn — An ASGI Server. Its job is to run the Event Loop that makes async code work.

What are the alternatives to these servers in python?

▼

For FastAPI, an alternative to Gunicorn + Uvicorn workers is Daphne. It is used mainly for persistent connections like chats. For Flask, uwsgi is a good alternative.

2. Workers and Threads

What is a thread? Does it change with ASGI vs WSGI?

▼

A thread is the smallest sequence of programmed instructions that an operating system can execute. A single application (process) can have multiple threads running simultaneously, sharing the same memory.

In WSGI (Synchronous) — The server typically assigns one thread per incoming request. If you have a pool of 20 threads, you can handle 20 concurrent users. If user 21 arrives, they have to wait until one of the threads finishes its job. This is why WSGI struggles with high concurrency if requests involve slow I/O (like long database queries).

In ASGI (Asynchronous) — The system uses a single thread to run an Event Loop. Because of Non-Blocking I/O, this single thread can juggle thousands of requests at once. It starts processing Request A, hits a database call, pauses A, and moves to Request B, all within the exact same thread.

What is a worker? Does it change with ASGI vs WSGI?

▼

A worker is a dedicated OS process created by a Master Process (like Gunicorn) to handle incoming requests. Unlike threads, multiple processes do not share memory; they are completely isolated.

Python is restricted by the GIL (Global Interpreter Lock), meaning a single Python process can only utilize one CPU core at a time. To use a multi-core server (e.g., an 8-core machine), you must spawn multiple workers.

In WSGI (e.g., standard Gunicorn worker) — A worker process typically handles one request at a time (unless configured with internal threads). To handle heavy traffic, you might spawn 4 to 8 workers to maximize throughput.

In ASGI (e.g., Uvicorn worker) — A worker process runs an Event Loop. Even just one worker can handle thousands of concurrent requests. You still spawn multiple workers (usually 1 per CPU core) to ensure you are utilizing all the hardware available to you.

What is an event loop in FastAPI? How can the code execute something while doing something else?

▼

The Event Loop is the beating heart of asyncio. It is a single Python thread that manages a list of tasks (coroutines). Whenever the code encounters an await statement, the function voluntarily yields control back to the loop. This allows the loop to "pause" that specific task and do something else, such as accepting a new request or processing a different background job.

The "Magic" of Juggling — Instead of creating a new thread for every person, the loop keeps a To-Do List. It checks which tasks are waiting for I/O (like a database response) and which are ready to run, ensuring the CPU is never sitting idle while waiting for data.

What is an thread pool in FastAPI?

▼

FastAPI is built on top of a framework called Starlette. Starlette uses an underlying asynchronous library called anyio. Sometimes, you have to run synchronous code that doesn't have an await (like standard requests.get(), time.sleep(), or heavy math). If you run this in the main event loop, the "waiter" freezes, and your entire web server stops responding. To prevent this, FastAPI uses a Thread Pool. A thread pool is a pre-created group of background threads. When FastAPI sees synchronous code, it offloads that work to a background thread so the main event loop can keep running freely.

How many ThreadPool can FastAPI use?

▼

There is one default thread pool per worker process (managed by anyio). However, the important metric is not the number of pools, but the number of threads inside the pool. By default, the anyio thread pool in FastAPI has a limit of 40 threads. This means FastAPI can handle exactly 40 concurrent blocking tasks at the exact same time in the background before it starts queuing them up. (You can configure this limit, but 40 is the default).

How do the Thread Pool and the main Event Loop interact?

▼

This is FastAPI's secret weapon. It automatically routes your code based on how you define your endpoints to prevent the server from freezing.

Scenario A: You write async def endpoint(): — FastAPI runs this directly in the main Event Loop. It trusts you to use await for all I/O. If you aren't using await inside the function, you should define it as def, not async def.

Scenario B: You write def endpoint(): (without async) — FastAPI recognizes this is synchronous. To prevent blocking the main loop, it packages the function and hands it off to an available thread in the Thread Pool.

While the background thread does the heavy lifting, the Event Loop keeps looping and serving other users. Once the thread finishes, it hands the result back to the Event Loop, which sends the final HTTP response to the user.

3. FastAPI practical questions

Can I combine 3 uvicorn workers with several thread pool?

▼

Yes, absolutely. And this is exactly how you scale on Cloud Run. When you start your app with Gunicorn managing Uvicorn workers (e.g., --workers 3), you are spawning 3 completely separate Python processes.

Because they are separate processes, they each get their own setup. So, 3 workers means:

3 Main Event Loops
3 Independent Thread Pools (3 x 40 threads = 120 total background threads available)

If Cloud Run sends 100 concurrent requests to this single instance, they will be distributed across the 3 event loops, and any synchronous def endpoints will be handled by the 120 available threads.

I am using FastAPI with async def, but no await statements inside. Is it fine?

▼

No. It is a disaster.

This is a common trap, and it is the single most common cause of complete server lockups in Cloud Run.

Why it's fatal: Because you wrote async def, FastAPI assumes you are going to use await. Therefore, it runs your function directly on the main event loop. But because you have no await statements, your code is synchronous (blocking).
The Result: It will freeze the entire event loop until that function finishes. If it takes 2 seconds to run, your web server is deaf, dumb, and blind for 2 seconds. If Cloud Run sends 80 concurrent requests to this container, the first one locks the server, and the other 79 queue up and time out.
The Fix: If you do not have await inside, you must drop the async keyword and just use def. This forces FastAPI to push the function to a background thread, keeping the main event loop alive to accept other requests.

Imagine I have 3 steps in my FastAPI route. Each step is non-blocking I/O. If I have 'async def', with some awaits for the 3 calls, I will be able to handle concurrent requests to the route. Will each request then take more time each? How long? Is it better to use a 'def' function? What will be the difference?

▼

Will it take more time? Yes, but the difference is measured in milliseconds, while the throughput (total requests handled) scales massively.
How long? If one request takes exactly 500ms (mostly waiting on network I/O), running 10 of them concurrently might make each request take 510ms to 520ms. The extra time comes from context switching—the event loop jumping between tasks—and CPU sharing (since Cloud Run shares the CPU cores across all concurrent requests).
Is it better to use def? Absolutely not. If you use def for I/O, FastAPI offloads them to the Thread Pool.
- Async context switching (in a single event loop) is incredibly fast and cheap in Python.
- Thread context switching is managed by the Operating System, is heavy on RAM, and fights the Python Global Interpreter Lock (GIL). Plus, the thread pool is limited to 40 threads by default. If you get 80 requests, 40 execute, and 40 wait in a queue. With async def, all 80 execute concurrently on the event loop.

How to optimize the code for async? In FastAPI, you can offload a sync function to a 'run_in_threadpool' function that will run it in a threadpool, and then use await in front of it. This will prevent the synchronous code to block the main event loop.

4. Dockerfile

How to use the Dockerfile command? Use 'python main.py', 'gunicorn', or 'uvicorn'?

▼

CMD ["python", "main.py"]

When: Never for production web servers. Only for simple scripts, local development, or non-web background jobs.

Why: This triggers the internal development server. It runs a single, synchronous process with no process management or robustness. If it crashes, it stays down. It handles only one request at a time and presents a security liability.

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8080"]

When: For FastAPI when you want maximum simplicity and your app is primarily I/O-bound.

Why: Uvicorn is an ASGI server; a single worker handles many concurrent requests efficiently. However, it lacks process management (if the worker crashes, the container goes down) and doesn't fully leverage multi-core CPUs for CPU-bound tasks.

CMD ["gunicorn", "app.main:app", "--bind", "0.0.0.0:8080"]

When: For Flask (or other WSGI frameworks) as a basic production setup.

Why: Gunicorn provides a master process to manage and restart workers, improving robustness. By default, it uses one synchronous worker, meaning it processes requests one at a time unless configured otherwise.

CMD ["gunicorn", "--workers", "N", "--threads", "M", "app.main:app", ...]

When: For Flask frameworks that need concurrency and robustness.

--workers N: Use N for true CPU parallelism (e.g., (2 * cores) + 1). Each is a separate OS process bypassing the GIL.
--threads M: Use M for I/O-bound concurrency within a single worker, allowing it to switch during I/O waits.

CMD ["gunicorn", "-k", "uvicorn.workers.UvicornWorker", ...]

When: The recommended production setup for FastAPI on Cloud Run.

Why: Combines the best of both worlds:

Gunicorn: Provides robust process management and restarts.
Uvicorn Workers: Offers high concurrency for I/O-bound tasks.
Multiple Workers: Provides true CPU parallelism and increased resilience.