HTTP Rate Limiter: Cloudflare Workers Guide

In Part 1, we introduced Cloudflare Workers and the concept of middleware at the edge. We saw how Workers let you execute logic close to your users, and we discussed their V8 isolate runtime. Now it’s time to move from theory to practice.

In this article, we’ll build a working HTTP rate limiter. You’ll learn:

How to track request usage per client
The differences between fixed-window and token-bucket algorithms
How to implement a Durable Object in TypeScript
How to integrate the limiter into Worker middleware
How to test and benchmark your limiter at the edge

By the end, you’ll have a foundation for building your own edge middleware that’s robust, efficient, and globally scalable.

What is HTTP Rate Limiting?

Rate limiting is the process of controlling how often clients can access an API. This is a common practice to throttle traffic, protect backend workloads, and prevent abuse. For rate-limiting to work, clients must be distinguishable; therefore, a unique identifier is often issued to the client (e.g., API key, Client key), or the identifier should be automatically extracted from the client's request (e.g., IP Address).

There are diverse rate-limiting strategies, but for the scope of this article, we'll be exploring two (most common) strategies:

1. Fixed-Window Limiting

The fixed-window algorithm counts requests in discrete time intervals, like 1-minute blocks. For example, if a client is allowed 10 requests per minute, the counter resets at the start of each minute.

Pros:

Simple to implement
Easy to understand

Cons:

“Burstiness” at the window boundary: a client could make 20 requests in quick succession if two windows overlap
Less smooth distribution of requests

2. Token-Bucket Limiting

The token-bucket algorithm is more flexible. Each client has a “bucket” of tokens. Every request consumes one token. Tokens are replenished at a fixed rate. If client's bucket is empty, then the client is blocked.

Pros:

Smooth request flow
Handles bursts gracefully
Easy to tune for limits and refill rates

Cons:

Slightly more complex than fixed-window

For edge deployments, token-bucket is usually preferable because it balances fairness and responsiveness. Think of your favourite rate-limited API service; there's a high chance that Token-Bucket Limiting (or its variant) is being used.

Setting Up the Worker and Durable Objects

To build a reliable rate limiter at the edge, we need a component that can hold state for each client; Cloudflare Durable Objects are perfect for this. They provide a single, consistent state instance per identifier, and all requests for that identifier are routed to the same object, no matter which Cloudflare data center handles them.

In this section, we’ll create a complete Durable Object–based rate limiting system with three different tiers: Free, Premium, and Ultimate. Each tier behaves differently, allowing you to enforce limits based on a client’s subscription level.

Before diving into the object classes, it’s important to keep TypeScript aware of your Worker bindings.

Note: Run `wrangler types` After Adding Durable Objects

Whenever you define new Durable Object classes or update your wrangler.jsonc bindings, regenerate your type definitions:

wrangler types

This keeps your editor and TypeScript compiler aligned with your Worker’s actual environment. It’s important after adding new Durable Object namespaces or changing class names, because the types govern autocompletion and type checking.

The Worker: Routing Requests with the Correct Rate Limiter

Here is the Worker entrypoint that receives incoming API requests, determines which rate tier the client belongs to, and asks the appropriate Durable Object whether the request should be allowed or not.

// src/index.ts

import { DurableObject } from "cloudflare:workers";

export interface Env {
  FREE_LIMITER: DurableObjectNamespace<FreeRequestLimiter>;
  PREMIUM_LIMITER: DurableObjectNamespace<PremiumRequestLimiter>;
  ULTIMATE_LIMITER: DurableObjectNamespace<UltimateRequestLimiter>;
}

// Worker logic
export default {
  async fetch(request, env, _ctx): Promise<Response> {
    // extract Client API_KEY
    const key = request.headers.get("API_KEY");
    if (key === null) {
      return new Response("Could not determine client key", { status: 400 });
    }

    // determine rate limit tier
    try {
      // in a production system, you would look up the key in a database for the tier
      const rateLimiter = key.startsWith("ult_")
        ? env.ULTIMATE_LIMITER
        : key.startsWith("pre_")
        ? env.PREMIUM_LIMITER
        : env.FREE_LIMITER;

      const stub = rateLimiter.getByName(key);
      const milliseconds_to_next_request =
        await stub.getMillisecondsToNextRequest();

      if (milliseconds_to_next_request > 0) {
        return new Response("Rate limit exceeded", { status: 429 });
      }
    } catch (error) {
      return new Response("Could not connect to rate limiter", { status: 502 });
    }

    // proceed with normal request handling, e.g. proxy to origin
    return new Response("Request successful", { status: 200 });
  },
} satisfies ExportedHandler<Env>;

The Worker does three things:

Extracts the unique identifier, in this case, the API key.
Determines the appropriate rate tier using a simple prefix-based rule (in a real system, this would come from a database).
Sends the unique identifier to the Durable Object for that tier. If the limiter reports that the request should wait, the Worker immediately returns a 429.

This structure separates the request logic from the rate-limiting logic, keeping everything clean and modular.

Building the Rate Limiter Durable Objects

The Durable Object itself implements the token-bucket style behaviour. To keep things maintainable, we define a BaseRateLimiter class that encapsulates the shared logic, and then three concrete classes that specify the performance characteristics for each tier.

// src/index.ts

// Durable Object
abstract class BaseRateLimiter extends DurableObject {
  abstract milliseconds_per_request: number;
  abstract milliseconds_for_updates: number;
  abstract capacity: number;

  abstract tokens: number;

  async getMillisecondsToNextRequest(): Promise<number> {
    await this.checkAndSetAlarm();

    let delay = this.milliseconds_per_request;
    if (this.tokens > 0) {
      this.tokens -= 1;
      delay = 0;
    }

    return delay;
  }

  private async checkAndSetAlarm() {
    const currentAlarm = await this.ctx.storage.getAlarm();
    if (currentAlarm == null) {
      await this.ctx.storage.setAlarm(
        Date.now() + this.milliseconds_for_updates
      );
    }
  }

  async alarm() {
    if (this.tokens < this.capacity) {
      this.tokens = Math.min(
        this.capacity,
        this.tokens + this.milliseconds_for_updates
      );
      await this.checkAndSetAlarm();
    }
  }
}

This base class gives each tier its own “bucket” size, refill rate, and request cost.

When getMillisecondsToNextRequest() is called, the object does the following:

Ensures an alarm is scheduled to refill tokens.
If tokens are available, it decrements one and returns 0, meaning the request is allowed immediately.
If tokens have run out, it returns a delay value indicating how long the client should wait.

The refill logic runs inside the alarm() method, which Cloudflare triggers based on the alarm schedule. This is an efficient way of replenishing tokens at the edge without needing to calculate elapsed time for every request.

Defining Each Tier

Each tier extends the base class, sets its own performance characteristics, and initializes the bucket with the corresponding capacity.

// src/index.ts

export class UltimateRequestLimiter extends BaseRateLimiter {
  milliseconds_per_request = 100; // 1 request / 100ms = 10 req/sec
  milliseconds_for_updates = 5000; // refill every 5s
  capacity = 80; // burst capacity
  tokens: number;

  constructor(ctx: DurableObjectState, env: Env) {
    super(ctx, env);
    this.tokens = this.capacity;
  }
}

export class PremiumRequestLimiter extends BaseRateLimiter {
  milliseconds_per_request = 200; // 1 request / 200ms = 5 req/sec
  milliseconds_for_updates = 5000; // refill every 5s
  capacity = 30; // burst capacity
  tokens: number;

  constructor(ctx: DurableObjectState, env: Env) {
    super(ctx, env);
    this.tokens = this.capacity;
  }
}

export class FreeRequestLimiter extends BaseRateLimiter {
  milliseconds_per_request = 500; // 1 request / 500ms = 2 req/sec
  milliseconds_for_updates = 5000; // refill every 5s
  capacity = 10; // burst capacity
  tokens: number;

  constructor(ctx: DurableObjectState, env: Env) {
    super(ctx, env);
    this.tokens = this.capacity;
  }
}

Each tier defines three important parameters:

milliseconds_per_request: how frequently a client is allowed to make a request
milliseconds_for_updates: how often to refill tokens
capacity: how many requests can be made in a burst before throttling begins

Because everything is based on an abstract parent class, defining new tiers becomes straightforward.

Binding the Durable Objects in wrangler.jsonc

Finally, all the three-tiered limiters binding should be created in the configuration:

// wrangler.jsonc

"durable_objects": {
  "bindings": [
    {
      "name": "FREE_LIMITER",
      "class_name": "FreeRequestLimiter"
    },
    {
      "name": "PREMIUM_LIMITER",
      "class_name": "PremiumRequestLimiter"
    },
    {
      "name": "ULTIMATE_LIMITER",
      "class_name": "UltimateRequestLimiter"
    }
  ]
}

Each binding points to its corresponding class, allowing the Worker entrypoint to retrieve the right limiter when processing requests. Then run wrangler types to generate the type definitions for the durable objects.

Deployment: Local and Cloudflare

Once your Worker and Durable Objects are ready, the final step is to run them locally and then publish them to Cloudflare. The process is simple, and it’s the same whether you’re building a rate limiter or a larger edge application.

Running Locally

Cloudflare’s local environment is started with:

wrangler dev

This runs your Worker on localhost:8787 and also spins up local Durable Objects automatically. Nothing extra needs to be configured. Every API key you use will create or reuse its corresponding Durable Object instance exactly as it would when deployed to Cloudflare (production).

You can test requests immediately:

curl -H "API_KEY: pre_test123" http://localhost:8787

Deploying to Cloudflare

When everything works locally, deploy globally:

wrangler deploy

Wrangler builds your Worker, uploads it to Cloudflare’s edge network, creates Durable Object classes if needed, and returns a production URL.

You can test the deployed version the same way:

curl -H "API_KEY: pre_test123" https://your-worker.your-subdomain.workers.dev

When deployed to Cloudflare, you can confirm that your rate-limiter durable objects are active by going to:

Cloudflare Dashboard → Workers → Durable Objects

This page shows all instances created by your Worker, along with their current storage and activity.

Fixed-Window Implementation

Here's a code snippet of fixed-window implemetation for comparism:

const windowSize = 60 * 1000; // 1 minute
let counter = 0;
let windowStart = Date.now();

if (Date.now() - windowStart > windowSize) {
  counter = 0;
  windowStart = Date.now();
}

if (counter >= limit) {
  return new Response("Too Many Requests", { status: 429 });
}

counter++;

Simple, but can cause bursts at window boundaries.

Testing and Benchmarking at the Edge

Once the Worker and Durable Object are deployed, you can test with:

# Single request
curl -i https://your-worker.your-subdomain.workers.dev/api/test

# Burst test
for i in {1..30}; do curl -s -o /dev/null -w "%{http_code}\n" https://your-worker.your-subdomain.workers.dev/api/test; done

Metrics to monitor include:

Number of requests blocked (429)
Average latency of Durable Object calls
Token refill consistency
Error rate

Edge deployments generally have sub-millisecond response times for token checks. Using wrangler dev locally is great for functional tests, but benchmarking in production ensures you capture global latency patterns.

Observations and Best Practices

Keep state lightweight: Store only what’s necessary, like token counts and last refill timestamps. This ensures fast execution and minimal storage overhead.
Avoid heavy computations in Durable Objects: CPU time is limited per execution. Complex logic can slow down requests and increase costs.
Plan object keys carefully: Use one unique key per client to prevent race conditions and ensure accurate rate tracking.
Provide client feedback: Return headers such as X-RateLimit-Remaining and Retry-After so clients know when to retry.
Test for edge cases: Simulate bursts, multi-client traffic, and slow refill scenarios to ensure limits behave as expected.

By following these practices, your rate limiter stays predictable, efficient, and ready to handle traffic at the edge.

Closing Thoughts

In this part, we moved from theory to practice by implementing an HTTP rate limiter on Cloudflare Workers using Durable Objects. We explored how to track client requests, enforce limits per tier, and handle bursts efficiently at the edge.

You also learned the difference between fixed-window and token-bucket strategies, why token-bucket is often better for edge deployments, and how to benchmark and tune your limiter for real-world traffic.

In Part 3, we’ll take these concepts further by extending rate limiting to WebSockets, building a modular middleware pipeline, and covering best practices for deploying, monitoring, and scaling edge traffic. By the end, you’ll have a foundation for production-ready, globally distributed request control.

Next: [Part 3: Building a WebSocket Rate Limiter with Cloudflare Workers]

Part 2: Building an HTTP Rate Limiter with Cloudflare Workers

What is HTTP Rate Limiting?

1. Fixed-Window Limiting

2. Token-Bucket Limiting

Setting Up the Worker and Durable Objects

Note: Run `wrangler types` After Adding Durable Objects

The Worker: Routing Requests with the Correct Rate Limiter

Building the Rate Limiter Durable Objects

Defining Each Tier

Binding the Durable Objects in wrangler.jsonc

Deployment: Local and Cloudflare

Running Locally

Deploying to Cloudflare

Fixed-Window Implementation

Testing and Benchmarking at the Edge

Observations and Best Practices

Closing Thoughts

Comments

Building Stateful Edge Middleware with Cloudflare Workers

Part 1: Understanding Cloudflare Workers & Edge Middleware

More from this blog

Part 1: Understanding Cloudflare Workers & Edge Middleware

CTF: Ethernaut-0

CTF: Ethernaut

Deploy and Access Flask App on Windows Server [No CGI]

Command Palette

What is HTTP Rate Limiting?

1. Fixed-Window Limiting

2. Token-Bucket Limiting

Setting Up the Worker and Durable Objects

Note: Run wrangler types After Adding Durable Objects

The Worker: Routing Requests with the Correct Rate Limiter

Building the Rate Limiter Durable Objects

Defining Each Tier

Binding the Durable Objects in wrangler.jsonc

Deployment: Local and Cloudflare

Running Locally

Deploying to Cloudflare

Fixed-Window Implementation

Testing and Benchmarking at the Edge

Observations and Best Practices

Closing Thoughts

Comments

Building Stateful Edge Middleware with Cloudflare Workers

Part 1: Understanding Cloudflare Workers & Edge Middleware

More from this blog

Note: Run `wrangler types` After Adding Durable Objects