System Design Interview Prep: The 10 Most Important Patterns - Guides

You are sitting in a system design interview. The interviewer says “Design a system that serves 10 million users,” and you start drawing boxes. At some point, you need to decide: How do I distribute the load? Where do I cache? How do the services talk to each other? If this is the first time you are thinking about these questions, it is too late.

System design interviews do not test whether you can pull a perfect architecture out of thin air. They test whether you have a repertoire of proven architecture patterns and know when to apply each one. The good news: the vast majority of problems at German tech companies can be solved with the same ten patterns.

This guide explains each pattern, shows you the interview context where it comes up, and gives you the reasoning interviewers want to hear. If you are not yet familiar with the basic flow of a system design interview, read the pillar guide on system design interviews in Germany first.

The 10 Most Important System Design Patterns🔗

1. Load Balancing🔗

Load balancing distributes incoming requests across multiple servers so that no single server becomes a bottleneck. In almost every system design problem, a load balancer is one of the first components you should draw once you have more than one application server.

How it works: A load balancer sits between client and server pool. It receives all incoming requests and forwards them according to a strategy. The most common strategies are round robin (even distribution), least connections (the server with the fewest active connections gets the next request), and IP hashing (the same client always lands on the same server).

When to use it in an interview: As soon as you scale the system beyond a single server. If the problem mentions millions of users, load balancing is practically mandatory. Even for smaller systems, it shows the interviewer you are thinking about availability, because a single server is a single point of failure.

The trade-off to mention: Load balancing solves the problem of distributing load but introduces complexity around session management. If your application server holds state (for example, a user session), you need either sticky sessions or externalized state (say, Redis). In the interview, mentioning this point is sufficient. You do not need to solve it in detail unless the interviewer asks.

2. Caching🔗

Caching stores frequently read data in a fast intermediate layer so that not every request hits the database. It is the pattern with the highest leverage on performance, and interviewers expect you to incorporate it in almost every problem.

How it works: The two fundamental strategies are cache-aside (the application checks the cache first; on a miss, it reads from the database and writes the result to the cache) and write-through (every write updates both database and cache). Redis and Memcached are the tools you should mention in an interview.

When to use it in an interview: Whenever the read-to-write ratio is high. A URL shortener, for example, has a ratio of 100:1 (100 reads per write). A cache can eliminate 90% of database reads in that scenario. For a system with heavy writes (like a logging system), caching provides less benefit.

The trade-off to mention: Cache invalidation is one of the hardest problems in computer science. Stale data in the cache can lead to inconsistent behavior. Name a concrete strategy: TTL (time-to-live) for simple cases, event-based invalidation for more complex ones.

3. Database Sharding🔗

Sharding splits a large database horizontally into smaller pieces (shards), each holding a subset of the data. It is the answer to “What do you do when a single database can no longer handle the load?”

How it works: You choose a shard key, a value by which data is partitioned. For a user system, that could be the user ID: users 1 through 1,000,000 go to shard A, users 1,000,001 through 2,000,000 go to shard B. Range-based sharding is easy to understand but carries the risk of hot spots (when one range has significantly more traffic). Hash-based sharding distributes more evenly but makes range queries harder.

When to use it in an interview: When the data volume or query load exceeds what a single database instance can handle. For problems involving hundreds of millions of records or tens of thousands of writes per second, sharding is the standard answer.

The trade-off to mention: Sharding complicates queries that cross shard boundaries. JOINs across multiple shards are expensive or impossible. Resharding, the process of redistributing data after the fact, is operationally costly. Mention consistent hashing as a strategy to minimize resharding effort.

4. Message Queues🔗

Message queues decouple producers and consumers of messages. Instead of calling a service synchronously and waiting for the response, you place the message in a queue and process it asynchronously.

How it works: A producer writes a message to the queue (for example, “Order 12345 was placed”). A consumer reads the message and processes it (for example, “Send confirmation email to the customer”). The producer does not need to wait until the email is sent. Kafka, RabbitMQ, and Amazon SQS are the tools you should know.

When to use it in an interview: Whenever you need asynchronous processing. Classic scenarios include email sending, image processing, notifications, and background data processing. When two services work at different speeds, a queue serves as a buffer.

The trade-off to mention: Message queues introduce eventual consistency. The message is in the queue but not yet processed. For systems that require immediate consistency (for example, an account balance after a transfer), asynchronous processing is not always the right choice. Also mention the risk of message loss and how you mitigate it with acknowledgements and dead-letter queues.

5. Content Delivery Network (CDN)🔗

A CDN distributes static content (images, videos, CSS, JavaScript) to servers worldwide so that users load data from the geographically nearest server. It reduces latency and offloads the origin server.

How it works: When a user in Munich requests an image, the CDN edge server in Frankfurt delivers it instead of sending the request all the way to the origin server in Virginia. On a cache miss, the edge server fetches the file from the origin and stores it for subsequent requests.

When to use it in an interview: For problems with a global user base and static content. A video streaming system, a social media platform, or an e-commerce shop are classic scenarios. For APIs, a CDN can also make sense for frequently requested, rarely changing data.

The trade-off to mention: CDNs work best for static or rarely changing content. For dynamic data that differs per user, a CDN provides little benefit. CDN costs at high traffic volumes are significant, and cache invalidation on content updates requires a deliberate strategy (versioned URLs, cache-busting headers).

6. Rate Limiting🔗

Rate limiting restricts the number of requests a client can send within a given time window. It protects your system from overload and abuse.

How it works: The most common algorithms are token bucket (a “bucket” fills with tokens at a constant rate; each request consumes a token), sliding window (counts requests in a rolling time window), and fixed window counter (counts requests in fixed time blocks). Implementation typically happens at the API gateway level.

When to use it in an interview: For any public-facing API. As soon as the problem mentions an API used by external clients, you should bring up rate limiting. For internal services, it can also make sense to prevent cascading failures.

The trade-off to mention: Rate limiting that is too aggressive locks out legitimate users. Rate limiting that is too loose does not protect. In distributed systems, you need a centralized counter (for example, in Redis), which introduces another network dependency. Mention that you would set different limits per user tier (free tier vs. premium).

7. API Gateway🔗

An API gateway is the central entry point for all client requests. It routes requests to the correct backend services and handles authentication, rate limiting, logging, and protocol translation.

How it works: Instead of the client communicating directly with ten different microservices, it talks only to the API gateway. The gateway forwards the request to the responsible service and, when needed, aggregates responses from multiple services.

When to use it in an interview: As soon as your system has more than two backend services. In microservice architectures, an API gateway is practically standard. It is also the natural place for rate limiting, authentication, and request logging.

The trade-off to mention: The API gateway becomes a single point of failure if it is not operated redundantly. It can also become a bottleneck because every request passes through it. Name horizontal scaling of the gateway itself, combined with load balancing in front of it, as the solution.

8. Database Replication🔗

Replication creates copies of your database on multiple servers. The purpose: higher read performance (read requests are distributed across replicas) and fault tolerance (if the primary fails, a replica can take over).

How it works: In a primary-replica setup, only the primary server accepts writes. All writes are replicated asynchronously (or synchronously) to the replicas. Read requests can be served by replicas. On a primary failure, a replica is promoted to the new primary (failover).

When to use it in an interview: As soon as availability is a topic, and it almost always is. When you have high read load but moderate write volume, replication is the simpler first step before sharding.

The trade-off to mention: Asynchronous replication introduces replication lag. A user writes a comment, refreshes the page, and does not see it because the read went to a replica that has not yet received the write. Read-after-write consistency is the solution: reads immediately after a write go to the primary instead of a replica.

9. Event-Driven Architecture🔗

In an event-driven architecture, services communicate not through direct calls but by publishing and consuming events. A service publishes “Order created,” and every interested service reacts to it independently.

How it works: An event bus (often Kafka or a similar system) is the central infrastructure. Services publish events to topics. Other services subscribe to topics they care about and process events independently. The publishing service does not need to know who consumes the events.

When to use it in an interview: For problems where multiple services need to react to the same business event. Example: an order triggers inventory management, invoicing, and notification in parallel. Also relevant for problems that require real-time updates (feeds, dashboards, live tracking).

The trade-off to mention: Event-driven architecture makes debugging and tracing harder because the processing chain is no longer linear. Event ordering can become a problem (what happens if “Order cancelled” arrives before “Order created”?). Mention correlation IDs for tracing and idempotent processing as countermeasures.

10. Circuit Breaker🔗

A circuit breaker prevents a failing service from bringing down the entire system. When a service stops responding, the circuit breaker cuts the connection instead of retrying repeatedly.

How it works: The circuit breaker has three states: closed (everything normal, requests go through), open (too many failures, requests are rejected immediately), and half-open (after a waiting period, individual test requests are allowed through to check whether the service has recovered). If the test requests succeed, the circuit closes again.

When to use it in an interview: In microservice architectures where services depend on each other. Especially relevant when the problem emphasizes reliability or fault tolerance. Also useful for systems with external API dependencies (payment providers, email services).

The trade-off to mention: An open circuit breaker means certain features are temporarily unavailable. You need to define a fallback strategy: return cached responses, queue requests for later processing, or show the user a meaningful error message. Also mention threshold configuration as a challenge: after how many failures should the circuit open?

Combining Patterns: How to Argue in the Interview🔗

Knowing the ten patterns individually is not enough. In a real system design interview, you need to combine multiple patterns and explain why you use them in exactly that configuration.

Take the problem “Design a notification system for 10 million users” as an example. A strong argument looks like this:

API gateway as the entry point, with rate limiting so no single client floods the system
Message queue between API and notification workers, because delivery can happen asynchronously and the queue serves as a buffer during load spikes
Database replication for the user and preference database, so read operations (which channels has the user enabled?) are fast
Caching for user preferences, because they change rarely but are read on every notification send
Circuit breaker in front of external providers (SMTP server, push notification service), so a provider outage does not block the entire pipeline

The key is not the list but the justification. Each pattern solves a concrete problem that you can name. Interviewers at German companies pay particular attention to whether you avoid over-engineering: a notification system for a startup with 50,000 users does not need a Kafka cluster with ten partitions.

Mistakes to Avoid🔗

Pattern-Dropping Without Context🔗

The most common mistake: candidates draw a diagram and scatter patterns like buzzwords. “Here is a cache, there is a load balancer, and of course a message queue.” Without explaining what problem each component solves, it looks memorized rather than understood to the interviewer.

The fix is simple. Before you draw a pattern, say one sentence about what problem it solves. “The database receives 50,000 reads per second, and 80% of those hit the same top 1,000 URLs. A Redis cache in front of the database can absorb that 80% and reduce DB load to 10,000 QPS.” That takes ten seconds and makes all the difference.

Over-Engineering for a Startup🔗

German Mittelstand companies and startups look for pragmatic developers. If the problem says “50,000 users,” you do not need a multi-region setup with a global CDN and sharding across 20 database nodes. Start with the simplest design that works and scale only when requirements demand it. This shows the interviewer that you have built real systems, not just studied textbook architectures.

Ignoring Trade-offs🔗

Every pattern has a downside. Caching introduces stale data. Message queues introduce eventual consistency. Sharding complicates queries. If you only mention advantages, it creates the impression that you have not thought through the consequences. For every pattern, name at least one trade-off and explain why you accept it in this specific context.

Confusing or Misattributing Patterns🔗

Replication is not sharding. A CDN is not a cache (though it uses caching). An API gateway is not a load balancer (though it can include load balancing). Use the terms precisely. If you are unsure about a distinction, say so openly: “I am using replication here for fault tolerance, not for data partitioning. For that, we would need sharding.”

Next Step🔗

These ten patterns form the foundation on which you can build any system design solution. Knowledge alone is not enough, though. The difference between “I know the patterns” and “I can use them in an interview” comes down to practice: thinking out loud, reasoning under time pressure, responding to follow-up questions.

CodingCareer’s mock system design interviews simulate exactly this situation. You work through a realistic problem with an experienced developer as the interviewer, who gives you concrete feedback on your pattern selection, your reasoning, and your communication. Not textbook feedback, but feedback from someone who has conducted and passed system design interviews at German tech companies themselves.

Book your free 15-minute diagnostic call and find out where you stand in your system design preparation.

FAQ

Which system design patterns should I know for an interview?

The ten most important patterns for system design interviews are load balancing, caching, database sharding, message queues, CDN, rate limiting, API gateway, database replication, event-driven architecture, and circuit breaker. You do not need to know every pattern in exhaustive detail, but you should be able to explain what problem each one solves, when you would use it, and what trade-offs it introduces. In CodingCareer's mock system design interviews, you practice exactly that: applying patterns in the context of a real problem and justifying your decisions to the interviewer.

How many patterns do I need for a system design interview?

For most system design interviews at German tech companies, five to seven patterns that you can apply confidently are enough. Depth matters more than breadth: can you explain why you introduce caching at a specific point and why not somewhere else? A candidate who truly understands three patterns performs better than one who can list ten but cannot apply any in context. CodingCareer's system design coaching focuses on identifying the patterns relevant to your target companies and working through them with you until you are interview-ready.

What is the most common mistake with system design patterns in interviews?

The most common mistake is applying patterns without context. Candidates hear the problem and immediately add caching, load balancing, and message queues without explaining what problem each pattern solves. Interviewers at German companies do not evaluate how many patterns you know, but whether you apply them deliberately and with justification. 'I am adding a cache here because 80% of requests hit the same 1,000 URLs and we need to reduce database load from 50,000 to 10,000 queries per second' is the kind of reasoning that scores points. CodingCareer trains this justified application in mock interviews with experienced developers.

Do I need to know patterns like consistent hashing or CRDTs in detail?

For mid-level positions, it is enough to know that consistent hashing exists and what problem it solves: distributing data evenly across shards without having to redistribute everything when shards change. CRDTs are too specialized for most interviews. For senior positions, you are expected to explain consistent hashing and know when it makes sense compared to range-based sharding. Staff-level candidates should also be able to discuss weaknesses and alternatives. In CodingCareer's coaching sessions, we calibrate preparation depth to your target level and target companies.