Engineering the Real-Time Social Graph: A Comprehensive Architectural Analysis of Instant Messaging Systems

1. Introduction: The Real-Time Imperative

The ubiquity of instant messaging has fundamentally altered the expectations for digital communication. Users no longer tolerate the delay inherent in email or the page refreshes of early web forums; they demand an experience that mimics face-to-face interaction, where presence is felt, and responses are perceived as instantaneous. This shift has necessitated a radical departure from traditional web architecture, moving engineers away from stateless request-response models toward persistent, stateful, and event-driven distributed systems.

At the scale of platforms like WhatsApp, Discord, or Slack—serving hundreds of millions of concurrent users—the engineering challenges transcend simple data movement. Architects must contend with the "C10M" problem (handling ten million concurrent connections), the physics of network latency, the erratic nature of mobile networks, and the logical paradoxes of distributed time. A message sent is not merely a packet of text; it is a transactional event that must be ordered, delivered, persisted, and acknowledged across a fractured global network, often within milliseconds.

Furthermore, modern messaging experiences are defined not just by the text delivered, but by the ephemeral signals that accompany it. The "Typing..." indicator, the "Online" status dot, and the "Read" receipt create a sense of copresence that is technically expensive to maintain. These features generate a "signaling storm"—a volume of ephemeral events orders of magnitude larger than the message data itself. Managing this throughput without overwhelming backend infrastructure requires rigorous client-side optimization, such as throttling and debouncing, and highly specialized backend architectures.

This analysis provides an exhaustive technical examination of the "zero to hero" journey in building such systems. It explores the transition from simple polling to WebSocket-based gateways, the optimization of ephemeral event streams, the concurrency models (Erlang Actor Model vs. Go CSP) that make massive scale possible, and the complex distributed algorithms required to identify and resolve race conditions in real-time environments.

2. The Transport Layer: Protocols and Persistent Connections

The foundation of any real-time system is the transport layer. In the context of the web and mobile applications, this layer is responsible for establishing and maintaining the conduit through which data flows. The historical dominance of HTTP, designed for document retrieval, proved insufficient for the bidirectional demands of chat.

2.1 The Limitations of the Request-Response Model

Traditional HTTP functions on a client-initiated basis. The client sends a request header, the server processes it and returns a response, and the connection is typically closed or returned to a pool. For a chat application, this model forces the client to "poll" the server to check for new messages.

Short Polling: The client repeatedly sends requests (e.g., every 5 seconds). This introduces a latency floor equal to the polling interval and wastes bandwidth on redundant HTTP headers, often creating more overhead than payload.
Long Polling: A slight improvement where the server holds the connection open until data is available or a timeout occurs. While this simulates a push, it still requires a new connection setup for every message received, creating significant CPU overhead for SSL/TLS handshakes and header parsing at scale.

2.2 WebSockets: The Standard for Bidirectional Communication

The WebSocket protocol (RFC 6455) was standardized to solve the overhead of HTTP polling. A WebSocket connection begins its life as a standard HTTP request with an Upgrade: websocket header. If the server supports it, it responds with 101 Switching Protocols, and the connection transitions from a request-response model to a raw, full-duplex TCP stream.

Technical Advantages:

Persistent State: Unlike REST, where authentication must be validated on every call, a WebSocket connection is authenticated once during the handshake. The server maintains the memory of who owns the socket, reducing database lookups for session validation.
Low Overhead: Once established, data frames have a minimal header (2-14 bytes), compared to the 700-800 bytes typical of an HTTP header (containing cookies, user-agents, etc.). For small payloads like "user is typing," this efficiency is critical.
Full Duplex: Both client and server can transmit data independently. This is essential for chat, where a user may be receiving a stream of incoming messages while simultaneously typing an outgoing one.

2.3 MQTT: Optimizing for the Edge and IoT

While WebSockets are dominant in browser environments, the Mobile and IoT sectors often favor MQTT (Message Queuing Telemetry Transport). MQTT is a lightweight, binary protocol designed for unreliable networks and constrained devices.

Architecture Differences:

Pub/Sub Model: Unlike the point-to-point nature of WebSockets, MQTT is inherently a publish-subscribe protocol. Clients connect to a central "Broker" and subscribe to "Topics" (e.g., chat/room/123). This decouples the sender from the receiver, allowing for efficient one-to-many broadcasting (fan-out) natively.
Quality of Service (QoS): MQTT offers three levels of delivery guarantee:
- QoS 0 (At most once): Fire and forget. Fast, but messages may be lost.
- QoS 1 (At least once): Guarantees delivery but may result in duplicates (requires idempotency handling).
- QoS 2 (Exactly once): The highest guarantee, involving a 4-step handshake to ensure no duplicates. This is rarely used in high-volume chat due to latency costs.
Last Will and Testament (LWT): A unique feature where a client pre-registers a message (e.g., "User Offline") that the broker automatically publishes if the client disconnects ungracefully (e.g., battery death or network timeout). This provides a reliable mechanism for presence detection without active polling.

Facebook Messenger and WhatsApp have historically utilized variations of MQTT (or protocols inspired by it) to minimize battery drain on mobile devices, leveraging its small packet size to reduce radio wake time.

2.4 Server-Sent Events (SSE): The Unidirectional Alternative

Server-Sent Events (SSE) allows servers to push updates to clients over a standard HTTP connection. Unlike WebSockets, SSE is unidirectional—data flows only from server to client.

Use Cases: SSE is ideal for "read-only" real-time streams like stock tickers, news feeds, or the "incoming" channel of a chat app if the "outgoing" messages are sent via standard HTTP POST.
Limitations: The inability to send binary data natively (SSE is text-based) and the maximum connection limit in browsers (HTTP/1.1 limits widely to 6 connections per domain) make SSE less attractive for complex, media-rich chat applications compared to WebSockets.

2.5 Comparative Analysis of Transport Technologies

The following table summarizes the technical trade-offs faced by architects when selecting a transport layer.

Protocol	Directionality	Header Overhead	Reconnection Logic	Binary Support	Best Use Case
WebSockets	Full Duplex	Low (2-14 bytes)	Manual Implementation Required	Native	High-frequency interactive apps (Chat, Gaming)
MQTT	Pub/Sub	Very Low (2 bytes)	Built-in (Keep-Alive/LWT)	Native	Mobile apps, IoT, Battery-constrained environments
SSE	Server → Client	Low (Text stream)	Automatic (Browser handled)	Base64 Encoded Only	Notifications, Live Feeds, Dashboards
Long Polling	Half Duplex	High (HTTP Headers)	Manual Implementation Required	Native	Legacy fallback, Corporate firewalls blocking WS
WebTransport	Full Duplex	Low (QUIC based)	Manual	Native	Next-gen applications requiring unreliable datagrams (Video)

Insight: While WebSockets are the default choice for web-based chat, the most resilient architectures often implement a "protocol downgrade" strategy. The client attempts to connect via WebSocket/QUIC; if blocked by a corporate firewall or proxy, it automatically degrades to Long Polling to ensure connectivity, albeit with higher latency.

3. Client-Side Engineering: The Typing Indicator Case Study

The typing indicator ("Alice is typing...") is a feature that appears trivial but presents a massive scaling challenge. It represents ephemeral, high-frequency state. Unlike a message, which is immutable and valuable, a typing event is transient and loses value within seconds. If a user types 300 characters a minute, sending a packet for every keystroke would result in 300 requests per minute per user. For 1 million concurrent users, this would generate 5 million requests per second—a DDoS attack by design.

To manage this, client-side engineering must focus on signal reduction through Throttling and Debouncing.

3.1 The "Signaling Storm" and Rate Limiting

The primary goal is to decouple the physical keystroke rate from the network packet rate. We do not need to know what key was pressed, only that activity is occurring.

3.1.1 Debouncing: The Waiting Game

Debouncing ensures that a function is only triggered after a specific period of inactivity.

Logic: "Execute this function only if 500ms have passed since the last keystroke."
Scenario: A user types "Hello". They press 'H', 'e', 'l', 'l', 'o' in rapid succession (100ms apart).
- Without Debounce: 5 events sent.
- With Debounce (500ms): The timer resets on every keypress. The event fires only after 'o' is pressed and the user pauses for 500ms.
Critique for Typing Indicators: Pure debouncing is actually poor for the "Start Typing" signal because it introduces a delay. The recipient won't see "Alice is typing..." until Alice stops or pauses typing. It is, however, excellent for the "Stop Typing" signal.

3.1.2 Throttling: The Regular Pulse

Throttling ensures that a function is executed at most once in a specified time interval, regardless of trigger frequency.

Logic: "Execute this function at most once every 3000ms."
Scenario: User types continuously for 10 seconds.
- With Throttling (3000ms): An event is sent at t=0 (Start), t=3s, t=6s, and t=9s.
Application: This is the ideal strategy for the "Is Typing" signal. It guarantees the indicator appears immediately (Leading Edge execution) and refreshes the state periodically while the user continues to type, preventing the indicator from timing out on the receiver's end.

3.1.3 The Hybrid Implementation Strategy

A robust production implementation combines both techniques to manage the state machine effectively.

Event: First Keystroke (t=0):
- Condition: isTyping == false
- Action: Send TYPING_START packet immediately. Set isTyping = true. Start Throttle Timer (e.g., 3s).
Event: Continuous Typing (t < 3s):
- Action: Ignored by Throttle logic. Internal LastKeystrokeTime updated locally.
Event: Continuous Typing (t = 3s):
- Condition: User still typing.
- Action: Throttle timer expires. Send TYPING_RENEW packet. Reset Throttle Timer.
Event: User Stops Typing:
- Action: A separate Debounce Timer (e.g., 1000ms) runs on every keystroke.
- Trigger: When user stops, Debounce Timer expires.
- Output: Send TYPING_STOP packet. Set isTyping = false. Cancel Throttle Timer.

This logic ensures the network sees a low-frequency pulse (1 packet every 3s) during activity and a precise stop signal, minimizing load while maximizing UI responsiveness.

3.2 Optimizing Payloads: JSON vs. Protobuf

Once the frequency of events is minimized, the size of each event becomes the next optimization target.

JSON (JavaScript Object Notation):

Format: {"event": "typing", "user": 102, "chat": 555}
Pros: Human-readable, native to browsers, easy to debug.
Cons: Verbose. The field names ("event", "user") are repeated in every packet, wasting bandwidth. Payload size: ~45-50 bytes.

Protocol Buffers (Protobuf):

Format: Binary serialization based on a predefined schema (.proto).
Pros: Extremely compact. Field names are replaced by integer tags (1, 2, 3).
Cons: Requires schema management and compilation steps.
Impact: A typing event might be reduced to 0x08 0x66 0x10 0x2B (approx. 4-10 bytes). At 1 billion daily events, this 80% reduction translates to terabytes of saved bandwidth and lower latency on congested mobile networks.

Insight: High-scale apps often use JSON for the control plane (creating groups, updating profiles) where flexibility is needed, but switch to Protobuf or custom binary formats for the data plane (messages, presence, typing) where volume is high and schema is stable.

4. Gateway Architecture: Managing Millions of Connections

Scaling a web server to handle millions of persistent connections is fundamentally different from scaling a REST API. In a REST architecture, a server is stateless; any request can be routed to any server. In a chat architecture, the connection is stateful. If User A is connected to Gateway-01, Gateway-01 is the only server that can push a message to User A over that specific WebSocket.

4.1 The Gateway Service Pattern

The Gateway Service (or Connection Layer) is responsible for holding the open TCP/WebSocket connections.

Responsibility: It performs TLS termination, authentication, and protocol decoding (e.g., unwrapping the Protobuf payload).
Sharding: Because a single server has limits on open file descriptors (sockets) and RAM, connections are sharded across hundreds of gateway nodes. Discord and WhatsApp use Consistent Hashing or a Service Discovery mechanism to route users to specific gateways based on User ID or simply utilizing Least-Connection load balancing.
The Problem of Addressability: If User A sends a message to User B, and User B is on a different gateway, the system needs a way to route that message. This necessitates a "Session Map" or "Presence Service"—typically a highly available key-value store (like Redis or an in-memory distributed table) that maps UserID -> GatewayID.

4.2 Handling Reconnections and the "Thundering Herd"

A critical failure mode in messaging architectures is the "Thundering Herd." If a Gateway node holding 100,000 connections crashes, all 100,000 clients effectively disconnect simultaneously. Their retry logic (often just a while(!connected) connect() loop) triggers an immediate storm of reconnection requests to the remaining healthy servers.

Mitigation Strategies:

Exponential Backoff with Jitter: Clients must not reconnect immediately. They should wait a random amount of time (Jitter) that increases exponentially with each failure (e.g., wait = random(0, 2^n * 100ms)). This spreads the load over time.
Shedding Load: Gateways should actively reject new connections if their pending handshake queue exceeds a threshold, prioritizing the stability of existing sessions over new ones.

4.3 The Ephemeral State Problem (Presence)

Tracking who is "Online" is another variant of the typing indicator problem. Writing to a database every time a user connects or disconnects is too slow.

Heartbeat-based Presence:

Gateways maintain a "last heartbeat" timestamp for every connection in local memory. Periodically (e.g., every 10s), the Gateway updates a distributed cache (Redis) with a TTL (Time To Live) of 15s.

Online: User sends heartbeat -> Gateway updates Redis (TTL extended).
Offline: User disconnects -> Heartbeats stop -> Gateway stops updating Redis -> Redis key expires automatically. This "soft state" approach ensures that if a Gateway crashes, the users on it are eventually marked offline when their keys expire, without requiring an explicit "I am disconnecting" packet (which can't be sent during a crash).

5. Backend Concurrency: The Engine Room

Once a connection is established, the backend must route messages efficiently. The choice of programming language and concurrency model dictates the hardware efficiency of the system.

5.1 The Concurrency Challenge: Threads vs. Lightweight Processes

In traditional models (like Java's thread-per-request), each connection requires a dedicated OS thread. An OS thread has a large stack (1MB+) and significant scheduling overhead. Hosting 1 million connections would theoretically require 1TB of RAM just for stack space, and the CPU would spend 100% of its time context switching.

5.2 Erlang and the BEAM VM (The WhatsApp Model)

WhatsApp and Discord (via Elixir) famously utilize the Erlang ecosystem. Erlang solves the C10M problem using the Actor Model.

Lightweight Processes (Actors): An Erlang process is not an OS thread. It is a virtual execution unit managed by the BEAM VM. It starts with a tiny stack (approx 300 words or ~2KB).
Isolation: Actors share no memory. They communicate via message passing. If one Actor crashes (e.g., due to a malformed packet), it does not corrupt the memory of others. This "Let it Crash" philosophy allows the system to be self-healing.
Preemptive Scheduling: The BEAM VM acts as an operating system for the application. It allocates a "reduction budget" (CPU ticks) to each process. Once the budget is spent, the process is paused, and another takes over. This ensures that a heavy calculation for one user never blocks the latency of another user's message.

5.3 Go and CSP (The Modern Alternative)

Go (Golang) approaches concurrency using Goroutines and Channels (Communicating Sequential Processes).

Goroutines: Like Erlang processes, these are lightweight, user-space threads managed by the Go runtime. They start small (2KB) and grow dynamically.
Performance: Go compiles to machine code (unlike Erlang's bytecode), offering raw CPU performance closer to C++.
Trade-off: Go lacks the complete memory isolation of Erlang. Panic in a goroutine can potentially crash the program if not recovered, and garbage collection (though highly optimized) is global, whereas Erlang's GC is per-process.

Comparative Concurrency:

Feature	Erlang/Elixir (BEAM)	Go (Golang)	Java (Traditional)
Unit of Concurrency	Actor (Process)	Goroutine	OS Thread
Memory Footprint	Very Low (~2KB)	Very Low (~2KB)	High (~1MB)
Communication	Message Passing (Copy)	Channels (Ref/Copy)	Shared Memory (Locks)
Garbage Collection	Per-Process	Global Mark-and-Sweep	Global (Generational)
Fault Tolerance	Supervisor Trees (High)	Manual Error Handling	Exception Handling

Insight: For chat systems where latency consistency and uptime are paramount (WhatsApp), Erlang/Elixir is often favored. For systems requiring raw throughput and compute power (e.g., video encoding or massive logic processing), Go or Rust is preferred.

6. Message Routing and Brokerage

When Gateway-A receives a message intended for a user on Gateway-B, it needs a mechanism to transfer that data. This is the domain of the Message Broker.

6.1 The "Fan-Out" Problem

The complexity of routing scales with group size.

1:1 Chat: Gateway-A publishes 1 message; Gateway-B receives 1 message.
Group Chat (1000 users): Gateway-A receives 1 message. It must identify the 1000 recipients, look up their Gateway locations (which might be spread across 500 different servers), and forward the message.
Fan-out on Write: The sender bears the cost. The message is duplicated into the queues of all recipients immediately. This optimizes for read latency (the receiver gets it instantly) but spikes write load.
Fan-out on Read: The message is stored in a single "Group Timeline." Recipients poll/pull the timeline. This saves write resources but introduces latency and complexity in notifying users that new data exists.

Most real-time systems use a hybrid Fan-out on Write for active sessions (pushing to connected sockets) and Fan-out on Read for persistence (storing one copy in the DB).

6.2 Distributed Pub/Sub: Redis vs. Kafka

The choice of broker defines the system's reliability and latency profile.

Redis Pub/Sub:

Architecture: In-memory, push-based.
Pros: Ultra-low latency (sub-millisecond). Simple semantics (PUBLISH channel payload).
Cons: Fire-and-forget. If a subscriber (Gateway) is temporarily disconnected or busy, the message is lost. Redis buffers are limited; if the consumer can't keep up, it disconnects.
Use Case: Ephemeral events (Typing indicators, Presence, Online/Offline status) where occasional data loss is acceptable in exchange for speed.

Apache Kafka:

Architecture: Disk-based (log), pull-based.
Pros: Durability and Persistence. Messages are written to disk and replicated. If a consumer crashes, it can restart and "replay" the log from the last known offset to recover missed messages.
Cons: Higher latency (milliseconds to tens of milliseconds) due to disk I/O and replication. Complexity of partition management.
Use Case: The actual Chat Messages. Durability is non-negotiable here. A message must not be lost even if a gateway crashes.

The Hybrid Architecture:

Modern architectures often use both.

Typing/Presence: Routed via Redis Pub/Sub for instant feedback.
Messages: Written to Kafka first. A separate fleet of "Consumer Workers" reads from Kafka and pushes to the Gateways (for delivery) and to the Database (for long-term storage).

8. Distributed Consistency and Race Conditions

Perhaps the most intellectually demanding aspect of messaging engineering is ensuring that all users see the same reality. In a distributed system, "Time" is an illusion, leading to race conditions.

8.1 The Illusion of Time and Ordering

Scenario:

User A sends "Hello" (processed by Server 1).
User B sees "Hello" and replies "Hi" (processed by Server 2).

If Server 1's system clock is 100ms ahead of Server 2's, User B's "Hi" might receive a timestamp earlier than User A's "Hello." If the client sorts by timestamp, the chat will read:

User B: Hi
User A: Hello

This violation of causality creates confusion.

8.1.1 Logical Clocks

To solve this, engineers rely on Logical Clocks rather than physical wall clocks.

Lamport Timestamps: A simple counter passed with every message. If a server receives a message with counter N, it sets its own clock to N+1. This ensures that if Event A caused Event B, A has a lower number.
Sequence Numbers: A more practical implementation for chat is the Sequence ID. The database (or a dedicated Sequencer service like Snowflake) assigns a strictly increasing ID to every message in a channel. Clients sort by Sequence ID, not timestamp. If a client receives Message ID 5 and then Message ID 7, it knows definitively that Message ID 6 is missing and can request a refetch (Gap Detection).

8.2 Identifying and Handling Race Conditions

Scenario 1: The "Ghost" Notification (Out-of-Order Delivery)

Race: User receives a "New Message" push notification (via APNs) before the actual message data arrives via WebSocket (due to network routing).
Identification: The client app compares the local max Sequence ID with the ID in the notification.
Handling: If the notification ID > local ID, the client shows a generic "New Message" alert and triggers a fetch. It does not try to display the message content until the gap is filled.

Scenario 2: The Group Permission Race

Race: Admin A removes User B from a group. Simultaneously, User B sends a message.
Identification: The server processing User B's message checks the membership cache. If the "Remove" event hasn't propagated to that cache yet, the message might go through.
Handling (Causal Consistency): Systems use Vector Clocks or Version Vectors for group state. The "Remove" command increments the group version (v1 -> v2). User B's message is signed with v1. When the message reaches the persistence layer, the database sees the group is at v2 and rejects the v1 message, returning an error to User B.

Scenario 3: Concurrent Edits

Race: Two admins change the group description at the same time.
Handling (LWW or CRDTs):
- Last Write Wins (LWW): The server with the highest physical timestamp overwrites the other. Simple, but can lose data.
- CRDTs (Conflict-free Replicated Data Types): Advanced systems (like Discord's collaborative editing features) use CRDTs to merge changes mathematically, ensuring both edits are preserved or resolved deterministically without a central arbiter.

9. Mobile Optimization and Battery Life

Mobile architecture is dominated by the constraints of the cellular radio. The radio has high-power (Active) and low-power (Idle) states. Every packet sent wakes the radio, consuming significant battery.

9.1 The Radio State Machine and Adaptive Heartbeats

If a chat app sends a "keep-alive" ping every 15 seconds, the radio never sleeps. The battery drains in hours.

However, if the app doesn't ping, the NAT (Network Address Translation) router at the ISP will drop the silent connection mapping, and the user will stop receiving messages.

Optimization: Adaptive Heartbeats

The client algorithmically discovers the maximum timeout of the current network.

Start with a keep-alive interval of 4 minutes.
If the connection dies, reduce to 3 minutes.
If the connection survives, try 5 minutes. This allows the app to find the "sweet spot"—keeping the connection open with the minimum number of wake-ups (e.g., one ping every 9 minutes on T-Mobile, vs. every 4 minutes on AT&T).

9.2 Push Notifications as Fallback

Operating Systems (iOS/Android) aggressively kill background processes. A reliable messaging architecture cannot depend solely on WebSockets.

The Flow: When the app is killed, the WebSocket is severed. The Gateway marks the user as "Push Reachable."
The Wake-up: When a message arrives, the Gateway sends a payload to Apple (APNs) or Google (FCM).
The "Tickle": The OS receives this, wakes the app in the background, and gives it 30 seconds of execution time. The app reconnects the WebSocket, fetches the message, stores it locally, and then displays the notification. This ensures that when the user taps "Open," the message is already there.

Works Cited

1. Zeel Suthar. "WebSockets vs SSE vs MQTT." Medium. Accessed February 6, 2026. https://medium.com/@zeelm2014/websockets-vs-sse-vs-mqtt-c4fd2bb4ffbf

2. Himanshu Dhiman. "Go vs. Elixir: The Ultimate Battle for Backend Concurrency." Medium. Accessed February 6, 2026. https://medium.com/@hmnshudhmn24/go-vs-elixir-the-ultimate-battle-for-backend-concurrency-a1a5b15de26d

3. "Adding Typing Indicators to Real Time Chat Applications." DEV Community. Accessed February 6, 2026. https://dev.to/hexshift/adding-typing-indicators-to-real-time-chat-applications-76p

4. "Scaling realtime messaging for live chat experiences: Challenges and best practices." Ably.com. Accessed February 6, 2026. https://ably.com/blog/scaling-realtime-messaging-for-live-chat-experiences

5. Hamza. "WebSockets vs. Real-Time Rivals: A Deep Dive into SSE, Long-Polling, MQTT, and XMPP." DEV Community. Accessed February 6, 2026. https://dev.to/sshamza/websockets-vs-real-time-rivals-a-deep-dive-into-sse-long-polling-mqtt-and-xmpp-4hij

6. "Protocol Comparisons: Choosing the Right Real-Time Technology." WebSocket.org. Accessed February 6, 2026. https://websocket.org/comparisons/

7. "WebSockets vs MQTT: Web vs IoT Communication Protocols." WebSocket.org. Accessed February 6, 2026. https://websocket.org/comparisons/mqtt/

8. "Learn Debounce And Throttle In 16 Minutes." YouTube. Accessed February 6, 2026. https://www.youtube.com/watch?v=cjIswDCKgu0

9. "WebRTC Signaling Protocols: Comparing WebSocket, SIP, XMPP, and MQTT." Sheerbit. Accessed February 6, 2026. https://sheerbit.com/webrtc-signaling-protocols-comparing-websocket-sip-xmpp-and-mqtt/

10. "WebSocket vs MQTT: Performance Comparison for Enterprises." Lightyear.ai. Accessed February 6, 2026. https://lightyear.ai/tips/websocket-versus-mqtt-performance

11. "MQTT vs WebSocket - Which protocol to use when in 2024." Ably Realtime. Accessed February 6, 2026. https://ably.com/topic/mqtt-vs-websocket

12. Jayanth Thalla. "Developer Blog: How WhatsApp Handles Real-Time Messaging." Medium. Accessed February 6, 2026. https://medium.com/@jayanththalla33/developer-blog-how-whatsapp-handles-real-time-messaging-and-why-messages-say-waiting-for-this-0c00231dbfe7

13. "JavaScript Debounce vs. Throttle." Syncfusion Blogs. Accessed February 6, 2026. https://www.syncfusion.com/blogs/post/javascript-debounce-vs-throttle

14. "Debounce - Glossary." MDN Web Docs. Accessed February 6, 2026. https://developer.mozilla.org/en-US/docs/Glossary/Debounce

15. "Debouncing and Throttling Explained Through Examples." CSS-Tricks. Accessed February 6, 2026. https://css-tricks.com/debouncing-throttling-explained-examples/

16. "React Interview Coding Challenges: Debounce & Throttle Explained." YouTube. Accessed February 6, 2026. https://www.youtube.com/watch?v=Z8RP1venCCk

17. "Understanding Debouncing and Throttling in JavaScript – A Comprehensive Guide." Perficient Blogs. Accessed February 6, 2026. https://blogs.perficient.com/2024/11/12/understanding-debouncing-and-throttling-in-javascript-a-comprehensive-guide/

18. Ankit. "Debouncing vs Throttling: When to Use Which." JavaScript in Plain English. Accessed February 6, 2026. https://javascript.plainenglish.io/debouncing-vs-throttling-when-to-use-which-f11d600380e5

19. "Protobuf vs JSON Explained: Speed, Size & When to Use Each." DEV Community. Accessed February 6, 2026. https://dev.to/arnavsharma2711/protobuf-vs-json-explained-speed-size-when-to-use-each-5gg6

20. "How are protocol-buffers faster than XML and JSON?" Stack Overflow. Accessed February 6, 2026. https://stackoverflow.com/questions/52146721/how-are-protocol-buffers-faster-than-xml-and-json

21. "Protobuf vs JSON: Performance, Efficiency & API Speed." Gravitee.io. Accessed February 6, 2026. https://www.gravitee.io/blog/protobuf-vs-json

22. "Protobuf vs. JSON: Choosing the Right Data Format for API Development." AbstractAPI. Accessed February 6, 2026. https://www.abstractapi.com/guides/api-glossary/protobuf-vs-json

23. Hiren. "Protobuf vs JSON: I Cut My API Payload Size by 80%." Medium. Accessed February 6, 2026. https://medium.com/@hiren6997/protobuf-vs-json-i-cut-my-api-payload-size-by-80-heres-how-f04f9d95ccc8

24. Yadav Padiyar. "Scaling Up #5 — Discord: Real-Time Architecture at Internet Scale." Medium. Accessed February 6, 2026. https://medium.com/@yadavmpadiyar/%EF%B8%8F-scaling-up-5-discord-real-time-architecture-at-internet-scale-bef4be6b7198

25. "Real-time Messaging." Slack Engineering. Accessed February 6, 2026. https://slack.engineering/real-time-messaging/

26. Arvind Kumar. "Designing a Real-time Chat App (WhatsApp, Slack)." Medium. Accessed February 6, 2026. https://codefarm0.medium.com/designing-a-real-time-chat-app-whatsapp-slack-bf17912356d7

27. "How to Optimize Real-Time Data Synchronization." Zigpoll. Accessed February 6, 2026. https://www.zigpoll.com/content/how-can-we-optimize-realtime-data-synchronization-between-our-mobile-app-and-backend-services-to-reduce-latency-and-improve-user-experience

28. Gyanaa Vaibhav. "Designing a Real-Time Chat App That Actually Scales." Medium. Accessed February 6, 2026. https://medium.com/@gynanrudr0/designing-a-real-time-chat-app-that-actually-scales-no-bullsh-t-just-systems-that-work-0f3a2f1a35e8

29. "What to Choose for Your Synchronous and Asynchronous Communication Needs." Redis. Accessed February 6, 2026. https://redis.io/blog/what-to-choose-for-your-synchronous-and-asynchronous-communication-needs-redis-streams-redis-pub-sub-kafka-etc-best-approaches-synchronous-asynchronous-communication/

30. "Comparing Elixir and Go." CloudBees. Accessed February 6, 2026. https://www.cloudbees.com/blog/comparing-elixir-go

31. "Erlang vs Elixir vs Go for Backend Development." Index.dev. Accessed February 6, 2026. https://www.index.dev/skill-vs-skill/backend-elixir-vs-erlang-vs-go

32. "How Discord Stores Trillions of Messages." Discord Blog. Accessed February 6, 2026. https://discord.com/blog/how-discord-stores-trillions-of-messages

33. "How WhatsApp handles 50 billion messages a day?" GeeksforGeeks. Accessed February 6, 2026. https://www.geeksforgeeks.org/system-design/how-whatsapp-handles-50-billion-messages-a-day/

34. "Go or Elixir which one is best for chat app services?" Elixir Forum. Accessed February 6, 2026. https://elixirforum.com/t/go-or-elixir-which-one-is-best-for-chat-app-services/49577

35. "How WhatsApp Works - Architecture Deep Dive." GetStream.io. Accessed February 6, 2026. https://getstream.io/blog/whatsapp-works/

36. "Redis OSS vs Kafka - Difference Between Pub/Sub Messaging Systems." AWS. Accessed February 6, 2026. https://aws.amazon.com/compare/the-difference-between-kafka-and-redis/

37. Indraneel Sarode. "Building a Scalable Chat App Using WebSockets, Redis, Kafka, and PostgreSQL." Medium. Accessed February 6, 2026. https://medium.com/@indraneelsarode22neel/building-a-scalable-real-time-chat-app-from-single-server-to-distributed-powerhouse-614bb3391fa8

38. Tejas Gupta. "Battle of the Streams: Redis Pub/Sub vs Kafka Streams for Real-Time Systems." Medium. Accessed February 6, 2026. https://medium.com/@2017tejasgupta/battle-of-the-streams-redis-pub-sub-vs-kafka-streams-for-real-time-systems-bdb7f1d18ee9

39. "Redis vs Kafka: A comprehensive comparison for developers." DoubleCloud. Accessed February 6, 2026. https://double.cloud/blog/posts/2024/02/redis-vs-kafka/

40. Alex Aslam. "How Discord Uses Event Sourcing for Message History." DEV Community. Accessed February 6, 2026. https://dev.to/alex_aslam/how-discord-uses-event-sourcing-for-message-history-3n9h

41. "Understanding WhatsApp's Architecture & System Design." CometChat. Accessed February 6, 2026. https://www.cometchat.com/blog/whatsapps-architecture-and-system-design

42. "How does the messenger maintain the sequencing of the messages?" Codemia. Accessed February 6, 2026. https://codemia.io/knowledge-hub/path/how_does_the_messenger_maintain_the_sequencing_of_the_messages_during_chat_and_when_users_log_in_again

43. "Message Ordering." University of Texas. Accessed February 6, 2026. https://users.ece.utexas.edu/~garg/dist/jbkv2/chapter12-order.pdf

44. "System design question - ordered message delivery in a messaging app." Stack Overflow. Accessed February 6, 2026. https://stackoverflow.com/questions/68693285/system-design-question-ordered-message-delivery-in-a-messaging-app

45. "Cloud-Edge Hybrid Applications." UNL Repository. Accessed February 6, 2026. https://run.unl.pt/bitstream/10362/148917/1/Linde_2022.pdf

46. "Sequence numbers in Secret Chats." Telegram APIs. Accessed February 6, 2026. https://core.telegram.org/api/end-to-end/seq_no

47. "Handling Race Conditions / Concurrency in Network Protocol Design." Stack Overflow. Accessed February 6, 2026. https://stackoverflow.com/questions/38381044/handling-race-conditions-concurrency-in-network-protocol-design

48. "Vector Clocks in Distributed Systems." GeeksforGeeks. Accessed February 6, 2026. https://www.geeksforgeeks.org/computer-networks/vector-clocks-in-distributed-systems/

49. "Adaptive Heartbeats For Our Information Superhighway." Gojek Blog. Accessed February 6, 2026. https://www.gojek.io/blog/adaptive-heartbeats-for-our-information-superhighway

50. Deepanshu. "Adaptive Heartbeats For Our Information Superhighway." Medium (Gojek Engineering). Accessed February 6, 2026. https://medium.com/gojekengineering/adaptive-heartbeats-for-our-information-superhighway-26459bf85d62

51. "WebSocket vs Push Notification: Enterprise Communication Comparison." Lightyear.ai. Accessed February 6, 2026. https://lightyear.ai/tips/websocket-versus-push-notification

52. "Best Way to Implement Real-Time Notifications (FCM, WebSockets, or Alternatives)?" Reddit (r/FlutterDev). Accessed February 6, 2026. https://www.reddit.com/r/FlutterDev/comments/1ire2q5/best_way_to_implement_realtime_notifications_fcm/