The Application Layer is the most important layer because network applications are the reason for a computer network’s existence (raisons d’être). The continuous creation of new and useful applications has been the driving force behind the Internet’s success.

Internet applications have continuously evolved:

Classic (1970s-1980s): Text-based applications like text email, remote access, file transfers, and newsgroups.
The Killer App (Mid-1990s): The World Wide Web (Web surfing, search, e-commerce).
New Millennium: Highly compelling applications emerged, including:
- Voice/Video: VoIP and video conferencing (Skype, Facetime).
- User-Generated Content: Video platforms (YouTube) and streaming (Netflix).
- Social Networking: Platforms that created human networks layered on top of the physical Internet (Facebook, Twitter).
Recent Mobile Era (4G/5G): A huge growth of location-based mobile apps (Waze, Yelp, Tinder), mobile payment apps (Apple Pay), and messaging apps (WhatsApp).

This chapter will study the conceptual and implementation aspects of these applications by:

Defining key concepts like network services, clients and servers, and processes.
Examining applications in detail: Web, e-mail, DNS, P2P file distribution, and video streaming.
Covering application development using the socket interface over both TCP and UDP.

The Application Layer is a good starting point because it is familiar and helps introduce core protocol issues that will be seen again in the lower layers.

2.1 Principles of Network Applications

Developing a network application involves writing programs that run on different end systems and communicate with each other over the network.

Example (The Web): There are two distinct programs that communicate: the browser program (running on your host/client) and the Web server program (running on the server host).
- The Server Location: Servers are often housed in large buildings called data centers.

Crucial Design Principle (Confining Software). When you develop a network application, you only need to write software that runs on the end systems (hosts). You do not need to write software for the network-core devices, like routers or link-layer switches. This is because network-core devices only operate at the Network Layer and below, not the Application Layer. This fundamental design choice—confining application software to the edge—has significantly facilitated the rapid development and deployment of new network applications.

2.1.1 Network Application Architectures

An application’s architecture is the plan for how the software is structured across the various end systems. This is something the developer designs, and it is different from the fixed Internet network architecture (the 5-layer model).

When choosing an application architecture, developers typically pick one of two main designs: Client-Server or Peer-to-Peer (P2P).

1. Client-Server Architecture

This is the traditional and most common model.

Server: An always-on host that is ready to serve requests. It has a fixed, well-known IP address so clients can always find it.
Clients: Many other hosts that request services from the server.
Interaction: Clients request objects (like a web page), and the server sends the requested object back. Clients do not communicate directly with each other.
Examples: The Web, e-mail, FTP, and Telnet.
Handling High Demand (Data Centers): For popular applications like Google, Amazon, and Facebook, a single server can’t handle all the requests. To solve this, a data center is used. A data center houses a large number of hosts that work together to create a powerful virtual server. Service providers must pay for the maintenance, power, and bandwidth costs for these large centers.

2. Peer-to-Peer (P2P) Architecture

In a P2P architecture, there is minimal or no reliance on dedicated servers.

Peers: The application uses direct communication between pairs of hosts (peers) that are often intermittently connected (meaning they are not always online). These peers are not owned by the service provider; they are users’ desktops or laptops.
Examples: File-sharing applications like BitTorrent.
Advantages:
- Self-Scalability: As more users (peers) join and request files, they also add service capacity by distributing those files to others.
- Cost Effective: They usually don’t require expensive dedicated server infrastructure or bandwidth.
Challenges: P2P networks face difficulties with security, performance, and reliability because the structure is highly decentralized and relies on individual user devices.

2.1.2 Processes Communicating

It’s not actually programs that communicate, but processes—a process is simply a program that is currently running on an end system. Communication between processes on the same end system is governed by the end system’s operating system. Instead, we are interested in how processes run on different hosts (which is what networking is interested in). In this scenario, communication happens through the exchange of messages over the network.

Client and Server Processes

A network application is built upon pairs of communicating processes. To organize this communication, we label one process as the client and the other as the server.

Client Definition: The process that initiates the communication (makes the first contact).
Server Definition: The process that waits to be contacted to begin the session.

This definition holds true even in flexible architectures like P2P:

In the Web, the browser process is the client, and the Web server process is the server.
In P2P file sharing, if Peer A asks Peer B for a file, Peer A is the client, and Peer B is the server for that specific file transfer session. (The same peer can be both a client and a server at different times).

The interface between the Process and the Computer Network

Most applications consist of pairs of communicating processs, where a process sends and receives messages to the network through a software interface called a socket.

Analogy: The process is like a house, and the socket is like the house’s door. A message is sent out the sender’s “door” (socket) and arrives at the receiver’s “door” (socket).
API: The socket acts as the interface between the Application Layer and the Transport Layer within a host. It is also known as the Application Programming Interface (API) for building network applications.
Developer Control: The application developer controls the application-layer side of the socket. They only have limited control on the transport side, mainly choosing the transport protocol (TCP or UDP) and perhaps setting buffer size.

Addressing Processes

To successfully send data (packets) from a process on one computer (host) to a process on another, the network needs two key pieces of information to create a complete address:

The Address of the Host: This identifies the specific computer on the network.
- In the Internet, this is the IP address. You can think of the IP address as a unique, 32-bit quantity that identifies the host.
An Identifier for the Receiving Process: This identifies which specific application (or socket) on that host should receive the message.
- This information is necessary because one host can run many different network applications at the same time. A destination port number serves this purpose: it’s used to identify the receiving process.
  - Port Numbers are assigned to popular, standard applications to make them easy to find.
    - For example, a Web server uses port number 80.
    - A mail server (using the SMTP protocol) uses port number 25.
  - The official list of these well-known port numbers is maintained by the IANA.

2.1.3 Transport Services Available to Applications

Let’s recall what a socket is:

A socket is the interface between the application process and the transport-layer protocol. The application at the sending side pushes messages through the socket. At the other side of the socket, the transport-layer protocol has the responsibility of getting the messages to the socket of the receiving process.

The socket acts as the connection point between the application process and the Transport Layer. When you, as a developer, create an application, you must choose one of the available transport-layer protocols (like TCP or UDP in the Internet). This choice depends on which protocol’s services best match your application’s needs.

Transport protocols can offer services categorized into four main areas.

1. Reliable Data Transfer

The Problem: Packets can be lost in the network (due to router buffer overflow or bit corruption). For crucial applications (like email, file transfer, or financial apps), losing data is catastrophic.
Service: A protocol that offers reliable data transfer guarantees that the data sent by the application will arrive at the destination correctly and completely. The sending process can send data into the socket with complete confidence it will be delivered.
Alternative (Unreliable): If a transport protocol doesn’t guarantee reliable transfer (like UDP), data might be lost. This is acceptable for loss-tolerant applications, such as conversational audio/video, where a small glitch is better than waiting for re-sent data.

2. Throughput

Definition: Throughput is the rate at which the sending process can actually deliver bits to the receiving process. This rate can change because other sessions are sharing the same network links.
Service: A transport protocol could potentially offer guaranteed available throughput at a specified rate ( $r$ bits/sec).
- Bandwidth-Sensitive Applications: These applications, like Internet telephony (VoIP) that encodes voice at a fixed rate (e.g., 32 kbps), require a minimum guaranteed throughput to function correctly.
- Elastic Applications: These apps, such as email, file transfer, and Web transfers, can use as much or as little throughput as is available. They work better with more bandwidth, but they don’t have a fixed minimum requirement.

3. Timing

Service: The protocol can offer a timing guarantee, ensuring, for example, that every bit sent arrives at the receiver no more than a certain amount of time later (e.g., 100 milliseconds).
Importance: This is vital for interactive real-time applications (like multiplayer games, video conferencing, and VoIP). Long delays in these apps create unnatural pauses or make the experience feel unrealistic. Non-real-time apps prefer lower delay, but don’t have a strict timing requirement.

4. Security

Service: A transport protocol can provide security services, often achieved through encryption.
- The protocol can encrypt the data in the sending host and decrypt it in the receiving host before delivering it to the application.
- This ensures confidentiality between the two processes, even if the data is monitored or observed while traveling through the network.
Additional Services: Transport protocols can also offer data integrity and end-point authentication.

2.1.4 Transport Services Provided by the Internet

Okay, so we just talked about the four main services a transport protocol could offer in theory. Now, let’s get specific and look at what the Internet actually gives us. As an application developer, one of the first big choices you have to make is simple: Do I use TCP or UDP? They are completely different because they offer a different set of services to the invoking application. The following figure shows the service requirements for some selected applications:

TCP Services (Transmission Control Protocol)

Think of TCP as the dependable, high-service option. It gives you two main guarantees:

It’s Connection-Oriented: Before any application data starts flying, the client and server processes have to exchange control information first. This is like a little “handshaking” procedure that gets them ready. Once that’s done, you have a full-duplex connection—meaning both sides can send messages to each other at the same time—and you have to remember to close that connection when you finish.
It’s Reliable Data Transfer: This is its most famous feature. You can rely on TCP to deliver all your data to the receiving side, without errors and in the proper order. You give it a stream of bytes, and it promises to deliver the exact same stream.

A Note on Congestion: TCP also plays a good citizen role. It includes congestion control, which means it will slow down your sending process if the network between you and the receiver gets too full. This is for the general health of the Internet, making sure everyone gets a fair share of the bandwidth.

UDP Services (User Datagram Protocol)

UDP is the opposite: it’s a no-frills, lightweight protocol offering only minimal services.

It’s Connectionless: There is no handshaking needed at all—you just start sending data immediately.
It’s Unreliable: This is important: UDP offers no guarantee that your messages will arrive, and the messages that do arrive might be out of order.
No Congestion Control: UDP doesn’t slow down for anyone. The sender can pump data into the network at whatever rate it wants, even if that causes congestion for others (though the links themselves will still limit the speed eventually).

Services Not Provided by Internet Transport Protcols

We listed throughput and timing as possible services, but it’s important to know that neither TCP nor UDP provides any guaranteed throughput or timing services today.

Does that mean time-sensitive apps like VoIP can’t run? No, they run all the time! They work by being designed to cope with the lack of guarantees, tolerating some delay or loss. But when the delay gets really bad, even clever design can’t save them. The Internet is generally satisfactory for these apps, but it simply can’t offer a hard guarantee.

The following figure indicates the transport protocols used by some popular Internet applications:

Considerations on this figure:

Apps that cannot tolerate data loss—like Web, email, and file transfer—must use TCP for its reliability.
Apps that are loss-tolerant but hate delays—like Internet telephony—usually prefer UDP. They choose UDP to bypass TCP’s strict reliability features and especially to avoid the congestion control mechanism that might slow them down. (However, because many firewalls block UDP, these apps often use TCP as a backup.)

2.1.5 Application-Layer Protocols

Processes communicate by sending messages into sockets, but to do this correctly, they need to agree on the rules—this is where application-layer protocols come in.

An application-layer protocol is the formal rulebook that defines how an application’s processes, running on different computers, pass messages to each other. Specifically, it defines four things:

The Message Types: For instance, which messages are requests and which are responses.
The Syntax: The structure of the messages, including the fields and how those fields are separated.
The Semantics: The meaning of the information contained within the fields.
The Rules: When and how a process should send messages and how it should respond to incoming messages.

Some protocols are standardized and available in the public domain (defined in RFCs). The most famous example is the Web’s protocol, HTTP (HyperText Transfer Protocol). If a browser follows the HTTP rules, it can talk to any Web server that also follows them. Other protocols, like those used by Skype, are proprietary and kept private.

Protocol vs. Application

It’s important to remember the difference between the network application and the application-layer protocol.

An application-layer protocol is only one piece of a network application.

Consider the Web as example:

The Web is a client-server application that allows users to obtain documents from Web servers on demand. The Web application consists of many components, including document standards (HTML), browsers (Chrome), servers (Apache), and an application-layer protocol.
The Web’s application-layer protocol is HTTP which defines the format and sequence of messages exchanged between the browser and the server. HTTP is only the protocol and it’s just one piece of the larger Web application.

Similar to the Web, Netflix’s video service is the entire application, while its application-level DASH protocol is just the rule set for message exchange between the Netflix server and your client app.

2.1.6 Network Applications Covered in This Book

Instead of listing every application, the book focuses on a small group that is both pervasive and important. The discussion is organized strategically:

The Web: Covered first because it’s hugely popular and its protocol, HTTP, is straightforward and easy to understand.
Electronic Mail (E-mail): This was the Internet’s first “killer application.” It’s slightly more complex than the Web because it uses several application-layer protocols, not just one.
DNS (Directory Service): This is a piece of core network functionality—it translates domain names (names we can read) into network addresses. It’s an excellent example of how such a core service is actually implemented at the Application Layer.
P2P File Sharing and Video Streaming: These topics will finish the chapter’s focus on modern applications, including the use of Content Distribution Networks (CDNs) for video streaming.

2.2 The Web and HTTP

Before the 1990s, the Internet was mostly used by academics for things like file transfer and basic email. Then, the World Wide Web arrived. This application was so successful that it captured the attention of the general public and basically turned the Internet into the only data network that matters.

People love the Web because it operates on demand—you get the content when you want it, unlike traditional broadcast TV. The Web also allows anyone to become a publisher at a low cost, and features like hyperlinks, search engines, photos, videos, and interactive forms make it a powerful platform. In fact, many modern applications like YouTube and mobile apps like Google Maps run on top of the Web’s architecture.

2.2.1 Overview of HTTP

The Web’s heart is its application-layer protocol, the HyperText Transfer Protocol (HTTP). HTTP is implemented as two programs—a client program (your browser) and a server program (the Web server)—that exchange HTTP messages.

Before explaining HTTP in details, we should review some Web terminology:

Web Page (also called Document): This is made up of objects.
- Object: Simply a file, such as an HTML file, a JPEG image, a video clip, etc., that can be addressed by a single URL.
- Most Web pages consist of a base HTML file and several referenced objects.
  - Example: if a Web page contains HTML text and five JPEG images, then the Web page has six objects.
The base HTML file references the other objects in the page with the objects’ URL (Uniform Resource Locator). This address has two parts: the hostname of the server (e.g., www.someSchool.edu) and the object’s path name (e.g., /someDepartment/picture.gif).
Because Web Browsers implement the client side of HTTP, in the context of the web, we will use the words browser and client interchangeably.
Web Server: Implements the server side of HTTP and houses the Web objects (e.g., Apache), each addressable by a URL.

The HTTP Interaction

HTTP defines how Web clients request Web pages from Web servers and how servers transfer Web pages to clients. The general process is straightforward: when a user asks for a page (e.g., clicks a link), the browser (client) sends HTTP request messages to the server for each object in the page. The server then receives these requests and answers with HTTP response messages that contain the requested objects.

A very important technical detail is that HTTP uses TCP as its underlying transport protocol, not UDP.

The HTTP client starts by initiating a TCP connection with the server.
Once that connection is established (through their respective sockets), the client sends its request message into its socket interface. The server receives it from its socket, and the response flows back the same way.
Because TCP provides reliable data transfer, HTTP doesn’t have to worry about crucial details like lost data, or dealing with data that arrives out of order. This is a huge benefit of the layered architecture—HTTP simply relies on TCP and the lower layers to handle those complex tasks.

A very important characteristic of HTTP is that it is a stateless protocol. The server maintains no state information about the client. If a client asks for the exact same object twice, the server doesn’t remember doing it the first time; it simply responds by sending the object again.

Remember, as described in 2.1 Principles of Network Applications, the Web uses the client-server application architecture—the server is always on and has a fixed address, and it services requests from potentially millions of different browsers.

The original version of HTTP is called HTTP/1.0. The most common versions used today are HTTP/1.1, though the newer HTTP/2 is increasingly supported.

2.2.2 Non-Persistent and Persistent Connections

When a client and server need to communicate for an extended time, sending a series of requests and responses, the application developer has to make a core choice about the TCP connection:

Non-Persistent Connections: Each request/response pair is sent over a separate, brand-new TCP connection, which is closed immediately after the object is transferred.
Persistent Connections: All requests and their corresponding responses are sent over the same, single TCP connection, which stays open for a period of time.

While HTTP can be configured for either, the default mode today uses persistent connections.

HTTP with Non-Persistent Connections

Let’s trace what happens when a page, consisting of a base HTML file and 10 images, is requested from one server using non-persistent connections. Suppose the URL for the base HTML is http://www.someSchool.edu/someDepartment/home.index.

Connection Setup (HTML): The client process initiates a TCP connection to the server www.someSchool.edu on port 80 (the default HTTP port).
Request (HTML): The HTTP client sends an HTTP request message for the base HTML via its socket. The request message includes the path name /someDepartment/home.index.
Response (HTML): The HTTP server receives the request message via its socket, retrieves the HTML file (the object /someDepartment/home.index) from its storage, encapsulates it in an HTTP response message, and sends it to the client via its socket.
Connection Close (HTML): After ensuring that the client has received the response message intact, the server tells TCP to close the connection.
Parsing: The client receives the response, the connection terminates, and the client reads the HTML file, finding the 10 references to the JPEG images.
Repetition: Steps 1 through 4 are repeated for each of the 10 referenced JPEG images.

In a non-persistent connection, each TCP connection is closed after the server sends the object, and each non-persistent TCP connection transports exactly one request message and one response message. For this reason, for this single Web page, 11 separate TCP connections are generated (1 for the base HTML file + 10 for the ten images).

Let’s do a quick calculation to estimate the amount of time it takes to fetch a single object (like the base HTML file) using a non-persistent connection. To do this, we need to introduce the Round-Trip Time (RTT): this is the fundamental time measurement in networking. It’s the time it takes for a small packet to travel from the client to the server and then receive a response back from the server to the client. The RTT includes every possible delay along the way: the time it takes for the signal to travel across the wire (propagation delay), any time spent waiting in router buffers (queuing delay), and the time routers spend processing the packet (processing delay).

Now, consider what happens when you click a link, initiating a non-persistent TCP connection:

TCP Handshake: The browser must first establish the TCP connection, which requires a “three-way handshake”.
- The client sends a small TCP segment (the SYN message) to the server.
- The server receives this, acknowledges it, and responds with its own small TCP segment (SYNACK).
  - The time for these first two steps (client $\to$ server, then server $\to$ client) takes exactly one RTT.
- The client sends its HTTP request message to the server, often combined with the final acknowledgement of the handshake. This ends the three-way handshake process.
- The server processes this request and starts sending the HTML file back.
  - The time for these last two steps consumes another RTT.

Therefore, the total time required to get even the smallest object is two RTTs, plus the time it takes for the server to actually transmit the file to the client. Since a new connection—and thus a new two-RTT delay—is required for every single object on a non-persistent page, this architecture is inherently inefficient. (As a side note, browsers sometimes mitigate this by opening multiple parallel TCP connections to fetch objects simultaneously, which helps but doesn’t eliminate the RTT overhead.)

HTTP with Persistent Connections

The shortcomings of the non-persistent model are significant:

Server Burden: A new connection means the server has to allocate dedicated TCP buffers and maintain variables for every requested object, putting a heavy load on a busy Web server.
Delay Overhead: Every single object suffers a delay of two full RTTs just for connection setup and initial request/response.

This led to the design of HTTP/1.1 persistent connections (which is the default today):

Connection Stays Open: The key is that the server does not close the TCP connection after sending a response.
Efficiency: Subsequent requests (like for the 10 images in our example) and their responses can be sent over this same connection. An entire Web page, or even multiple pages from the same server, can be delivered over one persistent TCP connection.
Pipelining: Clients can send requests for multiple objects back-to-back, without waiting for a reply to the previous request. The server then sends the objects back in sequence, allowing for much faster delivery.
Closure: The connection is only closed after a period of inactivity (a configurable timeout interval).

2.2.3 HTTP Request Message Format

We learned there are two types—request messages and response messages—and we’ll start with the requests sent by the client (your browser).

HTTP Request Message

The most important thing to notice right away is that the entire message is written in ordinary ASCII text. That means if you captured one of these messages, you (as a computer-literate human) could easily read and understand exactly what the client is asking for.

Let’s look at a classic example of an HTTP request message:

GET /somedir/page.html HTTP/1.1
Host: www.someschool.edu
Connection: close
User-agent: Mozilla/5.0
Accept-language: fr

The very first line is called the request line and is the most critical part, and it contains three key fields:

The Method Field: This tells the server the specific action the client wants to perform. In this example, it’s GET, which is the most common method, used when the browser simply wants to retrieve an object. Other common methods are POST (usually for sending form data), HEAD (for requesting header information only), PUT (for uploading files), and DELETE.
The URL Field: This identifies the specific object being requested. In our example, the client wants the object /somedir/page.html.
The HTTP Version Field: This is self-explanatory; it indicates the version of the protocol the client is using, which here is HTTP/1.1.

All the subsequent lines are called header lines, and they provide supplementary information about the client and the nature of the request:

Host: www.someschool.edu: This field specifies the name of the host where the object resides. This might seem redundant since the TCP connection is already open to this host, but this information is actually required by Web proxy caches (which we’ll cover later) to correctly route the request.
Connection: close: By including this header, the browser is specifically telling the server that it prefers a non-persistent connection. The server is instructed to close the TCP connection immediately after the requested object is sent.
User-agent: Mozilla/5.0: This header identifies the specific browser type (the “user agent”) making the request. In this case, it indicates a Firefox browser. This is useful because a server might send slightly different versions of the same object optimized for different types of browsers.
Accept-language: fr: This is an example of a content negotiation header. It tells the server that the user prefers the French version of the object, if one is available.

Let’s look at the general format of a request message:

Besides the fields we already examined, there is another one which is the entity body. The request message can optionally include an entity body after the headers:

With the common GET method, the entity body is empty.
The entity body is primarily used with the POST method. When a user fills out an HTML form and submits it, the data the user typed (like search terms) is put into the entity body of the POST message. The client is still requesting a page, but the content of that page depends on the submitted form data.
(Note: Forms sometimes use the GET method too; in that case, the data is just included as a long string directly within the URL itself.)

Finally, besides GET and POST, keep in mind methods like HEAD, which retrieves only the header information without the actual object (often for debugging), and PUT and DELETE, which allow clients to upload or delete objects on the server, respectively.

HTTP Response Message

A server’s reply to a client request has three main parts: the status line, the header lines, and the entity body.

Here is a typical HTTP response message:

HTTP/1.1 200 OK
Connection: close
Date: Tue, 18 Aug 2015 15:44:04 GMT
Server: Apache/2.2.3 (CentOS)
Last-Modified: Tue, 18 Aug 2015 15:11:03 GMT
Content-Length: 6821
Content-Type: text/html
(data data data data data ...)

This response message has three sections: an initial status line, six header lines, and then the entity body. Let’s see one by one.

The Status Line is the first line of the response, and it quickly tells the client the outcome of the request. It contains three fields:

Protocol Version: Confirms the HTTP version used by the server (e.g., HTTP/1.1).
Status Code: A three-digit number that indicates the result of the request (e.g., 200).
Status Message: A corresponding, human-readable phrase (e.g., OK, which means the server has found and is sending the requested object).

The status code is critical for the client software:

200 OK: The request succeeded, and the requested information is contained in the response.
301 Moved Permanently: The requested object has been permanently moved. The response includes the new URL in the Location: header, and the client browser will automatically fetch the object from the new address.
400 Bad Request: A generic error code, meaning the server could not understand the request message (often due to bad syntax).
404 Not Found: The server could not locate the requested document.
505 HTTP Version Not Supported: The server doesn’t support the specific HTTP version requested by the client.

The subsequent six lines compose the Header Lines and provide details about the server, the object, and how the connection should be handled:

Connection: close: Just like in the request, this informs the client that the server is going to terminate the TCP connection immediately after this message is fully sent.
Date:: Indicates the exact time and date when the HTTP response was created and sent by the server (not the time the object was modified).
Server:: Identifies the specific Web server software that generated the response (e.g., Apache/2.2.3). This is the server’s equivalent to the client’s User-agent: header.
Last-Modified:: This is a very important field! It indicates the time and date when the object was last changed or created. This header is critical for object caching (allowing clients and proxies to determine if their stored copy of an object is still fresh).
Content-Length:: Specifies the size of the object in the entity body, measured in bytes.
Content-Type:: Indicates the type of object contained in the entity body (e.g., text/html for an HTML document). The client uses this official header, not the file extension, to determine the object type.

The last line is the Entity Body, and it’s the “meat” of the message; it contains the requested object itself.

Let’s see the general format of an HTTP response message:

While HTTP servers are intentionally stateless (simplifying design and allowing them to handle many connections), it’s often desirable for most commercial websites to identify users to restrict access or personalize content. Cookies allow sites to achieve this user tracking.

Cookie technology involves four main components :

A Cookie header line in the HTTP request message.
A Set-cookie header line in the HTTP response message.
A cookie file stored on the user’s computer, managed by their browser.
A back-end database at the Web site (the server).

How Cookies Work (The First Visit)

Using the following picture, let’s walk through an example of how cookies work:

Let’s look at the process when a user, Susan, visits Amazon.com for the first time:

Request: Susan’s browser sends an HTTP request to Amazon.
Server Action: The Amazon server creates a unique identification number (e.g., 1678) and stores this number in its back-end database along with information about her session.
Response: The server sends the response to Susan’s browser, including the Set-cookie: header line, which contains this new identification number (e.g., Set-cookie: 1678).
Browser Action: Susan’s browser receives the response, sees the Set-cookie: header, and writes a new entry into its local cookie file, associating Amazon’s hostname with the ID 1678 (Note in the picture that Susan has already visited eBay in the past; indeed, in the local cookie file, there is already the identifier corresponding to it).

From this point on, every time Susan’s browser sends an HTTP request to Amazon:

The browser consults its local cookie file, retrieves the ID (1678), and inserts a Cookie: header line into the request (e.g., Cookie: 1678).
The Amazon server receives the request and uses the ID (1678) to look up Susan’s history and stored information in its back-end database.

The site can now track Susan’s activity (which pages she visited, when, and what she put in her shopping cart).

Here are the pros and cons:

Benefits: Cookies create a user session layer on top of stateless HTTP, enabling features like shopping carts, personalized product recommendations, and “one-click shopping” (if the user registers their personal and payment info with that ID).
Controversy: Cookies are often considered an invasion of privacy. By combining the cookie ID with user-supplied account information (name, email), a website can build a detailed profile of a user and potentially sell that information to third parties.

2.2.5 Web Caching

A Web cache (also known as a proxy server) is a network entity that fulfills HTTP requests on behalf of the origin Web server. It has local storage where it keeps copies of recently requested objects.

A user’s browser is typically configured to direct all HTTP requests to the local Web cache first. Suppose a browser is requesting the object http://www.someschool.edu/campus.gif; here’s what happens:

Client Request: The browser establishes a TCP connection to the Web cache and sends an HTTP request for an object.
Cache Check: The cache checks its local disk storage.
- If the object is found (Cache Hit): The cache immediately sends the object back to the browser in an HTTP response.
- If the object is not found (Cache Miss): The cache acts as a client.
Cache Retrieves: The cache opens a TCP connection to the origin server (the actual server that hosts the object, that is www.someschool.edu) and sends the request.
Server Response: The origin server sends the object back to the cache.
Cache Forwards & Stores: The cache stores a copy of the object locally and simultaneously forwards the object to the original requesting browser.

Note that a cache acts as both a server (to the client) and a client (to the origin server). Caches are typically installed by ISPs (Internet Service Providers) or institutions (like a university) to serve their users.

Web caching is deployed for two major reasons:

Reduced Response Time: If the connection between the client and the cache is high-speed (which is usually the case), and the object is found in the cache, the user gets the object much faster. This drastically improves the user experience, especially when the bottleneck is the connection to the public Internet.
Reduced Traffic on Access Links: By fulfilling requests locally, the cache significantly reduces the amount of traffic traveling over an institution’s expensive, lower-bandwidth link to the Internet. This saves the institution money on bandwidth upgrades.

The Power of Web Caching: A Quantitative Example

Let’s walk through the exact calculations that demonstrate why installing a Web cache is so beneficial for an institution. Imagine an institutional network (like a large university) connected to the public Internet.

Access Link Speed: The link connecting the institution to the Internet has a capacity of 15 Mbps (Megabits per second).
LAN Speed: The internal institutional network (LAN) is much faster, say, 100 Mbps.
Traffic Load:
- Average Object Size: 1 Mbits (Megabit).
- Average Request Rate: 15 requests per second ( $15 reqs/sec$ ).
Internet Delay: The average time for a request to travel from the access link router, get processed by the origin server, and return is 2.0 seconds (this is the external delay).
LAN Delay: The delay on the high-speed LAN is considered negligible.

No Cache (The Problem)

Without a cache, every single request must travel over the slow 15 Mbps access link. We first calculate the traffic intensity on this bottleneck link. Traffic intensity measures how heavily a network link is utilized (requests × size / bandwidth):

On the LAN:

(15 requests/sec × 1 Mbit/request) ÷ 100 Mbps = 0.15 (15% utilization)

Result: Only tens of milliseconds delay
Not a problem

On the Access Link:

(15 requests/sec × 1 Mbit/request) ÷ 15 Mbps = 1.0 (100% utilization)

Result: Massive delays (minutes!)
This is the bottleneck

Traffic Intensity

Traffic Intensity tells you what fraction of the link’s capacity is being used.

How much data arrives per second?

15 requests/sec × 1 Mbit/request = 15 Mbits/sec of data

What’s the link’s maximum capacity?

15 Mbps (15 Mbits/sec)

What fraction is being used?

15 Mbits/sec ÷ 15 Mbps = 1.0 (or 100%)

When traffic intensity approaches 1.0, delays grow exponentially. The 15 Mbps access link is fully saturated—it’s trying to handle exactly as much traffic as its maximum capacity. This creates:

Total response time = LAN delay + Access link delay + Internet delay
Total response time ≈ negligible + ENORMOUS + 2 seconds = MINUTES

This makes the system unusable. The access link is the chokepoint that needs solving—which is where web caching comes in as a solution.

One solution is to upgrade the access link to, say, 100 Mbps. $I_{new_access} = 15 Mbps /100 Mbps = 0.15$ With a traffic intensity of $0.15$ , the access link delay is negligible. The total average response time would then simply be the Internet delay, which is 2.0 seconds. This works, but it requires a costly link upgrade.

With Cache (The Cost-Effective Fix)

Instead of upgrading the link, the institution installs a Web cache and keeps the access link at 15 Mbps. We assume a typical hit rate for the cache:

Cache Hit Rate: 0.4 (40% of requests are satisfied almost immediately by the cache, say, within 10 milliseconds).

Since 40% of requests are handled by the cache, only the remaining 60% need to go over the slow 15 Mbps access link to go to the origin server, so the traffic intensity on the access link is reduced from 1.0 to 0.6. A traffic intensity of 0.6 is low enough that the access link delay is now only in the range of tens of milliseconds—we can consider it negligible again (typically, a traffic intensity less than 0.8 corresponds to a small delay).

The total response time is the weighted average of the time for cache hits and the time for cache misses.

Cache Hit Time (40%): Satisfied immediately over the high-speed LAN, resulting in a negligible delay (e.g., 0.01 seconds).
Cache Miss Time (60%): Must go through the access link and the Internet. Since the access link delay is now negligible, this time is dominated by the Internet delay ( $\approx 2.0 seconds$ ).

$Avg. Delay = (Hit Rate \times Hit Delay) + (Miss Rate \times Miss Delay)$

$Avg. Delay = (0.4 \times 0.01 sec) + (0.6 \times 2.0 sec)$

$Avg. Delay = 0.004 sec + 1.2 sec$

$Avg. Delay \approx 1.2 seconds$

Through the use of Content Distribution Networks (CDNs), Web caches are increasingly playing an important role in the Internet. A CDN company installs many geographically distributed caches throughout the Internet, thereby localizing much of the traffic. There are shared CDNs (such as Akamai and Limelight) and dedicated CDNs (such as Google and Netflix). We will discuss CDNs in more detail in Section 2.6 (TODO: add hyperlink to 2.6).

The Conditional GET

Caching introduces a problem: the copy stored in the cache might be stale (modified at the origin server since it was cached). HTTP solves this with the conditional GET. A request is a conditional GET if it uses the GET method and includes an If-Modified-Since: header line.

Conditional GET Process:

Initial Fetch: On behalf of a requesting browser, a proxy cache sends a request message to a Web server:
```
GET /fruit/kiwi.gif HTTP/1.1
Host: www.exotiquecuisine.com
```

The Web server sends a response message with the requested object to the cache, including the object’s creation/last modification time in the Last-Modified: header:

HTTP/1.1 200 OK
Date: Sat, 3 Oct 2015 15:39:29
Server: Apache/1.3.0 (Unix)
Last-Modified: Wed, 9 Sep 2015 09:23:24
Content-Type: image/gif
(data data data data data ...)

The cache forwards the object to the requesting browser, but also caches the object locally (the cache also stores the last-modified date along with the object).
Subsequent Request: When, one week later, a new request arrives, the cache sends a conditional GET to the server, using the stored date:
```
GET /fruit/kiwi.gif HTTP/1.1
Host: www.exotiquecuisine.com
If-Modified-Since: Wed, 9 Sep 2015 09:23:24
```
Server Decision (Object Not Modified): If the object has not changed since that date, the server responds with:
```
HTTP/1.1 304 Not Modified
(empty entity body)
```
This status code 304 Not Modified tells the cache, “You’re good to go,” allowing the cache to forward its local, existing copy to the browser without wasting bandwidth by re-sending the object data.
Server Decision (Object Modified): If the object has changed, the server sends a 200 OK response with the new object data in the entity body, and the cache updates its copy.

The conditional GET ensures that caching saves bandwidth while still providing the user with the most current version of the content.

2.2.6 HTTP/2

Standardized in 2015, HTTP/2 is the first significant update to HTTP since 1997. It is widely supported by top websites and major browsers today. The primary goal of HTTP/2 is to reduce user-perceived latency by improving the transport mechanism. It achieves this through several key features:

Multiplexing: Sending multiple requests and responses concurrently over a single TCP connection.
Prioritization: Allowing clients to indicate which responses are more important.
Server Push: Allowing the server to proactively send needed objects.
Compression: Efficiently compressing HTTP header fields.

Crucially, HTTP/2 does not change the application-level semantics: the methods (GET, POST), status codes (200 OK), URLs, and most header fields remain the same. It only changes how the data is formatted and transported.

While HTTP/1.1’s persistent connection simplified things by using only one TCP connection per page, developers ran into a serious issue: Head-of-Line (HOL) blocking.

The Issue: If a large object (like a video clip) is being sent first over that single TCP connection, smaller, more critical objects behind it are blocked and delayed, especially over slow links.
HTTP/1.1 Workaround: To avoid this, HTTP/1.1 browsers “cheat” by opening multiple parallel TCP connections (often up to six) to the same server. This allows the small objects to bypass the large blocked object, reducing perceived delay.
Unintended Side Effect: Because standard TCP congestion control algorithms (like CUBIC) are designed to distribute available bandwidth fairly per connection, opening multiple parallel connections is a form of “connection competition.” A browser using six connections will indeed attempt to claim a six-times larger share of the bottleneck bandwidth than a browser using only one connection, which can be an unintended side effect but is a practical reality of the workaround. When your browser opens six simultaneous TCP connections, the network sees this not as one user, but as six separate, independent flows competing for bandwidth.
- If you (User A) have 6 connections and another user (User B) has 1 connection, the total number of connections is 7.
- Under TCP’s model, your browser receives $\frac{6}{7}$ (about 86%) of the total available bandwidth, while User B receives only $\frac{1}{7}$ (about 14%).
  - This is considered unfair because your single application is intentionally taking a disproportionately large share of the common resource, degrading the performance (speed and latency) for all other users on the shared network (like users on the same Wi-Fi, the same local area network, or sharing the same connection to an ISP).

HTTP/2 Framing

HTTP/2’s core design goal is to eliminate the need for these parallel connections to both reduce server overhead and allow TCP congestion control to operate fairly. To do this without causing HOL blocking, it introduces framing:

Message Breakdown: Each HTTP message (request or response) is broken down into small, independent frames.
Interleaving: These frames from different requests/responses are then interleaved and sent over the single persistent TCP connection.
HOL Solved: This mechanism prevents a single large object from dominating the line.
- For example, consider a Web page consisting of one large video clip (consisting of 1000 frames) and 8 smaller objects (each consisting of 2 frames). This the server will receive 9 concurrent requests from any browser. With frame interleaving, after sending one frame from the video clip, the first frames of each of the small objects are sent. Then after sending the second frame of the video clip, the last frames of each of the small objects are sent. Thus, all of the smaller objects are sent after sending a total of 18 frames. This significantly decreases the user-perceived delay for small, important objects.
Reassembly: Frames are then reassembled on the other end.
Binary Encoding: The frames are also binary encoded, making the protocol more efficient to parse, smaller, and less prone to errors than the ASCII-based HTTP/1.1.

Response Message Prioritization and Server Pushing

Framing is the most important enhancement of HTTP/2 protocol. However, HTTP/2 has other features:

Response Message Prioritization:
- Clients can assign a weight (1 to 256) to concurrent requests to the server, indicating which responses are more urgent.
- The server uses these weights (and message dependencies) to prioritize which frames to send first, optimizing overall application performance.
Server Pushing:
- The server can send multiple responses for a single client request. That is, in addition to the response to the original request, the server can push additional objects to the client, without the client having to request each one.
- This is possible because HTML base page indicates the objects that will be needed to full render the Web page. The server analyzes the base HTML page, identifies objects that will definitely be needed (images, CSS, etc.), and proactively pushes them to the client before receiving explicit HTTP requests for them. This eliminates the latency that would be incurred waiting for those client requests.

The Future: HTTP/3

HTTP/3 is the successor to HTTP/2 and is designed to operate over a new “transport” protocol called QUIC (which is implemented over UDP, not TCP). QUIC natively provides features like message multiplexing and low-latency connection setup, which simplifies the design of HTTP/3 by taking over many of the complex functions HTTP/2 had to implement on its own.

2.4 DNS—The Internet’s Directory Service

Human beings can be identified in many ways: by names on birth certificates, social security numbers, or driver’s license numbers. Within a given context, one identifier may be more appropriate than another. Computers at the IRS prefer fixed-length social security numbers, while ordinary people prefer the more mnemonic birth certificate names. (Can you imagine saying, “Hi. My name is 132-67-9875. Please meet my husband, 178-87-1146.“)

Similarly, Internet hosts can be identified in multiple ways. One identifier is the hostname. Hostnames such as www.facebook.com and gaia.cs.umass.edu are mnemonic and appreciated by humans but provide little information about the host’s location within the Internet. A hostname like www.eurecom.fr, ending with country code .fr, tells us the host is probably in France but doesn’t say much more. Furthermore, because hostnames consist of variable-length alphanumeric characters, they would be difficult for routers to process. For these reasons, hosts are also identified by IP addresses.

An IP address consists of four bytes with a rigid hierarchical structure, looking like 121.7.106.83, where each period separates one of the bytes expressed in decimal notation from 0 to 255. An IP address is hierarchical because scanning the address from left to right yields increasingly specific information about where the host is located in the Internet, that is, within which network in the network of networks, similar to how scanning a postal address from bottom to top yields increasingly specific information about the addressee’s location.

2.4.1 Services Provided by DNS

There are two ways to identify a host: by hostname and by IP address. People prefer the more mnemonic hostname identifier, while routers prefer fixed-length, hierarchically structured IP addresses. To reconcile these preferences, we need a directory service that translates hostnames to IP addresses. This is the main task of the Internet’s domain name system (DNS). The DNS is simultaneously:

a distributed database implemented in a hierarchy of DNS servers
an application-layer protocol that allows hosts to query the distributed database.

DNS servers are often UNIX machines running the Berkeley Internet Name Domain (BIND) software. The DNS protocol runs over UDP and uses port 53.

DNS is commonly employed by other application-layer protocols, including HTTP and SMTP, to translate user-supplied hostnames to IP addresses. Consider what happens when a browser requests the URL www.someschool.edu/index.html:

The same user machine runs the client side of the DNS application.
The browser extracts the hostname (www.someschool.edu).
The browser passes this hostname to the DNS client running on the user’s machine.
The DNS client sends a query containing the hostname to a DNS server.
The DNS client eventually receives a reply containing the IP address for the hostname.
With the IP address, the browser can finally initiate the TCP connection to the HTTP server.

DNS adds an additional delay—sometimes substantial—to Internet applications that use it. Fortunately, the desired IP address is often cached in a nearby DNS server, which helps reduce DNS network traffic and average DNS delay.

DNS provides other important services beyond translating hostnames to IP addresses:

Host Aliasing: A host might have a complicated canonical hostname (e.g., relay1.west-coast.enterprise.com) but can also have easier, more mnemonic alias names (e.g., enterprise.com). DNS provides the service to look up the canonical hostname and its IP address using the alias.
Mail Server Aliasing: It’s desirable for email addresses to be simple (e.g., bob@yahoo.com). However, the hostname of the Yahoo mail server is more complicated and much less mnemonic than simply yahoo.com (for example, the canonical hostname might be something like relay1.west-coast.yahoo.com). Thus, DNS uses MX records to map the simple alias (yahoo.com) to the mail server’s more complex canonical hostname (e.g., relay1.west-coast.yahoo.com). This allows the mail server and the Web server to share the same domain name alias.
Load Distribution: DNS distributes traffic among replicated servers. Busy sites like cnn.com are replicated over multiple servers, each with a different IP address. A set of IP addresses is associated with one alias hostname in the DNS database. When clients make a DNS query for a name mapped to a set of addresses, the server responds with the entire set but rotates the ordering of addresses within each reply. Because a client typically sends its HTTP request to the IP address listed first, DNS rotation distributes traffic among replicated servers.

The DNS is specified in RFC 1034 and RFC 1035, and updated in several additional RFCs.

DNS as a Core Internet Function

DNS is a key example of the Internet design philosophy that puts complexity at the edges of the network. While it uses the client-server paradigm like applications such as HTTP, DNS itself is not an application that a user directly interacts with. Instead, it’s a core Internet function—the name-to-address translation—that supports all other user applications.

2.4.2 Overview of How DNS Works

From the perspective of an application (like a browser), DNS is a simple service: you give it a hostname, and it returns an IP address. This translation is handled by a function call (e.g., gethostbyname() on many UNIX-based machines), which sends a query message over the network using UDP datagrams to port 53. After a delay ranging from milliseconds to seconds, DNS in the user’s host receives a DNS reply message providing the desired mapping, which is then passed to the invoking application. From the perspective of the invoking application, DNS is a black box providing simple translation service. In fact, the black box is complex, consisting of a large number of DNS servers distributed around the globe and an application-layer protocol specifying how DNS servers and querying hosts communicate.

A simple design for DNS would have one DNS server containing all mappings, with clients directing all queries to this single server. Although the simplicity of this design is attractive, it is inappropriate for today’s Internet with its vast and growing number of hosts. Problems with a centralized design include:

Single Point of Failure: If the one server crashes, the entire Internet effectively stops working, as no addresses can be resolved.
Traffic Volume: A single server couldn’t handle the immense volume of DNS queries generated by hundreds of millions of hosts.
Distant Centralized Database: A single server cannot be geographically close to all clients. Queries traveling across continents would suffer significant delays.
Maintenance: The database would be massive and require constant, frequent updates for every new host, making maintenance impossible.

Consequently, DNS is distributed by design and represents a wonderful example of how a distributed database can be implemented in the Internet.

A Distributed, Hierarchical Database

To overcome the scaling problems of a single, centralized server, the DNS system is built using a massive number of servers that are organized in a strict hierarchy and spread out all over the world. No single DNS server holds all the hostname-to-IP address mappings for the entire Internet; instead, the information is distributed across the hierarchy. To a first approximation, there are three classes of DNS servers—root DNS servers, top-level domain (TLD) DNS servers, and authoritative DNS servers—organized in a hierarchy.

To understand how these three classes interact, suppose a DNS client wants to determine the IP address for www.amazon.com. The client first contacts one of the root servers, which returns IP addresses for TLD servers for the top-level domain com. The client then contacts one of these TLD servers, which returns the IP address of an authoritative server for amazon.com. Finally, the client contacts one of the authoritative servers for amazon.com, which returns the IP address for www.amazon.com.

This system is generally broken down into three main classes of servers that interact during a lookup:

Root DNS Servers: These sit at the very top of the hierarchy. There are over 1000 physical server instances globally, derived from 13 logical root servers coordinated by organizations like the Internet Assigned Numbers Authority (IANA). The Root Servers do not know the IP address for any specific host; their sole job is to provide the IP addresses of the Top-Level Domain (TLD) servers for the domain suffix being queried (e.g., they point to the servers for .com, .org, or .edu).
Top-Level Domain (TLD) Servers: For each top-level domain—including com, org, net, edu, and gov, and all country top-level domains such as uk, fr, ca, and jp—there is a TLD server or server cluster. Verisign Global Registry Services maintains the TLD servers for com, and Educause maintains the TLD servers for edu. TLD servers provide the IP addresses for authoritative DNS servers.
Authoritative DNS Servers: Every organization with publicly accessible hosts such as web servers and mail servers on the Internet must provide publicly accessible DNS records mapping the names of those hosts to IP addresses. An organization’s authoritative DNS server houses these DNS records. An organization can implement its own authoritative DNS server or pay to have these records stored in an authoritative DNS server of some service provider. Most universities and large companies implement and maintain their own primary and secondary (backup) authoritative DNS servers.

The root, TLD, and authoritative DNS servers belong to the DNS server hierarchy. There is another important type of DNS server called the local DNS server. A local DNS server does not strictly belong to the hierarchy of servers but is nevertheless central to the DNS architecture. Each ISP—such as a residential ISP or an institutional ISP—has a local DNS server (also called a default name server). When a host connects to an ISP, the ISP provides the host with the IP addresses of one or more of its local DNS servers (typically through DHCP, which is discussed in Chapter 4). You can easily determine the IP address of your local DNS server by accessing network status windows in Windows or UNIX. A host’s local DNS server is typically “close to” the host. For an institutional ISP, the local DNS server may be on the same LAN as the host; for a residential ISP, it is typically separated from the host by no more than a few routers. When a host makes a DNS query, the query is sent to the local DNS server, which acts a proxy, forwarding the query into the DNS server hierarchy, as we’ll discuss in more detail below.

Local DNS Servers by Claude Sonnet 4.5

The following one is a clarification given by Claude Sonnet 4.5 on the “Local DNS Servers”

Why they’re “outside” the hierarchy: The DNS hierarchy consists of root servers → TLD servers → authoritative servers, which form a structured tree for domain name resolution. Local DNS servers don’t fit into this tree structure—they don’t have authority over any specific domain zones. Instead, they act as intermediaries or recursive resolvers that query the hierarchy on behalf of clients.

What they actually do: When your device needs to resolve a domain name, it doesn’t contact root servers directly. Instead:

Your device sends the query to your local DNS server (configured via DHCP)

The local DNS server either returns a cached answer or performs recursive queries through the DNS hierarchy

It caches the results to speed up future queries

The “closeness” aspect: The statement about local DNS servers being “close to” hosts is both topologically and geographically true:

Network proximity: Few hops away (same LAN for universities/companies, or a few routers away for home ISPs)

Lower latency: This proximity means faster query responses

Purpose: ISPs place these servers nearby to reduce DNS lookup times and decrease traffic to upstream DNS infrastructure

Modern terminology note: Today, “local DNS server” is often called a recursive resolver or DNS resolver. The term emphasizes its role: it performs recursive queries through the DNS hierarchy rather than having authoritative data itself.

For example, let’s consider the host cse.nyu.edu needs the IP address for gaia.cs.umass.edu (suppose that an authoritative DNS server for it is dns.umass.edu):

Host to Local DNS: The requesting host first sends a DNS query to its Local DNS Server (e.g., dns.nyu.edu).
Local DNS to Root: The local DNS server forwards the query to a Root DNS Server.
Root to Local DNS: The Root Server notes the .edu suffix and replies to the local DNS server with the IP addresses for the TLD servers responsible for .edu.
Local DNS to TLD: The local DNS server resends the query to one of the TLD servers.
TLD to Local DNS: The TLD server notes the umass.edu suffix and replies to the local DNS server with the IP address of the Authoritative DNS Server for that domain (dns.umass.edu).
Local DNS to Authoritative: The local DNS server sends the final query directly to the Authoritative Server.
Authoritative to Local DNS: The Authoritative Server replies to the local DNS server with the desired IP address for gaia.cs.umass.edu.
Local DNS to Host: Finally, the local DNS server sends the mapping back to the original requesting host.

In this full example, eight DNS messages (four queries, four replies) were exchanged just to get a single IP address! DNS caching reduces this query traffic.

The previous example assumed the TLD server knows the authoritative DNS server for the hostname. In general, this is not always true. Instead, the TLD server may know only of an intermediate DNS server, which in turn knows the authoritative DNS server. For example, suppose the University of Massachusetts has a DNS server called dns.umass.edu, and each department has its own DNS server that is authoritative for all hosts in that department. In this case, when the intermediate DNS server dns.umass.edu receives a query for a host with a hostname ending with cs.umass.edu, it returns to dns.nyu.edu the IP address of dns.cs.umass.edu, which is authoritative for all hostnames ending with cs.umass.edu. The local DNS server dns.nyu.edu then sends the query to the authoritative DNS server, which returns the desired mapping to the local DNS server, which in turn returns the mapping to the requesting host. In this case, a total of 10 DNS messages are sent.

The example shown in Figure 2.19 uses both recursive queries and iterative queries. The query sent from cse.nyu.edu to dns.nyu.edu is a recursive query, since it asks dns.nyu.edu to obtain the mapping on its behalf. However, the subsequent three queries are iterative since all replies are directly returned to dns.nyu.edu. In theory, any DNS query can be iterative or recursive. For example, the following figure shows a DNS query chain where all queries are recursive:

In practice, queries typically follow the pattern in Figure 2.19: the query from the requesting host to the local DNS server is recursive, and the remaining queries are iterative.

DNS Caching

DNS Caching is a critically important feature that allows the Domain Name System to minimize delay and significantly reduce the sheer volume of query traffic bouncing across the Internet.

The concept is straightforward:

When any DNS server receives a DNS reply (containing a hostname-to-IP address mapping), it caches the mapping in its local memory.
Suppose a subsequent query for the same hostname arrives at that server before the cache expires. In that case, the DNS server can provide the correct IP address immediately from its cache, even if it is not the authoritative source for that hostname.

This simple mechanism offers huge performance advantages:

Improved Delay: If a host, such as apricot.nyu.edu, queries its local server dns.nyu.edu, for the IP address for the hostname cnn.com, and a few hours later another host, kiwi.nyu.edu, queries the same local server for the same hostname, the local server can instantly provide the IP address from its cache. This completely eliminates the multi-step, multi-server query chain, drastically reducing the response time.
Reduced Traffic: By serving cached answers, the DNS system dramatically reduces the number of DNS messages (the 8 messages we saw in the previous example) that need to be sent across the wider network.
Bypassing Roots: Local DNS servers can also cache the IP addresses of TLD servers (like those for .com or .edu). Because of this, the root DNS servers are bypassed for all but a very small fraction of DNS queries, saving immense traffic at the highest level of the hierarchy.

Since hosts and their IP addresses are not permanent, DNS servers cannot store cached information forever. Mappings are typically discarded after a period of time (often around two days) to ensure the information provided to clients remains relatively current.

2.4.3 DNS Records and Messages (TODO)

2.5 Peer-to-Peer File Distribution (TODO)

2.6 Video Streaming and Content Distribution Networks (CDNs)

Video streaming services like Netflix and YouTube account for an estimated 80% of all Internet traffic. These services are built using specialized application-level protocols and server infrastructures that function similarly to the caches.

2.6.1 Internet Video

Streaming stored video involves delivering prerecorded content (movies, shows, user-generated clips) from servers to users on demand. A video is simply a sequence of images displayed at a constant rate (e.g., 24 or 30 frames per second):

Compression and Bit Rate: Digitally encoded video can be heavily compressed, allowing developers to trade off video quality with bit rate. Higher bit rates (more bits per second) result in better image quality and a superior user experience.
High Bit Rate Requirements: Video is a high bit rate medium.
- Low-quality video: $\approx 100 kbps$ .
- High-definition movies: $\approx 4 Mbps$ (Megabits per second).
- 4K streaming: $> 10 Mbps$ .
Performance Measure: The most critical performance metric for streaming is average end-to-end throughput. To guarantee continuous, uninterrupted playout, the network must deliver data to the client at an average rate that is at least as large as the compressed video’s bit rate.

Compression techniques allow providers to create multiple versions of the same video, each encoded at a different quality level (e.g., $300 kbps$ , $1 Mbps$ , and $3 Mbps$ ). This lets users select the best quality their current available bandwidth can support—a user on high-speed fiber can choose the $3 Mbps$ stream, while a user on a slower $3 G$ connection might be limited to the $300 kbps$ stream.

2.6.2

In the earliest form of streaming (used by platforms like early YouTube), the video is treated as a single, ordinary file stored on an HTTP server:

The client establishes a TCP connection and sends an HTTP GET request for the video’s URL.
The server sends the video file as quickly as network conditions allow within an HTTP response message.
The client collects the bytes in an application buffer.
Once the buffer exceeds a threshold, playback begins. The client is thus displaying video frames while simultaneously receiving and buffering the frames for later parts of the video.

The major shortcoming of this approach is that all clients receive the same single encoded version of the video, regardless of the significant variations in their available bandwidth, leading to poor quality or buffering for many users.

To solve the limitations of traditional HTTP streaming, a new method called Dynamic Adaptive Streaming over HTTP (DASH) was developed. DASH is characterized by two core technical mechanisms:

Multiple Encodings: The video is pre-encoded into several different versions, each with a different bit rate and corresponding quality level.
Client-Side Dynamic Adaptation: The client takes control, dynamically requesting short chunks (segments) of the video (a few seconds long) one at a time via HTTP GET requests.

DASH enables clients to adapt to both their starting bandwidth and fluctuations in bandwidth during a session (critical for mobile users):

Manifest File: The HTTP server stores all video versions, each with a different URL. It also stores a manifest file that lists the URL and bit rate for every version.
Client Initial Request: The client first requests the manifest file to learn about the available versions.
Chunk Selection: The client then begins the session, requesting one chunk at a time by specifying a URL and a byte range in an HTTP GET request.
Rate Determination: While downloading, the client continuously measures the received bandwidth and monitors its current buffer level.
Dynamic Switching:
- If the client has a high buffer level and high measured bandwidth, it selects the next chunk from a high-bitrate version (high quality).
- If the client has a low buffer level and low measured bandwidth, it switches to requesting a chunk from a low-bitrate version (low quality).

DASH allows the client to freely switch quality levels throughout the stream, ensuring continuous playout even as network conditions change.

2.6.3 Content Distribution Networks (CDNs)

A Content Distribution Network (CDN) is a system of servers deployed in multiple, geographically distributed locations across the Internet. A CDN manages these servers, stores copies of content (documents, videos, images, audio), and directs each user request to the server location that can provide the best user experience (lowest delay and highest throughput).

Without a CDN, a video streaming company would have three major problems trying to stream videos from a single, massive data center:

Low End-to-End Throughput: If the client is far from the single data center, the packets cross many links. If even one link (the bottleneck) has a throughput lower than the video’s required bit rate, the user will experience annoying freezing delays. The likelihood of hitting a bottleneck increases with distance.
Wasted Bandwidth and Cost: A popular video would be sent many times over the same links leading out of the data center. This wastes network bandwidth and forces the video company to pay its ISP repeatedly for sending the same bytes over and over.
Single Point of Failure: If the single data center or its connection to the Internet goes down, the company cannot distribute any video streams.

CDNs are implemented either as private CDNs (owned by the content provider, e.g., Google/YouTube) or as third-party CDNs (serving multiple providers, e.g., Akamai). They adopt one of two main philosophies for placing their server clusters :

Enter Deep (Pioneered by Akamai)
- Philosophy: Deploy many server clusters deep inside the access networks of Internet Service Providers (ISPs) all over the world (often thousands of locations).
- Goal: To get the content as close as possible to the end users, significantly decreasing the number of links and routers between the user and the CDN server, thereby maximizing user-perceived delay and throughput.
- Trade-off: This highly distributed design creates a significant challenge for maintenance and management.
Bring Home (Used by Limelight)
- Philosophy: Build fewer, very large server clusters (perhaps tens of sites) in central locations, typically at Internet Exchange Points (IXPs).
- Goal: To simplify the infrastructure and reduce maintenance and management overhead.
- Trade-off: The content is further from the user, potentially resulting in higher delay and lower throughput compared to the “Enter Deep” model.

CDNs typically do not store every single video in every single cluster, since some videos are rarely viewed or are only popular in some countries. Instead, they often use a pull strategy:

If a cluster receives a request for a video it doesn’t have (a cache miss), the cluster pulls (retrieves) the video from a central repository or another cluster.
It then stores a copy locally while simultaneously streaming the video to the client.
Similar to Web caching, when a cluster’s storage is full, it removes the least frequently requested content to make room for new, popular content.

CDN Operation (TODO: still to be read)

The core function of a CDN is to intercept a client’s request for content and redirect it to a server cluster that is best suited to serve the content at that moment.

CDNs heavily rely on the Domain Name System (DNS) to perform request interception. Let’s trace the process using the example of a content provider, NetCinema, using the third-party CDN, KingCDN, to distribute the video http://video.netcinema.com/6Y7B23V:

Client Request: The user clicks the video link, and the client host sends a DNS query for the hostname: video.netcinema.com.
LDNS Relays: The user’s Local DNS Server (LDNS) relays this query to the Authoritative DNS Server for NetCinema.
CDN Handover (NetCinema’s DNS): The NetCinema authoritative server sees the video prefix, recognizes the request is for CDN-served content, and instead of returning an IP address, it returns an alias hostname belonging to the CDN’s domain (e.g., a1105.kingcdn.com). This hands the request over to the CDN’s infrastructure.
CDN Cluster Selection (KingCDN’s DNS): The user’s LDNS sends a second DNS query for the CDN alias hostname (a1105.kingcdn.com). This query enters KingCDN’s private DNS system, which is where the CDN determines the optimal server cluster for the client. The KingCDN DNS system then returns the IP address of that specific optimal server to the LDNS.
IP Forwarding: The LDNS forwards the content server’s IP address to the user’s host.
Direct Connection: The client host establishes a direct TCP connection with the determined CDN server and issues an HTTP GET request for the video. (If DASH is used, the server sends a manifest file, and the client requests chunks dynamically.)

Cluster Selection Strategies (TODO: still to be read)

The cluster selection strategy is the proprietary mechanism used by the CDN (in Step 4 above) to dynamically map the client to the best server cluster. This decision is based on the IP address of the client’s Local DNS Server (LDNS), which the CDN learns through the DNS lookup process.

Geographically Closest Strategy (Simple)
- Mechanism: The CDN uses commercial geo-location databases to map the LDNS IP address to a physical geographic location. It then selects the CDN cluster that is the fewest kilometers away from the LDNS (“as the bird flies”).
- Pros: It is a simple strategy that works reasonably well for a large number of clients.
- Cons/Challenges:
  - Network Path vs. Distance: The geographically closest cluster may not be the closest in terms of actual network distance, path length, or number of hops.
  - Remote LDNS: Some end-users are configured to use an LDNS that is far from their actual location, causing the CDN to incorrectly map the client to a distant server.
  - Ignoring Congestion: This static strategy fails to account for real-time variations in delay and available bandwidth due to Internet congestion.
Real-Time Measurement Strategies (Dynamic)
- Mechanism: To account for current traffic, CDNs can perform periodic real-time measurements of delay and loss performance between their clusters and the LDNSs worldwide. This is often done by having CDN clusters send probes (like ping messages) to the LDNSs.
- Pros: Can determine the best cluster based on current network conditions and congestion, offering a more optimized user experience.
- Cons/Challenges: Many LDNSs are configured not to respond to such probe messages, limiting the effectiveness and coverage of this approach.

2.7 Socket Programming: Creating Network Applications

Network applications are fundamentally built as a pair of communicating programs: a client program and a server program, which run on different end systems. When executed, they create communicating processes that send and receive data by reading from and writing to sockets. The developer’s main task is writing the code for these two programs.

Network applications fall into two categories based on their underlying protocols:

Open Applications (Standardized):
- Protocol: Operation is strictly defined in an open standard document (like an RFC).
- Interoperability: Client and server programs must conform to the RFC’s rules. This allows programs written by independent developers (e.g., a Google Chrome browser and an Apache Web server) to successfully communicate.
Proprietary Applications (Custom):
- Protocol: The application-layer protocol is not openly published.
- Development: A single developer or team creates both the client and server programs, giving them complete control over the code and protocol.
- Interoperability: Other developers cannot easily create programs that interoperate with this application because the rules are secret.

A developer creating a client-server application must make two critical decisions regarding the transport layer:

TCP vs. UDP:
- TCP (Transmission Control Protocol): Used for applications needing a connection-oriented service and a reliable byte-stream channel (guaranteed delivery, ordered data).
- UDP (User Datagram Protocol): Used for applications needing a connectionless service, sending independent data packets without any guarantees of delivery, order, or reliability.
Port Numbers:
- Standardized Protocols: Applications implementing open protocols (like HTTP or SMTP) must use the well-known port number associated with that protocol (e.g., port 80 for HTTP).
- Proprietary Applications: Developers must select a port number that avoids these well-known numbers.

We introduce UDP and TCP socket programming by way of a simple UDP application and a simple TCP application.

2.7.1 Socket Programming with UDP

In this subsection, we detail how to construct basic client-server programs utilizing the User Datagram Protocol (UDP). Recall that UDP is connectionless and provides no reliability guarantees. Processes communicate by exchanging packets, and the application’s socket acts as a doorway between the application layer and the transport layer. Furthermore, the application developer has control of everything on the application-layer side of the socket; however, it has little control on the transport-layer side.

For a sending process to transmit a packet through its UDP socket, it must explicitly specify the packet’s destination address. This address is essential for network routing and process delivery. The complete destination address comprises two necessary identifiers:

Destination Host’s IP Address: Used by routers across the Internet to direct the packet to the correct physical machine (host).
Destination Socket’s Port Number: Used by the destination host’s operating system to demultiplex the packet and deliver it to the correct socket process waiting on that specific port.
Crucially, the packet is also automatically stamped with a source address (source IP address and source port number) by the underlying operating system kernel, allowing the receiving process to send a reply.

We will use a simple client-server application where the client sends a line of lowercase text, and the server converts it to uppercase and sends it back. We will use Python, choosing the arbitrary port 12000 for the server.

This figure highlights the main socket-related activity of the client and server that communicate over the UDP transport service:

UDP Client Code: `UDPClient.py`

This code initializes the client’s socket and sends the user’s input to the server address.

from socket import socket, AF_INET, SOCK_DGRAM
 
 
serverName = 'localhost'
serverPort = 12000
 
clientSocket = socket(AF_INET, SOCK_DGRAM)
message = input('Input lowercase sentence:')
clientSocket.sendto(message.encode(),(serverName, serverPort))
modifiedMessage, serverAddress = clientSocket.recvfrom(2048)
print(modifiedMessage.decode())
clientSocket.close()

Client Code Analysis:

clientSocket = socket(AF_INET, SOCK_DGRAM): This line creates the client’s socket. AF_INET specifies the IPv4 address family, and SOCK_DGRAM signifies that it is a UDP socket. The operating system automatically assigns a temporary client port number to this socket.
clientSocket.sendto(...): This method is unique to UDP. It takes the message (which is converted from a string to bytes using .encode()) and attaches the destination tuple (serverName, serverPort) before sending the resulting datagram.
clientSocket.recvfrom(2048): This method blocks until a packet arrives. It retrieves the received data (modifiedMessage) and, importantly, captures the sender’s full return address (serverAddress), which contains both the server’s IP and port number.

UDP Server Code: `UDPServer.py`

The server must initialize first, explicitly bind to a designated port, and then loop indefinitely to handle incoming client requests.

from socket import socket, AF_INET, SOCK_DGRAM
 
 
serverPort = 12000
 
serverSocket = socket(AF_INET, SOCK_DGRAM)
serverSocket.bind(('', serverPort))
print("The server is ready to receive")
while True:
    message, clientAddress = serverSocket.recvfrom(2048)
    modifiedMessage = message.decode().upper()
    serverSocket.sendto(modifiedMessage.encode(), clientAddress)

Server Code Analysis:

serverSocket.bind(('', serverPort)): This critical line explicitly assigns the well-known port number 12000 to the server’s socket. The empty string '' (or 0.0.0.0) instructs the host to listen on any network interface. This ensures that any packet sent to port 12000 at the server’s IP address is correctly directed to this server process.
while True:: The server enters an infinite loop, remaining available to service requests from any client.
message, clientAddress = serverSocket.recvfrom(2048): Upon receiving a datagram, the server extracts the message and the full client return address (clientAddress). This address is vital, as it provides the necessary destination for the server’s reply.
serverSocket.sendto(modifiedMessage.encode(), clientAddress): The server sends the processed data back. Because the reply is sent to the address captured in clientAddress, the operating system can correctly route the datagram back to the client’s temporary port, completing the transaction.

2.7.2 Socket Programming with TCP 🔗

Unlike UDP, TCP is a connection-oriented protocol. This means that the client and server must first handshake and establish a TCP connection before any data can be exchanged. One end of this connection is attached to the client socket, and the other is attached to a server socket. Once the TCP connection is established, either side can send data by simply dropping the data into the connection via its socket. This is a key difference from UDP, where the sender must explicitly attach a destination address to the packet.

The client has the job of initiating contact with the server, which must be running first:

Server Preparation: The server program must be running as a process and must have a special socket—a welcoming socket—ready to accept the initial contact from any client process.
Client Contact: The client initiates the TCP connection by creating its own TCP socket and specifying the address (IP address and port number) of the server’s welcoming socket.
The Handshake: The client invokes the connect() method, initiating the three-way handshake within the transport layer. This handshake is completely invisible to the client and server application programs.
New Socket Creation: During the handshake, the client process “knocks on the welcoming door.” When the server “hears” this, it invokes the accept() method on the welcoming socket, creating a new socket (e.g., connectionSocket) that is dedicated to this particular client.

From the application’s perspective, the client’s socket and the server’s connection socket are now directly connected by a pipe. The client process can send arbitrary bytes into its socket, and TCP guarantees that the server process will receive each byte in the order sent, providing a reliable service.

This figure highlights the main socket-related activity of the client and server that communicate over the TCP transport service:

We use the same simple client-server application: the client sends a line of data, and the server capitalizes and returns it.

TCP Client Code: `TCPClient.py`

The client’s code reflects the need to establish the connection before sending data.

from socket import socket, AF_INET, SOCK_STREAM
 
 
serverName = 'localhost'
serverPort = 12000
 
clientSocket = socket(AF_INET, SOCK_STREAM)
clientSocket.connect((serverName, serverPort))
sentence = input('Input lowercase sentence:')
clientSocket.send(sentence.encode())
modifiedSentence = clientSocket.recv(1024)
print('From Server: ', modifiedSentence.decode())
clientSocket.close()

Client Code Analysis:

clientSocket = socket(AF_INET, SOCK_STREAM): The second parameter, SOCK_STREAM, indicates that this is a TCP socket. The operating system assigns the client’s port number.
clientSocket.connect((serverName, serverPort)): This line initiates the TCP connection. After it executes, the three-way handshake is complete, and the connection is established.
clientSocket.send(sentence.encode()): The client program simply drops the bytes into the TCP connection via its socket; it does not explicitly attach a destination address.
clientSocket.close(): This line closes the socket and, consequently, closes the TCP connection, causing TCP in the client to send the necessary termination message to the server’s TCP.

TCP Server Code: `TCPServer.py`

The server must bind the welcoming socket and listen for connection requests indefinitely.

from socket import socket, AF_INET, SOCK_STREAM
 
 
serverPort = 12000
 
serverSocket = socket(AF_INET, SOCK_STREAM)
serverSocket.bind(('', serverPort))
serverSocket.listen(1)
print('The server is ready to receive')
while True:
    connectionSocket, addr = serverSocket.accept()
    sentence = connectionSocket.recv(1024).decode()
    capitalizedSentence = sentence.upper()
    connectionSocket.send(capitalizedSentence.encode())
    connectionSocket.close()

Server Code Analysis:

serverSocket.bind(('', serverPort)): This assigns the server port number (12000) to the welcoming socket.
serverSocket.listen(1): This command has the server listen for TCP connection requests from clients.
connectionSocket, addr = serverSocket.accept(): When a client “knocks,” this method is invoked on the serverSocket. It then creates a new socket (connectionSocket) dedicated to that client. The client and server complete the handshake, and the dedicated TCP connection is established.
connectionSocket.recv(...) and connectionSocket.send(...): The server uses this new, dedicated socket to receive the client’s sentence and send the modified sentence back.
connectionSocket.close(): The server closes the dedicated connection socket. The serverSocket remains open in the while True loop, allowing it to immediately accept another client connection.

Quartz 4

Explorer

02. Application Layer

2.1 Principles of Network Applications

2.1.1 Network Application Architectures

1. Client-Server Architecture

2. Peer-to-Peer (P2P) Architecture

2.1.2 Processes Communicating

Client and Server Processes

The interface between the Process and the Computer Network

Addressing Processes

2.1.3 Transport Services Available to Applications

1. Reliable Data Transfer

2. Throughput

3. Timing

4. Security

2.1.4 Transport Services Provided by the Internet

TCP Services (Transmission Control Protocol)

UDP Services (User Datagram Protocol)

Services Not Provided by Internet Transport Protcols

2.1.5 Application-Layer Protocols

Protocol vs. Application

2.1.6 Network Applications Covered in This Book

2.2 The Web and HTTP

2.2.1 Overview of HTTP

The HTTP Interaction

2.2.2 Non-Persistent and Persistent Connections

HTTP with Non-Persistent Connections

HTTP with Persistent Connections

2.2.3 HTTP Request Message Format

HTTP Request Message

HTTP Response Message

2.2.4 User-Server Interaction: Cookie

How Cookies Work (The First Visit)

2.2.5 Web Caching

The Power of Web Caching: A Quantitative Example

No Cache (The Problem)

With Cache (The Cost-Effective Fix)

The Conditional GET

2.2.6 HTTP/2

HTTP/2 Framing

Response Message Prioritization and Server Pushing

The Future: HTTP/3

2.4 DNS—The Internet’s Directory Service

2.4.1 Services Provided by DNS

DNS as a Core Internet Function

2.4.2 Overview of How DNS Works

A Distributed, Hierarchical Database

DNS Caching

2.4.3 DNS Records and Messages (TODO)

2.5 Peer-to-Peer File Distribution (TODO)

2.6 Video Streaming and Content Distribution Networks (CDNs)

2.6.1 Internet Video

2.6.2

2.6.3 Content Distribution Networks (CDNs)

CDN Operation (TODO: still to be read)

Cluster Selection Strategies (TODO: still to be read)

2.7 Socket Programming: Creating Network Applications

2.7.1 Socket Programming with UDP

UDP Client Code: UDPClient.py

UDP Server Code: UDPServer.py

2.7.2 Socket Programming with TCP 🔗

TCP Client Code: TCPClient.py

TCP Server Code: TCPServer.py

Graph View

Table of Contents

UDP Client Code: `UDPClient.py`

UDP Server Code: `UDPServer.py`

TCP Client Code: `TCPClient.py`

TCP Server Code: `TCPServer.py`