The application layer sits at the top of the network protocol stack, hosting actual user-facing applications. Unlike lower layers that merely provide transport services, this layer performs real work for users. This chapter examines several critical network applications and their supporting protocols, beginning with the Domain Name System (DNS), which maps Internet names to IP addresses. The discussion then covers three major applications: electronic mail, the World Wide Web, and multimedia services including video streaming, concluding with an analysis of content distribution through peer-to-peer networks and content delivery networks.

7.3 The World Wide Web

The World Wide Web, commonly known simply as the Web, is an architectural framework for accessing linked content distributed across millions of machines throughout the Internet. In just ten years, it evolved from a tool for coordinating high-energy physics experiments in Switzerland to the application that millions of people identify as “The Internet” itself. Its enormous popularity stems from its ease of use for beginners and its rich graphical interface providing access to an enormous wealth of information on virtually every conceivable subject, from aardvarks to Zulus.

The Web began in 1989 at CERN, the European Center for Nuclear Research. The initial idea was to help large teams, often with members in a dozen or more countries and time zones, collaborate using a constantly changing collection of reports, blueprints, drawings, photos, and other documents produced by particle physics experiments. CERN physicist Tim Berners-Lee proposed a web of linked documents, and the first text-based prototype was operational eighteen months later. A public demonstration at the Hypertext ‘91 conference caught the attention of other researchers, which led Marc Andreessen at the University of Illinois to develop the first graphical browser called Mosaic, released in February 1993.

Mosaic’s popularity was so immense that a year later Andreessen left to form Netscape Communications Corp., a company whose goal was developing Web software. For the next three years, Netscape Navigator and Microsoft’s Internet Explorer engaged in a “browser war,” each trying to capture a larger share of the new market by frantically adding more features, and consequently more bugs, than the other. Through the 1990s and 2000s, Web sites and Web pages grew exponentially until there were millions of sites and billions of pages. A small number of these sites became tremendously popular, and those sites and the companies behind them largely define the Web as people experience it today. Examples include a bookstore (Amazon, started in 1994), a flea market (eBay, 1995), search (Google, 1998), and social networking (Facebook, 2004). The period through 2000, when many Web companies became worth hundreds of millions of dollars overnight only to go bust practically the next day when they turned out to be hype, has become known as the dot com era. New ideas continue to strike it rich on the Web, many originating from students. Mark Zuckerberg was a Harvard student when he started Facebook, and Sergey Brin and Larry Page were students at Stanford when they started Google.

In 1994, CERN and M.I.T. signed an agreement establishing the W3C (World Wide Web Consortium), an organization devoted to further developing the Web, standardizing protocols, and encouraging interoperability between sites. Berners-Lee became the director. Since then, several hundred universities and companies have joined the consortium. Although numerous books about the Web now exist, the best place to obtain up-to-date information about the Web is naturally on the Web itself, with the consortium’s home page at www.w3.org providing links to all of the consortium’s numerous documents and activities.

7.3.1 Architectural Overview

From the users’ perspective, the Web comprises a vast, worldwide collection of content in the form of Web pages. Each page typically contains links to hundreds of other objects, which may be hosted on any server on the Internet anywhere in the world. These objects may be text and images, but nowadays also include a wide variety of objects including advertisements and tracking scripts. A page may also link to other Web pages, and users can follow a link by clicking on it, which takes them to the page pointed to. This process can be repeated indefinitely. The idea of having one page point to another, now called hypertext, was invented by visionary M.I.T. professor of electrical engineering Vannevar Bush in 1945, long before the Internet was invented. In fact, this was before commercial computers existed, although several universities had produced crude prototypes that filled large rooms and had millions of times less computing power than a smart watch but consumed more electrical power than a small factory.

Pages are generally viewed with a program called a browser. Brave, Chrome, Edge, Firefox, Opera, and Safari are examples of popular browsers. The browser fetches the requested page, interprets the content, and displays the page, properly formatted, on the screen. The content itself may be a mix of text, images, and formatting commands in the manner of a traditional document, or other forms of content such as video or programs that produce a graphical interface for users.

Here’s an example of Web page:

A Web page contains many objects. The index page, which the browser loads, typically contains instructions for the browser concerning the locations of other objects to assemble, as well as how and where to render those objects on the page. A piece of text, icon, graphic image, photograph, or other page element that can be associated with another page is called a hyperlink. To follow a link, a desktop or notebook computer user places the mouse cursor on the linked portion of the page area, which causes the cursor to change shape, and clicks. On a smartphone or tablet, the user taps the link. Following a link is simply a way of telling the browser to fetch another page. In the early days of the Web, links were highlighted with underlining and colored text so they would stand out. Now, page creators can use style sheets to control the appearance of many aspects of the page including hyperlinks, so links can effectively appear however the website designer wishes. The appearance of a link can even be dynamic, for example changing when the mouse passes over it. It is up to the page creators to make links visually distinct to provide a good user experience. Readers might find a story of interest and click on the indicated area, at which point the browser fetches the new page and displays it. Dozens of other pages are linked off the first page besides this example. Every other page can consist of content on the same machine as the first page or on machines halfway around the globe. The user cannot tell, as the browser fetches whatever objects the user indicates through a series of clicks. Thus, moving between machines while viewing content is seamless.

The browser displays a Web page on the client machine. Each page is fetched by sending a request to one or more servers, which respond with the contents of the page. The request-response protocol for fetching pages is a simple text-based protocol that runs over TCP, just as with SMTP. It is called HTTP (HyperText Transfer Protocol). The secure version of this protocol, which is now the predominant mode of retrieving content on the Web today, is called HTTPS (Secure HyperText Transfer Protocol). The content may simply be a document read off a disk or the result of a database query and program execution. The page is a static page if it is a document that is the same every time it is displayed. In contrast, if it was generated on demand by a program or contains a program, it is a dynamic page.

A dynamic page may present itself differently each time it is displayed. For example, the front page for an electronic store may be different for each visitor. If a bookstore customer has bought mystery novels in the past, upon visiting the store’s main page, the customer is likely to see new thrillers prominently displayed, whereas a more culinary-minded customer might be greeted with new cookbooks. How the Web site keeps track of who likes what involves cookies, even for culinarily challenged visitors.

The browser contacts a number of servers to load the Web page. The content on the index page might be loaded directly from files hosted at fcc.gov. Auxiliary content, such as an embedded video, might be hosted at a separate server, still at fcc.gov, but perhaps on infrastructure dedicated to hosting the content. The index page may also contain references to other objects that the user may not even see, such as tracking scripts or advertisements hosted on third-party servers. The browser fetches all of these objects, scripts, and so forth and assembles them into a single page view for the user.

Display entails a range of processing depending on the content type. Besides rendering text and graphics, it may involve playing a video or running a script that presents its own user interface as part of the page. In this case, the fcc.gov server supplies the main page, the fonts.gstatic.com server supplies additional objects like fonts, and the google-analytics.com server supplies nothing visible to the user but tracks visitors to the site.

The Client Side

A browser is a program that displays a Web page and captures a user’s request to follow other content on the page. When an item is selected, the browser follows the hyperlink and retrieves the indicated object. When the Web was first created, three questions needed answers before a selected page could be displayed:

What is the page called?
here is it located?
How can it be accessed?

If every page had a unique name, identifying pages would be unambiguous, but the problem would remain unsolved. Just as a Social Security number uniquely identifies a person but provides no way to find their address or determine what language to use when writing to them, the Web faces similar challenges.

The solution identifies pages in a way that solves all three problems at once. Each page is assigned a URL (Uniform Resource Locator) that serves as the page’s worldwide name. URLs have three parts:

The protocol (also known as the scheme)
The DNS name of the machine hosting the page
The path uniquely indicating the specific page, which may be a file to read or program to run. In the general case, the path has a hierarchical name modeling a file directory structure, though the interpretation is up to the server and may or may not reflect the actual directory structure.

As an example, the URL https://fcc.gov/ consists of three parts:

the protocol (https)
the DNS name (fcc.gov)
the path (/), which the Web server often treats as some default index object.

When a user selects a hyperlink, the browser executes a series of steps to fetch the pointed-to page:

The browser determines the URL.
The browser asks DNS for the IP address of the server fcc.gov.
DNS replies with IP address 23.1.55.196.
The broswer makes a TCP connection to that address on port 443 (the default for HTTPS; HTTP’s less-used default is port 80).
The browser sends an HTTPS request asking for the page /, which the web server typically maps to a default index page (such as index.html, index.php, or similar), as configured on fcc.gov.
The server sends the page as an HTTPS response, for example, by sending the file /index.html, if that is determined to be the default index object.
If the page includes URLs that are needed for display, the browser fetches the other URLs using the same process.
The browser displays the page /index.html.
The TCP connections are released if there are no other requests to the same servers for a short period.

Many browsers display the current step in a status line at the bottom of the screen, allowing users to identify whether poor performance is due to DNS not responding, a server not responding, or slow page transmission. A more detailed way to explore and understand Web page performance is through a waterfall diagram, which shows all objects the browser loads, timing dependencies for loading each request, and operations associated with each page load such as DNS lookups, TCP connections, and content downloading. Here’s an example:

These diagrams reveal browser behavior including the number of parallel connections to any given server, whether connections are being reused, and relative time for DNS lookups versus object downloads, as well as other potential performance bottlenecks.

The URL design is open-ended, in the sense that it is straightforward for browsers to use multiple protocols to retrieve different resource types. Slightly simplified forms of the common ones are listed here:

The http protocol is the Web’s native language spoken by Web servers, with HTTP standing for HyperText Transfer Protocol.
The ftp protocol accesses files by FTP, the Internet’s file transfer protocol, which predates the Web by more than four decades. The Web makes obtaining files from FTP servers easy by providing a simple, clickable interface instead of the older command-line interface, contributing to the Web’s spectacular growth. The file protocol allows accessing local files as Web pages without requiring a server, though it works only for local files, not remote ones.
The mailto protocol allows users to send email from a Web browser, typically starting the user’s mail agent with the address field already filled in.
The rtsp and sip protocols establish streaming media sessions and audio and video calls.
Finally, the about protocol provides information about the browser, such as about:plugins showing MIME types handled by plug-ins, about:telemetry showing performance and user activity information the browser gathers, about:preferences showing user preferences, and about:config showing browser configuration aspects including whether DNS-over-HTTPS lookups are being performed.

URLs are designed not only for Web navigation but to run older protocols like FTP and email as well as newer protocols for audio and video, and to provide convenient access to local files and browser information. This approach makes specialized user interface programs for those other services unnecessary and integrates nearly all Internet access into a single program: the Web browser.

The Server Side

When a user types a URL or clicks hypertext, the browser parses the URL, interprets the part between https:// and the next slash as a DNS name to look up, establishes a TCP connection to port 443 on that server using the IP address, then sends a command containing the rest of the URL as the path to the page. The server returns the page for the browser to display.

To a first approximation, a simple Web server performs these steps in its main loop:

Accept a TCP connection from a client browser
Get the path to the page (the requested file name)
Get the file from disk
Send the file contents to the client
Release the TCP connection.

Modern Web servers have more features, but this is what a Web server does for simple file-based content. For dynamic content, the third step may be replaced by executing a program determined from the path that generates and returns the contents.

However, Web servers use a different design to serve hundreds or thousands of requests per second. One problem with the simple design is that accessing files is often the bottleneck, as disk reads are very slow compared to program execution and the same files may be read repeatedly from disk. Another problem is that only one request is processed at a time, so if a file is large, other requests are blocked during transfer.

One obvious improvement used by all Web servers is maintaining a cache in memory of the n most recently read files or a certain number of gigabytes of content. Before accessing disk, the server checks the cache. If the file is there, it can be served directly from memory, eliminating disk access. Although effective caching requires large amounts of main memory and extra processing time to check the cache and manage its contents, the time savings nearly always justify the overhead and expense.

To tackle serving more than a single request at a time, one strategy makes the server multithreaded. In one design, the server consists of a front-end module accepting all incoming requests and k processing modules, as shown in this figure:

The k + 1 threads all belong to the same process, so processing modules have access to the cache within the process’s address space. When a request comes in, the front end accepts it, builds a short record describing it, and hands the record to a processing module.

The processing module first checks the cache for the requested object. If present, it updates the record to include a pointer to the file. If not, the processing module starts a disk operation to read it into the cache, possibly discarding other cached files to make room (according to a cache replacement policy (e.g., LRU)). When the file arrives from disk, it is put in the cache and sent back to the client. The advantage is that while one or more processing modules are blocked waiting for disk or network operations, other modules can actively work on other requests. With k processing modules, throughput can be as much as k times higher than with a single-threaded server, though when the disk or network is the limiting factor, multiple disks or a faster network are necessary for real improvement over the single-threaded model.

Essentially all modern Web architectures are now designed with a split between the front end and a back end. The front-end Web server is often called a reverse proxy because it retrieves content from other, typically back-end, servers and serves those objects to the client. The proxy is “reverse” because it acts on behalf of servers rather than clients. When loading a Web page, a client is often first directed using DNS to a reverse proxy (front-end server), which begins returning static objects to the client’s Web browser so it can begin loading page contents quickly. While those typically static objects are loading, the back end can perform complex operations such as performing Web searches, doing database lookups, or otherwise generating dynamic content, which it serves back to the client via the reverse proxy as results and content become available.

A reverse proxy serves as a front-end server that can cache static content, distribute incoming requests across multiple processing threads, and allow dynamic content to be generated asynchronously while static content is delivered immediately.

Personal Note: Although a reverse proxy can act as a load balancer by distributing requests across different backend servers, in this case we are referring to concurrency within a single server, where multiple threads (or processing modules) handle requests independently. And I am a little bit confused because ChatGPT says that this is an internal server design choice rather than a proxy function. So, the book is wrong? I strongly doubt that the book is wrong.

Here’s an overview of the differences between Forward Proxy and Reverse Proxy by ByteByteGo. You’re absolutely right. Let me provide a more condensed summary that truly reduces the length while preserving all technical detail:

7.3.2 Static Web Objects

The basis of the Web is transferring Web pages from server to client. In the simplest form, Web objects are static. Even on dynamic pages, substantial content like logos, style sheets, headers, and footers remains static. Static objects are files on servers that appear identically each time they’re fetched and viewed. They’re highly cacheable, often for extended periods, and frequently placed on caches near users. Being static doesn’t mean they’re inert at the browser—videos, for example, are static objects.

HTML (HyperText Markup Language), the Web’s lingua franca, allows users to create pages with text, graphics, video, and links to other pages. HTML is a markup language that describes document formatting. The term “markup” derives from copyeditors marking up documents to instruct printers which fonts to use. Markup languages contain explicit formatting commands: <b> starts boldface mode, </b> ends it, and <h1> begins a level 1 heading. LaTeX and TeX are other well-known markup languages. Microsoft Word is not a markup language because its formatting commands aren’t embedded in the text.

The key advantage of markup languages is separating content from presentation. Modern web pages use style sheets written in CSS (Cascading Style Sheets) to define typefaces, colors, sizes, padding, and other attributes of text, lists, tables, headings, ads, and page elements. Browsers simply understand markup commands and style sheets and apply them to content. Embedding and standardizing all markup commands within each HTML file enables any browser to read and reformat any page. This is crucial because a page created in a 3840 × 2160 window with 24-bit color must display properly in a 640 × 320 mobile phone window. Linear scaling would render letters unreadably small. Documents can be written with plain text editors, word processors, or HTML editors like Adobe Dreamweaver.

7.3.3 Dynamic Web Pages and Web Applications

The static page model treats pages as linked multimedia documents, which worked well as vast information went online in the Web’s early days. Today’s excitement centers on using the Web for applications and services like e-commerce, library catalogs, maps, email, and document collaboration. These applications resemble conventional software like mail readers and word processors but run inside the browser with user data stored on servers in Internet data centers. They use Web protocols to access information and the browser to display interfaces. Users don’t need to install separate programs, and data can be accessed from different computers and backed up by service operators. This successful model rivals traditional application software, aided by large providers offering free applications. It represents a prevalent form of cloud computing, moving computation from individual desktops into shared server clusters.

To act as applications, Web pages can no longer be static. A library catalog page must reflect which books are available or checked out. A stock market page should let users interact to see prices over different periods and compute profits and losses. Dynamic content can be generated by programs running on the server, in the browser, or both.

The general situation is as shown in the following figure:

For example, consider a map service that lets the user enter a street address and presents a corresponding map of the location. Given a request for a location, the Web server must use a program to create a page that shows the map for the location from a database of streets and other geographic information. This action is shown as steps 1 through 3. The request (step 1) causes a program to run on the server. The program consults a database to generate the appropriate page (step 2) and returns it to the browser (step 3). There is more to dynamic content, however. The page that is returned may itself contain programs that run in the browser. In our map example, the program would let the user find routes and explore nearby areas at different levels of detail. It would update the page, zooming in or out as directed by the user (step 4). To handle some interactions, the program may need more data from the server. In this case, the program will send a request to the server (step 5) that will retrieve more information from the database (step 6) and return a response (step 7). The program will then continue updating the page (step 4). The requests and responses happen in the background; the user may not even be aware of them because the page URL and title typically do not change. By including client-side programs, the page can present a more responsive interface than with server-side programs alone.

Server-Side Dynamic Web Page Generation

Let us look briefly at the case of server-side content generation. When users click links in forms to buy something, a request is sent to the server at the URL specified with the form along with the form contents. These data go to a program or script for processing. The URL identifies the program to run; the data provide program input. The returned page depends on processing results, not fixed like a static page. Successful orders might return expected shipping dates; unsuccessful ones might indicate widgets are out of stock or credit cards are invalid.

How the server runs a program instead of retrieving a file depends on Web server design and isn’t specified by Web protocols, since the interface can be proprietary and browsers don’t need the details. Browsers simply make requests and fetch pages. Nonetheless, standard APIs have been developed for Web servers to invoke programs, making it easier for developers to extend servers with Web applications.

The first API, CGI (Common Gateway Interface), defined in RFC 3875, has handled dynamic page requests since the Web’s beginning. CGI provides an interface allowing Web servers to communicate with back-end programs and scripts that accept input from forms and generate HTML pages in response. Programs can be written in any convenient language, usually scripting languages like Python, Ruby, or Perl for ease of development. By convention, CGI programs reside in a cgi-bin directory visible in the URL. The server maps requests to this directory to a program name and executes it as a separate process, providing request data as program input. Program output gives the Web page returned to the browser.

The second API takes a different approach, embedding small scripts inside HTML pages that the server itself executes to generate the page. PHP (PHP: Hypertext Preprocessor) is a popular language for these scripts. Servers must understand PHP, just as browsers must understand CSS for style sheets. Servers usually identify PHP pages by the php file extension rather than html or htm. PHP is simpler than CGI and widely used. Although easy to use, PHP is actually a powerful programming language for interfacing the Web and server databases. It has variables, strings, arrays, and most control structures found in C, but more powerful I/O than just printf. PHP is open source, freely available, and designed specifically to work well with Apache, the world’s most widely used open source Web server.

Client-Side Dynamic Web Page Generation

PHP and CGI scripts handle input and database interactions on the server, accepting form information, looking up database information, and generating HTML pages with results. They cannot respond to mouse movements or interact directly with users. For this, scripts embedded in HTML pages must execute on the client machine rather than the server. Starting with HTML 4.0, such scripts were permitted using the <script> tag. The current HTML standard, HTML5, includes new syntactic features for multimedia and graphical content, including <video>, <audio>, and <canvas> tags. The canvas element facilitates dynamic rendering of two-dimensional shapes and bitmap images. Interestingly, it has privacy considerations because HTML canvas properties are often unique on different devices. This uniqueness allows website operators to track users even if they delete tracking cookies and block tracking scripts.

JavaScript is the most popular client-side scripting language. Despite similar names, JavaScript has almost nothing to do with Java. Like other scripting languages, it’s very high-level—a single line can pop up a dialog box, wait for text input, and store the resulting string in a variable. Such high-level features make JavaScript ideal for interactive Web pages. However, its rapid evolution makes writing JavaScript programs that work on all platforms difficult, though it may eventually stabilize.

While PHP and JavaScript both embed code in HTML files, they’re processed completely differently:

With PHP, after users click submit, the browser collects information into a long string and sends it to the server as a request for a PHP page. The server loads the PHP file, executes the embedded script to produce a new HTML page, and sends it back to the browser for display. The browser cannot even be certain it was produced by a program. This processing is shown as steps 1 to 4 in the following figure.
With JavaScript, when submit is clicked, the browser interprets a JavaScript function contained on the page. All work is done locally inside the browser with no server contact. This processing is shown as steps 1 and 2 in the following figure. Results display virtually instantaneously, whereas PHP can have several-second delays before resulting HTML arrives at the client.

This difference doesn’t mean JavaScript is better than PHP. Their uses are completely different. PHP is used when server database interaction is needed. JavaScript and other client-side languages are used when interaction is with the user at the client computer. They can certainly be combined for complementary functionality.

7.3.4 HTTP and HTTPS

Before examining the protocol transporting Web information between servers and clients, it’s worth noting distinctions between HTTP (HyperText Transfer Protocol, specified in RFC 2616) and HTTPS (Secure HyperText Transfer Protocol). Both protocols retrieve objects essentially the same way, and the HTTP standard for retrieving Web objects evolves independently from its secure counterpart, which uses the HTTP protocol over a secure transport protocol called TLS (Transport Layer Security). This section focuses on HTTP protocol details and its evolution from early versions to the modern HTTP/3. Chapter 8 discusses TLS in detail, which effectively transports HTTP to constitute HTTPS. For this section, think of HTTPS as simply HTTP transported over TLS.

Overview

HTTP is a simple request-response protocol. Conventional HTTP versions typically run over TCP, although the most modern version, HTTP/3, now commonly runs over UDP as well. It specifies what messages clients may send to servers and what responses they receive in return. Request and response headers are given in ASCII, just like SMTP, and contents are given in a MIME-like format, also like SMTP. This simple model was partly responsible for the Web’s early success because it made development and deployment straightforward.

HTTP is evolving in how it’s used on the Internet. HTTP is an application layer protocol because it runs on top of TCP and is closely associated with the Web and that is why we’re covering in this chapter. In another sense, HTTP is becoming more like a transport protocol providing a way for processes to communicate content across different network boundaries. These processes do not have to be a Web browser and server. A media player could use HTTP to request album information from a server. Antivirus software could use HTTP to download updates. Developers could use HTTP to fetch project files. Consumer electronics like digital photo frames often use an embedded HTTP server as an interface to the outside world. Machine-to-machine communication increasingly runs over HTTP. For example, an airline server might contact a car rental server and make a reservation as part of a vacation package the airline offers.

Methods

Although HTTP was designed for Web use, it was intentionally made more general than necessary with an eye toward future object-oriented uses. For this reason, operations called methods are supported beyond just requesting a Web page.

Each request consists of one or more lines of ASCII text, with the first word on the first line being the method name. Method names are case sensitive, so GET is allowed but not get. The built-in methods are listed in the following figure:

The GET method requests the server to send the page (where “page” means “object” in the most general case, though thinking of a page as file contents suffices to understand concepts). The page is suitably encoded in MIME. The vast majority of requests to Web servers are GETs with simple syntax. The usual form is GET filename HTTP/1.1 where filename names the page to fetch and 1.1 is the protocol version.

The HEAD method requests just the message header without the actual page. This method can collect information for indexing purposes or test a URL for validity.

The POST method is used when forms are submitted. Like GET, it bears a URL, but instead of simply retrieving a page it uploads data to the server (the form contents or parameters). The server then does something with the data depending on the URL, conceptually appending the data to the object. The effect might be purchasing an item or calling a procedure. Finally, the method returns a page indicating the result.

The remaining methods aren’t used much for browsing. The PUT method is GET’s reverse: instead of reading the page, it writes it. This makes it possible to build a collection of Web pages on a remote server. The request body contains the page, possibly encoded using MIME, in which case the lines following PUT might include authentication headers proving the caller has permission to perform the operation.

DELETE does what you’d expect: it removes the page, or at least indicates the Web server agreed to remove it. As with PUT, authentication and permission play major roles.

The TRACE method is for debugging, instructing the server to send back the request. This is useful when requests aren’t being processed correctly and the client wants to know what request the server actually received.

The CONNECT method lets a user make a connection to a Web server through an intermediate device like a Web cache.

The OPTIONS method provides a way for the client to query the server for a page and obtain the methods and headers that can be used with that page.

Every request gets a response consisting of a status line and possibly additional information like all or part of a Web page. The status line contains a three-digit status code telling whether the request was satisfied and, if not, why not. The first digit divides responses into five major groups, as shown in the following figure:

The 1xx codes are rarely used in practice.
The 2xx codes mean the request was handled successfully and content (if any) is being returned.
The 3xx codes tell the client to look elsewhere, either using a different URL or in its own cache.
The 4xx codes mean the request failed due to client error like an invalid request or nonexistent page.
The 5xx errors mean the server itself has an internal problem, either due to code error or temporary overload.

Message Headers

The request line (the line with the GET method) may be followed by additional lines with more information called request headers. This information can be compared to procedure call parameters. Responses may also have response headers. Some headers can be used in either direction. A selection of the more important ones is given in the following figure:

The User-Agent header allows the client to inform the server about its browser implementation (e.g., Mozilla/5.0 and Chrome/74.0.3729.169). This information is useful for servers to tailor responses to the browser, since different browsers can have widely varying capabilities and behaviors.

The four Accept headers tell the server what the client is willing to accept if it has a limited repertoire of what’s acceptable. The first header specifies acceptable MIME types (e.g., text/html). The second gives the character set (e.g., ISO-8859-5 or Unicode-1-1). The third deals with compression methods (e.g., gzip). The fourth indicates a natural language (e.g., Spanish). If the server has a choice of pages, it can use this information to supply what the client seeks. If it’s unable to satisfy the request, an error code is returned and the request fails.

The If-Modified-Since and If-None-Match headers are used with caching. They let the client ask for a page to be sent only if the cached copy is no longer valid.

The Host header names the server, taken from the URL. This header is mandatory because some IP addresses may serve multiple DNS names and the server needs to tell which host to hand the request to.

The Authorization header is needed for protected pages. In this case, the client may have to prove it has a right to see the requested page.

The client uses the (misspelled) Referer [sic] header to give the URL that referred to the URL now requested. Most often this is the previous page’s URL. This header is particularly useful for tracking Web browsing, as it tells servers how a client arrived at the page.

Cookies are small files that servers place on client computers to remember information for later. A typical example is an e-commerce site that uses a client-side cookie to track what the client has ordered so far. Every time the client adds an item to their shopping cart, the cookie is updated to reflect the new item. Although cookies are dealt with in RFC 2109 rather than RFC 2616, they also have headers. The Set-Cookie header is how servers send cookies to clients. The client is expected to save the cookie and return it on subsequent requests using the Cookie header. (Note that a more recent specification for cookies with newer headers, RFC 2965, has been largely rejected by industry and isn’t widely implemented.)

Many other headers are used in responses. The Server header allows the server to identify its software build if it wishes. The next five headers, all starting with Content-, allow the server to describe properties of the page it’s sending.

The Last-Modified header tells when the page was last modified, and the Expires header tells how long the page will remain valid. Both headers play important roles in page caching.

The Location header is used by the server to inform the client it should try a different URL. This can be used if the page has moved or to allow multiple URLs to refer to the same page (possibly on different servers). It’s also used for companies with a main Web page in the com domain that redirect clients to national or regional pages based on their IP addresses or preferred language.

If a page is large, a small client may not want it all at once. Some servers accept requests for byte ranges, so the page can be fetched in multiple small units. The Accept-Ranges header announces the server’s willingness to handle this.

Headers that can be used either way include the Date header, which contains the time and date the message was sent, while the Range header tells the byte range of the page provided by the response. The ETag header gives a short tag serving as a name for the page content and is used for caching. The Cache-Control header gives other explicit instructions about how to cache (or more usually, how not to cache) pages.

Finally, the Upgrade header is used for switching to a new communication protocol, such as a future HTTP protocol or secure transport. It allows the client to announce what it can support and the server to assert what it’s using.

Caching

Users frequently revisit pages and related pages share common resources like navigation images, style sheets, and scripts. Caching stores fetched pages for reuse, eliminating redundant transfers. HTTP includes mechanisms to help clients determine when cached pages remain valid, reducing network traffic and latency. Pages are typically stored on disk for future browser sessions.

The core challenge is determining when a cached copy matches what would be fetched fresh. URLs alone can’t answer this—a URL for “latest news” changes frequently while one for “Greek mythology” rarely changes. HTTP employs two strategies:

Page validation checks if cached content is still fresh using the Expires header and current time. Without an Expires header, browsers apply heuristics—if a page hasn’t changed in a year (per Last-Modified header), it likely won’t change in the next hour. Such heuristics work well but aren’t foolproof.
When freshness is uncertain, clients issue a conditional GET asking servers to confirm cached validity. The If-Modified-Since header sends the cached page’s timestamp; servers respond with a short confirmation or send the full updated page.

HTTP/1 and HTTP/1.1

Browsers typically connect to servers via TCP on port 443 (HTTPS) or port 80 (HTTP). The value of using TCP is that neither browsers nor servers have to worry about how to handle long messages, reliability, or congestion control. All of these matters are handled by the TCP implementation.

HTTP/1.0 established a TCP connection, sent one request, received one response, then closed the connection. This worked for simple HTML pages but became inefficient as pages grew to include numerous embedded resources like icons—each requiring its own TCP connection.

HTTP/1.1 introduced persistent connections (connection reuse), allowing multiple request-response pairs over a single TCP connection. This amortizes TCP setup and teardown costs across requests. Additionally, request pipelining sends subsequent requests before receiving prior responses. These improvements provide speedups for two reasons: eliminating redundant connection establishments (each requiring at least one round-trip time) and avoiding TCP slow-start warmup for each transfer. Multiple short connections take disproportionately longer than one longer connection due to slow-start overhead.

Pipelining further improves performance by sending requests for embedded images as soon as the main page identifies them, reducing server idle time. However, persistent connections raise the question of when to close them. In practice, connections stay open until idle for a short period (e.g., 60 seconds) or when too many connections exist.

The performance difference between these three cases is shown in the following figure:

An alternative approach—parallel connections running multiple simultaneous TCP connections—was popular before persistent connections. While hiding some latency by parallelizing setup, this method is discouraged because each TCP connection performs independent congestion control, causing connections to compete, increase packet loss, and act more aggressively than a single connection. Persistent connections are superior, avoiding overhead without congestion issues.

HTTP/2

HTTP/1.0 existed from the Web’s beginning, and HTTP/1.1 was written in 2007. By 2012 it was outdated, prompting IETF to create a working group that produced HTTP/2, starting from Google’s earlier SPDY protocol. The final specification was published as RFC 7540 in May 2015.

The working group aimed to:

Allow clients and servers to choose HTTP versions
Maintain compatibility with HTTP/1.1
Improve performance through multiplexing, pipelining, and compression
Support existing practices in browsers, servers, proxies, and delivery networks.

A key principle was backward compatibility—existing applications had to work with HTTP/2, while new ones could leverage new features for better performance. Therefore, headers, URLs, and general semantics changed little. What changed was encoding and client-server interaction patterns:

In HTTP/1.1, a client opens a TCP connection to a server, sends a text request, waits for a response, and often closes the connection. This repeats as needed to fetch an entire page.
In HTTP/2, a TCP connection is established and many requests can be sent in binary format, possibly prioritized, with the server responding in any order. Only after all requests are answered is the TCP connection closed.

An example of getting the same information (a Web page, its style sheet, and two images) in HTTP/1.1 and HTTP/2 is shown in the following figure:

HTTP/2 introduces server push, allowing servers to proactively send files they know will be needed even before the client requests them. For example, if a client requests a page and the server sees it uses a style sheet and JavaScript file, the server can send those resources before they’re requested, eliminating delays. In HTTP/1.1, multiple requests can be sent consecutively over the same TCP connection, but they must be processed and responded to in order. In HTTP/2, responses can return in any order. If image 1 is very large, the server could send image 2 first so the browser can start displaying the page before image 1 arrives—not allowed in HTTP/1.1. The server can also send the style sheet without being asked.

Beyond pipelining and multiplexing requests over the same TCP connection, HTTP/2 compresses headers and sends them in binary to reduce bandwidth usage and latency. An HTTP/2 session consists of frames, each with a separate identifier. Responses may return in different order than requests, but since each response carries the request identifier, the browser can determine which request each response corresponds to.

Encryption was contentious during HTTP/2 development. Some strongly favored it, others opposed it—particularly for Internet-of-Things applications where devices lack computing power. Ultimately, the standard didn’t require encryption, but all browsers do require it, making it de facto mandatory for Web browsing.

HTTP/3

HTTP/3 (H3) is the third major HTTP revision, designed as HTTP/2’s successor. The major distinction is its transport protocol: rather than TCP, it uses QUIC, an augmented version of UDP with user-space congestion control running on top. HTTP/3 started as HTTP-over-QUIC and has become the latest proposed major revision. Many open-source libraries supporting client and server logic for QUIC and HTTP/3 exist in languages including C, C++, Python, Rust, and Go. Popular web servers like nginx now support HTTP/3 through patches.

The QUIC transport protocol supports stream multiplexing and per-stream flow control, similar to HTTP/2. Stream-level reliability and connection-wide congestion control can dramatically improve HTTP performance, since congestion information can be shared across sessions and reliability amortized across multiple parallel connections fetching objects. Once a connection exists to a server endpoint, HTTP/3 allows the client to reuse that connection with multiple different URLs.

HTTP/3, running HTTP over QUIC, promises many performance enhancements over HTTP/2, primarily from QUIC’s benefits for HTTP versus TCP. In some ways, QUIC could be viewed as the next generation of TCP. It offers connection setup with no additional round trips between client and server. When a previous connection has been established between client and server, zero-round-trip connection re-establishment is possible if a secret from the previous connection was cached. QUIC guarantees reliable, in-order delivery of bytes within a single stream but provides no guarantees regarding bytes on other QUIC streams. QUIC permits out-of-order delivery within a stream, though HTTP/3 doesn’t use this feature. HTTP/3 over QUIC will be performed exclusively using HTTPS—requests to the increasingly deprecated HTTP URLs won’t be upgraded to use HTTP/3.

7.3.5 Web Privacy

One of the most significant recent concerns involves privacy issues associated with Web browsing. Web sites, applications, and third parties often use HTTP mechanisms to track user behavior both within single sites and across the Internet. Additionally, attackers may exploit information side channels in browsers or devices to track users. This section describes mechanisms used to track and fingerprint individual users and devices.

Cookies

A conventional tracking method involves placing a cookie—effectively a small amount of data—on client devices, which clients then send back upon subsequent visits to various sites. When a user requests a Web object like a page, a server may place persistent state called a cookie on the user’s device using the “set-cookie” directive in HTTP. This data is stored locally on the device. When the device visits that Web domain in the future, the HTTP request passes the cookie along with the request itself.

First-party HTTP cookies, set by the domain of the site the user intends to visit (like shopping or news sites), are useful for improving user experience on many sites. For example, cookies preserve state across a Web session, allowing sites to track useful information about ongoing user behavior, such as recent logins or shopping cart contents.

Cookies set by one domain are generally only visible to that same domain. For example, one advertising network may set a cookie on a user device, but no other third party can see it. This Web security policy, called the same-origin policy, prevents one party from reading another party’s cookie and in some sense limits how information about individual users is shared.

While first-party cookies often improve user experience, third parties such as advertisers and tracking companies can also set cookies on client devices, allowing them to track the sites users visit as they navigate across the entire Internet. This tracking works as follows: When a user visits a site, in addition to directly requested content, the device may load content from third-party sites, including advertising network domains. Loading an advertisement or script from a third party allows that party to set a unique cookie on the user’s device. That user may subsequently visit different sites that load objects from the same third party that set tracking information on a different site.

A common example involves two different sites using the same advertising network to serve ads. The advertising network would see the user’s device return the cookie it set on a different site, plus the HTTP referer request header accompanying the request to load the advertiser’s object, indicating the original site the user’s device was visiting. This practice is commonly called cross-site tracking.

Super cookies and other locally stored tracking identifiers that users cannot control like regular cookies allow intermediaries to track users across sites over time. Unique identifiers can include third-party tracking identifiers encoded in HTTP, specifically HSTS (HTTP Strict Transport Security) headers that aren’t cleared when users clear their cookies, and tags that intermediate third parties like mobile ISPs can insert into unencrypted Web traffic traversing a network segment. This enables third parties like advertisers to build profiles of user browsing across sites, similar to Web tracking cookies used by ad networks and application providers.

Third-Party Trackers

Web cookies from third-party domains used across many sites allow advertising networks or other third parties to track user browsing habits on any site where that tracking software is deployed (any site carrying their advertisements, sharing buttons, or other embedded code). Advertising networks and other third parties typically track user browsing patterns across the range of sites users browse, often using browser-based tracking software. Sometimes a third party develops its own tracking software (like Web analytics software), or they may use a different third-party service to collect and aggregate this behavior across sites.

Sites may permit advertising networks and other third-party trackers to operate on their site, enabling them to collect analytics data, advertise on other sites (called re-targeting), or monetize available advertising space via carefully targeted ads. Advertisers collect data about users using various tracking mechanisms including HTTP cookies, HTML5 objects, JavaScript, device fingerprinting, browser fingerprinting, and other common Web technologies. When users visit multiple sites leveraging the same advertising network, that network recognizes the user’s device, enabling them to track Web behavior over time.

Using such tracking software, a third party or advertising network can discover user interactions, social networks and contacts, likes, interests, purchases, and more. This information enables precise tracking of whether advertisements resulted in purchases, mapping of relationships between people, creation of detailed user tracking profiles, highly targeted advertising, and significantly more due to the breadth and scope of tracking.

Even when someone isn’t a registered user of a particular service (like a social media site or search engine), has ceased using that service, or has logged out, they’re often still being uniquely tracked using third-party and first-party trackers. Third-party trackers are increasingly concentrated with a few large providers.

Beyond third-party tracking with cookies, advertisers and third-party trackers can track user browsing behavior with techniques like canvas fingerprinting (a type of browser fingerprinting), session replay (whereby a third party can see a playback of every user interaction with a particular page), and even exploitation of a browser or password manager’s auto-fill feature to send back data from Web forms, often before a user fills out the form. These sophisticated technologies provide detailed information about user behavior and data, including fine-grained details like user scrolls and mouse-clicks and even sometimes the user’s username and password for a given site (which can be intentional on the user’s part or unintentional on the site’s part).

Recent studies suggest specific instances of third-party tracking software are pervasive. The same studies discovered that news sites have the largest number of tracking parties on any given first-party site, with other popular tracking categories including arts, sports, and shopping sites. Cross-device tracking refers to linking activities of a single user across multiple devices (smartphones, tablets, desktop machines, other smart devices), aiming to track user behavior even as they use different devices.

Certain aspects of cross-device tracking may improve user experience. For example, like cookies on a single device or browser, cross-device tracking can allow users to maintain a seamless experience when moving from one device to the next (like continuing to read a book or watch a movie from where they left off). Cross-device tracking can also help prevent fraud—for example, a service provider may notice that a user has logged in from an unfamiliar device in a completely new location. When a user attempts login from an unrecognized device, a service provider can take additional authentication steps like two-factor authentication.

Cross-device tracking is most common by first-party services like email service providers, content providers (streaming video services), and commerce sites, but third parties are also becoming increasingly adept at tracking users across devices. Cross-device tracking may be deterministic, based on a persistent identifier like a login tied to a specific user, or probabilistic. The IP address is one example of a probabilistic identifier for implementing cross-device tracking. For example, network address translation can cause multiple devices on a network to have the same public IP address. Suppose a user visits a site from a mobile device like a smartphone and uses that device at both home and work. A third party can set IP address information in the device’s cookies. That user may then appear from two public IP addresses (one at work, one at home), and those two IP addresses may be linked by the same third-party cookie. If the user then visits that third party from different devices sharing either of those IP addresses, those additional devices can be linked to the same user with high confidence.

Cross-device tracking often uses a combination of deterministic and probabilistic techniques, many not requiring the user to be logged into any site to enable tracking. For example, some parties offer analytics services that, when embedded across many first-party sites, allow the third party to track a user across sites and devices. Third parties often work together to track users across devices and services using a practice called cookie syncing.

Cross-device tracking enables more sophisticated inference of higher-level user activities, since data from different devices can be combined to build a more comprehensive picture of individual user activity. For example, data about a user’s location (collected from a mobile device) can be combined with a user’s search history and social network activity (such as likes) to determine whether a user has physically visited a store following an online search or online advertising exposure.

Device and Browser Fingerprinting

Even when users disable common tracking mechanisms like third-party cookies, sites and third parties can still track users based on environmental, contextual, and device information that the device returns to the server. Based on a collection of this information, a third party may be able to uniquely identify or fingerprint a user across different sites and over time.

One well-known fingerprinting method is canvas fingerprinting, whereby the HTML canvas is used to identify a device. The HTML canvas allows a Web application to draw graphics in real time. Differences in font rendering, smoothing, dimensions, and other features may cause each device to draw an image differently, and the resulting pixels can serve as a device fingerprint. The technique was first discovered in 2012 but not brought to public attention until 2014. Although there was a backlash at that time, many trackers continue to use canvas fingerprinting and related techniques like canvas font fingerprinting, which identifies a device based on the browser’s font list. Recent studies found these techniques are still present on thousands of sites. Sites can also use browser APIs to retrieve other information for tracking devices, including information like battery status, which can be used to track a user based on battery charge level and discharge time. Other reports describe how knowing the battery status of a device can be used to track a device and therefore associate it with a user.

When different third-party trackers share information with each other, these parties can track an individual user even as they visit sites with different tracking mechanisms installed. Cookie syncing is difficult to detect and also facilitates merging of datasets about individual users between disparate third parties, creating significant privacy concerns. Recent studies suggest the practice of cookie syncing is widespread among third-party trackers.

Quartz 4

Explorer

07. The application Layer

7.3 The World Wide Web