lf122, Webdesign: Apache part I

Apache part I

Abstract:

This Article about the most used web server Apache is divided in two parts. In the first part I describe in short the history of the World Wide Web and the second part is an introduction to the HTTP protocol.

_________________ _________________ _________________

History

The concept of HTTP client and server has been developed by people working at the CERN (Centre Européen de Recherche Nucléaire).
Once their research job was completed, they gave it to an American university (NSCA).
I guess that a number of people would be amazed to see that the basis of the modern World Wide Web were created by European people (and particularly the French people).

Apache is the name of a free WEB server project. The name Apache has a slightly contested origin, some say it comes from "a patchy server" because of the numerous patches in the beginning ( again a Hacker trick :) ),some others have a much more serious explanation and say that the founders of the project took this name following in memory of Apache tribe. A tribe with great adaptability on the land.
It is the most used web server in the Internet. It follows HTTP protocol (1.1), standardized by the consortium w3.
A Netcraft survey, made in June 1999, estimates that 60.05% of the web servers are Apache servers.
A web server is the "server" side of the client-server model. It answers queries from "web clients" such as e.g the lynx web browsers ;-).

The HTTP protocol

Server and client talk to each other using the HTTP protocol (Hypertext Transfer Protocol). The current version is HTTP 1.1 as specified in RFC 2616
This protocol is divided in two parts : the client query and the server answer. The protocol is ASCII text based.

The query :

It is one line of text divided in 3 parts :

[query type]
[URL]
[Protocol used]

Possible queries are : GET, POST, HEAD, PUT, DEL, TRACE.
The URL is the path to what you want to see and follows the domain name (for instance www.linuxfocus.org is the domain name and /Francais is the URL to the welcome page for French people)
The protocol used can be HTTP/1.0 or HTTP/1.1

The answer :

The answer from the server is built with a header and a body, depending on the query type.

>telnet www.linuxfocus.org 80 Trying 195.53.25.18... Connected to nova.linuxfocus.org. Escape character is '^]'. GET / HTTP/1.0 <return> <return> HTTP/1.1 200 OK Date: Mon, 27 Sep 1999 21:23:20 GMT Server: Apache/1.3.3 (Unix) (Red Hat/Linux) Last-Modified: Sun, 26 Sep 1999 16:40:44 GMT ETag: "4b005-1616-37ee4c8c" Accept-Ranges: bytes Content-Length: 5654 Connection: close Content-Type: text/html <PAGE HTML>

What does this answer say?
The first line shows the protocol used and the return value of the server (a return value greater than 400 indicates an error). It is followed by the date, the version of the server, the date of the last modification of the URL (this allows the client to know if the files in his cache are still valid). Content-Length is the length of the answer (queries to CGI scripts do not provide this information) and the Content-Type tells the web client the MIME type of the answer (text, html, images ...).

This is not a complete description : some lines are still a mystery to me ;-)
Let's see what happens when an error occurs :

>telnet www.linuxfocus.org 80 Trying 195.53.25.18... Connected to nova.linuxfocus.org. Escape character is '^]'. get / HTTP/1.0 <return> <return> HTTP/1.1 501 Method Not Implemented Date: Mon, 27 Sep 1999 21:22:03 GMT Server: Apache/1.3.3 (Unix) (Red Hat/Linux) Allow: GET, HEAD, OPTIONS, TRACE Connection: close Content-Type: text/html

As you can see, the header is talkative enough ;-)
HTTP is a very simple protocol as we will see in these examples :

>telnet www.linuxfocus.org 80 Trying 195.53.25.18... Connected to nova.linuxfocus.org. Escape character is '^]'. GET / < return > < return >

[the contents of index.html from www.linuxfocus.org is then displayed ]..

What happens inside the Apache server ?
You have been connected with the telnet command to the port 80 of www.linuxfocus.org (IP adress 195.53.25.1) (the port 80 is the default port for the http server). The server was waiting for a query and you wrote GET / followed by 2 carriage return.
Why those 2 carriage returns ?
The empty line just signals the server that this is the end of the query. The server answered by sending the requested file (index.html). The TCP/IP connection is closed at the end of the transfer.

As you can see, the language used between the client and the server is very simple but difficulties arise when you use version 1.1 instead of 1.0 for your query:

GET / HTTP/1.0< return >
< return >
HTTP/1.1 200 OK
Date: Tue, 24 Aug 1999 22:25:11 GMT
Server: Apache/1.3.3 (Unix)  (Red Hat/Linux)
Last-Modified: Sun, 01 Aug 1999 11:50:52 GMT
ETag: "4b005-1462-37a4349c"
Accept-Ranges: bytes
Content-Length: 5218
Connection: close
Content-Type: text/html
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> ....

GET / HTTP/1.1 <return >
< return >
HTTP/1.1 400 Bad Request
Date: Tue, 24 Aug 1999 22:24:59 GMT
Server: Apache/1.3.3 (Unix)  (Red Hat/Linux)
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML><HEAD>
<TITLE>400 Bad Request</TITLE>
</HEADBODY>
<H1>Bad Request</H1>
Your browser sent a request that
this server could not understand.<P>
client sent HTTP/1.1 request without hostname
    (see RFC2068 section 9, and 14.23): </P>
</BODY></HTML>

Example :

GET / HTTP/1.0< return >
Host:www.linuxfocus.org< return >
< return >
[...]

it forks a child process to answer the query ;
and the "parent process" still listens to the port 80 for a new query.
The child answers the query.

Functionality

The web server is an interface between the web client asking for an URL (Uniform Request Locator) - this abbreviation is not the only one used, you can also find URI, URN, It's basically all the same - and the operating system Apache is working on. The web client sends its query and the server answers back the page which corresponds to the requested URL.

Some queries sent by the client can't be directly answered by the server. The server can spawn some programs in order to do the job and returns the results : this is exactly how the CGI-scripts (Common Gateway Interface) are working.

Conclusion

Talkback form for this article

talkback page

Translation information:

fr --> -- : Charles vidal <charles_vidal/at/bigfoot.com>

fr --> en: Frédéric Raynal <pappy/at/users.sourceforge.net>

fr --> en: Alexandre Abbes <alexandre.abbel/at/inria.fr>

2003-01-04, generated by lfparser version 2.35