Apache part I




[Photo de l'auteur]


Original in fr Charles vidal

fr to en Frédéric Raynal

fr to en Alexandre Abbes


Chairman of a gastronomical lug in Paris. He likes the philosophy behind GNU and Open Source, because it helps to share knowledge. He would like to have time to play saxophone.


This Article about the most used web server Apache is divided in two parts. In the first part I describe in short the history of the World Wide Web and the second part is an introduction to the HTTP protocol.





The concept of HTTP client and server has been developed by people working at the CERN (Centre Européen de Recherche Nucléaire).
Once their research job was completed, they gave it to an American university (NSCA).
I guess that a number of people would be amazed to see that the basis of the modern World Wide Web were created by European people (and particularly the French people).

Apache is the name of a free WEB server project. The name Apache has a slightly contested origin, some say it comes from "a patchy server" because of the numerous patches in the beginning ( again a Hacker trick :) ),some others have a much more serious explanation and say that the founders of the project took this name following in memory of Apache tribe. A tribe with great adaptability on the land.
It is the most used web server in the Internet. It follows HTTP protocol (1.1), standardized by the consortium w3.
A Netcraft survey, made in June 1999, estimates that 60.05% of the web servers are Apache servers.
A web server is the "server" side of the client-server model. It answers queries from "web clients" such as e.g the lynx web browsers ;-).

The HTTP protocol

Server and client talk to each other using the HTTP protocol (Hypertext Transfer Protocol). The current version is HTTP 1.1 as specified in RFC 2616
This protocol is divided in two parts : the client query and the server answer. The protocol is ASCII text based.
  1. The query :

  2. It is one line of text divided in 3 parts :

    1. [query type]
    2. [URL]
    3. [Protocol used]
    This basic line may also be followed by other lines to specify the query, as we shall see for a HTTP/1.1 query.

  3. The answer :

  4. The answer from the server is built with a header and a body, depending on the query type.
    >telnet www.linuxfocus.org 80
    Connected to nova.linuxfocus.org.
    Escape character is '^]'.
    GET / HTTP/1.0 <return>
    HTTP/1.1 200 OK
    Date: Mon, 27 Sep 1999 21:23:20 GMT
    Server: Apache/1.3.3 (Unix)  (Red Hat/Linux)
    Last-Modified: Sun, 26 Sep 1999 16:40:44 GMT
    ETag: "4b005-1616-37ee4c8c"
    Accept-Ranges: bytes
    Content-Length: 5654
    Connection: close
    Content-Type: text/html

    What does this answer say?
    The first line shows the protocol used and the return value of the server (a return value greater than 400 indicates an error). It is followed by the date, the version of the server, the date of the last modification of the URL (this allows the client to know if the files in his cache are still valid). Content-Length is the length of the answer (queries to CGI scripts do not provide this information) and the Content-Type tells the web client the MIME type of the answer (text, html, images ...).

    This is not a complete description : some lines are still a mystery to me ;-)
    Let's see what happens when an error occurs :
    >telnet www.linuxfocus.org 80
    Connected to nova.linuxfocus.org.
    Escape character is '^]'.
    get / HTTP/1.0 <return>
    HTTP/1.1 501 Method Not Implemented
    Date: Mon, 27 Sep 1999 21:22:03 GMT
    Server: Apache/1.3.3 (Unix)  (Red Hat/Linux)
    Connection: close
    Content-Type: text/html

    As you can see, the header is talkative enough ;-)
    HTTP is a very simple protocol as we will see in these examples :
    >telnet www.linuxfocus.org 80
    Connected to nova.linuxfocus.org.
    Escape character is '^]'.
    GET / < return >
          < return >
    [the contents of index.html from www.linuxfocus.org is then displayed ]..

    What happens inside the Apache server ?
    You have been connected with the telnet command to the port 80 of www.linuxfocus.org (IP adress (the port 80 is the default port for the http server). The server was waiting for a query and you wrote GET / followed by 2 carriage return.
    Why those 2 carriage returns ?
    The empty line just signals the server that this is the end of the query. The server answered by sending the requested file (index.html). The TCP/IP connection is closed at the end of the transfer.

    As you can see, the language used between the client and the server is very simple but difficulties arise when you use version 1.1 instead of 1.0 for your query:

    GET / HTTP/1.0< return >
    < return >
    HTTP/1.1 200 OK
    Date: Tue, 24 Aug 1999 22:25:11 GMT
    Server: Apache/1.3.3 (Unix)  (Red Hat/Linux)
    Last-Modified: Sun, 01 Aug 1999 11:50:52 GMT
    ETag: "4b005-1462-37a4349c"
    Accept-Ranges: bytes
    Content-Length: 5218
    Connection: close
    Content-Type: text/html
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> ....
    But typing 1.1 gives this:
    GET / HTTP/1.1 <return >
    < return >
    HTTP/1.1 400 Bad Request
    Date: Tue, 24 Aug 1999 22:24:59 GMT
    Server: Apache/1.3.3 (Unix)  (Red Hat/Linux)
    Connection: close
    Transfer-Encoding: chunked
    Content-Type: text/html
    <TITLE>400 Bad Request</TITLE>
    <H1>Bad Request</H1>
    Your browser sent a request that
    this server could not understand.<P>
    client sent HTTP/1.1 request without hostname 
        (see RFC2068 section 9, and 14.23): </P>
    The query with the new HTTP 1.1 protocol requires more information fields. It is built on several lines. The added lines allow for the transmission of more precise information and therefore improves the quality of the communication.
    This is the version 1.1 of this protocol. Apache's team has strictly followed the new specification which provides more functionality : authentication, virtual sites - several sites sharing the same IP address - and so on ...

    Example :

    GET / HTTP/1.0< return >
    Host:www.linuxfocus.org< return >
    < return >
    As it is done with most of the clients-servers, when the server receive a query :


    The main principle is that a web server can only send one single answer back to clients. The client just sees that it sends a query and gets back the answer.

    The web server is an interface between the web client asking for an URL (Uniform Request Locator) - this abbreviation is not the only one used, you can also find URI, URN, It's basically all the same - and the operating system Apache is working on. The web client sends its query and the server answers back the page which corresponds to the requested URL.

    Some queries sent by the client can't be directly answered by the server. The server can spawn some programs in order to do the job and returns the results : this is exactly how the CGI-scripts (Common Gateway Interface) are working.


    To understand how Apache is working just try telnet on different HTTP servers. This way you can also see what server a specific site is running as the name of the server appears in the answer.