Avoiding security holes when developing an application - Part 6: CGI scripts


Software Development


[image of the authors]


Original in fr Frédéric Raynal, Christophe Blaess, Christophe Grenier

fr to en Georges Tarbouriech

en to en Lorne Bailey


Christophe Blaess is an independent aeronautics engineer. He is a Linux fan and does much of his work on this system. He coordinates the translation of the man pages as published by the Linux Documentation Project.

Christophe Grenier is a 5th year student at the ESIEA, where he works as a sysadmin too. He has a passion for computer security.

Frédéric Raynal has been using Linux for many years because it doesn't pollute, it doesn't use hormones, MSG or animal bone meal... only sweat and tricks.


Getting a file, running a program from a badly programmed Perl script ... "There's More Than One Way To Do It!"

Previous articles in the serie :


[article illustration]

ArticleBody:[The real article: put the text and html-codes here]

Web server, URI and configuration problems

(Too short) Introduction on how a web server works and how to build an URI

When a client asks for a HTML file, the server sends the requested page (or an error message). The browser interprets the HTML code to format and display the file. For instance, typing the http://www.linuxdoc.org/HOWTO/ HOWTO-INDEX/howtos.html
URL (Uniform Request Locator), the client connects to the www.linuxdoc.org server and asks for the /HOWTO/HOWTO-INDEX/howtos.html page (called URI - Uniform Resource Identifiers), using the HTTP protocol. If the page exists, the server sends the requested file. With this static model, if the file is present on the server, it is sent "as is" to the client, otherwise an error message is sent (the well known 404 - Not Found).

Unfortunately, this doesn't allow interactivity with the user, making features such as e-business, e-reservation for holidays or e-whatever impossible.

Fortunately, there are solutions to dynamically generate HTML pages. CGI (Common Gateway Interface) scripts are one of them. In this case, the URI to access web pages is built in a slightly different way :

http://<server><pathToScript>[?[param_1=val_1][...] [&param_n=val_n]]
The arguments list is stored in the QUERY_STRING environment variable. In this context, a CGI script is nothing but an executable file. It uses the stdin (standard input) or the environment variable QUERY_STRING to get the arguments passed to it. After executing the code, the result is displayed on the stdout (standard output) and then, redirected to the web client. Almost every programming language can be used to write a CGI script (compiled C program, Perl, shell-scripts...).

For example, let's search what the HOWTOs from www.linuxdoc.org know about ssh :

http://www.linuxdoc.org/cgi-bin/ldpsrch.cgi? svr=http%3A%2F%2Fwww.linuxdoc.org&srch=ssh&db=1& scope=0&rpt=20
In fact, this is much simpler than it seems. Let's analyze this URL:

Often, arguments names and values are explicit enough to understand their meaning. Furthermore, the content of the page displaying the answers is rather significant.

Now you know that the bright side of CGI scripts is the user's ability to pass in arguments... but the dark side is that a badly written script opens a security hole.

You probably noticed the strange characters used by your preferred browser or present within the previous request. Those characters are encoded with the ISO 8859-1 charset (have a look at >man iso_8859_1). The table 1 provides with the meaning of some of these codes. Let's mention some IIS4.0 and IIS5.0 servers have a very dangerous vulnerability called unicode bug based on the extended unicode representation of "/" and "\". .

Apache configuration with "SSI Server Side Include"

Server Side Include is a part of a web server's functionality. It allows integrating instructions into web pages, either to include a file "as is", or to execute a command (shell or CGI script).

In the Apache configuration file httpd.conf, the "AddHandler server-parsed .shtml" instruction activates this mechanism. Often, to avoid the distinction between .html and .shtml, one can add the .html extension. Of course, this slows down the server... This can be controlled at directories level with the instructions :

In the attached guestbook.cgi script, the text provided by the user is included into an HTML file, without '<' and ' >' character conversion into &lt; and &gt; HTML code. A curious person could submit one of the following instructions :

With the first one,
you get a few lines of information about the system :
HTTP_ACCEPT=image/gif, image/jpeg, image/pjpeg, image/png, */*
HTTP_USER_AGENT=Mozilla/4.76 [fr] (X11; U; Linux 2.2.16 i686)
SERVER_SIGNATURE=<ADDRESS>Apache/1.3.14 Server www.esiea.fr Port 8080</ADDRESS>

SERVER_SOFTWARE=Apache/1.3.14 (Unix)  (Red-Hat/Linux) PHP/3.0.18
DATE_LOCAL=Tuesday, 27-Feb-2001 15:33:56 CET
DATE_GMT=Tuesday, 27-Feb-2001 14:33:56 GMT
LAST_MODIFIED=Tuesday, 27-Feb-2001 15:28:05 CET

The exec instruction, provides you almost with a shell equivalent :


Don't try "<!--#include file="/etc/passwd"-->", the path is relative to the directory where you can find the HTML file and can't contain "..". The Apache error_log file, then contains a message indicating an access attempt to a prohibited file. The user can see the message [an error occurred while processing this directive] in the HTML page.

SSI are not often needed so it is better to deactivate it on the server. However the cause of the problem is the combination of the broken guestbook application and the SSI.

Perl Scripts

In this section, we present security holes related to CGI scripts written with Perl. To keep things clear, we don't provide the examples full code but only the parts required to understand where the problem is.

Each of our scripts is built according the following template :

#!/usr/bin/perl -wT
BEGIN { $ENV{PATH} = '/usr/bin:/bin' }
delete @ENV{qw(IFS CDPATH ENV BASH_ENV)};   # Make %ENV safer =:-)
print "Content-type: text/html\n\n";
print "<HTML>\n<HEAD>";
print "<TITLE>Remote Command</TITLE></HEAD>\n";
# now use $input e.g like this:
# print "<p>$input{filename}</p>\n";
# #################################### #
# Start of problem description         #
# #################################### #

# ################################## #
# End of problem description         #
# ################################## #

print "<form action=\"$ENV{'SCRIPT_NAME'}\">\n";
print "<input type=texte name=filename>\n </form>\n";
print "</BODY>\n";
print "</HTML>\n";

# first arg must be a reference to a hash. 
# The hash will be filled with data.
sub ReadParse($) {
  my $in=shift;
  my ($i, $key, $val);
  my $in_first;
  my @in_second;

  # Read in text
  if ($ENV{'REQUEST_METHOD'} eq "GET") {
    $in_first = $ENV{'QUERY_STRING'};
  } elsif ($ENV{'REQUEST_METHOD'} eq "POST") {
    die "ERROR: unknown request method\n";

  @in_second = split(/&/,$in_first);

  foreach $i (0 .. $#in_second) {
    # Convert plus's to spaces
    $in_second[$i] =~ s/\+/ /g;

    # Split into key and value.  
    ($key, $val) = split(/=/,$in_second[$i],2); 

    # Convert %XX from hex numbers to alphanumeric
    $key =~ s/%(..)/pack("c",hex($1))/ge;
    $val =~ s/%(..)/pack("c",hex($1))/ge;

    # Associate key and value
    # \0 is the multiple separator
    $$in{$key} .= "\0" if (defined($$in{$key})); 
    $$in{$key} .= $val;

  return length($#in_second); 

More on the arguments passed to Perl (-wT) later. We begin cleaning up the $ENV and $PATH environment variables and we send the HTML header (this is something part of the html protocl between browser and server. You can't see it in the webpage displayed on the browser side). The ReadParse() function reads the arguments passed to the script. This can be done more easily with modules, but this way you can see the whole code. Next, we present the examples. Last, we finish with the HTML file.

The null byte

Perl considers every character in the same way, what differs from C functions, for instance. For Perl, the null character to end a string is a character like any other one. So what ?

Let's add the following code to our script to create showhtml.cgi  :

  # showhtml.cgi
  my $filename= $input{filename}.".html";
  print "<BODY>File : $filename<BR>";
  if (-e $filename) {
      open(FILE,"$filename") || goto form;
      print <FILE>;

The ReadParse() function gets the only argument : the name of the file to display. To prevent some "rude guest" from reading more than the HTML files, we add the ".html" extension at the end of the filename. But, remember, the null byte is a character like any other one...

Thus, if our request is showhtml.cgi?filename=%2Fetc%2Fpasswd%00 the file is called my $filename = "/etc/passwd\0.html" and ours astounded eyes gaze at something not being HTML.

What happens ? The strace command shows how Perl opens a file:

  /tmp >>cat >open.pl << EOF
  > #!/usr/bin/perl
  > open(FILE, "/etc/passwd\0.html");
  > EOF
  /tmp >>chmod 0700 open.pl 
  /tmp >>strace ./open.pl 2>&1 | grep open
  execve("./open.pl", ["./open.pl"], [/* 24 vars */]) = 0
  open("./open.pl", O_RDONLY)             = 3
  read(3, "#!/usr/bin/perl\n\nopen(FILE, \"/et"..., 4096) = 51
  open("/etc/passwd", O_RDONLY)           = 3

The last open() presented by strace corresponds to the system call, written in C. We can see, the .html extension disappeared, and this allowd us to open /etc/passwd.

This problem is solved with a single regular expression which removes all null bytes:


Using pipes

Here is a script without any protection. It displays a given file from the directory tree /home/httpd/ :


my $filename= "/home/httpd/".$input{filename};
print "<BODY>File : $filename<BR>";
open(FILE,"$filename") || goto form;
print <FILE>;

Don't laugh at this example ! I have seen such scripts.

The first exploit is obvious :

One need only go up the tree to access any file. But there is another much more interesting posibility: to execute the command of your choice. In Perl, the open(FILE, "/bin/ls") command opens the "/bin/ls" binary file... but open(FILE, "/bin/ls |") executes the specified command. Adding a single pipe | changes the behavior of open().

Another problem comes from the fact that the existence of the file is not tested, which allows us to execute any command but also to pass any arguments : pipe1.cgi?filename=..%2F..%2F..%2Fbin%2Fcat%20%2fetc%2fpasswd%20| displays the password file content.

Testing the existence of the file to open gives less freedom :


my $filename= "/home/httpd/".$input{filename};
print "<BODY>File : $filename<BR>";
if (-e $filename) {
  open(FILE,"$filename") || goto form;
  print <FILE>
} else {
  print "-e failed: no file\n";
The previous example doesn't work anymore. The "-e" test fails since it can't find the "../../../bin/cat /etc/passwd |" file.

Let's try now the /bin/ls command. The behavior will be the same as before. That is, if we try, for instance, to list the /etc directory content, "-e" tests the existence of the "../../../bin/ls /etc | file, but it doesn't exist either. As soon as we don't provide the name of a "ghost" file, we won't get anything interesting :(

However, there is still a "way out", even if the result is not so good. The /bin/ls file exists (well, in most of the systems), but if open() is called with this filename, the command won't be executed but the binary will be displayed. We must then find a way to put a pipe '|' at the end of the name, without it to be used during the check done by "-e". We already know the solution : the null byte. If we send "../../../bin/ls\0|" as name, the existence check succeeds since it only considers "../../../bin/ls", but open() can see the pipe and then executes the command. Thus, the URI providing the current directory content is :


Line feed

The finger.cgi script executes the finger instruction on our machine :


print "<BODY>";
$login = $input{'login'};
$login =~ s/([;<>\*\|`&\$!#\(\)\[\]\{\}:'"])/\\$1/g;
print "Login $login<BR>\n";
print "Finger<BR>\n";
$CMD= "/usr/bin/finger $login|";
open(FILE,"$CMD") || goto form;
print <FILE>

This script, (at least) takes a useful precaution : it takes care of some strange characters to prevent them from being interpreted with a shell by placing a '\' in front. Thus, the semicolon is changed to "\;" by the regular expression. But the list doesn't contain every important character. Among others, the line feed '\n' is missing.

In your preferred shell command line, you validate an instruction typing the RETURN or ENTER key that sends a '\n' character. In Perl, you can do the same. We already saw the open() instruction allowed us to execute a command as soon as the line ended with a pipe '|'.

To simulate this behavior we to add a carriage-return and an instruction after the login sent to the finger command :


Other characters are quite interesting to execute various instructions in a row :

They don't work here since they are protected with the regular expression. But, let's find a way to work this out.

Backslash and semicolon

The previous finger.cgi script avoides problems with some strange characters. Thus, the URI <finger.cgi?login=kmaster;cat%20/etc/passwd doesn't work when the semicolon is escaped. However, one character is not protected : the backslash '\'.

Let's take, for instance, a script that prevents us from going up the tree by using the regular expression s/\.\.//g to get rid of "..". It doesn't matter! Shells can manage various numbers of '/' at once (just try cat ///etc//////passwd to get convinced).

For example, in the above pipe2.cgi script, the $filename variable is initialized from the "/home/httpd/" prefix. Using the previous regular expression could seem efficient to prevent from going up through the directories. Of course, this expression protects from "..", but what happens if we protect the '.' character ? That is, the regular expression doesn't match if the filename is .\./.\./etc/passwd. Let's mention, this works very well with system() (or ` ... `), but open() or "-e" fails.

Let's go back to the finger.cgi script. Using the semicolon, the finger.cgi?login=kmaster;cat%20/etc/passwd URI doesn't give the expected result since the semicolon is escaped by the regular expression. That is, the shell receives the instruction :

/usr/bin/finger kmaster\;cat /etc/passwd
The following errors are found in the web server logs :
finger: kmaster;cat: no such user.
finger: /etc/passwd: no such user.
These messages are identical to those you can get when typing this line in a shell. The problem comes from the fact the protected ';' considers this character as belonging to the string "kmaster;cat" .

We want to separate both instructions, the one from the script and the one we want to use. We must then protect the ';' : <A HREF="finger.cgi?login=kmaster\;cat%20/etc/passwd"> finger.cgi?login=kmaster\;cat%20/etc/passwd</A>. The "\; string, is then changed by the script into "\\;", and next, sent to the shell. This last reads :

/usr/bin/finger kmaster\\;cat /etc/passwd
The shell splits this into two different instructions :
  1. /usr/bin/finger kmaster\ which probably will fail... but we don't care ;-)
  2. cat /etc/passwd which displays the password file.
The solution is simple : the backslash '\' must be escaped, too.

Using an unprotected " character

Sometimes, the parameter is "protected" using quotes. We have changed the previous finger.cgi script to protect the $login variable that way.

However, if the quotes are not escaped, it's useless. Even one added in your request will fail. This happens because the first quote sent closes the opening one from the script. Next, you write the command, and a second quote opens the last (closing) quote from the script.

The finger2.cgi script illustrates this :


print "<BODY>";
$login = $input{'login'};
$login =~ s/\0//g;
$login =~ s/([<>\*\|`&\$!#\(\)\[\]\{\}:'\n])/\\$1/g;
print "Login $login<BR>\n";
print "Finger<BR>\n";
#New (in)efficient super protection :
$CMD= "/usr/bin/finger \"$login\"|";  
open(FILE,"$CMD") || goto form;
while(<FILE>) {

The URI to execute the command then becomes :

The shell receives the command /usr/bin/finger "$login";cat /etc/passwd"" and the quotes are not a problem anymore.

So, it's important, if you wish to protect the parameters with quotes, to escape them as for the semicolon or the backslash already mentioned.

Writing in Perl

Warning and tainting options

When programming in Perl, use the w option or "use warnings;" (Perl 5.6.0 and later), it informs you about potential problems, such as uninitialized variables or obsolete expressions/functions.

The T option ( taint mode) provides higher security. This mode activates various tests. The most important concerns a possible tainting of variables. Variables are either clean or tainted. Data coming from outside the program is considered as tainted as long as it hasn't been cleaned up. Such a tainted variable is then unable to assign values to things that are used outside the program (calls to other shell comands).

In taint mode, the command line arguments, the environment variables, some system call results (readdir(), readlink(), readdir(), ...) and the data coming from files, are considered suspicious and thus tainted.

To clean up a variable, you must filter it through a regular expression. Obviously, using .* is useless. The goal is to force you to take care of provided arguments. Always use a regular expression that is as specific as possible.

Nevertheless, this mode doesn't protect from everything : the tainting of arguments passed to system() or exec() as a list variable is not checked. You must then be very careful if one of your scripts uses these functions. The exec "sh", '-c', $arg; instruction is considered as secure, whether $arg is tainted or not :(

It's also recommended to add "use strict;" at the beginning of your programs. This forces you to declare variables; some people will find that annoying but it's mandatory if you use mod-perl.

Thus, your Perl CGI scripts must begin with :

#!/usr/bin/perl -wT
use strict;
use CGI;
or with Perl 5.6.0 :
#!/usr/bin/perl -T
use warnings;
use strict;
use CGI;

Call to open()

Many programmers open a file simply using open(FILE,"$filename") || .... We already saw the risks of such code. To reduce the risk, specify the open mode :

Don't open your files in an unspecified way.

Before accessing a file, it's recommended to check if the file exists. This doesn't prevent the race conditions types of problems presented in the previous article, but avoids some traps such as commands with arguments.

if ( -e $filename ) { ... }

Starting from Perl 5.6, there's a new syntax for open() : open(FILEHANDLE,MODE,LIST). With the '<' mode, the file is open for reading; with the '>' mode, the file is truncated or created if needed, and open for writing. This becomes interesting for modes communicating with other processes. If the mode is '|-' or '-|', the LIST argument is interpreted as a command and is respectively found before or after the pipe.

Before Perl 5.6 and open() with three arguments, some people used the sysopen() command.

Input escaping and filtering

There are two methods : either you specify the forbidden characters, or you explicitely define the allowed characters using regular expressions. The example programs should have convinced you that it's quite easy to forget to filter potentially dangerous characters, that's why the second method is recommended.

Practically, here is what to do : first, check the request only holds allowed characters. Next, escape the characters considered as dangerous among the allowed ones.

#!/usr/bin/perl -wT

# filtre.pl

#  The $safe and $danger variables respectively define
#  the characters without risk and the risky ones.
#  Add or remove some to change the filter.
#  Only $input containing characters included in the
#  definitions are valid.

use strict;

my $input = shift;

my $safe = '\w\d';
my $danger = '&`\'\\|"*?~<>^(){}\$\n\r\[\]';
#  '/', space and tab are not part of the definitions on purpose

if ($input =~ m/^[$safe$danger]+$/g) {
    $input =~ s/([$danger]+)/\\$1/g;
} else {
    die "Bad input chars in $input\n";
print "input = [$input]\n";

This script defines two character sets :

Every request containing a character not present in one of the two sets is immediately rejected.

PHP scripts

I don't want to be controversial, but I think it's better to write scripts in PHP rather than in Perl. More exactly, as a system administrator, I prefer my users to write scripts in PHP language rather than in Perl. Someone programming insecurely in PHP will be as dangerous as Perl, so why prefer PHP ? If you have some programming problems with PHP, you can activate the Safe mode (safe_mode=on) or deactivate functions (disable_functions=...). This mode prevents accessing files not belonging to the user, changing environment variables unless explicitely allowed, executing commands, etc.

By default, the Apache banner informs us about the PHP being used.

$ telnet localhost 80
Connected to localhost.localdomain.
Escape character is '^]'.
HTTP/1.1 200 OK
Date: Tue, 03 Apr 2001 11:22:41 GMT
Server: Apache/1.3.14 (Unix)  (Red-Hat/Linux) mod_ssl/2.7.1
        OpenSSL/0.9.5a PHP/4.0.4pl1 mod_perl/1.24
Connection: close
Content-Type: text/html

Connection closed by foreign host.
Write expose_PHP = Off into /etc/php.ini to hide the information :
Server: Apache/1.3.14 (Unix)  (Red-Hat/Linux) mod_ssl/2.7.1 
OpenSSL/0.9.5a mod_perl/1.24 

The /etc/php.ini file (PHP4) and /etc/httpd/php3.ini have many parameters that can help harden the system. For instance, the "magic_quotes_gpc" option adds quotes on the arguments received by the GET, POST methods and via cookies; this avoids a number of problems found in our Perl examples.


This article is probably the most easily understood among the articles in this series. It shows vulnerabilities exploited every day on the web. There are many others, often related to bad programming (for instance, a script sending a mail, taking the From: field as an argument, provides a good site for spamming). Examples are too numerous. As soon as a script is on a web site, you can bet at least one person will try to use it the wrong way.

This article ends the series about secure programming. We hope we helped you discover the main security holes found in too many applications, and that you will take into account the "security" parameter when designing and programming your applications. Security problems are often neglected because of the limited scope of the development (internal use, private network use, temporary model, etc.). Nevertheless, a module originally designed for only very restricted use can become the base for a much bigger application and then changes later on will be much more expensive.

Some URI Encoded characters

URI Encoding (ISO 8859-1) Character
%00 \0 (end of string)
%0a \n (carriage return)
%20 space
%21 !
%22 "
%23 #
%26 & (ampersand)
%2f /
%3b ;
%3c <
%3e >
Tab 1 : ISO 8859-1 and character correspondance


The fauly guestbook.cgi program

#!/usr/bin/perl -w

# guestbook.cgi

BEGIN { $ENV{PATH} = '/usr/bin:/bin' }
delete @ENV{qw(IFS CDPATH ENV BASH_ENV)};   # Make %ENV safer =:-)
print "Content-type: text/html\n\n";
print "<HTML>\n<HEAD><TITLE>Buggy Guestbook</TITLE></HEAD>\n";
my $email= $input{email};
my $texte= $input{texte};
$texte =~ s/\n/<BR>/g;
print "<BODY><A HREF=\"guestbook.html\">
       GuestBook </A><BR><form action=\"$ENV{'SCRIPT_NAME'}\">\n
      Email: <input type=texte name=email><BR>\n
      Texte:<BR>\n<textarea name=\"texte\" rows=15 cols=70>
      </textarea><BR><input type=submit value=\"Go!\">
print "</BODY>\n";
print "</HTML>";
open (FILE,">>guestbook.html") || die ("Cannot write\n");
print FILE "Email: $email<BR>\n";
print FILE "Texte: $texte<BR>\n";
print FILE "<HR>\n";
sub ReadParse {
  my $in =shift;
  my ($i, $key, $val);
  my $in_first;
  my @in_second;
  # Read in text
  if ($ENV{'REQUEST_METHOD'} eq "GET") {
    $in_first = $ENV{'QUERY_STRING'};
  } elsif ($ENV{'REQUEST_METHOD'} eq "POST") {
    die "ERROR: unknown request method\n";
  @in_second = split(/&/,$in_first);
  foreach $i (0 .. $#in_second) {
    # Convert plus's to spaces
    $in_second[$i] =~ s/\+/ /g;
    # Split into key and value.
    ($key, $val) = split(/=/,$in_second[$i],2); 
    # Convert %XX from hex numbers to alphanumeric
    $key =~ s/%(..)/pack("c",hex($1))/ge;
    $val =~ s/%(..)/pack("c",hex($1))/ge;
    # Associate key and value
    $$in{$key} .= "\0" if (defined($$in{$key})); 
    $$in{$key} .= $val;
  return length($#in_second);