Fundamentals of CGI
The Common Gateway Interface, or CGI, is an interface between a web server and any backend scripts or programs that perform a given task.
It's very easy to get-by without knowing the full details of how CGI physically works, but this can often lead to a lack of confidence when debugging website issues.
How Web Servers Work
The job of the web server is to handle HTTP (HyperText Transfer Protocol) requests sent to it, and to send back an appropriate HTTP response. What the web server actually does with the request can vary, from serving up static HTML files (one of the easiest/standard jobs for a web server), to processing a web-form and sending an email, for example.
CGI Basics
If the web server wants to handle anything more complicated than serving up a static HTML file, there needs to be an agreed way to forward essential information from the web server to an external script or program.
CGI encapsulates this process and defines how information can be passed between the web server and the script. This is done by setting environment variables, and occasionally sending data to STDIN (the Standard Input interface) of the script - POST data is sent to STDIN.
HTTP Environment Variables
I won't cover every single HTTP/CGI environment variable that is set during this process, and be aware that different web servers may set slightly different environment variables. The following list details some of the important ones to be aware of, using an example URL...
https://www.exmaple.com/cgi-bin/script.cgi?foo=bar
HTTP Environment Variables
I won't cover every single HTTP/CGI environment variable that is set during this process, and be aware that different web servers may set slightly different environment variables. The following list details some of the important ones to be aware of, using an example URL...
https://www.exmaple.com/cgi-bin/script.cgi?foo=bar
- DOCUMENT_ROOT => /var/www/html
- GATEWAY_INTERFACE => CGI/1.1
- HTTP_COOKIE => sessionid=abc123
- HTTP_HOST => www.example.com
- HTTP_USER_AGENT => Mozilla/5.0
- QUERY_STRING => foo=bar
- REQUEST_METHOD => GET
- REQUEST_URI => /cgi-bin/script.cgi?foo=bar
When the script executes, it is doing so within an environment, and can access the environment variables directly. In a Perl script you could do the following:
#!/bin/env perl print "Content-Type: text/plain\n\n"; print $ENV{ HTTP_USER_AGENT };
This would result in "Mozilla/5.0" being returned to the browser.
The full response being sent back in this example script consists of a header and a body separated by two new-line characters (\n\n) after the Content-Type header.
Testing CGI Scripts on the Command Line
A CGI script is simply expecting some environment variables to be set when it runs. It doesn't care, nor does it know, if there's a web server running, or if it was invoked because of that web server.
It then simply becomes an exercise in re-creating a CGI environment for the script to run in.
Even a full POST request can be mimicked in the same way, by piping your POSTDATA to STDIN.
CGI Languages
Another interesting aspect of this is when you realise that the script can be written in any language you like, so long as it can access environment variables, and read STDIN. Take your pick from C, Perl, Python, Ruby, etc., and even a Bash script.
I started writing CGI programs in C, parsing the QUERY_STRING manually, splitting on '&', then on '=', managing memory manually, and decoding HTTP-encoded strings.
I then moved to Perl, and found the CGI module on CPAN, which takes care of all this for you. Other languages will no doubt come with their own libraries and modules for dealing with common scenarios such as CGI environments.