Apache Basics and simple CGI scripts

From docwiki
Jump to: navigation, search


Motivation

The web is ubiquitous today. Everything comes with built-in web servers and there are many free/open source web-servers available. For a long time Apache was the most used web server and it is still rather prominent today. It has recently been surpassed by nginx[1] (pronounced: engine-x). Still the versatility of Apache and the time-tested security makes it a good choice for many applications.

Here you will learn the basics to get you started with running an apache web server, yet many of the concepts will apply to any web server.

Appache Configuration

Global Configuration

Apache has one main configuration file. This is depends on how apache was started and the default is dependent on your Linux distribution. E.g. in debian it is /etc/apache2/apache2.conf and in redhat it is: /etc/httpd/conf/httpd.conf

Within the main configuration there are usually Include statements that include other files. E.g.

IncludeOptional sites-enabled/*.conf

Which would include any .conf file from the sub-directory sites-enabled. Usually the configuration is split up into many files. E.g. one for each module that is included and one for each virtual web server that is hosted.

Apache Feather

Independent whether this is in the main config file or in included files. Some directives are global: They change parameters of the server itself. E.g.

Listen 80
Listen 443
Listen 127.0.0.1:9980

The above would tell apache to create listen sockets of 443 and 80 on all ports and one additional port 9980 that is only available on the localhost.

Modules

A lot of functions of Apache are placed in modules. Those functions then will only be available if the module is loaded. You can use the a2enmod to enabled the modules. (This will place links form the directory where the module load files are to the directory that is actually included in the config).

# a2enmod proxy
# cd /etc/apache2/
# ls -l mods-enabled/proxy.conf
lrwxrwxrwx 1 root root 28 Apr  1 12:37 mods-enabled/proxy.conf -> ../mods-available/proxy.conf

Virtual Web Servers

A web server without encryption answers on port 80. If you have https encryption then it answers on port 443. If you have more then one IP you can choose which IP address the socket binds to.

When a web client connect they will ask for the URI part (the part behind the host name) but after the request the host name that should be sent is also transmitted (in http/1.1 requests). Thus the server can present different web pages depending on the host name.

So the server can discern what the client wants, either be the IP address and/or by the host name that the client requested. So we speak of IP-based and name-based virtual hosts.

With https protected services there is a little chicken-and-egg type problem: When the SSL connection is established the server needs to present the certificate for that server. If it has more virtual servers it does not know which, since the host name is only sent within the established session. To avoid this and to allow more then one virtual server with https protection on the same IP address the SNI was invented. SNI (server name indication) is supported by all modern browsers. With SNI the server name is already sent within the SSL handshake.

<VirtualHost 10.11.12.13:80>

ServerName www.test.example.org
ServerAlias test.example.org


ServerAdmin admin@example.org
DocumentRoot /var/www/testsever/


ErrorLog /var/log/apache/test.error_log
TransferLog /var/log/apache/test.access_log

RedirectPermanent /wuwien http://www.wu.ac.at

Alias /projectdata/ /home/anna/projectx/data/

</VirtualHost>

The above example defines a virtual host (you might want to place that in its own config file - but it works in the main file as well). The virtual host is on that private IP 10.11.12.13 and accepts requests on port 80. This configuration will only be used if the hostname that is sent matches the name in ServerName or ServerAlias. Documents will be served from the DocumentRoot and the config specifies the location of the log files.

If someone browses to http://www.test.example.org/projectdata/ they will actually see what is in the /home/anna/projectx/data/ directory - but only if the user that the web-server uses has permissions on that directory.

You could specify many other directives within that block. One exmaple here is the RedirectPermanent. If a uses goes to http://www.test.exmaple.org/wuwien they will be redirected to another server.

Directory and Location Configuration

Often we want special settings that only apply to one directory (where the files are on the server) or one location (the part specified in the URL).

For this you can specify settings that are only valid in these directories. Of course this can be nested within VirtualHost blocks. E.g.

<Location /server-status>
 SetHandler server-status
</Location>

This would tell apache to server server-status pages (If the module is enabled) under the URI /servers-status.

In most cases it is better to use Directory. E.g.

<Directory "/opt/some/data/">
    Options -Indexes 
    AllowOverride AuthConfig
</Directory>

The above example turns off the indexing of directories. (That is: if you browse to a directory instead of a file, then apache can create a listing of the content. This is turned off here). It also says that the AuthConfig can be specified in a different place: In so called .htaccess files:

The AuthConfig specifies if you need a password to access the web page.

.htaccess files

When you place a file with the name .htaccess in a directory you can change some settings of the configuration just for that directory (and sub-directories). This only works if the class of settings that you want to change is allowed to be changed there. See the above example. Most of the time this is used to password protect access:

AuthName Streng-Vertraulich
AuthType Basic
AuthUserFile /opt/myapp/webusers
require valid-user

The above apache directives tell the server that for access it should ask for a password. In the password Dialog "Streng-Vertraulich" is told to the user. The users and passwords are checked against the given file. Any user in that file has access.

$ touch webusers
$ htpasswd  -B webusers anna
New password:
Re-type new password:
Updating password for user anna
cat webusers
anna:$2y$05$amEPdHfhgbggHblFGUx2ZeuVGNFKbSZoc1kamltBZJrj.YoX1YEwW

The above create a file (if it does not exist yet) with the touch command. the htpasswd tool is then used to create a user nammed anna in that file. The password is read interactively. The -B option tells the tool to use the secure bcrypt algorithm for password hashing. For each user in the file there is a line with the format user:hashed-password.

Useful Apache Features

Reverse Proxy

Here are a few of the many features of Apache that might be useful. You can use it as a reverse proxy where incoming requests are passed onto a totally different server and presented as if they where located on your server. E.g.

ProxyPass "/bilder/"  "http://www.example.com/img/"
ProxyPassReverse "/bilder/"  "http://www.example.com/img/"

The above lines would present the files ander img on the www.example.com server as if they are on the local server within /bilder. The ProxyPassReverse rewrites some redirects.

Connecting to Scripts

In order to create dynamic content, the Apache server can include scripting languages that are directly executed in he context of the server (see below) or can call CGI script (where one script is executed for each request) or can connect to various services. E.g. FastCGI servers, WSGI servers. In a lot of cases Languages include their own web sever and so the ProxyPass above is all that is needed to connect to other applications.


Including Script Languages

PHP, Python, Perl and other languages can be included in Apache so that pages can directly execute code in that language.


CGI Scripts

One of the oldest and easiest way to create dynamic content on the server is the use of CGI (Common Gateway Interface). CGI scripts are scripts (or compiled programs) where, for every request, the script is started and details of the request are passed to the script via environment variables.

In the Apache configuration you need an entry like this:

ScriptAlias /mycgi/ /opt/mycgi/cgi-bin/
<Directory /opt/mycgi/cgi-bin/>
  AllowOverride AuthConfig
  Options +ExecCGI -Indexes
  Require all granted
</Directory>

With this you could use the /opt/mycgi/cgi-bin/ as a directory where you store cgi scripts then can be exectued by the server. (It needs permission for the web server - or all - to execute the scripts. On a typical debian installation the /usr/lib/cgi-bin is already configured for cgi scripts. You may need to enable the cgi module. a2enmod cgi

Your script could look like this:

#!/bin/bash
echo Content-Type: text/plain
echo

echo hello, today is: $(date) 
echo we are running under user $(id)

If this is saved as test.cgi then you can surf to http://example.com/mycgi/test.cgi

The first 2 lines are dictated by the CGI standard. The script must tell the server which type the document is and then an empty line. In our case we have plain text.

Exercises

  • Try to install Apache using the package managment. Look at the existing config.
  • Try to see where the cgi directory is or try to enable the cgi module.
  • Write a short CGI module and see if it works with your webbrowser.

References