MyWebUniversity.com Home Page
 



Darwin Mac OS X man pages main menu
modperltuning(3)    User Contributed Perl Documentation   modperltuning(3)



NAME
       modperltuning - modperl performance tuning

DESCRIPTION
       Described here are examples and hints on how to configure a modperl
       enabled Apache server, concentrating on tips for configuration for
       high-speed performance.  The primary way to achieve maximal performance
       is to reduce the resources consumed by the modperl enabled HTPD pro-
       cesses.

       This document assumes familiarity with Apache configuration directives
       some familiarity with the modperl configuration directives, and that
       you have already built and installed a modperl enabled Apache server.
       Please also read the modperl documentation that comes with modperl
       for programming tips.  Some configurations below use features from
       modperl version 1.03 which were not present in earlier versions.

       These performance tuning hints are collected from my experiences in
       setting up and running servers for handling large promotional sites,
       such as The Weather Channel's "Blimp Site-ings" game, the MSIE 4.0
       "Subscribe to Win" game, and the MSN Million Dollar Madness game.

BASIC CONFIGURATION
       The basic configuration for modperl is as follows.  In the httpd.conf
       file, I add configuration parameters to make the
       "http:/www.domain.com/programs" URL be the base location for all
       modperl programs.  Thus, access to "http:/www.domain.com/pro-
       grams/printenv" will run the printenv script, as we'll see below.
       Also, any *.perl file will be interpreted as a modperl program just as
       if it were in the programs directory, and *.rperl will be modperl, but
       without any HTP headers automatically sent; you must do this explic-
       itly.  If you don't want these last two, just leave it out of your con-
       figuration.

       In the configuration files, I use /var/www as the "ServerRoot" direc-
       tory, and /var/www/docs as the "DocumentRoot".  You will need to change
       it to match your particular setup.  The network address below in the
       access to perl-status should also be changed to match yours.

       Additions to httpd.conf:

        # put modperl programs here
        # startup.perl loads all functions that we want to use within modperl
        Perlrequire /var/www/perllib/startup.perl
        
          AllowOverride None
          Options ExecCGI
          SetHandler perl-script
          PerlHandler Apache::Registry
          PerlSendHeader On
        

        # like above, but no PerlSendHeaders
        
          AllowOverride None
          Options ExecCGI
          SetHandler perl-script
          PerlHandler Apache::Registry
          PerlSendHeader Off
        

        # allow arbitrary *.perl files to be scattered throughout the site.
        
          SetHandler perl-script
          PerlHandler Apache::Registry
          PerlSendHeader On
          Options ]ExecCGI
        

        # like *.perl, but do not send HTP headers
        
          SetHandler perl-script
          PerlHandler Apache::Registry
          PerlSendHeader Off
          Options ]ExecCGI
        

        
          SetHandler perl-script
          PerlHandler Apache::Status
          order deny,allow
          deny from all
          allow from 204.117.82.
        

       Now, you'll notice that I use a "PerlRequire" directive to load in the
       file startup.perl.  In that file, I include all of the "use" statements
       that occur in any of my modperl programs (either from the programs
       directory, or the *.perl files).  Here is an example:

        #! /usr/local/bin/perl
        use strict;

        # load up necessary perl function modules to be able to call from Perl-SI
        # files.  These objects are reloaded upon server restart (SIGHUP or SIGUSR1)
        # if PerlFreshRestart is "On" in httpd.conf (as of modperl 1.03).

        # only library-type routines should go in this directory.

        use lib "/var/www/perllib";

        # make sure we are in a sane environment.
        $ENV{GATEWAYINTERFACE} =~ /^CGI-Perl/ or die "GATEWAYINTERFACE not Perl!";

        use Apache::Registry ();       # for things in the "/programs" URL

        # pull in things we will use in most requests so it is read and compiled
        # exactly once
        use CGI (); CGI->compile(':all');
        use CGI::Carp ();
        use DBI ();
        use DBD::mysql ();

        1;

       What this does is pull in all of the code used by the programs (but
       does not "import" any of the module methods) into the main HTPD
       process, which then creates the child processes with the code already
       in place.  You can also put any new modules you like into the
       /var/www/perllib directory and simply "use" them in your programs.
       There is no need to put "use lib "/var/www/perllib";" in all of your
       programs.  You do, however, still need to "use" the modules in your
       programs.  Perl is smart enough to know it doesn't need to recompile
       the code, but it does need to "import" the module methods into your
       program's name space.

       If you only have a few modules to load, you can use the PerlModule
       directive to pre-load them with the same effect.

       The biggest benefit here is that the child process never needs to
       recompile the code, so it is faster to start, and the child process
       actually shares the same physical copy of the code in memory due to the
       way the virtual memory system in modern operating systems works.

       You will want to replace the "use" lines above with modules you actu-
       ally need.

       Simple Test Program

       Here's a sample script called printenv that you can stick in the pro-
       grams directory to test the functionality of the configuration.

        #! /usr/local/bin/perl
        use strict;
        # print the environment in a modperl program under Apache::Registry

        print "Content-type: text/html\n\n";

        print "Apache::Registry Environment\n";

        print "
\n";
        print map { "$ = $ENV{$}\n" } sort keys %ENV;
        print "
\n"; When you run this, check the value of the GATEWAYINTERFACE variable to see that you are indeed running modperl. REDUCING MEMORY USE As a side effect of using modperl, your HTPD processes will be larger than without it. There is just no way around it, as you have this extra code to support your added functionality. On a very busy site, the number of HTPD processes can grow to be quite large. For example, on one large site, the typical HTPD was about 5Mb large. With 30 of these, all of RAM was exhausted, and we started to go to swap. With 60 of these, swapping turned into thrashing, and the whole machine slowed to a crawl. To reduce thrashing, limiting the maximum number of HTPD processes to a number that is just larger than what will fit into RAM (in this case, 45) is necessary. The drawback is that when the server is serving 45 requests, new requests will queue up and wait; however, if you let the maximum number of processes grow, the new requests will start to get served right away, but they will take much longer to complete. One way to reduce the amount of real memory taken up by each process is to pre-load commonly used modules into the primary HTPD process so that the code is shared by all processes. This is accomplished by inserting the "use Foo ();" lines into the startup.perl file for any "use Foo;" statement in any commonly used Registry program. The idea is that the operating system's VM subsystem will share the data across the processes. You can also pre-load Apache::Registry programs using the "Apache::Reg- istryLoader" module so that the code for these programs is shared by all HTPD processes as well. NOTE: When you pre-load modules in the startup script, you may need to kill and restart HTPD for changes to take effect. A simple "kill -HUP" or "kill -USR1" will not reload that code unless you have set the "PerlFreshRestart" configuration parameter in httpd.conf to be "On". REDUCING THE NUMBER OF LARGE PROCESES Unfortunately, simply reducing the size of each HTPD process is not enough on a very busy site. You also need to reduce the quantity of these processes. This reduces memory consumption even more, and results in fewer processes fighting for the attention of the CPU. If you can reduce the quantity of processes to fit into RAM, your response time is increased even more. The idea of the techniques outlined below is to offload the normal doc- ument delivery (such as static HTML and GIF files) from the modperl HTPD, and let it only handle the modperl requests. This way, your large modperl HTPD processes are not tied up delivering simple con- tent when a smaller process could perform the same job more effi- ciently. In the techniques below where there are two HTPD configurations, the same httpd executable can be used for both configurations; there is no need to build HTPD both with and without modperl compiled into it. With Apache 1.3 this can be done with the DSO configuration -- just configure one httpd invocation to dynamically load modperl and the other not to do so. These approaches work best when most of the requests are for static content rather than modperl programs. Log file analysis become a bit of a challenge when you have multiple servers running on the same host, since you must log to different files. TWO MACHINES The simplest way is to put all static content on one machine, and all modperl programs on another. The only trick is to make sure all links are properly coded to refer to the proper host. The static content will be served up by lots of small HTPD processes (configured not to use modperl), and the relatively few modperl requests can be handled by the smaller number of large HTPD processes on the other machine. The drawback is that you must maintain two machines, and this can get expensive. For extremely large projects, this is the best way to go. TWO IP ADRESES Similar to above, but one HTPD runs bound to one IP address, while the other runs bound to another IP address. The only difference is that one machine runs both servers. Total memory usage is reduced because the majority of files are served by the smaller HTPD processes, so there are fewer large modperl HTPD processes sitting around. This is accomplished using the httpd.conf directive "BindAddress" to make each HTPD respond only to one IP address on this host. One will have modperl enabled, and the other will not. TWO PORT NUMBERS If you cannot get two IP addresses, you can also split the HTPD pro- cesses as above by putting one on the standard port 80, and the other on some other port, such as 8042. The only configuration changes will be the "Port" and log file directives in the httpd.conf file (and also one of them does not have any modperl directives). The major flaw with this scheme is that some firewalls will not allow access to the server running on the alternate port, so some people will not be able to access all of your pages. If you use this approach or the one above with dual IP addresses, you probably do not want to have the *.perl and *.rperl sections from the sample configuration above, as this would require that your primary HTPD server be modperl enabled as well. Thanks to Gerd Knops for this idea. USING ProxyPass WITH TWO SERVERS To overcome the limitation of the alternate port above, you can use dual Apache HTPD servers with just slight difference in configuration. Essentially, you set up two servers just as you would with the two port on same IP address method above. However, in your primary HTPD con- figuration you add a line like this: ProxyPass /programs http:/localhost:8042/programs Where your modperl enabled HTPD is running on port 8042, and has only the directory programs within its DocumentRoot. This assumes that you have included the modproxy module in your server when it was built. Now, when you access http:/www.domain.com/programs/printenv it will internally be passed through to your HTPD running on port 8042 as the URL http:/localhost:8042/programs/printenv and the result relayed back transparently. To the client, it all seems as if it is just one server running. This can also be used on the dual-host version to hide the second server from view if desired. Thanks to Bowen Dwelle for this idea. SQUID ACELERATOR Another approach to reducing the number of large HTPD processes on one machine is to use an accelerator such as Squid (which can be found at http:/squid.nlanr.net/Squid/ on the web) between the clients and your large modperl HTPD processes. The idea here is that squid will han- dle the static objects from its cache while the HTPD processes will handle mostly just the modperl requests once the cache is primed. This reduces the number of HTPD processes and thus reduces the amount of memory used. To set this up, just install the current version of Squid (at this writing, this is version 1.1.22) and use the RunAccel script to start it. You will need to reconfigure your HTPD to use an alternate port, such as 8042, rather than its default port 80. To do this, you can either change the httpd.conf line "Port" or add a "Listen" directive to match the port specified in the squid.conf file. Your URLs do not need to change. The benefit of using the "Listen" directive is that redi- rected URLs will still use the default port 80 rather than your alter- nate port, which might reveal your real server location to the outside world and bypass the accelerator. In the squid.conf file, you will probably want to add "programs" and "perl" to the "cachestoplist" parameter so that these are always passed through to the HTPD server under the assumption that they always produce different results. This is very similar to the two port, ProxyPass version above, but the Squid cache may be more flexible to fine tune for dynamic documents that do not change on every view. The Squid proxy server also seems to be more stable and robust than the Apache 1.2.4 proxy module. One drawback to using this accelerator is that the logfiles will always report access from IP address 127.0.0.1, which is the local host loop- back address. Also, any access permissions or other user tracking that requires the remote IP address will always see the local address. The following code uses a feature of recent modperl versions (tested with modperl 1.16 and Apache 1.3.3) to trick Apache into logging the real client address and giving that information to modperl programs for their purposes. First, in your startup.perl file add the following code: use Apache::Constants qw(OK); sub My::SquidRemoteAddr ($) { my $r = shift; if (my ($ip) = $r->headerin('X-Forwarded-For') =~ /([^,\s])$/) { $r->connection->remoteip($ip); } return OK; } Next, add this to your httpd.conf file: PerlPostReadRequestHandler My::SquidRemoteAddr This will cause every request to have its "remoteip" address overrid- den by the value set in the "X-Forwarded-For" header added by Squid. Note that if you have multiple proxies between the client and the server, you want the IP address of the last machine before your accel- erator. This will be the right-most address in the X-Forwarded-For header (assuming the other proxies append their addresses to this same header, like Squid does.) If you use apache with modproxy at your frontend, you can use Ask Bjorn Hansen's modproxyaddforward module from ftp:/ftp.netcetera.dk/pub/apache/ to make it insert the "X-For- warded-For" header. SUMARY To gain maximal performance of modperl on a busy site, one must reduce the amount of resources used by the HTPD to fit within what the machine has available. The best way to do this is to reduce memory usage. If your modperl requests are fewer than your static page requests, then splitting the servers into modperl and non-modperl versions further allows you to tune the amount of resources used by each type of request. Using the "ProxyPass" directive allows these multiple servers to appear as one to the users. Using the Squid accel- erator also achieves this effect, but Squid takes care of deciding when to acccess the large server automatically. If all of your requests require processing by modperl, then the only thing you can really do is throw a lot of memory on your machine and try to tweak the perl code to be as small and lean as possible, and to share the virtual memory pages by pre-loading the code. AUTHOR This document is written by Vivek Khera. If you need to contact me, just send email to the modperl mailing list. This document is copyright (c) 1997-1998 by Vivek Khera. If you have contributions for this document, please post them to the mailing list. Perl POD format is best, but plain text will do, too. If you need assistance, contact the modperl mailing list at mod- perl@perl.apache.org first (send 'subscribe' to mod- perl-request@apache.org to subscribe). There are lots of people there that can help. Also, check the web pages http:/perl.apache.org/ and http:/www.apache.org/ for explanations of the configuration options. $Revision: 1.1.1.2 $ $Date: 2003/10/08 21:31:40 $ perl v5.8.6 2000-03-30 modperltuning(3)
Darwin Mac OS X man pages main menu

Contact us      |       About us      |       Term of use      |       Copyright © 2000-2010 MyWebUniversity.com ™