Archive for September, 2010

Issues with iptables stateful filtering

Wednesday, September 29th, 2010

I hit a weird issue today, I have Apache configured as a reverse proxy using mod_proxy_balancer which is a welcome addition in Apache 2.2.x. This is forwarding selected requests to more Apache instances running mod_perl, although this could be any sort of application layer, pretty standard stuff.

With ProxyStatus enabled and pushing a reasonable amount of traffic through the proxy, I started to notice that the application layer Apache instances would consistently get marked in error state every so often, removing them from the pool of available servers until their cooldown period expired and the proxy enabled them again.

Investigating the logs showed up this error:

[Tue Sep 28 00:24:39 2010] [error] (113)No route to host: proxy: HTTP: attempt to connect to 192.0.2.1:80 (192.0.2.1) failed
[Tue Sep 28 00:24:39 2010] [error] ap_proxy_connect_backend disabling worker for (192.0.2.1)

A spot of the ol’ google-fu turned up descriptions of similar problems, some not even related to Apache. The problem looked related to iptables.

All the servers are running CentOS 5 and anyone who runs this is probably aware of the stock Red Hat iptables ruleset. With HTTP access enabled, it looks something similar to this:

1
2
3
4
5
6
7
8
9
10
11
12
13
-A INPUT -j RH-Firewall-1-INPUT
-A FORWARD -j RH-Firewall-1-INPUT
-A RH-Firewall-1-INPUT -i lo -j ACCEPT
-A RH-Firewall-1-INPUT -p icmp --icmp-type any -j ACCEPT
-A RH-Firewall-1-INPUT -p 50 -j ACCEPT
-A RH-Firewall-1-INPUT -p 51 -j ACCEPT
-A RH-Firewall-1-INPUT -p udp --dport 5353 -d 224.0.0.251 -j ACCEPT
-A RH-Firewall-1-INPUT -p udp -m udp --dport 631 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m tcp --dport 631 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 80 -j ACCEPT
-A RH-Firewall-1-INPUT -j REJECT --reject-with icmp-host-prohibited

Almost all of it is boilerplate apart from line 12, which I added and is identical to the line above it granting access to SSH. Analysing the live hit counts against each of these rules showed a large number hitting that last catch-all rule on line 13 and indeed, this is what is causing the Apache errors.

Analysing the traffic with tcpdump/wireshark showed that the frontend Apache server is only getting as far as sending the initial SYN packet and it’s failing to match either the dedicated rule on line 12 for HTTP traffic, or even the rule on line 10 to match any related or previously established traffic, although I wouldn’t really expect it to match that.

Adding a rule before the last one to match and log any HTTP packets that are considered to be in the state INVALID showed that indeed for some strange reason, iptables is deciding that an initial SYN is somehow invalid.

More information about why it might be invalid can be coaxed from the kernel by issuing the following:

# echo 255 > /proc/sys/net/ipv4/netfilter/ip_conntrack_log_invalid

Although all this gave me was some extra text saying “invalid state” and the same output you get from the standard logging target:

ip_ct_tcp: invalid state IN= OUT= SRC=192.0.2.2 DST=192.0.2.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=21158 DF PROTO=TCP SPT=57351 DPT=80 SEQ=1402246598 ACK=0 WINDOW=5840 RES=0x00 SYN URGP=0 OPT (020405B40402080A291DC7600000000001030307)

From searching the netfilter bugzilla for any matching bug reports I found a knob that relaxes the connection tracking, enabled with the following:

# echo 1 > /proc/sys/net/ipv4/netfilter/ip_conntrack_tcp_be_liberal

This didn’t fix things, plus it’s recommended to only use this in extreme cases with a broken router or firewall. As all of these machines are on the same network segment with just a switch connecting them there shouldn’t be anything mangling the packets.

With those avenues exhausted the suggested workaround of adding rules similar to the following:

1
-A RH-Firewall-1-INPUT -m tcp -p tcp --dport 80 --syn -j REJECT --reject-with tcp-reset

before your last rule doesn’t really work as the proxy still drops the servers from the pool as before, but just logs a different reason instead:

[Tue Sep 28 13:59:23 2010] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 192.0.2.1:80 (192.0.2.1) failed
[Tue Sep 28 13:59:23 2010] [error] ap_proxy_connect_backend disabling worker for (192.0.2.1)

This isn’t really acceptable to me anyway as it requires your proxy to retry in response to a server politely telling it to go away and it’s going to be a performance hit. The whole point of using mod_proxy_balancer for me was so it could legitimately remove servers that are dead or administratively stopped, how is it supposed to tell the difference between that and a dodgy firewall?

The only solution that worked and was deemed acceptable was to simply remove the state requirement on matching the HTTP traffic, like so:

-A RH-Firewall-1-INPUT -m tcp -p tcp --dport 80 -j ACCEPT

This will match both new and invalid packets, however it’s left me with a pretty low opinion of iptables now. Give me pf any day.

Building CentOS 5 images for EC2

Thursday, September 16th, 2010

I had a need to create some CentOS 5 hosts on Amazons EC2 platform, and while there’s nothing stopping you from reusing a pre-built AMI, it’s always handy to know how these things are built from scratch.

I had a few basic requirements:

  • I’ll be creating various sizes of EC2 instances, so both i386 and x86_64 AMI’s are required.
  • Preferably boot the native CentOS kernel rather than use the generic EC2 kernel as I know the CentOS-provided Xen kernel JFW.

You’ll need the following:

  • An existing Intel/AMD Linux host, this should be running CentOS, Fedora, RHEL, or anything as long as it ships a usable yum(8). It should also be an x86_64 host if you’re planning on building for both architectures and have around 6GB of free disk space.
  • Amazon AWS account with working Access Key credentials for S3 and a valid X.509 certificate & private key pair for EC2.
  • The EC2 AMI tools and the EC2 API tools installed and available in your $PATH.
  • A flask of weak lemon drink.

Lets assume you’re working under /scratch, you’ll need to first create a directory to hold your root filesystem, and also a couple of directories within that ahead of installing anything:

# mkdir -p /scratch/ami/{dev,etc,proc,sys}

The /dev directory needs a handful of devices creating:

# MAKEDEV -d /scratch/ami/dev -x console
# MAKEDEV -d /scratch/ami/dev -x null
# MAKEDEV -d /scratch/ami/dev -x zero

A minimal /etc/fstab needs to be created:

1
2
3
4
5
/dev/sda1               /                       ext3    defaults        1 1
tmpfs                   /dev/shm                tmpfs   defaults        0 0
devpts                  /dev/pts                devpts  gid=5,mode=620  0 0
sysfs                   /sys                    sysfs   defaults        0 0
proc                    /proc                   proc    defaults        0 0

There are other partitions that will be available to your instance when it is up and running but they vary between instance types, this is the bare minimum that is required and should work for all instance types. If you want to add the additional partitions here, refer to the Instance Storage Documentation. I will instead use Puppet to set up any additional partitions after the instance is booted.

/proc and /sys should also be mounted inside your AMI root:

# mount -t proc proc /scratch/ami/proc
# mount -t sysfs sysfs /scratch/ami/sys

Create a custom /scratch/yum.cfg which will look fairly similar to the one your host system uses:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
[main]
cachedir=/var/cache/yum
keepcache=0
debuglevel=2
logfile=/var/log/yum.log
distroverpkg=redhat-release
tolerant=1
exactarch=1
obsoletes=1
gpgcheck=0
plugins=1
reposdir=/dev/null
 
# Note: yum-RHN-plugin doesn't honor this.
metadata_expire=1h
 
# Default.
# installonly_limit = 3
 
[centos-5]
name=CentOS 5 - Base
baseurl=http://msync.centos.org/centos-5/5/os/$basearch/
enabled=1
 
[centos-5-updates]
name=CentOS 5 - Updates
baseurl=http://msync.centos.org/centos-5/5/updates/$basearch/
enabled=1
 
[centos-5-epel]
name=Extra Packages for Enterprise Linux 5 - $basearch
baseurl=http://download.fedora.redhat.com/pub/epel/5/$basearch/
enabled=1

Notably disable the gpgcheck directive and make sure no additional repositories are picked up by setting the reposdir to somewhere where no .repo files are located otherwise you’ll scoop up any repositories configured on your host system. By making use of the $basearch variable in the URLs, this configuration should work for both i386 and x86_64.

If you have local mirrors of the package repositories, alter the file to point at them and be a good netizen. You will need to make sure that your base repository has the correct package groups information available. Feel free to also add any additional repositories.

You’re now ready to install the bulk of the Operating System. If the host architecture and target architecture are the same, you can just do:

# yum -c /scratch/yum.conf --installroot /scratch/ami -y groupinstall base core

If however you’re creating an i386 AMI on an x86_64 host, you need to use the setarch(8) command to prefix the above command like so:

# setarch i386 yum -c /scratch/yum.conf --installroot /scratch/ami -y groupinstall base core

This mostly fools yum and any child commands into thinking the host is i386 and without it, you’ll just get another x86_64 image. Sadly you can’t do the reverse to build an x86_64 AMI on an i386 host.

This should give you a fairly minimal yet usable base however it won’t have the right kernel installed, so do the following to remedy this:

# yum -c /scratch/yum.cfg --installroot /scratch/ami -y install kernel-xen
# yum -c /scratch/yum.cfg --installroot /scratch/ami -y remove kernel

(Remember to use setarch(8) again if necessary)

You can also use variations of the above commands to add or remove additional packages as you see fit.

All that’s required now is to perform a bit of manual tweaking here and there. Firstly you need to set up the networking which on EC2 is simple, one interface using DHCP. Create /etc/sysconfig/network-scripts/ifcfg-eth0:

1
2
3
4
5
6
7
DEVICE=eth0
BOOTPROTO=dhcp
ONBOOT=yes
TYPE=Ethernet
USERCTL=yes
PEERDNS=yes
IPV6INIT=no

And also /etc/sysconfig/network:

1
NETWORKING=yes

The networking still won’t work without the correct kernel module(s) being loaded so create /etc/modprobe.conf with the following:

1
2
alias eth0 xennet
alias scsi_hostadapter xenblk

The second module means the instance can see the various block devices as well as the first module fixing the networking. The ramdisk for the kernel now needs to be updated so it knows to pull in these two modules and load them at boot, for this you need to know the version of the kernel installed. You can do this a number of ways, but the easiest is to just look at the /boot directory:

# ls -1 /scratch/ami/boot
config-2.6.18-164.15.1.el5xen
grub
initrd-2.6.18-164.15.1.el5xen.img
message
symvers-2.6.18-164.15.1.el5xen.gz
System.map-2.6.18-164.15.1.el5xen
vmlinuz-2.6.18-164.15.1.el5xen
xen.gz-2.6.18-164.15.1.el5
xen-syms-2.6.18-164.15.1.el5

In this case the version is “2.6.18-164.15.1.el5xen”. Using this, we need to run mkinitrd(8) but we also need to use chroot(1) to run the command as installed in your new filesystem, using the filesystem as its / otherwise it will attempt to overwrite bits of your host system. So something like the following:

# chroot /scratch/ami mkinitrd -f /boot/initrd-2.6.18-164.15.1.el5xen.img 2.6.18-164.15.1.el5xen

No /etc/hosts file is created so it’s probably a good idea to create one of those:

1
127.0.0.1	localhost.localdomain localhost

SELinux will be enabled by default and although your instance will boot, you won’t be able to log in so the easiest thing is to just disable it entirely by editing /etc/selinux/config so it looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#	enforcing - SELinux security policy is enforced.
#	permissive - SELinux prints warnings instead of enforcing.
#	disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of these two values:
#	targeted - Only targeted network daemons are protected.
#	strict - Full SELinux protection.
#	mls - Multi Level Security protection.
SELINUXTYPE=targeted 
# SETLOCALDEFS= Check local definition changes
SETLOCALDEFS=0

You could also disable it with the correct kernel parameter at boot time. There may be a way to allow SELinux to work, it may just be that the filesystem needs relabelling which you can force on the first boot by creating an empty /scratch/ami/.autorelabel file. I’ll leave that as an exercise for the reader or myself when I’m bored enough.

Now we need do deal with how to boot the native CentOS kernel. Amazon don’t allow you to upload your own kernels or ramdisks to boot with your instances so how do you do it? Apart from their own kernels, they now provide a PV-GRUB kernel image that when it boots, it behaves just like the regular GRUB bootloader and reads your instance filesystem for a grub.conf and then uses that to select the kernel and loads it from your instance filesystem along with the accompanying ramdisk.

We don’t need to install any boot blocks but we will need to create a simple /boot/grub/grub.conf using the same kernel version we used when recreating the ramdisk:

1
2
3
4
5
6
default=0
timeout=5
title CentOS (2.6.18-164.15.1.el5xen)
	root (hd0)
	kernel /boot/vmlinuz-2.6.18-164.15.1.el5xen ro root=/dev/sda1
	initrd /boot/initrd-2.6.18-164.15.1.el5xen.img

If we install any updated kernels, they should automatically manage this file for us and insert their own entries, we just need to do this once.

To match what you normally get on a regular CentOS host, a couple of symlinks should also be created:

# ln -s grub.conf /scratch/ami/boot/grub/menu.lst
# ln -s ../boot/grub/grub.conf /scratch/ami/etc/grub.conf

When you create an EC2 instance you have to specify an existing SSH keypair created within EC2 which you should be able to use to log into the instance. This is accomplished by the usual practice of having the public part of the key being copied into /root/.ssh/authorized_keys however I initially thought that was magic that Amazon did for you, but they don’t, you need to do it yourself.

When the instance is booted, the public part of the key (as well as various other bits of metadata) is available at the URL http://169.254.169.254/latest/meta-data/public-keys/0/openssh-key, so the easiest thing to do is add the following to /etc/rc.d/rc.local:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#!/bin/sh
#
# This script will be executed *after* all the other init scripts.
# You can put your own initialization stuff in here if you don't
# want to do the full Sys V style init stuff.
 
touch /var/lock/subsys/local
 
if [ ! -d /root/.ssh ] ; then
        mkdir -p /root/.ssh
        chmod 700 /root/.ssh
fi
 
/usr/bin/curl -f http://169.254.169.254/latest/meta-data/public-keys/0/openssh-key > /root/.ssh/authorized_keys
 
chmod 600 /root/.ssh/authorized_keys

You can be more elaborate if you want, but this is enough to allow you to log in with the SSH key. One thing I found was that because the firstboot service started at boot, that sat for a while asking you if wanted to do any firstboot-y things which would delay your rc.local hack from running until it timed out, so Amazon would say your instance was running but you couldn’t SSH in for a minute or two. Easiest thing is to disable firstboot:

# chroot /scratch/ami chkconfig firstboot off

You can also use this to disable more services, there’s a few enabled by default that are arguably useless in an EC2 instance but they won’t break anything if you leave them enabled. You can also enable other services if you installed any additional packages.

Finally, if you need to play about with the image you can just do the following:

# chroot /scratch/ami

(Remember to use setarch(8) again if necessary)

This gives you a shell inside your new filesystem if you need to tweak anything else, one thing I found necessary was to set your /etc/passwd file up correctly and optionally set a root password, which you could use instead of the SSH key.

# pwconv
# passwd
Changing password for user root.
New UNIX password: ********
Retype new UNIX password: ********
passwd: all authentication tokens updated successfully.

If you want to use any RPM-based commands while you’re inside this chroot and you created an i386 image on an x86_64 host, you may get the following error:

# rpm -qa
rpmdb: Program version 4.3 doesn't match environment version
error: db4 error(-30974) from dbenv->open: DB_VERSION_MISMATCH: Database environment version mismatch
error: cannot open Packages index using db3 -  (-30974)
error: cannot open Packages database in /var/lib/rpm

This is because some of the files are architecture-dependent and despite using setarch(8) they still get written as x86_64 format. It’s a simple fix:

# rm -f /scratch/ami/var/lib/rpm/__db*

If you query the RPM database inside the chroot and then want to install or remove more packages with yum outside the chroot, you will need to do the above again.

Before you start to package up your new image there’s one small bit of clean up, for some reason yum creates some transaction files that you’ll notice are under /scratch/ami/scratch/ami/var/lib/yum, I couldn’t work out how to stop it making those so you just need to blow that directory away:

# rm -f /scratch/ami/scratch

You should also unmount the /proc and /sys filesystems you mounted before installing packages:

# umount /scratch/ami/proc
# umount /scratch/ami/sys

Right, you’re now ready to package up your new image.

First thing is to bundle the filesystem which will create one big image file, then chop it up into small ~ 10MB pieces and then create an XML manifest file that ties it all together. You will need your AWS user ID for this part which you can find in your AWS account:

# ec2-bundle-vol -c <certificate_file> -k <private_keyfile> -v /scratch/ami -p centos5-x86_64 -u <user_id> -d /scratch -r x86_64 --no-inherit

This can take a few minutes to run. Remember to also set the architecture appropriately.

The next step is to then upload the manifest and image pieces either to an existing S3 bucket that you own, or a new bucket that will be created:

# ec2-upload-bundle -m /scratch/centos5-x86_64.manifest.xml -b <bucket> -a <access_key> -s <secret_key>

This part has the longest wait depending on how fast your internet connection is, you’ll be uploading around 330MB per image. If you seriously made a flask of weak lemon drink, I’d drink most of it now.

Once that finishes the final step is to register the uploaded files as an AMI ready to create instances from it. Before we do that though, we need to find the correct AKI to boot it with. There should be four AKI’s available in each location, two for each architecture which differ in how they try and find the grub.conf on the image. One treats the image as one big filesystem with no partitioning, the other assumes the image is partitioned and assumes grub.conf is on the first partition.

List all of the available AKI’s with the following:

# ec2-describe-images -C <certificate_file> -K <private_keyfile> -o amazon | grep pv-grub
IMAGE	aki-407d9529	ec2-public-images/pv-grub-hd0-V1.01-i386.gz.manifest.xml	amazon	available	public		i386	kernel				instance-store
IMAGE	aki-427d952b	ec2-public-images/pv-grub-hd0-V1.01-x86_64.gz.manifest.xml	amazon	available	public		x86_64	kernel				instance-store
IMAGE	aki-4c7d9525	ec2-public-images/pv-grub-hd00-V1.01-i386.gz.manifest.xml	amazon	available	public		i386	kernel				instance-store
IMAGE	aki-4e7d9527	ec2-public-images/pv-grub-hd00-V1.01-x86_64.gz.manifest.xml	amazon	available	public		x86_64	kernel				instance-store

Assuming this is still an x86_64 image, the AKI we want is aki-427d952b. More documentation about these is available here.

Now we can register the AMI like so:

# ec2-register -C <certificate_file> -K <private_keyfile> -n centos5-x86_64 -d "CentOS 5 x86_64" --kernel aki-427d952b <bucket>/centos5-x86_64.manifest.xml
IMAGE   ami-deadbeef

The output is our AMI id which we’ll use for creating instances. If you haven’t already created an SSH keypair, do that now:

# ec2-add-keypair -C <certificate_file> -K <private_keyfile> <key_id>

This returns the private key portion of the SSH keypair which you need to save and keep safe, there’s no way of retrieving it if you lose it.

Finally create an instance using the AMI id we got from ec2-register along with the id of a valid SSH keypair:

# ec2-run-instances -C <certificate_file> -K <private_keyfile> -t m1.large -k <key_id> ami-deadbeef

This will return information about your new instance which will initially be in the pending state. Periodically run the following:

# ec2-describe-instances -C <certificate_file> -K <private_keyfile>

Once your instance is in the running state, you should be able to see the hostname and IP that has been allocated and you can now SSH in using the private key you saved from ec2-add-keypair.

When you’ve finished with the instance, you can terminate it as usual with ec2-terminate-instances.

If you no longer need the AMI image or wish to change it in some way, you need to first deregister it with:

# ec2-deregister -C <certificate_file> -K <private_keyfile> ami-deadbeef

Then remove the bundle from the S3 bucket:

# ec2-delete-bundle -b <bucket> -a <access_key> -s <secret_key> -m /scratch/centos5-x86_64.manifest.xml

Then in the case of making changes, repeat the steps from ec2-bundle-vol onwards.

Loops and Variable Indirection in Puppet

Friday, September 3rd, 2010

A while back I wrote about writing some custom Puppet facts, one of which was to enumerate the enslaved interfaces for each bonded network interface on a Linux host. So for example your facter output might contain the following facts:

interfaces => bond0,eth0,eth1,eth2,sit0
ipaddress_bond0 => 192.168.0.1
ipaddress_eth2 => 10.0.0.1
slaves_bond0 => eth0,eth1

The bottom fact is my addition. Why do I even want that? Well, I have Nagios monitoring the host via the bonded interface so I’ll know if that breaks completely as the host will be down but I’d also like to know if any one of the enslaved interfaces are down as even though the host should still be up thanks to the bonded interface doing its magic, a cable might have been unplugged or something more serious might have happened. On a modern Linux host with the /sys filesystem, you can do the following to see if an interface is connected to hardware:

# cat /sys/class/net/eth0/carrier
1

That value will flip to 0 should the cable be pulled, etc. so I wrote a really basic Nagios test that exits with the OK or CRITICAL value based on that. Now to write the Puppet manifest, in english the logic sounds pretty simple:

“Loop through each interface reported by the $interfaces fact, if the interface looks like a bonded interface, (i.e. it matches /bond\d+/), then loop through each interface reported by the $slaves_$interface fact which should now exist and export a Nagios service definition for it.”

First problem, there’s no such thing as a loop in Puppet. You can get around this by using a definition:

define match_bonded_interfaces {
    if $name =~ /bond\d+/ {
        ...
    }
}
 
$array_of_interfaces = split($interfaces, ',')
 
match_bonded_interfaces { $array_of_interfaces: }

This will call the definition for each value in the array with $name being set to the current element each time.

The second problem that I hit after this was I needed to access my custom fact for the interface to enumerate the enslaved interfaces. Not wanting to hardcode the interface name and without thinking I had tried to do the following:

define match_bonded_interfaces {
    if $name =~ /bond\d+/ {
        $slaves = ${slaves_$name}
    }
}

It seemed semi-logical at the time until I actually came to try it out and found it didn’t at all do what I had hoped. After a few further failed attempts at guessing some undocumented syntax to do what I wanted, I gave up thinking there was no way to do variable indirection in Puppet, in other words, accessing a variable by the value of another variable.

To me this seemed kind of odd. After all, quite a lot of the facts reported by facter are structured like this:

list_of_things => key1,key2,key3
value_key1 => foo
value_key2 => bar
value_key3 => baz

And if there’s any trace of programmer in you, you’re going to want to access those in a way that scales anywhere between 1 and n keys, rather than hardcode each possible key value and the associated variable names. Something deep inside would stop me from committing something like the following:

if $interfaces =~ /\beth0\b/ {
    notice("IP address of eth0 is ${ipaddress_eth0}")
}
 
if $interfaces =~ /\beth1\b/ {
    notice("IP address of eth1 is ${ipaddress_eth1}")
}
 
etc.

But annoyingly, that was the only way I could utilise my new fact, and besides most of the servers only had one bonded interface, but it was still annoying nonetheless so I was pleasantly surprised when a friend pointed me towards this blog entry which offered a possible solution, you use Puppets inline_template function to evaluate a small ERB template fragment like so:

define monitor_interface {
    @@nagios_service {
        ...
    }
}
 
define match_bonded_interfaces {
    if $name =~ /bond\d+/ {
        $slave_fact = "slaves_${name}"
        $slaves = inline_template('<%= scope.lookupvar(slave_fact) %>')
        $array_of_slaves = split($slaves, ',')
        monitor_interface { $array_of_slaves: }
    }
}
 
$array_of_interfaces = split($interfaces, ',')
 
match_bonded_interfaces { $array_of_interfaces: }

I feel dirty now.