Tuesday, May 20, 2014

Deleting locked IPSec SAs from Fortigates

We have had a borked IPSec Phase1 definition in our configuration since the initial configuration. The delete option was grayed out for it, despite the ref count showing 0. I finally had to call Fortinet about it. The engineer I spoke with said that the ref count of 0 doesn't necessarily mean that there aren't any references (what good is the ref count then?). He grabbed a copy of the configuration, and searched for the name of the Phase1. Sure enough, a policy routing entry turned up that we had long forgotten about. After removing this, I was able to delete the Phase1.

Monday, May 19, 2014

Advertising arbitrary routes via OSPF on Fortigate

To be clear, I'm not sure this is the correct way to inject routes into OSPF. That being said, it yields the desired behavior. The situation goes like this...

I have addressed all of the interfaces of my Fortigate (FGT) with subnets of network 65.65.65.0/24. Additionally, I have some virtual IPs (VIPs) defined that map addresses from 65.65.64.0/24 to corresponding addresses in 65.65.65.0/24. For example:

65.65.64.1 -> 65.65.65.1

This is not a garden-variety configuration, mapping one public subnet to another. The reasons are complex, involving BGP, portable subnets, multiple data centers. The bottom line, I need the FGT to NAT this traffic.

My initial solution to this problem was static routes. However, this becomes difficult to maintain as the network grows in complexity (we're definitely into that territory). What I want to do is advertise a subnet of 65.65.64.0/24 via OSPF. In the Fortigate, it's not as easy as saying "inject this subnet into OSPF." My solution, create a loopback interface on the FGT, and redistribute the connected subnet into OSPF.
  1. Create a Loopback network interface, with an address in the subnet you want to advertise. It doesn't seem to make a difference what address you use.
        edit "port-loopback"
            set vdom "root"
            set ip 65.65.64.1 255.255.255.192
            set type loopback
            set description "Loopback interface used to provide route for portable addresses."
            set snmp-index 28
        next

  2. Create a prefix-list entry that identifies the loopback subnet.
        edit "connected-to-ospf-v4"
            set comments "Define connected routes to export to OSPF"
                config rule
                    edit 10
                        set prefix 65.65.64.0 255.255.255.192
                        unset ge
                        unset le
                    next

  3. Create a route-map that uses the prefix list.
        edit "rm-connected-to-ospf"
            set comments "Defines IPv4 connected routes to redistribute to OSPF"
                config rule
                    edit 10
                        set match-ip-address "connected-to-ospf-v4"
                    next
                end
        next

  4. Configure OSPF to redistribute connected networks.
            config redistribute "connected"
                set status enable
                set routemap "rm-connected-to-ospf"
            end

Tuesday, May 6, 2014

Fortigate VIPs ate my packets.

We traded our Cisco ASAs for Fortinet Fortigates (FGT). So far, the trade-off seems to be a pretty, useable interface, for the rock solid (albeit annoying) functionality of the Cisco. One major issue (for us) that came up during our rollout was related to Virtual IPs (VIPs), essentially Fortinet parlance for destination NAT.

We have a very odd NAT situation. For a particular service we offer, we have clients that are incapable of connecting to the listening port (more accurately, the amount of red tape required to change a port number in a their script requires hundreds of hours of meetings, and many thousands of developer hours).

As a result, we have supported these clients by using a port redirection, but only for certain source addresses, because the port they MUST connect to is in use by another application (confused yet? I am).  On the Fortigate, we solved this by creating a pair of VIPs. One broad, for all ports; The other specific to the goofy awfulness. Here is the example of how we made it work.

edit "srv-v4-redir"
set src-filter "1.2.3.4" "5.6.7.8"
    set extip 111.222.0.1
    set extintf "any"
    set portforward enable
    set mappedip 10.0.0.1
    set extport 77
    set mappedport 10077
next
edit "srv-v4"
    set extip 111.222.0.1
    set extintf "any"
    set mappedip 10.0.0.1
next

As horrible as it looks, it actually works. The result is that clients 1.2.3.4 and 5.6.7.8 connect to port 77, but they actually get DNAT to port 10077. Anyone else connecting to port 77 goes to port 77.

It is worth noting that our original configuration was more awful, and broken. When I originally configured this bit of NAT, I was still learning my way around the FGT. I mistakenly configured the "src-v4-redir" VIP with an extintf of "vlan7", our outside interface. This broke other services using the broader "src-v4" VIP, but in amazingly random ways. All traffic from the outside worked fine. However, we discovered breakage for some users who connect to another service on that VIP from "vlan4"… but only for users coming from some source networks (networks unrelated to the "redid" sources).

In these cases, the traffic would just vanish into the FGT, as confirmed by the sniffer, and flow traces. In the latter case, traffic would fail with the following cryptic trace messages.

fortinet-1a # id=12 trace_id=26 msg="vd-root received a packet(proto=6, 172.25.1.7:52606->111.222.0.1:443) from vlan4."
id=12 trace_id=26 msg="allocate a new session-00e3addb"
id=12 trace_id=26 msg="find SNAT: IP-10.0.0.1(from IPPOOL), port-0"
id=12 trace_id=26 msg="use addr/intf hash, len=13"
id=12 trace_id=26 msg="pre_route_auth check fail(id=0), drop"

After escalating the ticket several times with Fortinet, and two weeks of broken connections (I'm calling you out here Fortinet, two weeks for an answer is unacceptable), we finally got assigned to a foul-mouthed engineer (the best kind). He identified the extintf problem, in between bouts of telling me what a kludgy setup this is…Yes, I know. I don't like it either, and offered a number of possible solutions, with the caveat of "I can't guarantee it, because nobody does this." We tested the change, which annoyingly required us to remove all references to the "redid" VIP, and it worked. Life goes on, I'm pretty satisfied with the FGT.

Thursday, March 13, 2014

Understanding resolvconf behavior on pxe-booted hosts.

I have been learning more than I ever cared to know about resolvconf, and what happens when you use it on a host with a read-only root filesystem. I have been preparing to roll out pxe-boot virtual machines at a second location. The PXE image has the following characteristics.

* NFS-mounted, read-only / filesystem.
* Local writeable disk for swap and /var.
* Unionfs, md-backed /etc (non-persistent, r/w)

I noticed that on the initial boot of a new VM, /etc/resolv.conf would be written correctly. However, all subsequent boots never see the NFS-supplied resolv.conf updated. After a frustrating afternoon of digging, I determined why resolvconf appears to stop working.

Resolvconf stores state data in /var/run/resolvconf. When dhclient is run for an interface, the dhclient-script script calls resolvconf with DNS particulars, resolvconf looks in the interfaces/ sub-directory for an entry named after the interface. If the file does not exist, or does not match the domain/nameserver options received by resolvconf, a new file is written, and appropriate changes are made to /etc/resolv.conf. If the options match what resolvconf already has, no changes are made. The below output shows the contents of the interfaces/ directory on my pxe host.

> ll /var/run/resolvconf/interfaces/
total 8
-rw-r--r--  1 root  wheel  80 Mar 13 18:00 vmx0:dhcp4
-rw-r--r--  1 root  wheel  76 Mar 13 18:00 vmx1

The problem with my pxe hosts lies in the volatile /etc. Every time the host reboots, the modified contents of /etc vanish. /etc/resolv.conf is replaced with the copy from NFS. In my case, this copy reflects the nameservers at the "original" datacenter. Since the state directory for resolvconf exists on the persistent /var, resolvconf sees the old [unchanged] lease data, and assumes everything is peachy with the resolv.conf file.

I don't need the extra features of resolvconf, so I can solve the problem by disabling it. I created a file in the pxe image, /etc/dhclient-enter-hooks, that contains the following.

resolvconf_enable=NO

My initial, more complicated fix, was to create an rc script to re-initialize the resolvconf state directory on every boot. This also worked flawlessly.

#!/bin/sh
#
# Clean out the contents of the resolvconf state directory. Otherwise,
# /etc/resolv.conf never gets updated after the initial boot of a new pxe host.
#
# BEFORE: netif
# AFTER: FILESYSTEMS
# PROVIDE: clean_resolvconf

echo -n "Cleaning out resolvconf state directory: "

/sbin/resolvconf -I

if [ $? ]; then
echo "OK"
else 
echo "FAILED"
fi

Wednesday, February 5, 2014

dhclient exits chroot on FreeBSD 10.0

I've booted my first pxe FreeBSD 10.0 image, and discovered that dhclient, of all things, doesn't seem to work. Running dhclient from the command-line after boot results in the following output:

WARNING dhclient failed to start
chroot
exiting

Some searching turns up a thread on the FreeBSD forums. A missing /var/empty directory is to blame. It should be owned by root, with permissions of 755.