Friday, July 30, 2010

Random server crashes R300 + IPMI = BAD

Ever since enabling IPMI on our Dell servers last week, we have been experiencing problems with random hangs on our R300s. I suspected IPMI immediately, particularly IPMI over a VLAN. When I finally went to the data center to reboot a server myself, I noticed the following error on the front LCD display.

E1410 CPU 1 IERR


Some googling indicates that this problem indicates a faulty CPU, and our Dell contact suggested that it was probably a memory or drive failure. However, further reading suggested that this problem can also be caused by non-hardware failures.

Going back to the original IPMI theory, I found that I was able to reproduce it quite easily by starting parallel iperf sessions between an R300 and another host to saturate the interface. I then started running constant ipmitool queries. I found that I was able to lock the R300 within 10 minutes, consistently.

I resolved the issue by moving the primary network interface for the OS to NIC #2, leaving NIC #1 for exclusive use by IPMI. In this configuration I was not able to crash the server in 30 minutes and it has run all night without issue.

Discussing the issue/resolution with one of the FreeBSD developers, he stated that this is not just a Dell issue, sharing IPMI with the LAN on FreeBSD is really dodgy, depending on on the particular NIC chipset in use (the Broadcom bge driver in this case). It may be that the VLAN tagging may have been the straw that broke the camel's back in this case. The server that caused the most trouble in this episode was previously running for over a year with IPMI enabled, but no VLAN tagging. To be fair, we were not previous doing any monitoring of this machine via IPMI, so the potential exposure was far less.

Wednesday, July 21, 2010

IPMI on FreeBSD

Here are some notes regarding how to use IPMI on FreeBSD. This information is relevant to the Dell boxes we have at work, no guarantees otherwise.

To load the IPMI module into a running system, use

kldload ipmi

or add the following to loader.conf and reboot (if you want the changes to be persistent)
vi /boot/loader.conf
ipmi_load="YES"

shutdown -r now


The kernel log should show output similar to this

ipmi0: on isa0
ipmi0: KCS mode found at io 0xca8 alignment 0x4 on isa
ipmi0: KCS error: ff
ipmi0: IPMI device rev. 0, firmware rev. 2.2, version 2.0
ipmi0: Number of channels 4
ipmi0: Attached watchdog

Install the ipmitool package/port. you should now be able to talk to ipmi on the local machine (or remote machines for that matter). Here are a couple of commands that I've found useful.

ipmitool lan print
Prints the current Ethernet configuration for the BMC.

ipmitool lan set
Prints the usage information for configuring the BMC LAN settings. A channel is required for setting these parameters. In my [limited] experience, the channel is always "1".

ipmitool sensor
ipmitool sdr list
Prints information about the sensors that can be monitored via ipmitool. the -v parameter added to ipmitool sensor prints the information organized in a list format.



Additional Information (sources):
FreeBSDwiki IPMI page
Linux IPMI notes (some FreeBSD info here)
Dell Linux IPMI page

Is FreeBSD clobbering your IPMI LAN access?

We've been enabling IPMI/BMC on our servers for environment monitoring, remote control, etc. Our newer Dell R300 servers share NIC #1 with IPMI and the Operating System. I noticed that IPMI works before FreeBSD starts the Ethernet drivers, then it stops responding. It turns out that this behavior can be stopped by adding a line to loader.conf. Here are the steps to do this (found on this page):
  1. Edit /boot/loader.conf, appending the following line:
    hw.bge.allow_asf="1"
  2. Save the file and reboot.
This also works if you have configured the BMC to use VLAN tagging.

On a side note, it is worth noting that IPMI != DRAC; IPMI == BMC. DRAC refers to the enhanced management tools provided by an add-in DRAC card or integrated into some higher-end Dell servers. This includes a web interface for configuration/monitoring and remote console (in the higher-end implementations). DRAC provides IPMI instrumentation and control, but IPMI does not provide DRAC functionality.

Update/Big Fat Warning: using IPMI on the same interface as your LAN can cause BIG problems with the bge driver. See this post.

Thursday, July 15, 2010

Getting net-snmp to use Liebert MIBs...any MIBs for that matter.

After far too much screwing around trying to get this work, I finally figured out how to get snmpwalk to display the text names of OIDs for our Liebert MPH rack PDUs. I downloaded the Liebert Global Products MIB from the Liebert website and placed them in ~/.snmp/mibs/. The Readme file in the downloaded archive says that MIBs need to be loaded in a specific order, which did nothing more me than to waste a lot of time. In order to use the extracted MIBs while walking the Emerson (Liebert) tree, use the -mall argument with snmpwalk.

snmpwalk -v 2c -c public -mall -OS 10.20.30.40 1.3.6.1.4.1.476