Enterprise Linux 7.3 makes some backwards-incompatible changes to interface names

Today, I was caught off guard by a change in Enterprise Linux 7.3. Apparently, systemd was assigning interface names like eno16780032 based on “garbage” data. I’m not really a fan of ANY of the names the modern schemes generate, what was the problem with eth#? But that’s beside the point. What hit me was that starting in 7.3 the garbage data is now being discarded and this results in a change in interface names. All this, in a point release. Here’s KB 2592561 that details the change. This applies to both Red Hat EL and CentOS, and presumably other members of the family based on RHEL 7.

The good news is that existing nodes that are updated are left alone. A file is generated to preserve the garbage data and your interface name. Unlike other udev rules, this one ONLY applies to existing systems that want to preserve the naming convention:

[root@kickstarted ~]# cat /etc/udev/rules.d/90-eno-fix.rules
# This file was automatically generated on systemd update
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:50:56:95:de:4e", NAME="eno16780032"

As you can see, it’s based on the MAC. That’s the bad news. If you plan to use the node as a template to deploy other VMs, the resulting VMs will effectively receive “new” interfaces based on the new MAC, resulting in ens192 instead of eno16780032. This definitely introduces at least one minor issue: the eno16780032 configuration is left intact and the interface is missing, so every call to systemctl restart network generates an error. It can also cause other issues for you if you have scripts, tools, provisioning systems, etc., that are predicting your nodes will have interface names like ens192. This is not difficult to remedy, thankfully.

Continue reading

Updating Linux Puppet Enterprise agent versions with puppet_agent

Happy New Year, everyone! I know that when I last blogged before the holiday, I was getting settled in with Jenkins, but I really needed to relax during the break so I had to hold off on that. I promise, I’ll get back to that soon, but in the meantime I encountered an issue that needed fixed and required a little more than the docs provided.

I ran into an issue recently with Puppet Enterprise agents that weren’t able to purge some resource types properly due to the use of Anchors. This is an issue with Puppet, not the module in question, and it was fixed in Puppet 4.4.0 but affected nodes were running older versions. Of course, rather than upgrade the agents by hand, I decided to automate it. Puppet has an Upgrading PE agents: *nix page that describes how to do this in a few ways. The latter options require manual effort, but the first option, via the module puppet_agent, can do this automatically for us. I diverged from the instructions a bit because I do my classifying with hiera rather than the PE Classifier and because there’s a missing step! (I’m trying to get that document updated, as well)

The guide says to install the module on the master and add some classification. To do this in a traditional roles and profiles setup using hiera as the classifier and with a monolithic controlrepo, we need to add the module to Puppetfile and .fixtures.yml, then add either a profile for puppet_agent or add it to a base profile. I chose the latter. Let’s add the module first:

Continue reading

Change your default linux shell without root access

A friend recently mentioned his hatred of ksh and inability to change his default shell because he is not root and root will not make the change. I concur, ksh is turrible, that needs fixed. Here’s a little trick for anyone else who has this issue. Let’s say you want to use bash and the shell forced upon you is ksh. The Korn shell uses .profile to set up your environment when you log in, so append these two lines to your profile:

# run bash damnit
[ ! "`which bash | egrep "^no"`" ] && [ -z $BASH ] && exec bash --login

We look for the output of which bash and ensure it does not say no bash in …, and if that file is non-zero, we exec bash as a login shell. You can read man exec for details on specifically how it works in a given shell, but it essentially replaces the running shell with the new command. If the first two checks are positive, then bash is executed and replaced the ksh process. Once the files are added to your profile, log in again in a new session. Voila, you now have a bash login shell!

rnelson0@dumbserver ~ $ ps -ef | grep bash
  rnelson0  5927  5298   0 19:56:42 pts/1       0:00 grep bash
  rnelson0  5298  5292   0 19:47:48 pts/1       0:00 bash --login

If you’re forced to use a different shell, you may need to locate a different profile file. If you want to use a different shell, just sub out bash for the name of your preferred shell. Enjoy!

Ruby net/https debugging and modern protocols

I ran into a fun problem recently with Zabbix and the zabbixapi gem. During puppet runs, each puppetdb record for a Zabbix_host resource is pushed through the zabbixapi, to create or update the host in the Zabbix system. When this happened, an interesting error crops up:

Error: /Stage[main]/Zabbix::Resources::Web/Zabbix_host[kickstart.example.com]: Could not evaluate: SSL_connect SYSCALL returned=5 errno=0 state=SSLv2/v3 read server hello A

If you google for that, you’ll find a lot of different errors and causes described across a host of systems. Puppet itself is one of those systems, but it’s not the only one. All of the systems have something in common: Ruby. What they rarely have is actual resolution, though. Possible causes include time out of sync between nodes, errors with the certificates and stores on the client or server side, and of course a bunch of “it works now!” with no explanation what changed. To confuse matters even more, the Zabbix web interface works just fine in the latest browsers, so the SSL issue seems restricted to zabbixapi.

To find the cause, we looked at recent changes. The apache SSLProtocols were changed recently, which shows up in a previous puppet run’s output:

Continue reading

Tcpdump: When and How?

A tool I rely on heavily for network debugging is tcpdump. This tool naturally comes to mind when I run into issues, but it may not for others. I thought I’d take a moment and describe when I reach for tcpdump and give a quick primer on how to use it.

If you’re using windows, windump/wireshark are the cli/gui equivalents. I’ll stick to tcpdump in this article, but many of the CLI options are the same and the filters are pretty similar if not the same.

When should you use tcpdump?

Whenever you’re troubleshooting an application, you hopefully have some sort of application-level logging to help you figure out what’s going on. Sometimes, you don’t have that – or what does exist provides inadequate detail or appears to be lying to you. You may also not have access to a device that you think is affecting the traffic, and you need to ensure that the traffic flow meets your expectations. As long as your application talks on the network, even locally, tcpdump may be able to help you!

You may have users from the internet who need to reach your application who are not able to, and they’re only receiving a timeout, but other users have no issues. You look in your web server logs and you don’t see any logs for the user complaining. There are log entries for the users who are not complaining. You can use tcpdump to listen on the webserver’s port for the customer’s IP and see if the connection attempts are seen. You can also see the packet contents in cleartext (as opposed to binary format – encrypted content is not decrypted, it’s just more easily visible) if that helps diagnose the issue.

Many applications also rely on local connections, typically on the loopback interface, and may be affected by the local firewall (iptables or the Windows Firewall Service, for example). Using tcpdump, you can see if the packets are immediately rejected, which is likely to be the firewall service, or if it completes a three-way handshake before closing the connection. In almost all cases, if a three-way handshakes is observed, the application has received the connection.

Given the name tcpdump, it’s worth nothing that you can see almost anything on the wire, not just TCP packets. UDP, GRE, even IPX are visible with the right filters.

How do you use tcpdump?

Let’s look at how you use tcpdump. In the examples below, I’m using a Linux VM with one interface, eth0, and the address 10.0.0.8. It has ssh, apache, and postfix services running. Tcpdump requires root access to see the raw packets on the wire, which I will gain with sudo. Be extremely careful who you grant this access to for two reasons. 1) Zombied tcpdump sessions can gobble all the CPU. 2) Since packet contents can be inspected, sensitive information can be seen by anyone with the permission to run tcpdump. This is a security risk, when you must meet PCI-DSS audit requirements. I’ll be using my unprivileged user rnelson0.

Tcpdump by default will try and resolve IP and service names. This can be slow, as it relies on DNS and file lookups, and confusing as most people will search by the IP addresses. We can disable these lookups by adding the n flag to the CLI, adding one instance for IPs and one for services, -nn. We also want to specify the interface, even on a single-NIC node, as it may default to the loopback instead of the ethernet interface, using -i <interface>. This gives us a default argument string of: -nni eth0 or -nni lo, depending on which we are looking for.

Next, we need to generate a filter to look at traffic. The tcpdump man page provides a lengthy list of filter components. One of the most common components is src|dst|host <scope>, which filters for packets from, to, or bi-directionally for the specified IP or network. Others are port <portnumber> and <protocol>, like icmp or gre. We can combine individual components with standard logical operators like and, or, and not: filter for non-ssh traffic to/from 10.0.0.200 with host 10.0.0.0.200 and not port 22.

As a “bonus”, when you run tcpdump with a bad filter, it will exit immediately. It doesn’t offer hints on how to fix the error, but it does let you know right away.

We put this together with the full command tcpdump -nni eth0 host 10.0.0.200 and not port 22. If we ssh to our node and just run this, we won’t see anything happen right away, but we’ll eventually see some ARP packets:

[rnelson0@kickstart ~]$ sudo tcpdump -nni eth0 host 10.0.0.200 and not port 22
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
22:05:50.309737 ARP, Request who-has 10.0.0.1 tell 10.0.0.200, length 46
22:05:50.589052 ARP, Request who-has 10.0.0.253 tell 10.0.0.200, length 46
22:05:59.934464 ARP, Request who-has 10.0.0.200 tell 10.0.0.253, length 46
22:06:51.315637 ARP, Request who-has 10.0.0.1 tell 10.0.0.200, length 46
22:06:51.519754 ARP, Request who-has 10.0.0.8 tell 10.0.0.200, length 46
22:06:51.519807 ARP, Reply 10.0.0.8 is-at 00:50:56:ac:f2:f7, length 28

Now if we view a file on the web server, we’ll see a three way handshake followed by a few PSH packets:

22:17:29.686840 IP 10.0.0.200.59916 > 10.0.0.8.80: Flags [S], seq 1113320281, win 8192, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
22:17:29.687041 IP 10.0.0.8.80 > 10.0.0.200.59916: Flags [S.], seq 741099373, ack 1113320282, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 5], length 0
22:17:29.690439 IP 10.0.0.200.59916 > 10.0.0.8.80: Flags [.], ack 1, win 256, length 0
22:17:29.690475 IP 10.0.0.200.59916 > 10.0.0.8.80: Flags [P.], seq 1:412, ack 1, win 256, length 411
22:17:29.690540 IP 10.0.0.8.80 > 10.0.0.200.59916: Flags [.], ack 412, win 490, length 0
22:17:29.693772 IP 10.0.0.8.80 > 10.0.0.200.59916: Flags [P.], seq 1:151, ack 412, win 490, length 150
22:17:29.694090 IP 10.0.0.8.80 > 10.0.0.200.59916: Flags [F.], seq 151, ack 412, win 490, length 0
22:17:29.696030 IP 10.0.0.200.59916 > 10.0.0.8.80: Flags [.], ack 152, win 256, length 0
22:17:29.700858 IP 10.0.0.200.59916 > 10.0.0.8.80: Flags [F.], seq 412, ack 152, win 256, length 0
22:17:29.700893 IP 10.0.0.8.80 > 10.0.0.200.59916: Flags [.], ack 413, win 490, length 0

For comparison, here’s what HTTPS looks like when HTTPS is not enabled. You see the SYN packet from the client, and the RST packet comes from the OS since there’s no service listening there:

22:18:50.057972 IP 10.0.0.200.59917 > 10.0.0.8.443: Flags [S], seq 825112119, win 8192, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
22:18:50.058088 IP 10.0.0.8.443 > 10.0.0.200.59917: Flags [R.], seq 0, ack 825112120, win 0, length 0
22:18:50.558200 IP 10.0.0.200.59917 > 10.0.0.8.443: Flags [S], seq 825112119, win 8192, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
22:18:50.558264 IP 10.0.0.8.443 > 10.0.0.200.59917: Flags [R.], seq 0, ack 1, win 0, length 0
22:18:51.060995 IP 10.0.0.200.59917 > 10.0.0.8.443: Flags [S], seq 825112119, win 8192, options [mss 1460,nop,nop,sackOK], length 0
22:18:51.061065 IP 10.0.0.8.443 > 10.0.0.200.59917: Flags [R.], seq 0, ack 1, win 0, length 0

Summary

I hope this short tutorial helps you figure out when and how to use tcpdump. If you have specific questions, post them in a comment or ask on twitter and I’ll respond.

Kickstart your CentOS Template, EL7 Edition

I wrote an article on kickstarting your CentOS Template in early 2014 that focused on Enterprise Linux 6. Later in the year, RHEL7 was announced and CentOS 7 soon followed. It’s well past time to refresh the kickstart article. To keep this more of a “moving target”, I’ve created a github repo to host the kickstart files at puppetinabox/centos-kickstart, so you can turn there for updates or submit your own PRs. I’m also toying with an existing puppet module danzilio/kickstart that generates kickstart files, and I plan to contribute some PRs to it to manage the kickstart service itself. In the meantime, I’ll show a small profile that will do the same thing, since it’s just apache and a few files.

Kickstart Configuration

The new EL7 file was based off the EL6 version. I simply changed the package list as some were no longer available and the open-vm-tools are now the preferred method of VMware tools management. That section was removed from the bottom. In the additional steps section, I changed the yum repo for puppet from Puppet 3 to Puppet Collections 1 for Puppet 4. I also removed the banner setup, that’s easy enough to add in if you like.

Kickstart Service Management

The kickstart service itself is pretty simple. You can use puppetlabs-apache to install apache and then place your files in it’s default root of /var/www/html. Take the kickstart files and add them to dist/profile/files with any modifications you require. Then create a profile that includes apache plus the kickstart files. That would look something like this:

Continue reading

NFS Export Settings for vSphere

Over the past few days, I’ve had to set up an NFS export for use with vSphere. I found myself frustrated with the vSphere docs because they only seem to cover the vSphere side of the house – when discussing the export that vSphere will use, the docs simply said, “Ask your vendor.” If you’re enabling NFS on your SAN and they have a one-click setup for you or a document with settings to enable, great, but if you’re setting up on a bare metal server or VM running Linux like me, you don’t have a vendor to ask.

Since I didn’t have a guide, I took a look at what happens when you enable NFS on a Synology. I don’t know if this is optimal, but this works for many people with the defaults. You can replicate this in Control Panel -> Shared Folders -> Highlight and Edit -> NFS Permissions. Add a new rule, add a hostname or IP entry, and hit OK. Here’s what the defaults look like:

NFS Exports fig 1

Let’s take a look at what happened in the busybox shell. SSH to your Synology as root with the admin password. Take a look at the permissions on the mount path (e.g. /volume1/rnelson0) and the contents of /etc/exports.

NFS Exports fig 2

(there is no carriage return at the end of the file, the ‘ds214>’ is part of the prompt not the exports contents)

A working mode for the directory is ‘0777’ and there’s a long string of nfs options. They are described in detail in the exports(5) man page. Here’s a high-level summary of each:

  • rw: Enable writes to the export (default is read-only).
  • async: Allow the NFS server process to accept additional writes before the current writes are written to disk. This is very much a preference and has potential for lost data.
  • no_wdelay: Do not delay writes if the server suspects (how? I don’t know) that another write is coming. This is a default with async so actually has no specific benefit here unless you remove async. This can have performance impacts, check whether wdelay is more appropriate.
  • no_root_squash: Do not map requests from uid/gid 0 (typically root) to the anonymous uid/gid.
  • insecure_locks: Do not require authentication of locking requests.
  • sec=sys: There are a number of modes, sys means no cryptographic security is used.
  • anonuid/anongid: The uid/gid for the anonymous user. On my Synology these are 1025/100 and match the guest account. Most Linux distros use 99/99 for the nobody account. vSphere will be writing as root so this likely has no actual effect.

I changed the netmask and anon(u|g)id values, as it’s most likely that a linux box with a nobody user would be the only non-vSphere client. Those should be the only values you need to change; async and no_wdelay are up to your preference.

If you use Puppet, you can automate the setup of your NFS server and exports. I use the echocat/nfs module from the forge (don’t forget the dependencies!). With the assumption that you already have a /nfs mount of sufficient size in place, the following manifest will create a share with the correct permissions and export flags for use with vSphere:

node server {
  include ::nfs::server

  file{ '/nfs/vsphere':
    ensure => directory,
    mode   => '0777',
  }
  ::nfs::server::export{ '/nfs/vsphere':
    ensure  => 'mounted',
    nfstag  => 'vsphere',
    clients => '10.0.0.0/24(rw,async,no_wdelay,no_root_squash,insecure_locks,sec=sys,anonuid=99,anongid=99)',
  }
}

To add your new datastore to vSphere, launch the vSphere Web Client. Go to the Datastores page, select the Datacenter you want to add the export to, and click on the Related Objects tab. Select Datastores and the first icon is to add a new datastore. Select NFS and the NFS version (new to vSphere 6), give your datastore a name, and provide the name of the export (/nfs/vsphere) and the hostname/IP of the server (nfs.example.com). Continue through the wizard, checking each host that will need the new datastore.

NFS Exports fig 3

Click Finish and you have a datastore that uses NFS! You may need to tweak the export flags for your environment, but this should be a good starting point. If you are aware of a better set of export flags, please let me know in the comments. Thanks!