Planning Your Distributed Log Insight Deployments

As I mentioned recently, I’ve changed jobs and it’s giving me more time for my blog. One of my first challenges at the new job is to look at how to deploy a Log Insight cluster, with the wrinkle that there are multiple datacenters and availability zones that need to leverage Log Insight. Previously, I have only worked with single-node instances of Log Insight, so I had a lot to learn.

The design includes three datacenters, A, B, and C, and the last has two availability zones, giving us A, B, C-1 and C-2. Each site has between 10 and 50 ESXi hosts, so not small but not gigantic, either. Every center should also forward data to a per-datacenter instance of a separate system for compliance.

Log Insight Product Docs

I found a few articles to help me out with the design aspect. The first are the official VMware docs vRealize Log Insight Configuration Limits, Sizing the vRealize Log Insight Virtual Appliance and Planning Your vRealize Log Insight Deployment. Each cluster consists of 3-12 members – one master and 2-11 workers – and each node can have up to 4TB of storage. They can talk to 1 each of a vROps Manager and Active Directory domain, 15 vCenter servers, and 10 forwarders. When using a cluster, nodes must be Medium or Large size and all nodes must be the same size. We should always use the Integrated Load Balancer (ILB) and direct targets to its virtual IP (VIP), even in non-cluster mode. This allows addition of nodes in the future without having to adjust the destination address of other devices.

There are a few caveats, as well. LI’s ILB does not support geoclusters yet, so all cluster members must be on the same Layer 2 network; devices in different L2 networks must be in separate clusters. If you’re using NSX, exclude the LI nodes from Distributed Firewall Protection, otherwise the ILB traffic may be blocked by spoofed traffic rules.

Initial Design

This gives us a lot of information to start designing. We need at least 3 medium or large Log Insight nodes in each datacenter, plus a VIP address assigned to the ILB. Since we have thin provisioned storage, we assign an additional 4TB disk to each node – it’s slightly above the space that LI will use, but hey, thin provisioned and no math required! If you don’t have thin provisioned storage, you can check the usable storage in the UI after deployment (IIRC it’s .6G for LI 8, but might be different if you’re upgrading a node from a previous version) and add whatever the delta is.

There is also one paragraph under the Planning doc’s Clusters with Forwarders section that I almost missed on the first readthrough:

The design is extended through the addition of multiple forwarder clusters at remote sites or clusters. Each forwarder cluster is configured to forward all its log messages to the main cluster and users connect to the main cluster, taking advantage of CFAPI for compression and resilience on the forwarding path. Forwarder clusters configured as top-of-rack can be configured with a larger local retention.

What this means is that in addition to clusters for A, B, C-1 and C-2, which will receive logs from the hosts in their respective datacenters, we also need a main cluster for our “Single Pain of Glass” view. This means 5 clusters of at least 3 nodes each, with clusters A, B, C-1, and C-2 forwarding to both the compliance system and the Main cluster.

Less obvious from the reading, but implied, is that the retention times of the SPOG cluster will likely be shorter than those of the datacenter clusters. Log Insight simply rotates out the oldest logs (by default to the bitbucket, but you can set up a long term archive location) when the disks get full, so your log ingestion rate determines your retention timeline in each cluster. However, you can’t combine 4 sets of logs into 1 without taking up more space, which means the retention time with the same disk space will be lower. We could increase the size of the SPOG cluster to 12 node, but it’s still possible that one noisy datacenter drowns out logs from the other 3 datacenters. Regardless of whether you increase the SPOG cluster size or not, it’s best to assume that your longest retention timeframes will be on the local LI cluster. As a result, when you log into the main cluster and you need to go back just a tiny bit further in time than is retained, you may have to log in to the respective datacenter LI instance to view the older records.

Additional Guides

Before finalizing the design, I looked beyond the product docs and found two more great references. VMware has an Architecting a VMware vRealize Log Insight Solution for VMware Cloud Providers whitepaper from January 2018, which makes it a bit older than Log Insight 8.0 but is still very relevant (reminder: LI jumped from 4.8 to 8.0 to match product numbers with other vRealize Suite products, not because of any major changes to LI). There’s a ton of valuable information in this paper, including some tuning advice for ESXi that I’ll inevitably come back to later, but right now we’re focused on the cluster design aspects.

While it’s not a factor for me, section 3.4.2 (pages 25-27) covers how to use a non-LI system as an intermediate syslog forwarder. Section 4.5 (page 33) has a table showing estimated log retention sizes for a single ESXi node or an 8 node ESXi cluster, which may be helpful in understanding the retention pattern of 3x4TBs in a cluster and whether additional nodes are required just for storage. Section 6.3 (pages 36-38) includes a number of tables of the required firewall ports and directions to build a proper firewall policy. Page 39 reminds us that the Log Insight nodes should probably run on a management cluster that has more nodes than the LI cluster (at least N+1), has HA enabled, and is using local (non-auto deploy) boot. Section 7.3 (page 42) notes that while an LI cluster can only communicate with one vROps Manager, one vROps Manager can communicate with multiple LI clusters – though the Launch in Context only works with a 1:1 mapping. Lots of good but random stuff there to inform the overall plan.

Section 10, starting on page 52, examines three scenarios and includes details on the resulting design. Design scenario C (page 54) most closely approximates my scenario. The London, Paris, and Frankfurt datacenters approximate my A, B, C-1, and C-2 datacenters, and the GNOC cluster approximates my Main cluster. The sizing is a bit more conservative (1 node at datacenters, 3 in the GNOC) but otherwise pretty close to what I came up with. Score 1 for me!

There’s one more document I reviewed, Log Insight Best Practices: Server by Steve Flanders. I worked my way backwards to this link, which is pretty much a checklist of everything I already identified, and a lot more, but all in one document! Really wish I had found this one first. Some of the things I didn’t catch already include:

  • Only list the local Active Directory domain servers; listing distant servers could result in up to 20 minute wait times for logins (or IME, just broken logins).
  • Use a service account for AD binding, decreases the chances of an expired password impairing your ability to log in.
  • If you use data archiving, LI does NOT clean up the archive location. It’s your job to make sure it continues to have free space and deletes data older than your long term retention requirements.
  • The 2TB limit mentioned is from 2015 and is currently 4TB.
  • Steve recommends using the DNS name instead of the IP as the log destination. There are pros and cons to each, I recommend investigating this and making an informed decision.
  • Replace the self-signed SSL certificate with a custom SSL certificate. Don’t forget to import the updated cert into other systems that need it, esp vROps.
  • We hopefully all know the importance of NTP, and Steve has written an article on proper NTP configuration. There’s nothing specific to LI here, but it’s especially vital that our log analysis systems are all properly synchronized. There’s nothing like events that actually happen hours apart appearing correlated because NTP isn’t working!

Full Design

With the addition of the whitepaper and Steve’s checklist, we have a better design. Let’s lay it out in two pieces: the installed components and the checklist of changes.

Components

  • Datacenter A
    • 3 node LI management cluster with a VIP, each node is a Medium install  with an extra 4TB disk
    • 3 node LI forwarding cluster with a VIP, each node is a Medium install with an extra 4TB disk
  • Datacenter B:
    • 3 node LI forwarding cluster with a VIP, each node is a Medium install with an extra 4TB disk
  • Datacenter C-1
    • 3 node LI forwarding cluster with a VIP, each node is a Medium install with an extra 4TB disk
  • Datacenter C-2
    • 3 node LI forwarding cluster with a VIP, each node is a Medium install with an extra 4TB disk

Checklist

Follow this for every cluster.

  • Deploy a single node as a new deployment to create the cluster
  • Configure NTP
  • Configure Active Directory authentication
    • Use the in-datacenter DCs only, accept certificates when using SSL
    • Use a service account for the binding
  • Replace the self-signed certificate with a custom certificate
  • Deploy 2 additional nodes as members of an existing cluster
  • (Forwarding clusters) Configure forwarding with a target of the management cluster’s VIP and the compliance system, by FQDN
  • (Forwarding clusters) Configure vCenter/vROps integration with the local datacenter’s vCenter(s) and single vROps instance.

Once the clusters are stood up and configured, they are ready to ingest data. You may choose an IP or FQDN. Though DNS is not hosted on the ESXi hosts, I still prefer IP-based destination as it is more likely to work during a datacenter issue; your preference may be different. How you apply it is also up to you. For ESXi hosts, you can use the vCenter integration in each forwarding cluster, Host Profiles, PowerCLI, or other vSphere APIs. LI can also ingest data from non-ESXi hosts, if you want to point them at LI as well.

Summary

We’ve reviewed a number of documents, official and unofficial, on Log Insight designs. We’ve taken the general instructions and guidelines and built our own design and implementation checklists. The design complies with best practices and can easily be expanded – just add additional nodes to each cluster as the need arises.

If I’ve missed anything, or you have questions about the design, drop a note in the comments or ping me on Twitter. Thanks!

Change is in the air

As the leaves start to change color and fall, you can’t help but notice the changes in the air. Our family is no exception.

First, I am so excited and happy for my wife, Michelle, who accepted the Stark Professor Endowed Chair here at IUSM. If you’re not familiar with academia positions, this is a really big deal! In addition, it’s at the same institute she’s already at, so no moving this time. Super proud of you, Michelle! She’ll be starting her new position on November 1st.

Second, I am changing jobs. I just celebrated my last day at AT&T, where I spent almost 15 years. It was a great ride, but sometimes you need a change. I’m gonna miss y’all! Next week, I start my new job at athenahealth. Like my previous one, this is full time remote and I will go to the Boston office a few times a year. I’m super excited about this change, it’s my first completely new job since 2004 and it’s in an new industry. This change should also give me more time to work on Puppet and the blog, which is great since I have some really great ideas following Puppetize PDX. However, I’m taking the week off between jobs to relax and avoid doing work-like things – and we’re redoing the floor in the guest bathroom to help me avoid temptation.

There’s also some other stuff in the works that we don’t want to jinx, but you’ll hear more soon. Tune in on Twitter if you want to be the first to get the deets.

Fall is always an exciting time, and I hope yours is as exciting as ours!

Adding, extending, and removing Linux disks and partitions in 2019

Managing disks and partitions in Linux has changed quite a bit over time. Unfortunately, as Jonathan Frappier points out, a lot of advice is either wrong, dated, or makes some poor assumptions along the way:

I know I almost always go to the search results drawing board when I have to manage a disk, so it’s pretty infrequent, and have the same problems of sifting through documentation that will work and that which will. I hope this article can be used as a source of modern, tested documentation for some common use cases.

Tools and File systems

First, a quick mention of the tools and file systems available.

fdisk

A venerable tool that continues to work fully as long as your disks are under 2 T in size. Once you’re over 2 T, I’d treat it like a RO only and not use it to make edits. You can use fdisk -l to display information on all disks it sees or fdisk <devicepath> to interact with a specific disk:

[rnelson0@build03 ~]$ sudo fdisk -l

Disk /dev/sda: 107.4 GB, 107374182400 bytes, 209715200 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x000b06ec

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *        2048     1026047      512000   83  Linux
/dev/sda2         1026048   209715199   104344576   8e  Linux LVM

Disk /dev/mapper/VolGroup00-lv_root: 12.6 GB, 12582912000 bytes, 24576000 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/mapper/VolGroup00-lv_swap: 4294 MB, 4294967296 bytes, 8388608 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/mapper/VolGroup00-lv_home: 83.9 GB, 83886080000 bytes, 163840000 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/loop0: 107.4 GB, 107374182400 bytes, 209715200 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/loop1: 2147 MB, 2147483648 bytes, 4194304 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/mapper/docker-253:0-263061-pool: 107.4 GB, 107374182400 bytes, 209715200 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 65536 bytes / 65536 bytes

[rnelson0@build03 ~]$ sudo fdisk /dev/sda
Welcome to fdisk (util-linux 2.23.2).

Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.


Command (m for help): m
Command action
   a   toggle a bootable flag
   b   edit bsd disklabel
   c   toggle the dos compatibility flag
   d   delete a partition
   g   create a new empty GPT partition table
   G   create an IRIX (SGI) partition table
   l   list known partition types
   m   print this menu
   n   add a new partition
   o   create a new empty DOS partition table
   p   print the partition table
   q   quit without saving changes
   s   create a new empty Sun disklabel
   t   change a partition's system id
   u   change display/entry units
   v   verify the partition table
   w   write table to disk and exit
   x   extra functionality (experts only)

Command (m for help): p

Disk /dev/sda: 107.4 GB, 107374182400 bytes, 209715200 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x000b06ec

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *        2048     1026047      512000   83  Linux
/dev/sda2         1026048   209715199   104344576   8e  Linux LVM

Command (m for help): q

parted

A more modern tool that supports over 2 T disks. I would prefer this, though there’s nothing wrong with fdisk other than the size limit. Similarly, you can use -l to show all data or a device name to interact. Once nice thing is it defaults to human-readable sizes instead of a jumble of long numbers without commas.

[rnelson0@build03 ~]$ sudo parted -l
Model: VMware Virtual disk (scsi)
Disk /dev/sda: 107GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags:

Number  Start   End    Size   Type     File system  Flags
 1      1049kB  525MB  524MB  primary  ext4         boot
 2      525MB   107GB  107GB  primary               lvm


Model: Linux device-mapper (thin-pool) (dm)
Disk /dev/mapper/docker-253:0-263061-pool: 107GB
Sector size (logical/physical): 512B/512B
Partition Table: loop
Disk Flags:

Number  Start  End    Size   File system  Flags
 1      0.00B  107GB  107GB  xfs


Model: Linux device-mapper (linear) (dm)
Disk /dev/mapper/VolGroup00-lv_home: 83.9GB
Sector size (logical/physical): 512B/512B
Partition Table: loop
Disk Flags:

Number  Start  End     Size    File system  Flags
 1      0.00B  83.9GB  83.9GB  ext4


Model: Linux device-mapper (linear) (dm)
Disk /dev/mapper/VolGroup00-lv_swap: 4295MB
Sector size (logical/physical): 512B/512B
Partition Table: loop
Disk Flags:

Number  Start  End     Size    File system     Flags
 1      0.00B  4295MB  4295MB  linux-swap(v1)


Model: Linux device-mapper (linear) (dm)
Disk /dev/mapper/VolGroup00-lv_root: 12.6GB
Sector size (logical/physical): 512B/512B
Partition Table: loop
Disk Flags:

Number  Start  End     Size    File system  Flags
 1      0.00B  12.6GB  12.6GB  ext4


[rnelson0@build03 ~]$ sudo parted /dev/sda
GNU Parted 3.1
Using /dev/sda
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) help
  align-check TYPE N                        check partition N for TYPE(min|opt) alignment
  help [COMMAND]                           print general help, or help on COMMAND
  mklabel,mktable LABEL-TYPE               create a new disklabel (partition table)
  mkpart PART-TYPE [FS-TYPE] START END     make a partition
  name NUMBER NAME                         name partition NUMBER as NAME
  print [devices|free|list,all|NUMBER]     display the partition table, available devices, free space, all found partitions, or a particular partition
  quit                                     exit program
  rescue START END                         rescue a lost partition near START and END

  resizepart NUMBER END                    resize partition NUMBER
  rm NUMBER                                delete partition NUMBER
  select DEVICE                            choose the device to edit
  disk_set FLAG STATE                      change the FLAG on selected device
  disk_toggle [FLAG]                       toggle the state of FLAG on selected device
  set NUMBER FLAG STATE                    change the FLAG on partition NUMBER
  toggle [NUMBER [FLAG]]                   toggle the state of FLAG on partition NUMBER
  unit UNIT                                set the default unit to UNIT
  version                                  display the version number and copyright information of GNU Parted
(parted) print
Model: VMware Virtual disk (scsi)
Disk /dev/sda: 107GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags:

Number  Start   End    Size   Type     File system  Flags
 1      1049kB  525MB  524MB  primary  ext4         boot
 2      525MB   107GB  107GB  primary               lvm

(parted) quit

ext4

The “extended” family of filesystems (currently ext4, but possibly ext3 or ext2 if you work on some really old systems) have been used by Linux for a long time. It’s still a default in a number of distros, especially for smaller partitions. It’s a very competent journaled filesystem and the maximums have been massively extended over earlier versions (for example, 1 EiB max volume and 16 TiB max file size), but it’s still considered yesterday’s technology. The biggest fault I find with ext4 is that inodes – what tracks the metadata for files – are set at filesystem creation time and cannot be changed. This means that when extended volumes, the inode count is not increased. Thus, after an expansion, you could wind up horribly undersized in inodes while still having free space available. New files cannot be created when the inodes are exhausted. Most are familiar with measuring disk usage by space capacity, but you can also view your inode capacity:

[rnelson0@build03 ~]$ df -h
Filesystem                      Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-lv_root   12G  6.7G  4.3G  62% /
devtmpfs                        908M     0  908M   0% /dev
tmpfs                           920M     0  920M   0% /dev/shm
tmpfs                           920M   98M  822M  11% /run
tmpfs                           920M     0  920M   0% /sys/fs/cgroup
/dev/sda1                       477M  185M  263M  42% /boot
/dev/mapper/VolGroup00-lv_home   77G  5.8G   68G   8% /home
tmpfs                           184M     0  184M   0% /run/user/1000
[rnelson0@build03 ~]$ df -i
Filesystem                      Inodes  IUsed   IFree IUse% Mounted on
/dev/mapper/VolGroup00-lv_root  768544 202858  565686   27% /
devtmpfs                        232260    360  231900    1% /dev
tmpfs                           235373      1  235372    1% /dev/shm
tmpfs                           235373    850  234523    1% /run
tmpfs                           235373     16  235357    1% /sys/fs/cgroup
/dev/sda1                       128016    354  127662    1% /boot
/dev/mapper/VolGroup00-lv_home 5120000 634404 4485596   13% /home
tmpfs                           235373      1  235372    1% /run/user/1000

This limitation leads me to recommend other filesystems, especially when you expect it will grow in the future.

btrfs

Designed at Oracle for Linux, btrfs uses a copy-on-write B-Tree data structure and provides advanced pooling, snapshots, and checksum features. Initially developed in 2008, it took until 2013 to be marked as stable and even longer for Linux distros to mark it as supported – even now, only Oracle Linux 7, SuSE 15, and Synology DSM v6 do so. Plenty of people do love in spite of its support status and it is a viable solution, I’m just not as familiar with it personally.

xfs

Originally designed by SGI, xfs was ported to Linux in 2001. This, too, took a long while to become supported by distros, but now is supported by almost every distro, many of which use it as the default file system. It supports dynamic inode allocation, up to 8 EiB – 1 byte volumes AND files, online resizing, and many other features. This is the default in Red Hat Enterprise Linux and my preferred FS, which I’ll be using later – but is roughly equivalent to btrfs in general, if you prefer that.

LVM

Not quite a file system or a tool, the Logical Volume Manager is a device mapper that provides an extra layer of an abstraction between raw file systems and the OS. All of ext4, btrfs, and xfs can be used separately or with the LVM. Among other things, it lets you easily create device maps that span partitions and disks, resize them, and export/import them (for instance, to migrate disks to another system and retain the FS). I prefer using the LVM for everything; if you never modify the partitions it causes no harm, but if you ever need to and have not used LVM, it’s far more painful.

Most LVM commands begin with pv, vg, or lv and you use can use tab-completion to discover the utilities we do not cover today. I regularly use pvs/vgs/lvs to get short summaries and pvdisplay/vgdisplay/lvdisplay for detailed status. All others are very situational.

Adding a new volume

With some understanding of the tools and filesystems at our fingertips, let’s take a look at a common scenario: adding a new volume. I’m only focused on the OS steps here, so we will assume that you have added a new physical disk, attached a VMDK or EC2 volume, etc. as your implementation requires. We will also assume that you want an xfs filesystem at /mnt/newdisk, the system had a single disk /dev/sda that was automatically partitioned by the installer, and the new disk receives the moniker /dev/sdb. Not all OSes and systems will provide the same name; for instance, Jonathan’s AWS Linux system’s second disk was called /dev/nvme1n1. In such a case, just swap in the correct partition name.

Most modern distros will automatically detect new disks while running. If yours does not, or fails to detect the disk, you can force a rescan with the following command (change the host number as appropriate):

echo "- - -" > /sys/class/scsi_host/host0/scan

Make sure the new device shows up before continuing:

[rnelson0@build03 ~]$ ls /dev/sd?
/dev/sda
[rnelson0@build03 ~]$ ls /dev/sd?
/dev/sda /dev/sdb

You can use fdisk or parted to get information on the device. There will be no label, since we haven’t touched it, but the reported size should be accurate (be sure to account for the difference between, say, gigabytes and gibibytes. I was surprised to learn that when I specify 16 GB in my vSphere system, it apparently means gibibytes, where the proper label is actually GiB. By comparison, parted’s GB does mean gigabytes.

[rnelson0@build03 ~]$ sudo parted /dev/sdb
GNU Parted 3.1
Using /dev/sdb
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) print
Error: /dev/sdb: unrecognised disk label
Model: VMware Virtual disk (scsi)
Disk /dev/sdb: 17.2GB
Sector size (logical/physical): 512B/512B
Partition Table: unknown
Disk Flags:

If we plan to have multiple partitions on a single disk, this is where we would create them using mkpart. However, you do not need to create a partition and can use the whole disk. I generally prefer using the whole disk; it’s fairly cheap to add a new disk and I believe is cognitively easier to manage than extending disks and partitions. For instance, if you add a 100 GB disk and make a single partition and later extend it to 200 GB, you can just extend it easily. If you create two 50 GB partitions and extend it to 100 GB later, you can only extend the 2nd partition and would have to create a new partition and join it to an existing LVM mapper to (effectively) increase the space of the partition. Alternatively, you can just always add entire disks to LVM groups, assuming that any extension is via an additional disk, which is what we will do here.

Though we are not creating partitions on the disk, we do want to leverage the LVM to create a mapping that we can manipulate semi-independently of the disks. Each LVM mapping has 3 layers – the physical, virtual, and logical layers. The physical layer refers to the disk itself (even if it’s a virtualized system, we pretend it’s a physical disk). The logical layer is where we create the end mapping that receives the file system. The virtual layers sits in the middle, sort of akin to the partitions we skipped, and presents physical layer information to the virtual layer. In our case, we simply map the entire disk to the physical and virtual layers, then assign 100% of the free space of the virtual device to the new logical device.

Once that is done, we format the result with our filesystem of choice, in this case xfs by using mkfs.xfs, and if it doesn’t exist, create the mount point. I normally put some of the information in variables, since the LVM commands are almost always the same. You’ll also notice I switched to root to save a few keystrokes over using sudo constantly, as all the LVM commands require elevated permissions.

#COMMANDS
DEVICE=/dev/sdb
NAME=newdisk
MOUNT=/mnt/newdisk

pvcreate ${DEVICE}
vgcreate vg_xfs_${NAME} ${DEVICE}
lvcreate -n lv_${NAME} -l 100%FREE vg_xfs_${NAME}
mkfs.xfs /dev/vg_xfs_${NAME}/lv_${NAME}
if [[ ! -d "${MOUNT}" ]]; then mkdir ${MOUNT}; fi

#OUTPUT
[root@build03 ~]# DEVICE=/dev/sdb
[root@build03 ~]# NAME=newdisk
[root@build03 ~]# MOUNT=/mnt/newdisk
[root@build03 ~]#
[root@build03 ~]# pvcreate ${DEVICE}
  Physical volume "/dev/sdb" successfully created.
[root@build03 ~]# vgcreate vg_xfs_${NAME} ${DEVICE}
  Volume group "vg_xfs_newdisk" successfully created
[root@build03 ~]# lvcreate -n lv_${NAME} -l 100%FREE vg_xfs_${NAME}
  Logical volume "lv_newdisk" created.
[root@build03 ~]# mkfs.xfs /dev/vg_xfs_${NAME}/lv_${NAME}
meta-data=/dev/vg_xfs_newdisk/lv_newdisk isize=512    agcount=4, agsize=1048320 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=0, sparse=0
data     =                       bsize=4096   blocks=4193280, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

[root@build03 ~]# parted -l
...
Model: Linux device-mapper (linear) (dm)
Disk /dev/mapper/vg_xfs_newdisk-lv_newdisk: 17.2GB
Sector size (logical/physical): 512B/512B
Partition Table: loop
Disk Flags:

Number  Start  End     Size    File system  Flags
 1      0.00B  17.2GB  17.2GB  xfs

Finally, we have to mount this new partition. How your system manages the fstab can vary wildly, but generally just appending a line to /etc/fstab works (some distros automagically manage it). Mount the partition and now we can start using it!

[root@build03 ~]# echo "/dev/vg_xfs_${NAME}/lv_${NAME} ${MOUNT}          xfs    defaults        1 2" >> /etc/fstab
[root@build03 ~]# mount /mnt/newdisk
[root@build03 ~]# df -h /mnt/newdisk
Filesystem                             Size  Used Avail Use% Mounted on
/dev/mapper/vg_xfs_newdisk-lv_newdisk   16G   33M   16G   1% /mnt/newdisk
[root@build03 ~]# df -i /mnt/newdisk
Filesystem                             Inodes IUsed   IFree IUse% Mounted on
/dev/mapper/vg_xfs_newdisk-lv_newdisk 8386560     3 8386557    1% /mnt/newdisk

Extending an LVM device with an existing disk

The next most common case is to extend the disk a device uses. As I mentioned before, I prefer to just use a new disk as they’re frequently cheap, but some systems do make them more expensive. In my lab, I increased the size of my /dev/sdb to 32 GiB. Because the disk has already been detected, I have to force the OS to rescan and see the new space using echo 1>/sys/class/block/<DEVICE>/device/rescan:

[root@build03 ~]# parted -l
...
Error: /dev/sdb: unrecognised disk label
Model: VMware Virtual disk (scsi)
Disk /dev/sdb: 17.2GB
Sector size (logical/physical): 512B/512B
Partition Table: unknown
Disk Flags:

Model: Linux device-mapper (linear) (dm)
Disk /dev/mapper/vg_xfs_newdisk-lv_newdisk: 17.2GB
Sector size (logical/physical): 512B/512B
Partition Table: loop
Disk Flags:

Number  Start  End     Size    File system  Flags
 1      0.00B  17.2GB  17.2GB  xfs
...
[root@build03 ~]# echo 1>/sys/class/block/sdb/device/rescan
[root@build03 ~]# parted -l
...
Error: /dev/sdb: unrecognised disk label
Model: VMware Virtual disk (scsi)
Disk /dev/sdb: 34.4GB
Sector size (logical/physical): 512B/512B
Partition Table: unknown
Disk Flags:

Model: Linux device-mapper (linear) (dm)
Disk /dev/mapper/vg_xfs_newdisk-lv_newdisk: 17.2GB
Sector size (logical/physical): 512B/512B
Partition Table: loop
Disk Flags:

Number  Start  End     Size    File system  Flags
 1      0.00B  17.2GB  17.2GB  xfs

With the physical size increase visible, we need to expand the LVM device. First, pvresize so the physical layer sees the change, then vgscan to see the changes, and finally lvextend to have the logical volume allocate the virtual group’s free (unassigned) space (the + in +100%FREE means to ADD free space; 100%FREE without the plus means it allocates as much space as is free, but starting from the beginning – I don’t make the rules, I just get run over by them like everyone else!). Extend the FS with xfs_growfs and then look at the space and inodes. We can use the same variables as before to make it a little easier:

#COMMANDS
DEVICE=/dev/sdb
NAME=newdisk
MOUNT=/mnt/newdisk

pvresize ${DEVICE}
vgscan
lvextend -l +100%FREE /dev/vg_xfs_${NAME}/lv_${NAME}
xfs_growfs ${MOUNT}

#OUTPUT
[root@build03 ~]# DEVICE=/dev/sdb
[root@build03 ~]# NAME=newdisk
[root@build03 ~]# MOUNT=/mnt/newdisk
[root@build03 ~]#
[root@build03 ~]# pvresize ${DEVICE}
  Physical volume "/dev/sdb" changed
  1 physical volume(s) resized / 0 physical volume(s) not resized
[root@build03 ~]# vgscan
  Reading volume groups from cache.
  Found volume group "vg_xfs_newdisk" using metadata type lvm2
  Found volume group "VolGroup00" using metadata type lvm2
[root@build03 ~]# lvextend -l +100%FREE /dev/vg_xfs_${NAME}/lv_${NAME}
  Size of logical volume vg_xfs_newdisk/lv_newdisk changed from 16.00 GiB (4096 extents) to <32.00 GiB (8191 extents).
  Logical volume vg_xfs_newdisk/lv_newdisk successfully resized.
[root@build03 ~]# xfs_growfs ${MOUNT}
meta-data=/dev/mapper/vg_xfs_newdisk-lv_newdisk isize=512    agcount=4, agsize=1048320 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=0 spinodes=0
data     =                       bsize=4096   blocks=4193280, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal               bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
data blocks changed from 4193280 to 8387584

[root@build03 ~]# df -h /mnt/newdisk
Filesystem                             Size  Used Avail Use% Mounted on
/dev/mapper/vg_xfs_newdisk-lv_newdisk   32G   33M   32G   1% /mnt/newdisk
[root@build03 ~]# df -i /mnt/newdisk
Filesystem                              Inodes IUsed    IFree IUse% Mounted on
/dev/mapper/vg_xfs_newdisk-lv_newdisk 16775168     3 16775165    1% /mnt/newdisk

You can see that the inode count is roughly twice the size as before, so you are in no danger of running out any time soon, even with a ton of tiny files!

Extending an LVM device with a new disk

We can also extend an LVM device using a new disk. After attaching the new disk, you need to pvcreate the new physical device, vgextend the virtual group into the new disk, and then lvextend the volume and xfs_growfs filesystem. In this example, the new disk is /dev/sdc and is 16 GiB:

#COMMANDS
DEVICE=/dev/sdc
NAME=newdisk
MOUNT=/mnt/newdisk

pvcreate ${DEVICE}
vgextend vg_xfs_${NAME} ${DEVICE}
lvextend /dev/vg_xfs_${NAME}/lv_${NAME}
xfs_growfs ${MOUNT}

#OUTPUT
[root@build03 ~]# DEVICE=/dev/sdc
[root@build03 ~]# NAME=newdisk
[root@build03 ~]# MOUNT=/mnt/newdisk
[root@build03 ~]#
[root@build03 ~]# pvcreate ${DEVICE}
  Physical volume "/dev/sdc" successfully created.
[root@build03 ~]# vgextend vg_xfs_${NAME} ${DEVICE}
  Volume group "vg_xfs_newdisk" successfully extended
[root@build03 ~]# vgs
  VG             #PV #LV #SN Attr   VSize   VFree
  VolGroup00       1   3   0 wz--n- <99.51g   5.66g
  vg_xfs_newdisk   2   1   0 wz--n-  47.99g <16.00g
[root@build03 ~]# lvextend -n lv_${NAME} -l +100%FREE /dev/vg_xfs_${NAME}/lv_${NAME}
  Please specify a logical volume path.
  Run `lvextend --help' for more information.
[root@build03 ~]# lvextend -l +100%FREE /dev/vg_xfs_${NAME}/lv_${NAME}
  Size of logical volume vg_xfs_newdisk/lv_newdisk changed from <32.00 GiB (8191 extents) to 47.99 GiB (12286 extents).
  Logical volume vg_xfs_newdisk/lv_newdisk successfully resized.
[root@build03 ~]# xfs_growfs ${MOUNT}
meta-data=/dev/mapper/vg_xfs_newdisk-lv_newdisk isize=512    agcount=9, agsize=1048320 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=0 spinodes=0
data     =                       bsize=4096   blocks=8387584, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal               bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
data blocks changed from 8387584 to 12580864
[root@build03 ~]# df -h /mnt/newdisk
Filesystem                             Size  Used Avail Use% Mounted on
/dev/mapper/vg_xfs_newdisk-lv_newdisk   48G   33M   48G   1% /mnt/newdisk
[root@build03 ~]# df -i /mnt/newdisk
Filesystem                              Inodes IUsed    IFree IUse% Mounted on
/dev/mapper/vg_xfs_newdisk-lv_newdisk 25161728     3 25161725    1% /mnt/newdisk

Once again, the space and the inodes have increased!

Removing an LVM device

Finally, there are times when you will remove a mount point entirely. You can just “yank” the disks and you’ll probably be okay, but rather than cross our fingers and hope, we can manually remove the configuration to ensure no auto-detection goes wrong. The process includes unmounting the partition, removing the logical/virtual/physical layer mappings, then yanking the disks. We will undo our previous examples, where /dev/sdb and /dev/sdc provide vg_xfs_newdisk and lv_newdisk. Just start at the end and work our way back:

[root@build03 ~]# umount /mnt/newdisk
[root@build03 ~]# lvremove lv_${NAME}
  Volume group "lv_newdisk" not found
  Cannot process volume group lv_newdisk
[root@build03 ~]# lvremove /dev/vg_xfs_newdisk/lv_newdisk
Do you really want to remove active logical volume vg_xfs_newdisk/lv_newdisk? [y/n]: y
  Logical volume "lv_newdisk" successfully removed
[root@build03 ~]# vgremove vg_xfs_newdisk
  Volume group "vg_xfs_newdisk" successfully removed
[root@build03 ~]# pvremove /dev/sdc
  Labels on physical volume "/dev/sdc" successfully wiped.
[root@build03 ~]# pvremove /dev/sdb
  Labels on physical volume "/dev/sdb" successfully wiped.
[root@build03 ~]#
[root@build03 ~]# lvs
  LV      VG         Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  lv_home VolGroup00 -wi-ao----  78.12g
  lv_root VolGroup00 -wi-ao---- <11.72g
  lv_swap VolGroup00 -wi-ao----   4.00g
[root@build03 ~]# vgs
  VG         #PV #LV #SN Attr   VSize   VFree
  VolGroup00   1   3   0 wz--n- <99.51g 5.66g
[root@build03 ~]# pvs
  PV         VG         Fmt  Attr PSize   PFree
  /dev/sda2  VolGroup00 lvm2 a--  <99.51g 5.66g

Do not forget to modify /etc/fstab to remove the mapping, or the next boot may complain about the missing disk or even fail outright. Make sure to follow your distro guide for this, as sometimes systems like anaconda can undo your edits silently. While this can all be done live, I do recommend a reboot at the end, so that if something went wrong that affects the boot cycle, it is found immediately.

Summary

I covered common tasks – adding, extending, and removing partitions on the same system – but there’s a lot more you can do. As mentioned, parted replaces fdisk, but we glossed over that in favor of using entire disks. But, at least you know that any documentation using fdisk is pretty aged. I also briefly mentioned vgexport/vgimport. If you plan to detach disks in LVM – even a single disk – and re-attach them to another system, you want to export the virtual group on the old system and import it on the new system. This will help ensure that any mismatch in device naming by the OS – say sdc and sde on the old system and sdc and sdd on the new system – do not result in any data loss.

I hope this article is a good reference for fundamental filesystem tasks for novice and expert linux administrators alike. Please let me know on twitter or in the comments of anything else you would like to see and I’ll keep this updated. Thanks!

Convert a controlrepo to using the Puppet Development Kit (PDK)

I previously wrote about converting an individual puppet module’s repo to use the Puppet Development Kit. We can also convert controlrepos to use the PDK. I am starting with a “traditional” controlrepo, described here, as well as centralized tests, described here. To follow this article directly, you need to:

  • Have all hiera data and role/profile/custom modules in the /dist directory
  • Have all tests, for all modules, in the /spec directory

If your controlrepo looks different, this article can be used to migrate to the PDK, but you will have to modify some of the sections a bit.

This will be a very lengthy blog post (over 4,000 words!) and covers a very time-consuming process. It took me about 2 full days to work through this on my own controlrepo. Hopefully, this article helps you shave significant time off the effort, but don’t expect a very quick change.

Managing the PDK itself

First, let’s make sure we have a profile to install and manage the PDK. As we use the role/profile pattern, we create a class profile::pdk with the parameter version, which we can specify in hiera as profile::pdk::version: ‘1.7.1’ (current version as of this writing). This profile can then be added to an appropriate role class, like role::build for a build server, or applied directly to your laptop. I use only Enterprise Linux 7, but we could certainly flush this out to support multiple OSes:

# dist/profile/manifests/pdk.pp
class profile::pdk (
  String $version = 'present',
) {
  package {'puppet6-release':
    ensure => present,
    source => "https://yum.puppet.com/puppet6/puppet6-release-el-7.noarch.rpm",
  }
  package {'pdk':
    ensure => $version,
    require => Package['puppet6-release'],
  }
}

# spec/classes/profile/pdk_spec.rb
require 'spec_helper'
describe 'profile::pdk' do
  on_supported_os.each do |os, facts|
    next unless facts[:kernel] == 'Linux'
    context "on #{os}" do
      let (:facts) {
        facts.merge({
          :clientcert => 'build',
        })
      }

      it { is_expected.to compile.with_all_deps }

      it { is_expected.to contain_package('puppet6-release') }
      it { is_expected.to contain_package('pdk') }
    end
  end
end

Once this change is pushed upstream and the build server (or other target node) checks in, the PDK is available:

$ pdk --version
1.7.1

Now we are almost ready to go. Of course, we need to start with good, working tests! If any tests are currently failing, we need to get them to a passing state before continuing, like this:

Finished in 3 minutes 12.2 seconds (files took 1 minute 17.89 seconds to load)
782 examples, 0 failures

With everything in a known good state, we can then be sure that any failures are related to the PDK changes, and only the PDK changes.

Setting up the PDK Template

The PDK comes with a set of pre-packaged templates. It is recommended to stick with a set of templates designed for the current PDK version for stability. However, the templates are online and may updated without an accompanying PDK release. We may choose to stick with the on-disk templates, we may point to the online templates from Puppet, or we may create our own! For those working with the the on-disk templates, you can skip down to working with .sync.yml

To another template, we use the pdk convert --template-url. If this is our own template, we should make sure the latest commit is compliant with the PDK version we are using. If we point to Puppet’s templates, we essentially shift to the development track. Make sure you understand this before changing the templates. We can get back to using the on-disk template with the url file:///opt/puppetlabs/pdk/share/cache/pdk-templates.git, though, so this isn’t a decision we have to live with forever. Here’s the command to switch to the official Puppet templates:

$ pdk convert --template-url=https://github.com/puppetlabs/pdk-templates

------------Files to be added-----------
.travis.yml
.gitlab-ci.yml
.yardopts
appveyor.yml
.rubocop.yml
.pdkignore

----------Files to be modified----------
metadata.json
Gemfile
Rakefile
spec/spec_helper.rb
spec/default_facts.yml
.gitignore
.rspec

----------------------------------------

You can find a report of differences in convert_report.txt.

pdk (INFO): Module conversion is a potentially destructive action. Ensure that you have committed your module to a version control system or have a backup, and review the changes above before continuing.
Do you want to continue and make these changes to your module? Yes

------------Convert completed-----------

6 files added, 7 files modified.

Now, everyone’s setup is probably a little different and thus we cannot predict the entirety of the changes each of us must make, but there are some minimal changes everyone must make. The file .sync.yml can be created to allow each of us to override the template defaults without having to write our own templates. The layout of the YAML starts with the filename the changes will modify, followed by the appropriate config section and then the value(s) for that section. We can find the appropriate config section by looking at the template repo’s erb templates. For instance, I do not use AppVeyor, GitLab, or Travis with this controlrepo, so to have git ignore them, I made the following changes to the .gitignore‘s required hash:

$ cat .sync.yml
---
.gitignore:
  required:
    - 'appveyor.yml'
    - '.gitlab-ci.yml'
    - '.travis.yml'

When changes are made to the sync file, they must be applied with the pdk update command. We can see that originally, these unused files were to be committed, but now they are properly ignored:

$ git status
# On branch pdk
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#       modified:   .gitignore
#       modified:   .rspec
#       modified:   Gemfile
#       modified:   Rakefile
#       modified:   metadata.json
#       modified:   spec/default_facts.yml
#       modified:   spec/spec_helper.rb
#
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#       .gitlab-ci.yml
#       .pdkignore
#       .rubocop.yml
#       .sync.yml
#       .travis.yml
#       .yardopts
#       appveyor.yml
no changes added to commit (use "git add" and/or "git commit -a")

$ cat .sync.yml
---
.gitignore:
  required:
    - 'appveyor.yml'
    - '.gitlab-ci.yml'
    - '.travis.yml'

$ pdk update
pdk (INFO): Updating mss-controlrepo using the template at https://github.com/puppetlabs/pdk-templates, from 1.7.1 to 1.7.1

----------Files to be modified----------
.gitignore

----------------------------------------

You can find a report of differences in update_report.txt.

Do you want to continue and make these changes to your module? Yes

------------Update completed------------

1 files modified.

$ git status
# On branch pdk
# Changes not staged for commit:
# (use "git add <file>..." to update what will be committed)
# (use "git checkout -- <file>..." to discard changes in working directory)
#
# modified: .gitignore
# modified: .rspec
# modified: Gemfile
# modified: Rakefile
# modified: metadata.json
# modified: spec/default_facts.yml
# modified: spec/spec_helper.rb
#
# Untracked files:
# (use "git add <file>..." to include in what will be committed)
#
# .pdkignore
# .rubocop.yml
# .sync.yml
# .yardopts
no changes added to commit (use "git add" and/or "git commit -a")

Anytime we pdk update, we will still receive new versions of the ignored files, but they won’t be committed to the repo and a git clean or a clean checkout will remove them.

After initial publication, I was made aware that you can completely delete or unmanage a file using delete:true or unmanage:true, as described here, rather than using .gitignore.

We may need to implement other overrides, except that we do not know what they would be yet, so let’s commit our changes so far. Then we can start working on validation or unit tests. It doesn’t really matter which we choose to work on first, though my preference is validation first as it does not depend on the version of Puppet we are testing.

PDK Validate

The PDK validation check, pdk validate, will check the syntax and style of metadata.json and any task json files, syntax and style of all puppet files, and ruby code style. This is roughly equivalent to our old bundle exec rake syntax task. Since the bundle setup is a wee bit old and the PDK is kept up to date, we shouldn’t be surprised if what was passing before now has failures. Here’s a sample of the errors I encountered on my first run – there were hundreds of them:

$ pdk validate
pdk (INFO): Running all available validators...
pdk (INFO): Using Ruby 2.5.1
pdk (INFO): Using Puppet 6.0.2
[✔] Checking metadata syntax (metadata.json tasks/*.json).
[✔] Checking module metadata style (metadata.json).
[✔] Checking Puppet manifest syntax (**/**.pp).
[✔] Checking Puppet manifest style (**/*.pp).
[✖] Checking Ruby code style (**/**.rb).
info: task-metadata-lint: ./: Target does not contain any files to validate (tasks/*.json).
warning: puppet-lint: dist/eyaml/manifests/init.pp:43:12: indentation of => is not properly aligned (expected in column 14, but found it in column 12)
warning: puppet-lint: dist/eyaml/manifests/init.pp:51:11: indentation of => is not properly aligned (expected in column 12, but found it in column 11)
warning: puppet-lint: dist/msswiki/manifests/init.pp:56:12: indentation of => is not properly aligned (expected in column 13, but found it in column 12)
warning: puppet-lint: dist/msswiki/manifests/init.pp:57:10: indentation of => is not properly aligned (expected in column 13, but found it in column 10)
warning: puppet-lint: dist/msswiki/manifests/init.pp:58:11: indentation of => is not properly aligned (expected in column 13, but found it in column 11)
warning: puppet-lint: dist/msswiki/manifests/init.pp:59:10: indentation of => is not properly aligned (expected in column 13, but found it in column 10)
warning: puppet-lint: dist/msswiki/manifests/init.pp:60:12: indentation of => is not properly aligned (expected in column 13, but found it in column 12)
warning: puppet-lint: dist/msswiki/manifests/rsync.pp:37:140: line has more than 140 characters
warning: puppet-lint: dist/msswiki/manifests/rsync.pp:43:140: line has more than 140 characters
warning: puppet-lint: dist/profile/manifests/access_request.pp:21:3: optional parameter listed before required parameter
warning: puppet-lint: dist/profile/manifests/access_request.pp:22:3: optional parameter listed before required parameter

We can control puppet-lint Rake settings in .sync.yml – but it only works for rake tasks. pdk validate will ignore it because puppet-lint isn’t invoked via rake. The same settings need to be put in .puppet-lint.rc in the proper format. That file is not populated via pdk, so just create it by hand. I don’t care about the arrow alignment or 140 characters checks, so I’ve added the appropriate lines to both files and re-run pdk update. We all have difference preferences, just make sure they are reflected in both locations:

$ cat .sync.yml
---
.gitignore:
  required:
    - 'appveyor.yml'
    - '.gitlab-ci.yml'
    - '.travis.yml'
Rakefile:
  default_disabled_lint_checks:
    - '140chars'
    - 'arrow_alignment'
$ cat .puppet-lint.rc
--no-arrow_alignment-check
--no-140chars-check
$ grep disable Rakefile
PuppetLint.configuration.send('disable_relative')
PuppetLint.configuration.send('disable_140chars')
PuppetLint.configuration.send('disable_arrow_alignment')

Now we can use pdk validate and see a lot fewer violations. We can try to automatically correct the remaining violations with pdk validate -a, which will also try to auto-fix other syntax violations, or pdk bundle exec rake lint_fix, which restricts fixes to just puppet-lint. Not all violations can be auto-corrected, so some may still need fixed manually. I also found I had a .rubocop.yml in a custom module’s directory causing rubocop failures, because apparently rubocop parses EVERY config file it finds no matter where it’s located, and had to remove it to prevent errors. It may take you numerous tries to get through this. I recommend fixing a few things and committing before moving on to the next set of violations, so that you can find your way back if you make mistakes. Here’s a command that can help you edit all the files that can’t be autofixed by puppet-lint or rubocop (assuming you’ve already completed an autofix attempt):

vi $(pdk validate | egrep "(puppet-lint|rubocop)" | awk '{print $3}' | awk -F: '{print $1}' | sort | uniq | xargs)

Alternatively, you can disable rubocop entirely if you want by adding the following to your .sync.yml. If you are only writing spec tests, this is probably fine, but if you are writing facts, types, and providers, I do not suggest it.

.rubocop.yml:
  selected_profile: off

We have quite a few methods to fix all the possible errors that come our way. Once we have fixed everything, we can move on to the Unit Tests. We will re-run validation again after the unit tests, to ensure any changes we make for unit tests do not introduce new violations.

Unit Tests

Previously, we used bundle exec rake spec to run unit tests. The PDK way is pdk test unit. It performs pretty much the same, but it does collect all the output before displaying it, so if you have lots of fixtures and tests, you won’t see any output for a good long while and then bam, you get it all at once. The results will probably be just a tad overwhelming at first:

$ pdk test unit
pdk (INFO): Using Ruby 2.5.1
pdk (INFO): Using Puppet 6.0.2
[✔] Preparing to run the unit tests.
[✖] Running unit tests.
  Evaluated 782 tests in 110.76110479 seconds: 700 failures, 0 pending.
failed: rspec: ./spec/classes/profile/base__junos_spec.rb:11: Evaluation Error: Error while evaluating a Resource Statement, Unknown resource type: 'cron' (file: /home/rnelson0/puppet/controlrepo/spec/fixtures/modules/profile/manifests/base/junos.pp, line: 15, column: 3) on node build
  profile::base::junos with defaults for all parameters should contain Cron[puppetrun]
  Failure/Error:
    context 'with defaults for all parameters' do
      it { is_expected.to create_class('profile::base::junos') }
      it { is_expected.to create_cron('puppetrun') }
    end
  end

failed: rspec: ./spec/classes/profile/base__linux_spec.rb:12: Evaluation Error: Error while evaluating a Resource Statement, Unknown resource type: 'sshkey' (file: /home/rnelson0/puppet/controlrepo/spec/fixtures/modules/ssh/manifests/hostkeys.pp, line: 13, column: 5) on node build
  profile::base::linux on redhat-6-x86_64 disable openshift selinux policy should contain Selmodule[openshift-origin] with ensure => "absent"
  Failure/Error:
        if (facts[:os]['family'] == 'RedHat') && (facts[:os]['release']['major'] == '6')
          context 'disable openshift selinux policy' do
            it { is_expected.to contain_selmodule('openshift-origin').with_ensure('absent') }
            it { is_expected.to contain_selmodule('openshift').with_ensure('absent') }
          end
failed: rspec: ./spec/classes/profile/base__linux_spec.rb:162: Evaluation Error: Error while evaluating a Resource Statement, Unknown resource type: 'cron' (file: /home/rnelson0/puppet/controlrepo/spec/fixtures/modules/os_patching/manifests/init.pp, line: 113, column: 3) on node build
  profile::base::linux on redhat-6-x86_64 when managing OS patching should contain Class[os_patching]
  Failure/Error:
          end

          it { is_expected.to contain_class('os_patching') }
          if (facts[:os]['family'] == 'RedHat') && (facts[:os]['release']['major'] == '7')
            it { is_expected.to contain_package('yum-utils') }

failed: rspec: ./spec/classes/profile/base__linux_spec.rb:18: error during compilation: Evaluation Error: Unknown variable: '::sshdsakey'. (file: /home/rnelson0/puppet/controlrepo/spec/fixtures/modules/ssh/manifests/hostkeys.pp, line: 12, column: 6) on node build
  profile::base::linux on redhat-7-x86_64 with defaults for all parameters should compile into a catalogue without dependency cycles
  Failure/Error:

        context 'with defaults for all parameters' do
          it { is_expected.to compile.with_all_deps }

          it { is_expected.to create_class('profile::base::linux') }

Whoa. Not cool. From 728 working tests to 700 failures is quite the explosion! And they blew up on missing resource types that are built-in to Puppet. What happened? Puppet 6, that’s what! However…

Fix Puppet 5 Tests First

When I ran pdk convert, it updated my metadata.json to specify it supported Puppet versions 4.7.0 through 6.x because I was missing any existing requirements section. The PDK defaults to using the latest Puppet version your metadata supports. Whoops! It’s okay, we can test against Puppet 5, too. I recommend that we get our existing tests working with the version of Puppet we wrote them for, just to get back to a known good state. We don’t want to be troubleshooting too many changes at once.

There are two ways to specify the version to use. There’s the CLI envvar PDK_PUPPET_VERSION that accepts a simple number like 5 or 6, which is preferred for automated systems like CI/CD, rather than humans. You can also use --puppet-version or --pe-version to set an exact version. I’m an old curmudgeon, so I’m using the non-preferred envvar setting today, but Puppet recommends using the actual program arguments! Regardless of how you specify the version, the PDK changes not just the Puppet version, but which version of Ruby it uses:

$ PDK_PUPPET_VERSION='5' pdk test unit
pdk (INFO): Using Ruby 2.4.4
pdk (INFO): Using Puppet 5.5.6
[✔] Preparing to run the unit tests.
[✖] Running unit tests.
  Evaluated 782 tests in 171.447295254 seconds: 519 failures, 0 pending.
failed: rspec: ./spec/classes/msswiki/init_spec.rb:24: error during compilation: Evaluation Error: Error while evaluating a Function Call, You must provide a hash of packages for wiki implementation. (file: /home/rnelson0/puppet/controlrepo/spec/fixtures/modules/msswiki/manifests/init.pp, line: 109, column: 5) on node build
  msswiki on redhat-6-x86_64  when using default params should compile into a catalogue without dependency cycles
  Failure/Error:
          end

          it { is_expected.to compile.with_all_deps }

          it { is_expected.to create_class('msswiki') }

failed: rspec: ./spec/classes/profile/apache_spec.rb:31: Evaluation Error: Error while evaluating a Resource Statement, Evaluation Error: Empty string title at 0. Title strings must have a length greater than zero. (file: /home/rnelson0/puppet/controlrepo/spec/fixtures/modules/concat/manifests/setup.pp, line: 59, column: 10) (file: /home/rnelson0/puppet/controlrepo/spec/fixtures/modules/apache/manifests/init.pp, line: 244) on node build
  profile::apache on redhat-7-x86_64 with additional listening ports should contain Firewall[100 Inbound apache listening ports] with dport => [80, 443, 8088]
  Failure/Error:
          end

          it {
            is_expected.to contain_firewall('100 Inbound apache listening ports').with(dport: [80, 443, 8088])
          }

failed: rspec: ./spec/classes/profile/base__linux_spec.rb:12: Evaluation Error: Unknown variable: '::sshecdsakey'. (file: /home/rnelson0/puppet/controlrepo/spec/fixtures/modules/ssh/manifests/hostkeys.pp, line: 36, column: 6) on node build
  profile::base::linux on redhat-6-x86_64 disable openshift selinux policy should contain Selmodule[openshift-origin] with ensure => "absent"
  Failure/Error:
        if (facts[:os]['family'] == 'RedHat') && (facts[:os]['release']['major'] == '6')
          context 'disable openshift selinux policy' do
            it { is_expected.to contain_selmodule('openshift-origin').with_ensure('absent') }
            it { is_expected.to contain_selmodule('openshift').with_ensure('absent') }
          end

Some of us may be lucky to make it through without errors here, but I assume most of us encounter at least a few failures, like I did – “only” 519 compared to 700 before. Don’t worry, we can fix this! To help us focus a bit, we can run tests on individual spec files using pdk bundle exec rspec <filename> (remembering to specify PDK_PUPPET_VERSION or to export the variable). Everyone has different problems here, but there are some common failures, such as missing custom facts:

  69) profile::base::linux on redhat-7-x86_64 when managing OS patching should contain Package[yum-utils]
      Failure/Error: include ::ssh::hostkeys

      Puppet::PreformattedError:
        Evaluation Error: Unknown variable: '::sshdsakey'. (file: /home/rnelson0/puppet/controlrepo/spec/fixtures/modules/ssh/manifests/hostkeys.pp, line: 12, column: 6) on node build

Nate McCurdy commented that instead of calling rspec directly, you can pass a comma-separated list of files with --tests​, e.g. pdk test unit --tests path/to/spec/file.rb,path/to/spec/file2.rb

I defined my custom facts in spec/spec_helper.rb. That has definitely changed. Here’s part of the diff from running pdk convert:

$ git diff origin/production spec/spec_helper.rb
diff --git a/spec/spec_helper.rb b/spec/spec_helper.rb
index de3e7e6..5e721b7 100644
--- a/spec/spec_helper.rb
+++ b/spec/spec_helper.rb
@@ -1,45 +1,44 @@
 require 'puppetlabs_spec_helper/module_spec_helper'
 require 'rspec-puppet-facts'

-add_custom_fact :concat_basedir, '/dne'
-add_custom_fact :is_pe, true
-add_custom_fact :root_home, '/root'
-add_custom_fact :pe_server_version, '2016.4.0'
-add_custom_fact :selinux, true
-add_custom_fact :selinux_config_mode, 'enforcing'
-add_custom_fact :sshdsakey, ''
-add_custom_fact :sshecdsakey, ''
-add_custom_fact :sshed25519key, ''
-add_custom_fact :pe_version, ''
-add_custom_fact :sudoversion, '1.8.6p3'
-add_custom_fact :selinux_agent_vardir, '/var/lib/puppet'
-
+include RspecPuppetFacts
+
+default_facts = {
+  puppetversion: Puppet.version,
+  facterversion: Facter.version,
+}

+default_facts_path = File.expand_path(File.join(File.dirname(__FILE__), 'default_facts.yml'))
+default_module_facts_path = File.expand_path(File.join(File.dirname(__FILE__), 'default_module_facts.yml'))

Instead of modifying spec/spec_helper.rb, facts should go in spec/default_facts.yml and spec/default_module_facts.yml. As the former is modified by pdk update, it is easier to maintain the later. Review the diff of spec/spec_helper.rb and spec/default_facts.yml (if we have the latter) for our previous custom facts and their values. When a test is failing for a missing fact, we can add it to spec/default_module_facts.yml in the format factname: “factvalue”.

  1) profile::base::linux on redhat-6-x86_64 disable openshift selinux policy should contain Selmodule[openshift-origin] with ensure => "absent"
     Failure/Error: include concat::setup

     Puppet::PreformattedError:
       Evaluation Error: Error while evaluating a Resource Statement, Evaluation Error: Empty string title at 0. Title strings must have a length greater than zero. (file: /home/rnelson0/puppet/controlrepo/spec/fixtures/modules/concat/manifests/setup.pp, line: 59, column: 10) (file: /home/rnelson0/puppet/controlrepo/spec/fixtures/modules/ssh/manifests/server/config.pp, line: 12) on node  build

This is related to an older version of puppetlabs/concat (v1.2.5). The latest is v5.1.0. After updating my Puppetfile and .fixtures.yml with the new version, I ran pdk bundle exec rake spec_prep to update the test fixtures, and this is resolved.

  2) profile::base::linux on redhat-6-x86_64 domain_join is true should contain Class[profile::domain_join]
     Failure/Error: include domain_join

     Puppet::PreformattedError:
       Evaluation Error: Error while evaluating a Function Call, Class[Domain_join]:
         expects a value for parameter 'domain_fqdn'
         expects a value for parameter 'domain_shortname'
         expects a value for parameter 'ad_dns'
         expects a value for parameter 'register_account'
         expects a value for parameter 'register_password' (file: /home/rnelson0/puppet/controlrepo/spec/fixtures/modules/profile/manifests/domain_join.pp, line: 8, column: 3) on node build

In this case, the mandatory parameters for an included class were not provided. The let (:params) block of our rspec contexts only allows us to set parameters for the current class. We have been pulling these parameters from hiera instead. This data should be coming from spec/fixtures/hieradata/default.yml, however, hiera lookup settings were also removed in spec/spec_helper.rb, breaking the existing hiera configuratioN:

 RSpec.configure do |c|
-  c.hiera_config = File.expand_path(File.join(__FILE__, '../fixtures/hiera.yaml'))
-  default_facts = {
-    puppetversion: Puppet.version,
-    facterversion: Facter.version
-  }

There is no replacement setting provided. The PDK was designed with hiera use in mind, so add this line (replacing the bold filename if yours is stored elsewhere) to .sync.yml and run pdk update. Your hiera config should start working again:

spec/spec_helper.rb:
  hiera_config: 'spec/fixtures/hiera.yaml'

These are just some common issues with a PDK conversion, but we may have others to resolve. We just need to keep iterating until we get through everything.

Things are looking good! But we are just done with the Puppet 5 rodeo. Before we move on to Puppet 6, now is a good time to make sure syntax validation works, and if you need to make changes to syntax, you then run the Puppet 5 tests that you know should work. Get that all settled before moving on.

Puppet 6 Unit Tests

There’s one big change to be aware of before we even run unit tests against Puppet 6. To make updating core types easier, without requiring a brand new release of Puppet, a number of types were moved into dedicated modules. This means that for Puppet 6 testing, we need to update our Puppetfile and .fixtures.yml (though the puppet agent all-in-one package packages these modules, we do not went our tests relying on an installed puppet agent). When we update these files, we need to make sure we ONLY deploy these core modules on Puppet 6, not Puppet 5- both for the master and testing – or we will encounter issues with Puppet 5. The Puppetfile is actually ruby code, so we can check the version before loading the modules (see note below), and .fixtures.yml accepts a puppet_version parameter to modules. We can click on each module name here to get the link for the replacement module. We do not have to add all of the modules, just the ones we use, but including the ones we are likely to use or have other modules depend on can reduce friction. The changes will look like this:

# Puppetfile
require 'puppet'
# as of puppet 6 they have removed several core modules into seperate modules
if Puppet.version =~ /^6\.\d+\.\d+/
  mod 'puppetlabs-augeas_core', '1.0.3'
  mod 'puppetlabs-cron_core', '1.0.0'
  mod 'puppetlabs-host_core', '1.0.1'
  mod 'puppetlabs-mount_core', '1.0.2'
  mod 'puppetlabs-sshkeys_core', '1.0.1'
  mod 'puppetlabs-yumrepo_core', '1.0.1'
end

# .fixtures.yml
fixtures:
  forge_modules:
    augeas_core:
      repo: "puppetlabs/augeas_core"
      ref: "1.0.3"
      puppet_version: ">= 6.0.0"
    cron_core:
      repo: "puppetlabs/cron_core"
      ref: "1.0.0"
      puppet_version: ">= 6.0.0"
    host_core:
      repo: "puppetlabs/host_core"
      ref: "1.0.1"
      puppet_version: ">= 6.0.0"
    mount_core:
      repo: "puppetlabs/mount_core"
      ref: "1.0.2"
      puppet_version: ">= 6.0.0"
    scheduled_task:
      repo: "puppetlabs/scheduled_task"
      ref: "1.0.0"
      puppet_version: ">= 6.0.0"
    selinux_core:
      repo: "puppetlabs/selinux_core"
      ref: "1.0.1"
      puppet_version: ">= 6.0.0"
    sshkeys_core:
      repo: "puppetlabs/sshkeys_core"
      ref: "1.0.1"
      puppet_version: ">= 6.0.0"
    yumrepo_core:
      repo: "puppetlabs/yumrepo_core"
      ref: "1.0.1"
      puppet_version: ">= 6.0.0"

Note: Having performed an actual upgrade to Puppet 6 now, I do NOT recommend adding the modules to the Puppetfile after all, unless you are specifying a newer version of the modules than is provided with the version of puppet-agent you are using, or you are not using the AIO versions, and ONLY if you have no Puppet 5 agents. Puppet 5 agents connecting to a Puppet 6 master will pluginsync these modules and throw errors instead of applying a catalog. If you do have multiple compile masters, you could conceivably keep a few running Puppet 5 and only have Puppet 5 agents connect to it, but that seems like a really specific and potentially problematic scenario, so in general, I repeat, I do NOT recommend adding the core modules to the Puppetfile. They must be placed in the .fixtures.yml file for testing, though.

Now give pdk test unit a try and see how it behaves. All the missing types will be back, so any errors we see now should be related to actual failures, or some other edge case I did not experience.

Note: I experienced the error Error: Evaluation Error: Error while evaluating a Function Call, undefined local variable or method `created' for Puppet::Pops::Loader::RubyLegacyFunctionInstantiator:Class when running my Puppet 6 tests immediately after working on the Puppet 5 tests. Nothing I found online could resolve this and when I returned to it later, it worked fine. I could not replicate the error, so I am unsure of what caused it. If you run into that error, I suggest starting a new session and running git clean -ffdx to remove unmanaged files, so that you start with a clean environment.

Updating CI/CD Integrations

Once both pdk validate and pdk test unit complete without error, we need to update the automated checks our CI/CD system uses. We all use different systems, but thankfully the PDK has many of us covered. For those who use Travis CI, Appveyor, or Gitlab, there are pre-populated .travis.yml.appveyor.yml, and .gitlab-ci.yml files, respectively. For those of us who use Jenkins, we have two options: 1) Copy one of these CI settings and integrate them into our build process (traditional or pipeline) or 2) apply profile::pdk to the Jenkins node and use the PDK for tests. Let’s look at the first option, basing it off the Travis CI config:

before_install:
  - bundle -v
  - rm -f Gemfile.lock
  - gem update --system
  - gem --version
  - bundle -v
script:
  - 'bundle exec rake $CHECK'
bundler_args: --without system_tests
rvm:
  - 2.5.0
env:
  global:
    - BEAKER_PUPPET_COLLECTION=puppet6 PUPPET_GEM_VERSION="~> 6.0"
matrix:
  fast_finish: true
  include:
    -
      env: CHECK="syntax lint metadata_lint check:symlinks check:git_ignore check:dot_underscore check:test_file rubocop"
    -
      env: CHECK=parallel_spec
    -
      env: PUPPET_GEM_VERSION="~> 5.0" CHECK=parallel_spec
      rvm: 2.4.4
    -
      env: PUPPET_GEM_VERSION="~> 4.0" CHECK=parallel_spec
      rvm: 2.1.9

We have to merge this into something useful for Jenkins. I am unfamiliar with Pipelines myself (I know, it’s the future, but I have $reasons!), but I have built a Jenkins server with RVM installed and configured it with a freestyle job. Here’s the current job:

#!/bin/bash
[[ -s /usr/local/rvm/scripts/rvm ]] && source /usr/local/rvm/scripts/rvm
# Use the correct ruby
rvm use 2.1.9

git clean -ffdx
bundle install --path vendor --without system_tests
bundle exec rake test

The new before_install section cleans up the local directory, equivalent to git clean -ffdx, but it also spits out some version information and runs gem update. These are optional, and the latter is only helpful if you have your gems cached elsewhere (the git clean will wipe the updated gems otherwise, wasting time). The bundler_args are already part of the bundle install command. The rvm version varies by puppet version, that will need tweaked. The test command is now bundle exec rake $CHECK, with a variety of checks added in the matrix section. test used to do everything in the first 2 matrix sections; parallel_spec just runs multiple tests at once instead of in serial which can be faster. The 3rd and 4th matrix sections are for older puppet versions. We can put this together into multiple jobs, into a single job that tests multiple versions of Puppet, or into a single job testing just one Puppet version. Here’s what a Jenkins job would look like that tests Puppet 6 and 5:

#!/bin/bash
[[ -s /usr/local/rvm/scripts/rvm ]] && source /usr/local/rvm/scripts/rvm

# Puppet 6
export BEAKER_PUPPET_COLLECTION=puppet6
export PUPPET_GEM_VERSION="~> 6.0"
rvm use 2.5.0
bundle -v
git clean -ffdx
# Comment the next line if you do not have gems cached outside the job workspace
gem update --system
gem --version
bundle -v

bundle install --path vendor --without system_tests
bundle exec rake syntax lint metadata_lint check:symlinks check:git_ignore check:dot_underscore check:test_file rubocop
bundle exec rake parallel_spec

# Puppet 5
rvm use 2.4.4
export PUPPET_GEM_VERSION="~> 5.0"
bundle exec rake parallel_spec

This creates parity with the pre-defined CI tests for other services.

The other option is adding profile::pdk to the Jenkins server, probably through the role, and use the PDK to run tests. That Jenkins freestyle job looks a lot simpler:

#!/bin/bash
PDK=/usr/local/bin/pdk
echo -n "PDK Version: "
$PDK --version

# Puppet 6
git clean -ffdx
$PDK validate
$PDK test unit

# Puppet 5
git clean -ffdx
PDK_PUPPET_VERSION=5 $PDK test unit

This is much simpler, and it should not need updated until removing Puppet 5 or adding Puppet 7 when it is released, whereas the RVM version in the bundle-version may need tweaked throughout the Puppet 6 lifecycle as Ruby versions change. However, the tests aren’t exactly the same. Currently, pdk validate does not run the rake target check:git_ignore, and possibly other check: tasks. In my opinion, as the pdk improves, the benefit of only having to update the PDK package version and not the git-based automation outweighs the single missing check and the maintenance of RVM on a Jenkins server. And for those of us using Travis CI/Appveyor/Gitlab-CI, it definitely makes sense to stick with the provided test setup as it requires almost no maintenance.

I used this earlier without explaining it, but the PDK also provides the ability to run pdk bundle, similar to the native bundle but using the vendored ruby and gems provided by the PDK. We can run individual tests like pdk bundle exec rake check:git_ignore, or install the PDK and modify the “bundle” Jenkins job to recreate the bundler setup using the PDK and not have to worry about RVM at all. I’ll leave that last as an exercise for the user, though.

We must be sure to review our entire Puppet pull request process and see what other integrations need updated, and of course we must update documentation for our colleagues. Remember, documentation is part of “done”, so we cannot say we are done until we update it.

Finally, with all updates in place, submit a Pull Request for your controlrepo changes. This Pull Request must go through the new process, not just to verify that it passes, but to identify any configuration steps you missed or did not document properly.

Summary

Today, we looked at converting our controlrepo to use the Puppet Development Kit for testing instead of bundler-based testing. It required lots of changes to our controlrepos, many of which the PDK handled for us via autocorrect; others involved manual updates. We reviewed a variety of changes required for CI/CD integrations such as Travis CI or Jenkins. We reviewed the specifics of our setup that others don’t share so we had a working setup top to bottom, and we updated our documentation so all of our colleagues can make use of the new setup. Finally, we opened a PR with our changes to validate the new configuration.

By using the PDK, we have leveraged the hard work of many Puppet employees and Puppet users alike who provide a combination of rigorously vetted sets of working dependencies and beneficial practices, and we can continue to improve those benefits by simply updating our version of the PDK and templates in the future. This is a drastic reduction in the mental load we all have to carry to keep up with Puppet best practices, an especially significant burden on those responsible for Puppet in their CI/CD sysytems. I would like to thank everyone involved with the PDK, including David Schmitt, Lindsey Smith, Tim Sharpe, Bryan Jen, Jean Bond, and the many other contributors inside and outside Puppet who made the PDK possible.

Linux OS Patching with Puppet Tasks

One of the biggest gaps in most IT security policies is a very basic feature, patching. Specific numbers vary, but most surveys show a majority of hacks are due to unpatched vulnerabilities. Sadly, in 2018, automatic patching on servers is still out of the grasp of many, especially those running older OSes.

While there are a number of solutions out there from OS vendors (WSUS for Microsoft, Satellite for RHEL, etc.), I manage a number of OSes and the one commonality is that they are all managed by Puppet. A single solution with central reporting of success and failure sounds like a plan. I took a look at Puppet solutions and found a module called os_patching by Tony Green. I really like this module and what it has to offer, even though it doesn’t address all my concerns at this time. It shows a lot of promise and I suspect I will be working with Tony on some features I’d like to see in the future.

Currently, os_patching only supports Red Hat/Debian-based Linux distributions. Support is planned for Windows, and I know someone is looking at contributing to provide SuSE support. The module will collect information on patching that can be used for reporting, and patching is performed through a Task, either at the CLI or using the PE console’s Task pane.

Setup

Configuring your system to use the module is pretty easy. Add the module to your Puppetfile / .fixtures.yml, add a feature flag to your profile, and include os_patching behind the feature flag. Implement your tests and you’re good to go. Your only real decision is whether you default the feature flag to enabled or disabled. In my home network, I will enable it, but a production environment may want to disable it by default and enable it as an override through hiera. Because the fact collects data from the node, it will add a few seconds to each agent’s runtime, so be sure to include that in your calculation.

Adding the module is pretty simple, Here are the Puppetfile / .fixtures.yml diffs:

# Puppetfile
mod 'albatrossflavour/os_patching', '0.3.5'

# .fixtures.yml
fixtures:
  forge_modules:
    os_patching:
      repo: "albatrossflavour/os_patching"
      ref: "0.3.5"

Next, we need an update to our tests. I will be adding this to my profile::base, so I modify that spec file. Add a test for the default feature flag setting, and one for the non-default setting. Flip the to and not_to if you default the feature flag to disabled. If you run the tests now, you’ll get a failure, which is expected since there is no supporting code in the class yet.(there is more to the test, I have only included the framework plus the next tests):

require 'spec_helper'
describe 'profile::base', :type => :class do
  on_supported_os.each do |os, facts|
    let (:facts) {
      facts
    }

    context 'with defaults for all parameters' do
      it { is_expected.to contain_class('os_patching') }
    end

    context 'with manage_os_patching enabled' do
      let (:params) do {
        manage_os_patching: false,
      }
      end

      # Disabled feature flags
      it { is_expected.not_to contain_class('os_patching') }
    end
  end
end

Finally, add the feature flag and feature to profile::base (the additions are in italics):

class profile::base (
  Hash    $sudo_confs = {},
  Boolean $manage_puppet_agent = true,
  Boolean $manage_firewall = true,
  Boolean $manage_syslog = true,
  Boolean $manage_os_patching = true,
) {
  if $manage_firewall {
    include profile::linuxfw
  }

  if $manage_puppet_agent {
    include puppet_agent
  }
  if $manage_syslog {
    include rsyslog::client
  }
  if $manage_os_patching {
    include os_patching
  }
  ...
}

Your tests will pass now. That’s all it takes! For any nodes where it is enabled, you will see a new fact and some scripts pushed down on the next run:

[rnelson0@build03 controlrepo:production]$ sudo puppet agent -t
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Notice: /File[/opt/puppetlabs/puppet/cache/lib/facter/os_patching.rb]/ensure: defined content as '{md5}af52580c4d1fb188061e0c51593cf80f'
Info: Retrieving locales
Info: Loading facts
Info: Caching catalog for build03.nelson.va
Info: Applying configuration version '1535052836'
Notice: /Stage[main]/Os_patching/File[/etc/os_patching]/ensure: created
Info: /Stage[main]/Os_patching/File[/etc/os_patching]: Scheduling refresh of Exec[/usr/local/bin/os_patching_fact_generation.sh]
Notice: /Stage[main]/Os_patching/File[/usr/local/bin/os_patching_fact_generation.sh]/ensure: defined content as '{md5}af4ff2dd24111a4ff532504c806c0dde'
Info: /Stage[main]/Os_patching/File[/usr/local/bin/os_patching_fact_generation.sh]: Scheduling refresh of Exec[/usr/local/bin/os_patching_fact_generation.sh]
Notice: /Stage[main]/Os_patching/Exec[/usr/local/bin/os_patching_fact_generation.sh]: Triggered 'refresh' from 2 events
Notice: /Stage[main]/Os_patching/Cron[Cache patching data]/ensure: created
Notice: /Stage[main]/Os_patching/Cron[Cache patching data at reboot]/ensure: created
Notice: Applied catalog in 54.18 seconds

You can now examine a new fact, os_patching, which will shows tons of information including the pending package updates, the number of packages, which ones are security patches, whether the node is blocked (explained in a bit), and whether a reboot is required:

[rnelson0@build03 controlrepo:production]$ sudo facter -p os_patching
{
  package_updates => [
    "acl.x86_64",
    "audit.x86_64",
    "audit-libs.x86_64",
    "audit-libs-python.x86_64",
    "augeas-devel.x86_64",
    "augeas-libs.x86_64",
    ...
  ],
  package_update_count => 300,
  security_package_updates => [
    "epel-release.noarch",
    "kexec-tools.x86_64",
    "libmspack.x86_64"
  ],
  security_package_update_count => 3,
  blocked => false,
  blocked_reasons => [],
  blackouts => {},
  pinned_packages => [],
  last_run => {},
  patch_window => "",
  reboots => {
    reboot_required => "unknown"
  }
}

Additional Configuration

There are a number of other settings you can configure if you’d like.

  • patch_window: a string descriptor used to “tag” a group of machines, i.e. Week3 or Group2
  • blackout_windows: a hash of datetime start/end dates during which updates are blocked
  • security_only: boolean, when enabled only the security_package_updates packages and dependencies are updated
  • reboot_override: boolean, overrides the task’s reboot flag (default: false)
  • dpkg_options/yum_options: a string of additional flags/options to dpkg or yum, respectively

You can set these in hiera. For instance, my global config has some blackout windows for the next few years:

os_patching::blackout_windows:
  'End of year 2018 change freeze':
    'start': '2018-12-15T00:00:00+1000'
    'end':   '2019-01-05T23:59:59+1000'
  'End of year 2019 change freeze':
    'start': '2019-12-15T00:00:00+1000'
    'end':   '2020-01-05T23:59:59+1000'
  'End of year 2020 change freeze':
    'start': '2020-12-15T00:00:00+1000'
    'end':   '2021-01-05T23:59:59+1000'
  'End of year 2021 change freeze':
    'start': '2021-12-15T00:00:00+1000'
    'end':   '2022-01-05T23:59:59+1000'

Patching Tasks

Once the module is installed and all of your agents have picked up the new config, they will start reporting their patch status. You can query nodes with outstanding patches using PQL. A search like inventory[certname] {facts.os_patching.package_update_count > 0 and facts.clientcert !~ 'puppet'} can find all your agents that have outstanding patches (except puppet – kernel patches require reboots and puppet will have a hard time talking to itself across a reboot). You can also select against a patch_window selection with and facts.os_patching.patch_window = "Week3" or similar. You can then provide that query to the command line task:

puppet task run os_patching::patch_server --query="inventory[certname] {facts.os_patching.package_update_count > 0 and facts.clientcert !~ 'puppet'}"

Or use the Console’s Task view to run the task against the PQL selection:

Add any other parameters you want in the dialog/CLI args, like setting rebootto true, then run the task. An individual job will be created for each node, all run in parallel. If you are selecting too many nodes for simultaneous runs, use additional filters, like the aforementioned patch_window or other facts (EL6 vs EL7, Debian vs Red Hat), etc. to narrow the node selection [I blew up my home lab, which couldn’t handle the CPU/IO lab, when I ran it against all systems the first time, whooops!]. When the job is complete, you will get your status back for each node as a hash of status elements and the corresponding values, including return (success or failure), reboot, packages_updated, etc. You can extract the logs from the Console or pipe CLI logs directly to jq (json query) to analyze as necessary.

Summary

Patching for many of us requires additional automation and reporting. The relatively new puppet module os_patching provides helpful auditing and compliance information alongside orchestration tasks for patching. Applying a little Puppet Query Language allows you to update the appropriate agents on your schedule, or to pull the compliance information for any reporting needs, always in the same format regardless of the (supported) OS. Currently, this is restricted to Red Hat/Debian-based Linux distributions, but there are plans to expand support to other OSes soon. Many thanks to Tony Green for his efforts in creating this module!

Using Puppet Enterprise 2018’s new backup/restore features

I was pretty excited when I read the new features in Puppet Enterprise 2018.1. There are a lot of cool new features and fixes, but the backup/restore feature stood out for me. Even with just 5 VMs at home, I don’t want to rock the boat when rebuilding my master by losing my CA or agent certs, much less with a lot more managed nodes at work, and all the little bootstrap requirements have changed since I started using PE in 2014. Figuring out how to get everything running myself would be possible, but it would take a while and be out of date in a few months anyway. Then there is everything in PuppetDB that I do not want to lose, like collected facts/resources and run reports.

Not coincidentally, I still had a single CentOS 6 VM around because it was my all-in-one puppet master, and migrating to CentOS 7 was not something I looked forward to due to the anticipated work it would require. With the release of this feature, I decided to get off my butt and do the migration. It still took over a month to make it happen, between other work, and I want to share my experience in the hope it saves someone else a bit of pain.

Create your upgrade outline

I want to summarize the plan at a really high level, then dive in a bit deeper. Keep in mind that I have a single all-in-one master using r10k and my plan does not address multi-master or split deployments. Both of those deployment models have significantly different upgrade paths, please be careful if you try and map this outline onto those models without adjusting. For the all-in-one master, it’s pretty simple:

  • Backup old master
  • Deploy a new master VM running EL7
  • Complete any bootstrapping that isn’t part of the backup
  • Install the same version of PE
  • Restore the old master’s backup onto the new master
  • Run puppet
  • Point agents at the new master

I will cover the backup/restore steps at the end, so the first step to cover is deploying a new master. This part sounds simple, but if Puppet is currently part of your provisioning process and you only have one master, you’ve got a catch 22 situation – new deployments must talk to puppet to complete without errors, and if you deploy a new puppet master using the same process, it will either fail to communicate with itself since PE is not installed, or it will talk to a PE installation that does not reflect your production environment. We need to make sure that we have the ability to provision without puppet, or be prepared for some manual efforts in the deploy. With a single master, manual efforts aren’t that burdensome, but can still reduce accuracy, which is why I prefer a modified automated provisioning workflow.

A lot of bootstrapping – specifically hiera and r10k/code manager – should be handled by the restore. There were just a few things I needed to do:

  • Run ssh-keygen/install an existing key and attach that key to the git system. You can avoid this by managing the ssh private/public keys via file resources, but you will not be able to pull new code until puppet processes that resource.
  • SSH to your git server and accept the key. You can avoid this with the sshkey resource, with the same restriction.
  • Check your VMs default iptables/selinux posture. I suggest managing security policy via puppet, which should prevent remote agents from connecting before the first puppet run, but it’s also possible to prevent the master from communicating with itself with the wrong default policy.
  • Check the hostname matches your expectations. All of /etc/hosts, /etc/hostname, /etc/sysconfig/network should list the short and FQDN properly, and hostname; hostname -f should return the same values. /etc/resolv.conf may also need the search domain. Fix any issues before installing PE, as certs are generated during install, and having the wrong hostname result can cause cascading faults best addressed by starting over.

The restore should get the rest from the PE side of things. If your provisioning automation performs other work that you had to skip, make sure you address it now, too.

Installing PE is probably the one manual step you cannot avoid. You can go to https://support.puppet.com and find links to current and past PE versions. Make sure you get the EL7 edition and not the EL6 edition. I did not check with Support, but I assume that you must restore on the same version you backed up, I would not risk even a patch release difference.

Skipping the restore brings us to running the agent, a simple puppet agent -t on the master, or waiting 30 minutes for the run to complete on its own.

The final step may not apply to your situation. In addition to refreshing the OS of the master, I switched to a new hostname. If you’re dropping your new master on top of the existing one’s hostname/IP, you can skip this step. I forked a new branch from production called mastermigration. The only change in this branch is to set the server value in /etc/puppetlabs/puppet/puppet.conf. There are a number of ways to do this, I went with a few ini_setting resources and a flag manage_puppet_conf in my profile::base::linux. The value should only be in one of the sections main or agent, so I ensured it is in main and absent elsewhere:

  if $manage_puppet_conf {
    # These settings are very useful during migration but are not needed most of the time
    ini_setting { 'puppet.conf main server':
      ensure => present,
      path => '/etc/puppetlabs/puppet/puppet.conf',
      section => 'main',
      setting => 'server',
      value => 'puppet.example.com',
    }
    ini_setting { 'puppet.conf agent server':
      ensure => absent,
      path => '/etc/puppetlabs/puppet/puppet.conf',
      section => 'agent',
      setting => 'server',
    }
  }

During the migration, I can just set profile::base::linux::manage_puppet_conf: true in hiera for the appropriate hosts, or globally, and they’ll point themselves at the new master. Later, I can set it to false if I don’t want to continue managing it (while there is no reason you cannot leave the flag enabled, by leaving it as false normally you can ensure that changing the server name here does not take effect unless purposefully flip the flag; you could also parameterize the server name).

Now let’s examine the new feature that makes it go.

Backups and Restores

Puppet’s documentation on the backup/restore feature provides lots of detail. It will capture the CA and certs, all your currently deployed code, your PuppetDB contents including facts, and almost all of your PE config. About the only thing missing are some gems, which you should hopefully be managing and installing with puppet anyway.

Using the new feature is pretty simple, puppet-backup createor puppet-backup restore <filename> will suffice for this effort. There are a few options for more fine-grained control, such as backup/restore of individual scopes with --scope=<scopes>[,<additionalscopes>...], e.g. --scope=certs.

 

The backup will only backup the current PE edition’s files, so if you still have /etc/puppet on your old master from PE 3 days, that will not be part of the backup. However, files in directories it does back up, like /etc/puppetlabs/puppet/puppet.conf.rpmsave, will persist. This will help reduce cruft, but not eliminate it. You will still need to police on-disk content. In particular, if you accidentally placed a large file in /etc/puppetlabs, say the PE install tarball, that will end up in your backup and can inflate the size a bit. If you feel the backup is exceptionally large, you may want to search for large files in that path.

The restore docs also specify two commands to run after a restore when Code Manager is used. If you use CM, make sure not to forget this step:

puppet access login
puppet code deploy --all --wait 

The backup and restore process are mostly time-dependent on the size of your puppetdb. With ~120 agents and 14 days of reports, it took less than 10 minutes for either process and generated a ~1G tarball. Larger environments may expect the master to be offline for a bit longer, if they want to retain their full history.

Lab it up

The backup/restore process is great, but it’s new, and some of us have very ancient systems laying around. I highly recommend testing this in the lab. My test looked like this:

  • Clone the production master to a VM on another hostname/IP
  • Run puppet-backup create
  • Fully uninstall PE (sudo /opt/puppetlabs/bin/puppet-enterprise-uninstaller -p -d -y)
  • Remove any remaining directories with puppet in them, excepting the PE 2018 install files, to ensure all cruft is gone
  • Disable and uninstall any r10k webhook or puppet-related services that aren’t provided by PE itself.
  • Reboot
  • Bootstrap (from above)
  • Install PE (sudo /opt/puppetlabs/bin/puppet-enterprise-installer) only providing an admin password for the console
  • Run puppet-backup restore <backup file>
  • Run puppet agent -t
  • Make sure at least one agent can check in with puppet agent -t --server=<lab hostname> (clone an agent too if need be)
  • Reboot
  • Make sure the master and agent can still check in, Console works, etc.
  • If possible, test any systems that use puppet to make sure they work with the new master
  • Identify any missing components/errors and repeat the process until none are observed

I mentioned that I used PE3. My master had been upgraded all the way from version 3.7 to 2018.1.2. I’m glad I tested this, because there were some unexpected database settings that the restore choked on. I had to engage Puppet Support who provided the necessary commands to update the database so I could get a useful backup. This also allowed me to identify all of my bootstrap items and of course, gain familiarity and confidence with the process.

This became really important for me because, during my production migration, I ran into a bug in my provisioning system where the symptom presented itself through Puppet. Because I was very practiced with the backup/restore process, I was able to quickly determine PE was NOT the problem and correctly identify the faulty system. Though it took about 6 hours to do my “very quick” migration, only about an hour of that was actually spent on the Puppet components.

I also found a few pieces of managed files on the master where the code presumed the directory structure would already be there, which it turns out was not the case. I must have manually created some directories 4 years ago. I think the most common issues you would find at this point are dependencies and ordering, but there may be others. Either fix the code now or, if it would negatively affect the production server, prep a branch for merging just prior to the migration, with the plan to revert if you rollback.

I strongly encourage running through the process a few times and build the most complete checklist you can before moving on to production.

Putting it together

With everything I learned in the lab, my final outline looked like this:

  • Backup old master, export to another location
  • Deploy a new master VM running EL7 using an alternative workflow
  • Run ssh-keygen/install an existing key and attach that key to the git system
  • SSH to the git server and accept the key
  • Verify your VMs default iptables/selinux posture; disable during bootstrap if required
  • Validate the hostname is correct
  • Install PE
  • Restore the backup
  • [Optional] Merge any code required for the new server; run r10k/CM to ensure it’s in place on the new master
  • Run puppet
  • Point agents at the new master

Yours may look slightly different. Please, spend the time in the lab to practice and identify any missing steps, it’s well worth it.

Summary

Refreshing any system of significant age is always possible, and often fraught with manual processes that are prone to error. Puppet Enterprise 2018.1 delivered a new backup/restore process that automates much of this process. We have put together a rough outline, refined it in the lab, and then used it to perform the migration in production with high confidence, accounting for any components the backup did not include. I really appreciate this new feature and I look forward to refinements in the future. I hope that soon enough migrations should be as simple an effective as in-place upgrades.

Contributing to a Political Campaign as a Nerd

As I promised in my previous politics article, I will continue not to advocate for specific politics and remain non-partisan on my blog. I do encourage everyone, regardless of beliefs or party, to participate in politics, because it affects you whether you participate or not. Participation in our democracy can only improve it.

Over the past year or two, I have come to feel far more strongly about my politics. I live in the United States, and it’s impossible to pay attention to the news and not feel some kind of way about our Republic. This year, I decided that I needed to contribute more directly, not just passively partake in politics. If you already follow me on twitter, you know that I wear my politics on my sleeve. Like many readers, I regularly vote. Like many readers, I donate to campaigns. That’s not enough for me anymore. So I decided to do more, and reached out to some local campaigns to find out what I could offer.

I will admit that this is scary. I have spent almost 20 years working on a relatively narrow field of expertise within IT. I had no experience with politics. Going from 20 years of experience to 0 is intimidating – but if I did it, so can you. If you read this and want to contribute, I want you to know that you will do just fine. As divisive and loud and argumentative and nasty as politics seems on the evening news, every group I have worked with over the past year has been welcoming, very graceful about my lack of knowledge and mistakes, and very accommodating to how much time I have available. Please, don’t let your fear keep you out. Reach out if you have any questions!

The expertise we have is sorely needed, though, especially in political campaigns. You will quickly find out that most people involved in political campaigns are not computer experts in any way. Sure, they’re computer savvy as most people are nowadays, but there are significant gaps in that knowledge that needs filled. All you need to do is read the news and you will quickly hear about campaigns that are hacked and crowdsourced analysis of what’s on a politicians phone and overwhelming numbers of twitter and facebook shenanigans. Your help is needed and will be welcomed.

I joined a campaign and I had no idea what I was getting into. I did not find many others in tech who have shared their experiences joining their first campaign. I hope this article helps fill this gap a little bit – and if any readers are in the same situation, I would encourage you to blog about it as well! Alright, let’s get volunteering!

As I hit publish on this, there are 85 days until Election Day in the US. It is NOT too late to volunteer! Your assistance will be welcomed up until the final moments on Election Day, and there are always future elections to prepare for.

What to expect

When you click Sign Up on a campaign web site, you’ll be offered some “normal” work – canvassing, phone banking, putting a sign in your yard, etc. All things technical are notably missing from the list. To offer your technical expertise, you will have to reach out to the campaign directly. Many campaigns or candidates have a listed phone number on their website. If not, try looking at the county or state party’s website for a phone number, and inform them that you would like to get in contact with the campaign.

You will have a chance at some point to talk to the candidate or a campaign manager and make sure that’s who you want to work for. Treat it like a job interview! The campaign will ask what you can do, and you get to ask the campaign about what they will do if elected. Be honest! When I first talked to my campaign, I explained that I had 20 years of IT experience but no campaign experience. I was willing to take on things unknown, but I wanted to make sure they knew it would be new to me. I found out they had an experienced webmaster who would be providing me assistance if I joined. I also asked a lot of pointed policy questions to ensure that I would be happy if this candidate was elected. Get your questions answered and let the campaign know whether you want to join and what you can contribute.

General tech tasks

There are so many areas you can contribute to the technology side of a campaign, regardless of where in technology you work. Here’s a very short, very incomplete list of items you can help with:

  • Setting up a free Slack and teaching people how to use it (EVERYONE uses slack nowadays!)
  • Setting up a website and analytics
  • Configuring multi-factor authentication on all services
  • Setting up apps on phones and tablets
  • Answering questions about how to use a computer, application, or service, even if you’re just functioning as Google as a Service for really busy people
  • Providing a sounding board for anything technical, including how technical people and companies may respond to something

Depending on what your expertise is, you may be able to offer some very specific needs. Surely, what you know can be applied to a campaign, though I may not be able to tell you how. A lot of my expertise is in information security. Here are some examples of InfoSec advice you can provide:

  • Explain threat models. Make sure you know how they apply, too; the threat model of running for US Senator is much different than that of running for a local council seat. Everyone can benefit from making sure they don’t expose their financial details to the world, but fewer are worried about specific attacks by enemy nation-states.
  • Ensure services are registered to a well protected account that belongs to the campaign instead of a random gmail account that belongs to someone who may leave the campaign.
  • Make sure MFA is enabled everywhere possible.
  • Restrict access to services to who needs it, and at the least permission level

Remember that you are advising a campaign, not running your own business, so you will probably “lose” some arguments in areas where you are objectively the expert, and that’s okay. Make sure everyone acknowledges the trade-offs being made, and do your best to minimize the potential fallout. If there’s a realistic chance of failure, prepare remediation plans so that you are ready if they are needed – the same kinds of thing you do at work to cover your company’s butt.

Be aware of how much you can contribute. If you can only spend 2 hours a week with the campaign at odd times of the day, maybe you are not the best person to run their web site. That’s okay, just make the campaign aware that you want to help as well as your limits and surely they can find a way to make it fit. If you can spend more hours, then I encourage you to take on more significant tasks. If your circumstances change, just let the campaign know!

There are tons of small things like this that you can contribute. Don’t worry if you can’t think of something now, if you reach out to the campaign I’m sure you can come up with something together!

Larger projects

In addition to these general tasks, you may be able to contribute to higher level projects to help the campaign. If you are a data scientist, a campaign needs you! Everyone needs to know which voters to target, and they’re hopefully looking for more ethical assistance than we have seen campaigns pursue in the past. Many campaigns can determine what kinds of voters they want to target, but they may lack the skills to find those voters within the mass of voter information available. Those who are great with analytics can help get data from the web sites to the voter analysis teams. Social media experts can help leverage Twitter, Facebook, Instagram, and other services to get messaging out effectively. Online advertising needs your expertise in marketing and advertising. Larger campaigns may need custom applications like HillaryBnB (an AirBnB-style app for canvassers).

Again, I had no experience with campaigns so these are just a few of the efforts I’ve observed recently, it’s a very non-comprehensive list of options. Each campaign’s needs are different, so I suggest checking with the campaign to see what is needed, rather than trying to offer specific projects.

Tackling the unknown

Though you are volunteering because you have expertise, sometimes what a campaign needs does not line up exactly. It’s a good thing we are an industry that is constantly learning! Lean on your existing expertise to get going.

I decided to help out a campaign with Google AdWords, as the lack of such a campaign was identified when I joined. Prior to June 1st, I had never used AdWords or done anything with advertising, online or otherwise. Yeah, it was intimidating. But I believe in my candidate, so I tackled it like any other tech I learned. I found some technical articles about how AdWords works and tips for novices, scrounged up some youtube video so I could see it, and then set up an account and got to work. After almost 3 weeks, I am starting to figure it out, and the campaign is benefiting from my efforts. Find something and dig in. You can grow your technical skills and help advocate for your politics at the same time!

While I will not pretend to be authoritative on AdWords, I want to share some things I learned, beyond the simple mechanics that you can learn through the documentation and tutorials:

  • Create the AdWords account with a central campaign account. Add additional administrators with campaign-specific emails, rather than personal emails. This makes it more difficult for someone to leave the campaign and take down advertising.
  • There’s a pretty decent iPhone app for AdWords, and it’s gives you some views not available on the web site, but you cannot edit very much on it. I am sure there is one for Android as well.
  • Google will provide a $100 coupon after spending your first $25. It will be sent in an email to the account owner’s email and it’s not automatic, you need to apply the code.
  • Google will also send an email advertising a free review of your account. I would wait a few weeks before calling, so that the specialists can see some data from your campaigns.
  • Impressions (someone seeing your ad) are free. Clicks (when someone actually clicks on them) are the only thing that costs you money.
  • Campaigns are made up of Ad Groups. Each ad group can advertise a different set of text and point to a different page, but funding is allocated on a Campaign level. You can add as many ad groups to a single campaign, but budget is allocated at the campaign level. You need to balance the number of campaigns and ad groups, and keep balancing them as voters’ interests change.
  • Each Campaign can also be targeted to different locations. You can limit some AdWords Campaigns to the political campaign’s region (by district for federal offices; by using zip codes for state and local offices) to focus your advertising, such as issues-based campaigns. Others may be made open to a wider area, maybe even the whole country, such as donation campaigns.
  • Each Ad Group can be made up of keywords which receive mostly-opaque scoring. The better the score, the better the placement. Keywords that do not result in Impressions receive a lower score and can drag down the score of the entire Ad Group. Disable keywords that do not work, or replace them with more specific and helpful keywords, to keep the scoring up. All the scoring is Google magic, and the effectiveness of this will vary quite a bit for every organization. Keep an eye on it.
  • Keywords are words or phrases that you want your ad matched to. You can also add Exclusions. Combine this with Search Terms results, which show the actual search someone used and the keyword or category it matched, to filter out Impressions/Clicks that are not helpful to your campaign. I have seen some really ridiculous search terms, including the amazing imagen you are an environmentalist giving a speech environmental due to population growth in the western united states that matched the keyword environment. Whoops, way too generic, it was replaced with a more specific phrase. Another was a search including the name of another candidate in another entire state and that one click ate the entire campaign budget for the day. Keeping up with Exclusions can save your campaign a lot of money.
  • Google watches monthly trends to determine when best to spend money. Your daily budget is better thought of as the average daily spend over a 30 day period. For example, Google may determine that you won’t get much out of ads on Saturday and spend close to $0, but the 2nd Wednesday of the month get the most impact and spend far more than your budget. You will only ever be charged 2x your specified budget, even if Google “spends” 2.3x your budget (I have never observed them going past 2.0x).
  • Almost everything you do with AdWords can be changed on the fly. However, there are two things to keep in mind:
    • Any new ad or keywords (and you cannot edit ads/keywords, you actually make a new one to replace the existing one) must be approved. It can take up to 24 hours to approve. You can add ads/keywords and disable them, then enable them when needed, to ensure they are approved prior to when they are needed.
    • Significant budget changes may flag a fraud alert. If you have a significant event-related campaign coming up, set it up at least 3 days in advance, as it can take 3 days to resolve suspected fraud. If you are spending $10 a day and want to increase that to $300 for a weekend event and your account gets flagged on Friday, it may be Monday before it is unfrozen and your event will be over.
  • You won’t find many AdWords tutorials that speak to political campaigns. It’s strongly associated with businesses. A few articles and charts mentioned Social Advocacy, which is probably the closest, but…
  • Success is difficult to measure. A business may track how many people click versus how many people order something. A donation campaign can track how many people donate, but an issues or awareness campaign cannot correlate visitors to the site with votes in a primary or general election. Conversion rates on their own won’t tell you much.
    • For many candidates, awareness itself is the goal. Many voters do not fill out the whole ballot and only check non-federal boxes if they recognize the name. Responsive Ads (as opposed to Text Ads) are fairly unobtrusive but can display small graphics. Logos are very helpful to start creating brand awareness.
    • Run at least two Ads for each campaign, much like you would do A/B testing at work. Review regularly and tweak the ads over time.
  • Advertising is not an island. Coordinate with the Social Media team and the event planners. If your candidate is going to be at a local festival or state fair in a few weeks, add some keywords for the event. If Social Media is blitzing on a policy, make sure common keywords will drive voters to your candidate. When things happen in the news that affect your voters, make sure their searches will bring them to your candidate. Likewise, you can review search terms people use and inform the other teams that these are some of the things voters are searching for and make sure the candidate speaks to those concerns.
    • This can feel very ghoulish or morbid. Much of news that drives people to politicians is going to be of a negative nature. We generally don’t call our elected officials when things are looking up. You WILL probably have to capitalize on an event that resulted in harm or death. In my case, unfortunately, there was a school shooting near the district. Ugh. But voters do want to know their candidate’s policies on subjects like school shootings. Be responsible and principled and above all, caring. Do not let the need to respond compromise your integrity or the candidate’s.
  • Your candidate’s party has other candidates running. Reach out to them for assistance, for ideas, for additional eyes on problems. You can get contact info from your county/state party’s offices, usually.
  • AdWords includes a large number of reports. Working in IT, our tendency is to encourage others to run reports themselves, but everyone on a political campaign is likely already spending all the time they can on their areas of expertise. It can be a huge help if you create/tweak reports for others, and schedule them to email the requester regularly.

Summary

Just because many of us make technology a huge part of our lives, we are not one dimensional. If you feel inspired by politics, don’t hide it, become active. I’ve discussed what you might expect if you join a political campaign, some of the work and expertise technologists can offer campaigns, and my experience in joining a campaign. Whether you contribute a few hours a month or hours every day, you will be a vital part of chosen campaign. That’s awesome! Participation in democracy is what makes our Republic so strong, and that of most democratic governments.

If you have any questions about volunteering, whether it’s technical or about the experience or something else, please reach out. You can drop a comment here, or @rnelson0 on twitter.

Enjoy, and thank you!