Adding, extending, and removing Linux disks and partitions in 2019

Managing disks and partitions in Linux has changed quite a bit over time. Unfortunately, as Jonathan Frappier points out, a lot of advice is either wrong, dated, or makes some poor assumptions along the way:

I know I almost always go to the search results drawing board when I have to manage a disk, so it’s pretty infrequent, and have the same problems of sifting through documentation that will work and that which will. I hope this article can be used as a source of modern, tested documentation for some common use cases.

Tools and File systems

First, a quick mention of the tools and file systems available.

fdisk

A venerable tool that continues to work fully as long as your disks are under 2 T in size. Once you’re over 2 T, I’d treat it like a RO only and not use it to make edits. You can use fdisk -l to display information on all disks it sees or fdisk <devicepath> to interact with a specific disk:

[rnelson0@build03 ~]$ sudo fdisk -l

Disk /dev/sda: 107.4 GB, 107374182400 bytes, 209715200 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x000b06ec

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *        2048     1026047      512000   83  Linux
/dev/sda2         1026048   209715199   104344576   8e  Linux LVM

Disk /dev/mapper/VolGroup00-lv_root: 12.6 GB, 12582912000 bytes, 24576000 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/mapper/VolGroup00-lv_swap: 4294 MB, 4294967296 bytes, 8388608 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/mapper/VolGroup00-lv_home: 83.9 GB, 83886080000 bytes, 163840000 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/loop0: 107.4 GB, 107374182400 bytes, 209715200 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/loop1: 2147 MB, 2147483648 bytes, 4194304 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/mapper/docker-253:0-263061-pool: 107.4 GB, 107374182400 bytes, 209715200 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 65536 bytes / 65536 bytes

[rnelson0@build03 ~]$ sudo fdisk /dev/sda
Welcome to fdisk (util-linux 2.23.2).

Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.


Command (m for help): m
Command action
   a   toggle a bootable flag
   b   edit bsd disklabel
   c   toggle the dos compatibility flag
   d   delete a partition
   g   create a new empty GPT partition table
   G   create an IRIX (SGI) partition table
   l   list known partition types
   m   print this menu
   n   add a new partition
   o   create a new empty DOS partition table
   p   print the partition table
   q   quit without saving changes
   s   create a new empty Sun disklabel
   t   change a partition's system id
   u   change display/entry units
   v   verify the partition table
   w   write table to disk and exit
   x   extra functionality (experts only)

Command (m for help): p

Disk /dev/sda: 107.4 GB, 107374182400 bytes, 209715200 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x000b06ec

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *        2048     1026047      512000   83  Linux
/dev/sda2         1026048   209715199   104344576   8e  Linux LVM

Command (m for help): q

parted

A more modern tool that supports over 2 T disks. I would prefer this, though there’s nothing wrong with fdisk other than the size limit. Similarly, you can use -l to show all data or a device name to interact. Once nice thing is it defaults to human-readable sizes instead of a jumble of long numbers without commas.

[rnelson0@build03 ~]$ sudo parted -l
Model: VMware Virtual disk (scsi)
Disk /dev/sda: 107GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags:

Number  Start   End    Size   Type     File system  Flags
 1      1049kB  525MB  524MB  primary  ext4         boot
 2      525MB   107GB  107GB  primary               lvm


Model: Linux device-mapper (thin-pool) (dm)
Disk /dev/mapper/docker-253:0-263061-pool: 107GB
Sector size (logical/physical): 512B/512B
Partition Table: loop
Disk Flags:

Number  Start  End    Size   File system  Flags
 1      0.00B  107GB  107GB  xfs


Model: Linux device-mapper (linear) (dm)
Disk /dev/mapper/VolGroup00-lv_home: 83.9GB
Sector size (logical/physical): 512B/512B
Partition Table: loop
Disk Flags:

Number  Start  End     Size    File system  Flags
 1      0.00B  83.9GB  83.9GB  ext4


Model: Linux device-mapper (linear) (dm)
Disk /dev/mapper/VolGroup00-lv_swap: 4295MB
Sector size (logical/physical): 512B/512B
Partition Table: loop
Disk Flags:

Number  Start  End     Size    File system     Flags
 1      0.00B  4295MB  4295MB  linux-swap(v1)


Model: Linux device-mapper (linear) (dm)
Disk /dev/mapper/VolGroup00-lv_root: 12.6GB
Sector size (logical/physical): 512B/512B
Partition Table: loop
Disk Flags:

Number  Start  End     Size    File system  Flags
 1      0.00B  12.6GB  12.6GB  ext4


[rnelson0@build03 ~]$ sudo parted /dev/sda
GNU Parted 3.1
Using /dev/sda
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) help
  align-check TYPE N                        check partition N for TYPE(min|opt) alignment
  help [COMMAND]                           print general help, or help on COMMAND
  mklabel,mktable LABEL-TYPE               create a new disklabel (partition table)
  mkpart PART-TYPE [FS-TYPE] START END     make a partition
  name NUMBER NAME                         name partition NUMBER as NAME
  print [devices|free|list,all|NUMBER]     display the partition table, available devices, free space, all found partitions, or a particular partition
  quit                                     exit program
  rescue START END                         rescue a lost partition near START and END

  resizepart NUMBER END                    resize partition NUMBER
  rm NUMBER                                delete partition NUMBER
  select DEVICE                            choose the device to edit
  disk_set FLAG STATE                      change the FLAG on selected device
  disk_toggle [FLAG]                       toggle the state of FLAG on selected device
  set NUMBER FLAG STATE                    change the FLAG on partition NUMBER
  toggle [NUMBER [FLAG]]                   toggle the state of FLAG on partition NUMBER
  unit UNIT                                set the default unit to UNIT
  version                                  display the version number and copyright information of GNU Parted
(parted) print
Model: VMware Virtual disk (scsi)
Disk /dev/sda: 107GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags:

Number  Start   End    Size   Type     File system  Flags
 1      1049kB  525MB  524MB  primary  ext4         boot
 2      525MB   107GB  107GB  primary               lvm

(parted) quit

ext4

The “extended” family of filesystems (currently ext4, but possibly ext3 or ext2 if you work on some really old systems) have been used by Linux for a long time. It’s still a default in a number of distros, especially for smaller partitions. It’s a very competent journaled filesystem and the maximums have been massively extended over earlier versions (for example, 1 EiB max volume and 16 TiB max file size), but it’s still considered yesterday’s technology. The biggest fault I find with ext4 is that inodes – what tracks the metadata for files – are set at filesystem creation time and cannot be changed. This means that when extended volumes, the inode count is not increased. Thus, after an expansion, you could wind up horribly undersized in inodes while still having free space available. New files cannot be created when the inodes are exhausted. Most are familiar with measuring disk usage by space capacity, but you can also view your inode capacity:

[rnelson0@build03 ~]$ df -h
Filesystem                      Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-lv_root   12G  6.7G  4.3G  62% /
devtmpfs                        908M     0  908M   0% /dev
tmpfs                           920M     0  920M   0% /dev/shm
tmpfs                           920M   98M  822M  11% /run
tmpfs                           920M     0  920M   0% /sys/fs/cgroup
/dev/sda1                       477M  185M  263M  42% /boot
/dev/mapper/VolGroup00-lv_home   77G  5.8G   68G   8% /home
tmpfs                           184M     0  184M   0% /run/user/1000
[rnelson0@build03 ~]$ df -i
Filesystem                      Inodes  IUsed   IFree IUse% Mounted on
/dev/mapper/VolGroup00-lv_root  768544 202858  565686   27% /
devtmpfs                        232260    360  231900    1% /dev
tmpfs                           235373      1  235372    1% /dev/shm
tmpfs                           235373    850  234523    1% /run
tmpfs                           235373     16  235357    1% /sys/fs/cgroup
/dev/sda1                       128016    354  127662    1% /boot
/dev/mapper/VolGroup00-lv_home 5120000 634404 4485596   13% /home
tmpfs                           235373      1  235372    1% /run/user/1000

This limitation leads me to recommend other filesystems, especially when you expect it will grow in the future.

btrfs

Designed at Oracle for Linux, btrfs uses a copy-on-write B-Tree data structure and provides advanced pooling, snapshots, and checksum features. Initially developed in 2008, it took until 2013 to be marked as stable and even longer for Linux distros to mark it as supported – even now, only Oracle Linux 7, SuSE 15, and Synology DSM v6 do so. Plenty of people do love in spite of its support status and it is a viable solution, I’m just not as familiar with it personally.

xfs

Originally designed by SGI, xfs was ported to Linux in 2001. This, too, took a long while to become supported by distros, but now is supported by almost every distro, many of which use it as the default file system. It supports dynamic inode allocation, up to 8 EiB – 1 byte volumes AND files, online resizing, and many other features. This is the default in Red Hat Enterprise Linux and my preferred FS, which I’ll be using later – but is roughly equivalent to btrfs in general, if you prefer that.

LVM

Not quite a file system or a tool, the Logical Volume Manager is a device mapper that provides an extra layer of an abstraction between raw file systems and the OS. All of ext4, btrfs, and xfs can be used separately or with the LVM. Among other things, it lets you easily create device maps that span partitions and disks, resize them, and export/import them (for instance, to migrate disks to another system and retain the FS). I prefer using the LVM for everything; if you never modify the partitions it causes no harm, but if you ever need to and have not used LVM, it’s far more painful.

Most LVM commands begin with pv, vg, or lv and you use can use tab-completion to discover the utilities we do not cover today. I regularly use pvs/vgs/lvs to get short summaries and pvdisplay/vgdisplay/lvdisplay for detailed status. All others are very situational.

Adding a new volume

With some understanding of the tools and filesystems at our fingertips, let’s take a look at a common scenario: adding a new volume. I’m only focused on the OS steps here, so we will assume that you have added a new physical disk, attached a VMDK or EC2 volume, etc. as your implementation requires. We will also assume that you want an xfs filesystem at /mnt/newdisk, the system had a single disk /dev/sda that was automatically partitioned by the installer, and the new disk receives the moniker /dev/sdb. Not all OSes and systems will provide the same name; for instance, Jonathan’s AWS Linux system’s second disk was called /dev/nvme1n1. In such a case, just swap in the correct partition name.

Most modern distros will automatically detect new disks while running. If yours does not, or fails to detect the disk, you can force a rescan with the following command (change the host number as appropriate):

echo "- - -" > /sys/class/scsi_host/host0/scan

Make sure the new device shows up before continuing:

[rnelson0@build03 ~]$ ls /dev/sd?
/dev/sda
[rnelson0@build03 ~]$ ls /dev/sd?
/dev/sda /dev/sdb

You can use fdisk or parted to get information on the device. There will be no label, since we haven’t touched it, but the reported size should be accurate (be sure to account for the difference between, say, gigabytes and gibibytes. I was surprised to learn that when I specify 16 GB in my vSphere system, it apparently means gibibytes, where the proper label is actually GiB. By comparison, parted’s GB does mean gigabytes.

[rnelson0@build03 ~]$ sudo parted /dev/sdb
GNU Parted 3.1
Using /dev/sdb
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) print
Error: /dev/sdb: unrecognised disk label
Model: VMware Virtual disk (scsi)
Disk /dev/sdb: 17.2GB
Sector size (logical/physical): 512B/512B
Partition Table: unknown
Disk Flags:

If we plan to have multiple partitions on a single disk, this is where we would create them using mkpart. However, you do not need to create a partition and can use the whole disk. I generally prefer using the whole disk; it’s fairly cheap to add a new disk and I believe is cognitively easier to manage than extending disks and partitions. For instance, if you add a 100 GB disk and make a single partition and later extend it to 200 GB, you can just extend it easily. If you create two 50 GB partitions and extend it to 100 GB later, you can only extend the 2nd partition and would have to create a new partition and join it to an existing LVM mapper to (effectively) increase the space of the partition. Alternatively, you can just always add entire disks to LVM groups, assuming that any extension is via an additional disk, which is what we will do here.

Though we are not creating partitions on the disk, we do want to leverage the LVM to create a mapping that we can manipulate semi-independently of the disks. Each LVM mapping has 3 layers – the physical, virtual, and logical layers. The physical layer refers to the disk itself (even if it’s a virtualized system, we pretend it’s a physical disk). The logical layer is where we create the end mapping that receives the file system. The virtual layers sits in the middle, sort of akin to the partitions we skipped, and presents physical layer information to the virtual layer. In our case, we simply map the entire disk to the physical and virtual layers, then assign 100% of the free space of the virtual device to the new logical device.

Once that is done, we format the result with our filesystem of choice, in this case xfs by using mkfs.xfs, and if it doesn’t exist, create the mount point. I normally put some of the information in variables, since the LVM commands are almost always the same. You’ll also notice I switched to root to save a few keystrokes over using sudo constantly, as all the LVM commands require elevated permissions.

#COMMANDS
DEVICE=/dev/sdb
NAME=newdisk
MOUNT=/mnt/newdisk

pvcreate ${DEVICE}
vgcreate vg_xfs_${NAME} ${DEVICE}
lvcreate -n lv_${NAME} -l 100%FREE vg_xfs_${NAME}
mkfs.xfs /dev/vg_xfs_${NAME}/lv_${NAME}
if [[ ! -d "${MOUNT}" ]]; then mkdir ${MOUNT}; fi

#OUTPUT
[root@build03 ~]# DEVICE=/dev/sdb
[root@build03 ~]# NAME=newdisk
[root@build03 ~]# MOUNT=/mnt/newdisk
[root@build03 ~]#
[root@build03 ~]# pvcreate ${DEVICE}
  Physical volume "/dev/sdb" successfully created.
[root@build03 ~]# vgcreate vg_xfs_${NAME} ${DEVICE}
  Volume group "vg_xfs_newdisk" successfully created
[root@build03 ~]# lvcreate -n lv_${NAME} -l 100%FREE vg_xfs_${NAME}
  Logical volume "lv_newdisk" created.
[root@build03 ~]# mkfs.xfs /dev/vg_xfs_${NAME}/lv_${NAME}
meta-data=/dev/vg_xfs_newdisk/lv_newdisk isize=512    agcount=4, agsize=1048320 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=0, sparse=0
data     =                       bsize=4096   blocks=4193280, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

[root@build03 ~]# parted -l
...
Model: Linux device-mapper (linear) (dm)
Disk /dev/mapper/vg_xfs_newdisk-lv_newdisk: 17.2GB
Sector size (logical/physical): 512B/512B
Partition Table: loop
Disk Flags:

Number  Start  End     Size    File system  Flags
 1      0.00B  17.2GB  17.2GB  xfs

Finally, we have to mount this new partition. How your system manages the fstab can vary wildly, but generally just appending a line to /etc/fstab works (some distros automagically manage it). Mount the partition and now we can start using it!

[root@build03 ~]# echo "/dev/vg_xfs_${NAME}/lv_${NAME} ${MOUNT}          xfs    defaults        1 2" >> /etc/fstab
[root@build03 ~]# mount /mnt/newdisk
[root@build03 ~]# df -h /mnt/newdisk
Filesystem                             Size  Used Avail Use% Mounted on
/dev/mapper/vg_xfs_newdisk-lv_newdisk   16G   33M   16G   1% /mnt/newdisk
[root@build03 ~]# df -i /mnt/newdisk
Filesystem                             Inodes IUsed   IFree IUse% Mounted on
/dev/mapper/vg_xfs_newdisk-lv_newdisk 8386560     3 8386557    1% /mnt/newdisk

Extending an LVM device with an existing disk

The next most common case is to extend the disk a device uses. As I mentioned before, I prefer to just use a new disk as they’re frequently cheap, but some systems do make them more expensive. In my lab, I increased the size of my /dev/sdb to 32 GiB. Because the disk has already been detected, I have to force the OS to rescan and see the new space using echo 1>/sys/class/block/<DEVICE>/device/rescan:

[root@build03 ~]# parted -l
...
Error: /dev/sdb: unrecognised disk label
Model: VMware Virtual disk (scsi)
Disk /dev/sdb: 17.2GB
Sector size (logical/physical): 512B/512B
Partition Table: unknown
Disk Flags:

Model: Linux device-mapper (linear) (dm)
Disk /dev/mapper/vg_xfs_newdisk-lv_newdisk: 17.2GB
Sector size (logical/physical): 512B/512B
Partition Table: loop
Disk Flags:

Number  Start  End     Size    File system  Flags
 1      0.00B  17.2GB  17.2GB  xfs
...
[root@build03 ~]# echo 1>/sys/class/block/sdb/device/rescan
[root@build03 ~]# parted -l
...
Error: /dev/sdb: unrecognised disk label
Model: VMware Virtual disk (scsi)
Disk /dev/sdb: 34.4GB
Sector size (logical/physical): 512B/512B
Partition Table: unknown
Disk Flags:

Model: Linux device-mapper (linear) (dm)
Disk /dev/mapper/vg_xfs_newdisk-lv_newdisk: 17.2GB
Sector size (logical/physical): 512B/512B
Partition Table: loop
Disk Flags:

Number  Start  End     Size    File system  Flags
 1      0.00B  17.2GB  17.2GB  xfs

With the physical size increase visible, we need to expand the LVM device. First, pvresize so the physical layer sees the change, then vgscan to see the changes, and finally lvextend to have the logical volume allocate the virtual group’s free (unassigned) space (the + in +100%FREE means to ADD free space; 100%FREE without the plus means it allocates as much space as is free, but starting from the beginning – I don’t make the rules, I just get run over by them like everyone else!). Extend the FS with xfs_growfs and then look at the space and inodes. We can use the same variables as before to make it a little easier:

#COMMANDS
DEVICE=/dev/sdb
NAME=newdisk
MOUNT=/mnt/newdisk

pvresize ${DEVICE}
vgscan
lvextend -l +100%FREE /dev/vg_xfs_${NAME}/lv_${NAME}
xfs_growfs ${MOUNT}

#OUTPUT
[root@build03 ~]# DEVICE=/dev/sdb
[root@build03 ~]# NAME=newdisk
[root@build03 ~]# MOUNT=/mnt/newdisk
[root@build03 ~]#
[root@build03 ~]# pvresize ${DEVICE}
  Physical volume "/dev/sdb" changed
  1 physical volume(s) resized / 0 physical volume(s) not resized
[root@build03 ~]# vgscan
  Reading volume groups from cache.
  Found volume group "vg_xfs_newdisk" using metadata type lvm2
  Found volume group "VolGroup00" using metadata type lvm2
[root@build03 ~]# lvextend -l +100%FREE /dev/vg_xfs_${NAME}/lv_${NAME}
  Size of logical volume vg_xfs_newdisk/lv_newdisk changed from 16.00 GiB (4096 extents) to <32.00 GiB (8191 extents).
  Logical volume vg_xfs_newdisk/lv_newdisk successfully resized.
[root@build03 ~]# xfs_growfs ${MOUNT}
meta-data=/dev/mapper/vg_xfs_newdisk-lv_newdisk isize=512    agcount=4, agsize=1048320 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=0 spinodes=0
data     =                       bsize=4096   blocks=4193280, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal               bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
data blocks changed from 4193280 to 8387584

[root@build03 ~]# df -h /mnt/newdisk
Filesystem                             Size  Used Avail Use% Mounted on
/dev/mapper/vg_xfs_newdisk-lv_newdisk   32G   33M   32G   1% /mnt/newdisk
[root@build03 ~]# df -i /mnt/newdisk
Filesystem                              Inodes IUsed    IFree IUse% Mounted on
/dev/mapper/vg_xfs_newdisk-lv_newdisk 16775168     3 16775165    1% /mnt/newdisk

You can see that the inode count is roughly twice the size as before, so you are in no danger of running out any time soon, even with a ton of tiny files!

Extending an LVM device with a new disk

We can also extend an LVM device using a new disk. After attaching the new disk, you need to pvcreate the new physical device, vgextend the virtual group into the new disk, and then lvextend the volume and xfs_growfs filesystem. In this example, the new disk is /dev/sdc and is 16 GiB:

#COMMANDS
DEVICE=/dev/sdc
NAME=newdisk
MOUNT=/mnt/newdisk

pvcreate ${DEVICE}
vgextend vg_xfs_${NAME} ${DEVICE}
lvextend /dev/vg_xfs_${NAME}/lv_${NAME}
xfs_growfs ${MOUNT}

#OUTPUT
[root@build03 ~]# DEVICE=/dev/sdc
[root@build03 ~]# NAME=newdisk
[root@build03 ~]# MOUNT=/mnt/newdisk
[root@build03 ~]#
[root@build03 ~]# pvcreate ${DEVICE}
  Physical volume "/dev/sdc" successfully created.
[root@build03 ~]# vgextend vg_xfs_${NAME} ${DEVICE}
  Volume group "vg_xfs_newdisk" successfully extended
[root@build03 ~]# vgs
  VG             #PV #LV #SN Attr   VSize   VFree
  VolGroup00       1   3   0 wz--n- <99.51g   5.66g
  vg_xfs_newdisk   2   1   0 wz--n-  47.99g <16.00g
[root@build03 ~]# lvextend -n lv_${NAME} -l +100%FREE /dev/vg_xfs_${NAME}/lv_${NAME}
  Please specify a logical volume path.
  Run `lvextend --help' for more information.
[root@build03 ~]# lvextend -l +100%FREE /dev/vg_xfs_${NAME}/lv_${NAME}
  Size of logical volume vg_xfs_newdisk/lv_newdisk changed from <32.00 GiB (8191 extents) to 47.99 GiB (12286 extents).
  Logical volume vg_xfs_newdisk/lv_newdisk successfully resized.
[root@build03 ~]# xfs_growfs ${MOUNT}
meta-data=/dev/mapper/vg_xfs_newdisk-lv_newdisk isize=512    agcount=9, agsize=1048320 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=0 spinodes=0
data     =                       bsize=4096   blocks=8387584, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal               bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
data blocks changed from 8387584 to 12580864
[root@build03 ~]# df -h /mnt/newdisk
Filesystem                             Size  Used Avail Use% Mounted on
/dev/mapper/vg_xfs_newdisk-lv_newdisk   48G   33M   48G   1% /mnt/newdisk
[root@build03 ~]# df -i /mnt/newdisk
Filesystem                              Inodes IUsed    IFree IUse% Mounted on
/dev/mapper/vg_xfs_newdisk-lv_newdisk 25161728     3 25161725    1% /mnt/newdisk

Once again, the space and the inodes have increased!

Removing an LVM device

Finally, there are times when you will remove a mount point entirely. You can just “yank” the disks and you’ll probably be okay, but rather than cross our fingers and hope, we can manually remove the configuration to ensure no auto-detection goes wrong. The process includes unmounting the partition, removing the logical/virtual/physical layer mappings, then yanking the disks. We will undo our previous examples, where /dev/sdb and /dev/sdc provide vg_xfs_newdisk and lv_newdisk. Just start at the end and work our way back:

[root@build03 ~]# umount /mnt/newdisk
[root@build03 ~]# lvremove lv_${NAME}
  Volume group "lv_newdisk" not found
  Cannot process volume group lv_newdisk
[root@build03 ~]# lvremove /dev/vg_xfs_newdisk/lv_newdisk
Do you really want to remove active logical volume vg_xfs_newdisk/lv_newdisk? [y/n]: y
  Logical volume "lv_newdisk" successfully removed
[root@build03 ~]# vgremove vg_xfs_newdisk
  Volume group "vg_xfs_newdisk" successfully removed
[root@build03 ~]# pvremove /dev/sdc
  Labels on physical volume "/dev/sdc" successfully wiped.
[root@build03 ~]# pvremove /dev/sdb
  Labels on physical volume "/dev/sdb" successfully wiped.
[root@build03 ~]#
[root@build03 ~]# lvs
  LV      VG         Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  lv_home VolGroup00 -wi-ao----  78.12g
  lv_root VolGroup00 -wi-ao---- <11.72g
  lv_swap VolGroup00 -wi-ao----   4.00g
[root@build03 ~]# vgs
  VG         #PV #LV #SN Attr   VSize   VFree
  VolGroup00   1   3   0 wz--n- <99.51g 5.66g
[root@build03 ~]# pvs
  PV         VG         Fmt  Attr PSize   PFree
  /dev/sda2  VolGroup00 lvm2 a--  <99.51g 5.66g

Do not forget to modify /etc/fstab to remove the mapping, or the next boot may complain about the missing disk or even fail outright. Make sure to follow your distro guide for this, as sometimes systems like anaconda can undo your edits silently. While this can all be done live, I do recommend a reboot at the end, so that if something went wrong that affects the boot cycle, it is found immediately.

Summary

I covered common tasks – adding, extending, and removing partitions on the same system – but there’s a lot more you can do. As mentioned, parted replaces fdisk, but we glossed over that in favor of using entire disks. But, at least you know that any documentation using fdisk is pretty aged. I also briefly mentioned vgexport/vgimport. If you plan to detach disks in LVM – even a single disk – and re-attach them to another system, you want to export the virtual group on the old system and import it on the new system. This will help ensure that any mismatch in device naming by the OS – say sdc and sde on the old system and sdc and sdd on the new system – do not result in any data loss.

I hope this article is a good reference for fundamental filesystem tasks for novice and expert linux administrators alike. Please let me know on twitter or in the comments of anything else you would like to see and I’ll keep this updated. Thanks!

Linux OS Patching with Puppet Tasks

One of the biggest gaps in most IT security policies is a very basic feature, patching. Specific numbers vary, but most surveys show a majority of hacks are due to unpatched vulnerabilities. Sadly, in 2018, automatic patching on servers is still out of the grasp of many, especially those running older OSes.

While there are a number of solutions out there from OS vendors (WSUS for Microsoft, Satellite for RHEL, etc.), I manage a number of OSes and the one commonality is that they are all managed by Puppet. A single solution with central reporting of success and failure sounds like a plan. I took a look at Puppet solutions and found a module called os_patching by Tony Green. I really like this module and what it has to offer, even though it doesn’t address all my concerns at this time. It shows a lot of promise and I suspect I will be working with Tony on some features I’d like to see in the future.

Currently, os_patching only supports Red Hat/Debian-based Linux distributions. Support is planned for Windows, and I know someone is looking at contributing to provide SuSE support. The module will collect information on patching that can be used for reporting, and patching is performed through a Task, either at the CLI or using the PE console’s Task pane.

Setup

Configuring your system to use the module is pretty easy. Add the module to your Puppetfile / .fixtures.yml, add a feature flag to your profile, and include os_patching behind the feature flag. Implement your tests and you’re good to go. Your only real decision is whether you default the feature flag to enabled or disabled. In my home network, I will enable it, but a production environment may want to disable it by default and enable it as an override through hiera. Because the fact collects data from the node, it will add a few seconds to each agent’s runtime, so be sure to include that in your calculation.

Adding the module is pretty simple, Here are the Puppetfile / .fixtures.yml diffs:

# Puppetfile
mod 'albatrossflavour/os_patching', '0.3.5'

# .fixtures.yml
fixtures:
  forge_modules:
    os_patching:
      repo: "albatrossflavour/os_patching"
      ref: "0.3.5"

Next, we need an update to our tests. I will be adding this to my profile::base, so I modify that spec file. Add a test for the default feature flag setting, and one for the non-default setting. Flip the to and not_to if you default the feature flag to disabled. If you run the tests now, you’ll get a failure, which is expected since there is no supporting code in the class yet.(there is more to the test, I have only included the framework plus the next tests):

require 'spec_helper'
describe 'profile::base', :type => :class do
  on_supported_os.each do |os, facts|
    let (:facts) {
      facts
    }

    context 'with defaults for all parameters' do
      it { is_expected.to contain_class('os_patching') }
    end

    context 'with manage_os_patching enabled' do
      let (:params) do {
        manage_os_patching: false,
      }
      end

      # Disabled feature flags
      it { is_expected.not_to contain_class('os_patching') }
    end
  end
end

Finally, add the feature flag and feature to profile::base (the additions are in italics):

class profile::base (
  Hash    $sudo_confs = {},
  Boolean $manage_puppet_agent = true,
  Boolean $manage_firewall = true,
  Boolean $manage_syslog = true,
  Boolean $manage_os_patching = true,
) {
  if $manage_firewall {
    include profile::linuxfw
  }

  if $manage_puppet_agent {
    include puppet_agent
  }
  if $manage_syslog {
    include rsyslog::client
  }
  if $manage_os_patching {
    include os_patching
  }
  ...
}

Your tests will pass now. That’s all it takes! For any nodes where it is enabled, you will see a new fact and some scripts pushed down on the next run:

[rnelson0@build03 controlrepo:production]$ sudo puppet agent -t
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Notice: /File[/opt/puppetlabs/puppet/cache/lib/facter/os_patching.rb]/ensure: defined content as '{md5}af52580c4d1fb188061e0c51593cf80f'
Info: Retrieving locales
Info: Loading facts
Info: Caching catalog for build03.nelson.va
Info: Applying configuration version '1535052836'
Notice: /Stage[main]/Os_patching/File[/etc/os_patching]/ensure: created
Info: /Stage[main]/Os_patching/File[/etc/os_patching]: Scheduling refresh of Exec[/usr/local/bin/os_patching_fact_generation.sh]
Notice: /Stage[main]/Os_patching/File[/usr/local/bin/os_patching_fact_generation.sh]/ensure: defined content as '{md5}af4ff2dd24111a4ff532504c806c0dde'
Info: /Stage[main]/Os_patching/File[/usr/local/bin/os_patching_fact_generation.sh]: Scheduling refresh of Exec[/usr/local/bin/os_patching_fact_generation.sh]
Notice: /Stage[main]/Os_patching/Exec[/usr/local/bin/os_patching_fact_generation.sh]: Triggered 'refresh' from 2 events
Notice: /Stage[main]/Os_patching/Cron[Cache patching data]/ensure: created
Notice: /Stage[main]/Os_patching/Cron[Cache patching data at reboot]/ensure: created
Notice: Applied catalog in 54.18 seconds

You can now examine a new fact, os_patching, which will shows tons of information including the pending package updates, the number of packages, which ones are security patches, whether the node is blocked (explained in a bit), and whether a reboot is required:

[rnelson0@build03 controlrepo:production]$ sudo facter -p os_patching
{
  package_updates => [
    "acl.x86_64",
    "audit.x86_64",
    "audit-libs.x86_64",
    "audit-libs-python.x86_64",
    "augeas-devel.x86_64",
    "augeas-libs.x86_64",
    ...
  ],
  package_update_count => 300,
  security_package_updates => [
    "epel-release.noarch",
    "kexec-tools.x86_64",
    "libmspack.x86_64"
  ],
  security_package_update_count => 3,
  blocked => false,
  blocked_reasons => [],
  blackouts => {},
  pinned_packages => [],
  last_run => {},
  patch_window => "",
  reboots => {
    reboot_required => "unknown"
  }
}

Additional Configuration

There are a number of other settings you can configure if you’d like.

  • patch_window: a string descriptor used to “tag” a group of machines, i.e. Week3 or Group2
  • blackout_windows: a hash of datetime start/end dates during which updates are blocked
  • security_only: boolean, when enabled only the security_package_updates packages and dependencies are updated
  • reboot_override: boolean, overrides the task’s reboot flag (default: false)
  • dpkg_options/yum_options: a string of additional flags/options to dpkg or yum, respectively

You can set these in hiera. For instance, my global config has some blackout windows for the next few years:

os_patching::blackout_windows:
  'End of year 2018 change freeze':
    'start': '2018-12-15T00:00:00+1000'
    'end':   '2019-01-05T23:59:59+1000'
  'End of year 2019 change freeze':
    'start': '2019-12-15T00:00:00+1000'
    'end':   '2020-01-05T23:59:59+1000'
  'End of year 2020 change freeze':
    'start': '2020-12-15T00:00:00+1000'
    'end':   '2021-01-05T23:59:59+1000'
  'End of year 2021 change freeze':
    'start': '2021-12-15T00:00:00+1000'
    'end':   '2022-01-05T23:59:59+1000'

Patching Tasks

Once the module is installed and all of your agents have picked up the new config, they will start reporting their patch status. You can query nodes with outstanding patches using PQL. A search like inventory[certname] {facts.os_patching.package_update_count > 0 and facts.clientcert !~ 'puppet'} can find all your agents that have outstanding patches (except puppet – kernel patches require reboots and puppet will have a hard time talking to itself across a reboot). You can also select against a patch_window selection with and facts.os_patching.patch_window = "Week3" or similar. You can then provide that query to the command line task:

puppet task run os_patching::patch_server --query="inventory[certname] {facts.os_patching.package_update_count > 0 and facts.clientcert !~ 'puppet'}"

Or use the Console’s Task view to run the task against the PQL selection:

Add any other parameters you want in the dialog/CLI args, like setting rebootto true, then run the task. An individual job will be created for each node, all run in parallel. If you are selecting too many nodes for simultaneous runs, use additional filters, like the aforementioned patch_window or other facts (EL6 vs EL7, Debian vs Red Hat), etc. to narrow the node selection [I blew up my home lab, which couldn’t handle the CPU/IO lab, when I ran it against all systems the first time, whooops!]. When the job is complete, you will get your status back for each node as a hash of status elements and the corresponding values, including return (success or failure), reboot, packages_updated, etc. You can extract the logs from the Console or pipe CLI logs directly to jq (json query) to analyze as necessary.

Summary

Patching for many of us requires additional automation and reporting. The relatively new puppet module os_patching provides helpful auditing and compliance information alongside orchestration tasks for patching. Applying a little Puppet Query Language allows you to update the appropriate agents on your schedule, or to pull the compliance information for any reporting needs, always in the same format regardless of the (supported) OS. Currently, this is restricted to Red Hat/Debian-based Linux distributions, but there are plans to expand support to other OSes soon. Many thanks to Tony Green for his efforts in creating this module!

Enterprise Linux 7.3 makes some backwards-incompatible changes to interface names

Today, I was caught off guard by a change in Enterprise Linux 7.3. Apparently, systemd was assigning interface names like eno16780032 based on “garbage” data. I’m not really a fan of ANY of the names the modern schemes generate, what was the problem with eth#? But that’s beside the point. What hit me was that starting in 7.3 the garbage data is now being discarded and this results in a change in interface names. All this, in a point release. Here’s KB 2592561 that details the change. This applies to both Red Hat EL and CentOS, and presumably other members of the family based on RHEL 7.

The good news is that existing nodes that are updated are left alone. A file is generated to preserve the garbage data and your interface name. Unlike other udev rules, this one ONLY applies to existing systems that want to preserve the naming convention:

[root@kickstarted ~]# cat /etc/udev/rules.d/90-eno-fix.rules
# This file was automatically generated on systemd update
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:50:56:95:de:4e", NAME="eno16780032"

As you can see, it’s based on the MAC. That’s the bad news. If you plan to use the node as a template to deploy other VMs, the resulting VMs will effectively receive “new” interfaces based on the new MAC, resulting in ens192 instead of eno16780032. This definitely introduces at least one minor issue: the eno16780032 configuration is left intact and the interface is missing, so every call to systemctl restart network generates an error. It can also cause other issues for you if you have scripts, tools, provisioning systems, etc., that are predicting your nodes will have interface names like ens192. This is not difficult to remedy, thankfully.

Continue reading

Updating Linux Puppet Enterprise agent versions with puppet_agent

Happy New Year, everyone! I know that when I last blogged before the holiday, I was getting settled in with Jenkins, but I really needed to relax during the break so I had to hold off on that. I promise, I’ll get back to that soon, but in the meantime I encountered an issue that needed fixed and required a little more than the docs provided.

I ran into an issue recently with Puppet Enterprise agents that weren’t able to purge some resource types properly due to the use of Anchors. This is an issue with Puppet, not the module in question, and it was fixed in Puppet 4.4.0 but affected nodes were running older versions. Of course, rather than upgrade the agents by hand, I decided to automate it. Puppet has an Upgrading PE agents: *nix page that describes how to do this in a few ways. The latter options require manual effort, but the first option, via the module puppet_agent, can do this automatically for us. I diverged from the instructions a bit because I do my classifying with hiera rather than the PE Classifier and because there’s a missing step! (I’m trying to get that document updated, as well)

The guide says to install the module on the master and add some classification. To do this in a traditional roles and profiles setup using hiera as the classifier and with a monolithic controlrepo, we need to add the module to Puppetfile and .fixtures.yml, then add either a profile for puppet_agent or add it to a base profile. I chose the latter. Let’s add the module first:

Continue reading

Change your default linux shell without root access

A friend recently mentioned his hatred of ksh and inability to change his default shell because he is not root and root will not make the change. I concur, ksh is turrible, that needs fixed. Here’s a little trick for anyone else who has this issue. Let’s say you want to use bash and the shell forced upon you is ksh. The Korn shell uses .profile to set up your environment when you log in, so append these two lines to your profile:

# run bash damnit
[ ! "`which bash | egrep "^no"`" ] && [ -z $BASH ] && exec bash --login

We look for the output of which bash and ensure it does not say no bash in …, and if that file is non-zero, we exec bash as a login shell. You can read man exec for details on specifically how it works in a given shell, but it essentially replaces the running shell with the new command. If the first two checks are positive, then bash is executed and replaced the ksh process. Once the files are added to your profile, log in again in a new session. Voila, you now have a bash login shell!

rnelson0@dumbserver ~ $ ps -ef | grep bash
  rnelson0  5927  5298   0 19:56:42 pts/1       0:00 grep bash
  rnelson0  5298  5292   0 19:47:48 pts/1       0:00 bash --login

If you’re forced to use a different shell, you may need to locate a different profile file. If you want to use a different shell, just sub out bash for the name of your preferred shell. Enjoy!

Ruby net/https debugging and modern protocols

I ran into a fun problem recently with Zabbix and the zabbixapi gem. During puppet runs, each puppetdb record for a Zabbix_host resource is pushed through the zabbixapi, to create or update the host in the Zabbix system. When this happened, an interesting error crops up:

Error: /Stage[main]/Zabbix::Resources::Web/Zabbix_host[kickstart.example.com]: Could not evaluate: SSL_connect SYSCALL returned=5 errno=0 state=SSLv2/v3 read server hello A

If you google for that, you’ll find a lot of different errors and causes described across a host of systems. Puppet itself is one of those systems, but it’s not the only one. All of the systems have something in common: Ruby. What they rarely have is actual resolution, though. Possible causes include time out of sync between nodes, errors with the certificates and stores on the client or server side, and of course a bunch of “it works now!” with no explanation what changed. To confuse matters even more, the Zabbix web interface works just fine in the latest browsers, so the SSL issue seems restricted to zabbixapi.

To find the cause, we looked at recent changes. The apache SSLProtocols were changed recently, which shows up in a previous puppet run’s output:

Continue reading

Tcpdump: When and How?

A tool I rely on heavily for network debugging is tcpdump. This tool naturally comes to mind when I run into issues, but it may not for others. I thought I’d take a moment and describe when I reach for tcpdump and give a quick primer on how to use it.

If you’re using windows, windump/wireshark are the cli/gui equivalents. I’ll stick to tcpdump in this article, but many of the CLI options are the same and the filters are pretty similar if not the same.

When should you use tcpdump?

Whenever you’re troubleshooting an application, you hopefully have some sort of application-level logging to help you figure out what’s going on. Sometimes, you don’t have that – or what does exist provides inadequate detail or appears to be lying to you. You may also not have access to a device that you think is affecting the traffic, and you need to ensure that the traffic flow meets your expectations. As long as your application talks on the network, even locally, tcpdump may be able to help you!

You may have users from the internet who need to reach your application who are not able to, and they’re only receiving a timeout, but other users have no issues. You look in your web server logs and you don’t see any logs for the user complaining. There are log entries for the users who are not complaining. You can use tcpdump to listen on the webserver’s port for the customer’s IP and see if the connection attempts are seen. You can also see the packet contents in cleartext (as opposed to binary format – encrypted content is not decrypted, it’s just more easily visible) if that helps diagnose the issue.

Many applications also rely on local connections, typically on the loopback interface, and may be affected by the local firewall (iptables or the Windows Firewall Service, for example). Using tcpdump, you can see if the packets are immediately rejected, which is likely to be the firewall service, or if it completes a three-way handshake before closing the connection. In almost all cases, if a three-way handshakes is observed, the application has received the connection.

Given the name tcpdump, it’s worth nothing that you can see almost anything on the wire, not just TCP packets. UDP, GRE, even IPX are visible with the right filters.

How do you use tcpdump?

Let’s look at how you use tcpdump. In the examples below, I’m using a Linux VM with one interface, eth0, and the address 10.0.0.8. It has ssh, apache, and postfix services running. Tcpdump requires root access to see the raw packets on the wire, which I will gain with sudo. Be extremely careful who you grant this access to for two reasons. 1) Zombied tcpdump sessions can gobble all the CPU. 2) Since packet contents can be inspected, sensitive information can be seen by anyone with the permission to run tcpdump. This is a security risk, when you must meet PCI-DSS audit requirements. I’ll be using my unprivileged user rnelson0.

Tcpdump by default will try and resolve IP and service names. This can be slow, as it relies on DNS and file lookups, and confusing as most people will search by the IP addresses. We can disable these lookups by adding the n flag to the CLI, adding one instance for IPs and one for services, -nn. We also want to specify the interface, even on a single-NIC node, as it may default to the loopback instead of the ethernet interface, using -i <interface>. This gives us a default argument string of: -nni eth0 or -nni lo, depending on which we are looking for.

Next, we need to generate a filter to look at traffic. The tcpdump man page provides a lengthy list of filter components. One of the most common components is src|dst|host <scope>, which filters for packets from, to, or bi-directionally for the specified IP or network. Others are port <portnumber> and <protocol>, like icmp or gre. We can combine individual components with standard logical operators like and, or, and not: filter for non-ssh traffic to/from 10.0.0.200 with host 10.0.0.0.200 and not port 22.

As a “bonus”, when you run tcpdump with a bad filter, it will exit immediately. It doesn’t offer hints on how to fix the error, but it does let you know right away.

We put this together with the full command tcpdump -nni eth0 host 10.0.0.200 and not port 22. If we ssh to our node and just run this, we won’t see anything happen right away, but we’ll eventually see some ARP packets:

[rnelson0@kickstart ~]$ sudo tcpdump -nni eth0 host 10.0.0.200 and not port 22
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
22:05:50.309737 ARP, Request who-has 10.0.0.1 tell 10.0.0.200, length 46
22:05:50.589052 ARP, Request who-has 10.0.0.253 tell 10.0.0.200, length 46
22:05:59.934464 ARP, Request who-has 10.0.0.200 tell 10.0.0.253, length 46
22:06:51.315637 ARP, Request who-has 10.0.0.1 tell 10.0.0.200, length 46
22:06:51.519754 ARP, Request who-has 10.0.0.8 tell 10.0.0.200, length 46
22:06:51.519807 ARP, Reply 10.0.0.8 is-at 00:50:56:ac:f2:f7, length 28

Now if we view a file on the web server, we’ll see a three way handshake followed by a few PSH packets:

22:17:29.686840 IP 10.0.0.200.59916 > 10.0.0.8.80: Flags [S], seq 1113320281, win 8192, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
22:17:29.687041 IP 10.0.0.8.80 > 10.0.0.200.59916: Flags [S.], seq 741099373, ack 1113320282, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 5], length 0
22:17:29.690439 IP 10.0.0.200.59916 > 10.0.0.8.80: Flags [.], ack 1, win 256, length 0
22:17:29.690475 IP 10.0.0.200.59916 > 10.0.0.8.80: Flags [P.], seq 1:412, ack 1, win 256, length 411
22:17:29.690540 IP 10.0.0.8.80 > 10.0.0.200.59916: Flags [.], ack 412, win 490, length 0
22:17:29.693772 IP 10.0.0.8.80 > 10.0.0.200.59916: Flags [P.], seq 1:151, ack 412, win 490, length 150
22:17:29.694090 IP 10.0.0.8.80 > 10.0.0.200.59916: Flags [F.], seq 151, ack 412, win 490, length 0
22:17:29.696030 IP 10.0.0.200.59916 > 10.0.0.8.80: Flags [.], ack 152, win 256, length 0
22:17:29.700858 IP 10.0.0.200.59916 > 10.0.0.8.80: Flags [F.], seq 412, ack 152, win 256, length 0
22:17:29.700893 IP 10.0.0.8.80 > 10.0.0.200.59916: Flags [.], ack 413, win 490, length 0

For comparison, here’s what HTTPS looks like when HTTPS is not enabled. You see the SYN packet from the client, and the RST packet comes from the OS since there’s no service listening there:

22:18:50.057972 IP 10.0.0.200.59917 > 10.0.0.8.443: Flags [S], seq 825112119, win 8192, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
22:18:50.058088 IP 10.0.0.8.443 > 10.0.0.200.59917: Flags [R.], seq 0, ack 825112120, win 0, length 0
22:18:50.558200 IP 10.0.0.200.59917 > 10.0.0.8.443: Flags [S], seq 825112119, win 8192, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
22:18:50.558264 IP 10.0.0.8.443 > 10.0.0.200.59917: Flags [R.], seq 0, ack 1, win 0, length 0
22:18:51.060995 IP 10.0.0.200.59917 > 10.0.0.8.443: Flags [S], seq 825112119, win 8192, options [mss 1460,nop,nop,sackOK], length 0
22:18:51.061065 IP 10.0.0.8.443 > 10.0.0.200.59917: Flags [R.], seq 0, ack 1, win 0, length 0

Summary

I hope this short tutorial helps you figure out when and how to use tcpdump. If you have specific questions, post them in a comment or ask on twitter and I’ll respond.