This pilot fish wakes up around 1:30 a.m. to go to the bathroom and checks his work cellphone. Lo and behold, there are a lot of emails about a very important filesystem that’s full. This filesystem contains all the user home directories for an entire facility, and according to the emails, even though several people had deleted large numbers of big files they no longer needed, the filesystem filled up again, and rather quickly.
Although the emails stopped around 10 p.m., fish knows he’s going to have a rowdy mob of incensed users clamoring at his cubicle in the morning, so he gets dressed and goes to work, arriving at about 2:45 a.m. This particular filesystem lives on a network that cannot be accessible from the internet.
Fish figures that some process has lost its mind and is writing incessantly to the storage. To find it, he runs a df -k on the home directory filesystem, which lived on a RAID platform, so it was — a tad slow. Two hours later, he identifies a user home directory that is at 1.1TB, up from 120GB a couple of weeks earlier. From there, he runs an ls -lR and looks for a mind-bogglingly big file or files. But none show up, and when fish does a sum of the filesizes from the ls command, they show about 120GB of files.
This makes no sense at all. Because fish is tired and hasn’t had his normal dose of caffeine, his mind isn’t functioning at full speed, but he eventually realizes he has seen this before. It means the file had been deleted, but the process was still writing to the storage.
When this has happened before, it was during work hours, when it was possible to identify the user and machine. With no idea which machine is hosting the errant process, fish has to search through them all and run lsof to look for errant open files. He does finally find it, but there is no process assigned to the entry, it has no size or inode, and the status of the entry is “unknown.” Fish has to reboot the machine to clear it.
He finishes just about the time the user of the machine in question shows up. User explains he had a test process go crazy and refuse to die, even when he issued kill -9. So he just deleted the output file and went home.
Fish then gives a rundown on how he has spent the past six or seven hours and explains that the user’s actions meant that both the process and file were left in almost undetectable states.
To which the user makes amends with the words, “Oh, sorry about that.”
Sharky wants you to ls your true true tales of IT life for me. Send them to email@example.com. You can also subscribe to the Daily Shark Newsletter.
Copyright © 2019 IDG Communications, Inc.