Thursday, August 14, 2014

No space left on device /tmp

Three days ago, nagios server stopped working and all servers cannot be monitor. The syslog says nagios is "Unable to create temp file for writing status data!". It looks like the disk is full. So i run the command df -h:
shiop:/var/log# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1              28G  1.9G   25G   8% /
tmpfs                 252M     0  252M   0% /lib/init/rw
udev                   10M  608K  9.5M   6% /dev
tmpfs                 252M     0  252M   0% /dev/shm
/dev/sda3              43G  2.7G   38G   7% /home


Ummm...I have more space on nagios server so it should be no problem if we create more files on server.
shiop:/var/log# touch /tmp/test
Unable to create file /tmp/test: No space left on device

After spending two hour googling on the internet , finally i found the problem. The inode of root filesystem has reached 100%.
shiop:/var/log# df -i
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/sda1            1831424   1831424   0    100% /
tmpfs                  64464       6   64458    1% /lib/init/rw
udev                   64464    1157   63307    2% /dev
tmpfs                  64464       1   64463    1% /dev/shm
/dev/sda3            2815344   14859 2800485    1% /home

There must be million of files on the root filesystem. So i should find all these files.
shio:/var/log# for i in /*; do echo $i; find $i | wc -l; done
/bin
97
/boot
18
/cdrom
1
/dev
...

After a few minutes, i found millions of files on directory /usr/local/nagios/var.
shiop:/var/log# rm -rf /usr/local/nagios/var/*
/bin/rm: Argument list too long

The command rm above doesn't work well, that's because too many argument list and it cannot be stored on the buffer of memory allocated. So i try another way to delete all these files using this command:
shiop:/var/log# find /usr/local/nagios/var/* -type f -delete

The command find will pass the found files one-by-one to the rm command.