Three days ago, nagios server stopped working and all servers cannot be monitor. The syslog says nagios is "Unable to create temp file for writing status data!". It looks like the disk is full. So i run the command df -h:
shiop:/var/log# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 28G 1.9G 25G 8% /
tmpfs 252M 0 252M 0% /lib/init/rw
udev 10M 608K 9.5M 6% /dev
tmpfs 252M 0 252M 0% /dev/shm
/dev/sda3 43G 2.7G 38G 7% /home
Ummm...I have more space on nagios server so it should be no problem if we create more files on server.
shiop:/var/log# touch /tmp/test
Unable to create file /tmp/test: No space left on device
After spending two hour googling on the internet , finally i found the problem. The inode of root filesystem has reached 100%.
shiop:/var/log# df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/sda1 1831424 1831424 0 100% /
tmpfs 64464 6 64458 1% /lib/init/rw
udev 64464 1157 63307 2% /dev
tmpfs 64464 1 64463 1% /dev/shm
/dev/sda3 2815344 14859 2800485 1% /home
There must be million of files on the root filesystem. So i should find all these files.
shio:/var/log# for i in /*; do echo $i; find $i | wc -l; done
/bin
97
/boot
18
/cdrom
1
/dev
...
After a few minutes, i found millions of files on directory /usr/local/nagios/var.
shiop:/var/log# rm -rf /usr/local/nagios/var/*
/bin/rm: Argument list too long
The command rm above doesn't work well, that's because too many argument list and it cannot be stored on the buffer of memory allocated. So i try another way to delete all these files using this command:
shiop:/var/log# find /usr/local/nagios/var/* -type f -delete
The command find will pass the found files one-by-one to the rm command.