niscache storage creates artificial I/O bottleneck limiting capacity of probes.

Discussion created by comfortably_nim on Feb 5, 2013

This is re-posted as an idea to get some steam behind it and I also have case 99181 open with Nimsoft regarding the defect.  Optimizations to work around the flaw are at the bottom.


Niscache storage causes an IO bottleneck on robots using standard filesystems.

This problem is similar to early sendmail queue IO issues due to storage of an excessive number of files in one directory. This causes IO contention on a huge directory file looking up the correct file.


This shows up most specifically with interface_traffic which creates several metric and ci objects per interface monitored. An interface_traffic probe monitoring 7,000 ports can make as many as 70,000 files in niscache in our experience. It is unclear if these include unmonitored interfaces on the devices, but that's inconsequential to the IO issue caused by the niscache design flaw.


This can be worked around somewhat by using esoteric filesystem optimizations including limiting metadata stored, introducing btree indexes on directories, turning off filesystem journaling, and creating a high inode ratio filesystem. This immediately and drastically increases the capacity of interface traffic and eliminates the IO Wait bottleneck on the box.


We would prefer and option to turn off niscache where it's not needed and the storage of that volume of data needs to be moved into an appropriate storage mechanism like sqllite. If you must use the filesystem, directory hashing needs to be implemented to reduce the number of files per directory.


The simple form of this is:


niscache/asdfjklp gets stored in niscache/a/s/d/f/j/asdfjklp


This allows the programatic finding of niscache files without creating an IO bottleneck on a single huge directory file.


FYI: For users looking to optimize around this flaw, you'll find the following helpful on linux.
lvcreate -n nimbus_niscache -L +1G /dev/system
mke2fs -T news -L niscache -O dir_index,^filetype,sparse_super,resize_inode /dev/system/nimbus_niscache
/dev/system/nimbus_niscache /opt/nimsoft/niscache ext2      noatime,noacl,nouser_xattr,defaults    0 0


And this optimizes for queue throughput on hubs.
lvcreate -n nimbus_queue -L +2G /dev/system
mke2fs -L nimbus_queue  -O ^filetype,resize_inode,sparse_super  -T largefile4 /dev/system/nimbus_queue
/dev/system/nimbus_queue    /opt/nimsoft/hub/q    ext2       noatime,noacl,nouser_xattr,defaults 0 0