has anyone experience how many tunnel a hub can handle? I guess it depends on CPU power and memory. Are there any restrictions by operational system (e.g. Windows, Linux)?
I believe there is a theoretical maximum of 30 for Windows and 50 for Linux, but it is usually recommended to add another hub at around 15-20 for Windows and 35-40 for Linux
Can you elaborate on "it is usually recommended to add another hub at around 15-20 for windows"? How does one add another primary hub without disrupting the existing infrastructure? Our environment has close to 20 remote hubs using tunnels right now, and they're all going through one primary hub. Aside from how to migrate the environment, I'd also like to know what the symptoms are of having too many tunnels.. I've been noticing some issues but haven't been able to pinpoint the cause...
You can put a front end hub to act as a tunnel server. It only need to act as the termination for the tunnels. The queues can still be fetched directly by the primary hub.
There is also a limit on queues that one hub can handle that you could run into. It is around 100 queues for a windos box. At this point you will see queues building up faster than they can be emptied. At this point you may want to add multiple front end hubs and terminat the queues and tunnels in these. From here you then only need one queue to get everything to the primary. If you have a High availabilty setup you will need to think out carfully what happens to message flow in various failure modes.
I actually think the queue limit for Windows is much lower than 100. Maybe something has changed since I encountered the problem, but it happened at 60-70 queues for me. From Nimsoft I had heard that 64 should be the magic number because of some way that the polling code works in Windows. I do not think it was quite so exact when I ran into it, but maybe I was counting incorrectly (including or excluding the wrong types of queues).
I would suggest to put the hubs at fisrt on hardware, doing it at the moment, have running those ssl-tunnel-termination on VM-Machine and got some speedproblems with it, aprox 20 Tunnel
cheers from Frankfurt :-)
This is a really confusing topic with multiple variables.
The well known limitation is 64 subscribers in Windows as everyone has stated. Subscribers are get queues, subscribe queues etc. Play with the callbacks to see what is included, but generally you want to leave a few for temporary subscribers that pop up when you do something like tail a log vi inf manager. This 64 limitation has nothing to do with tunnels. Tunnels and subscribers are separate things. Also, this limit does not apply to Linux. The source of the limitation is a winsock limitation, and when you surpass it, you will get an alert letting you know about it, and the subscriber/queue processing will go from parallel to serial.
We have also seen limitations when using any sort of SSL on the tunnels. By default the tunnels are implimented as openssl tunnels that are certificate authenticated with NULL encryption as the cypher which means exactly what it sounds like. We've noticed turning on encryption greately reduces how many tunnels you can have and this is not limited by CPU on the host as you would expect if it were designed to scale. Without encryption, you can easily do upwards of 60 on windows which corresponds to that winsock limitation if your using get queues. You can get much cheaper encryption if you impliment some other type of tunnel to encapsulate your nunnel which is the old recomendation to work around the encryption deficiencies in the hub. I believe the docs may still recommend this.
We have also seen a bizzarre problem on Linux Hubs where a TCP timeoute to a tunnel destination combined with a large amount of tunnels on a single hub will cause timeouts and repeated hub crashes. This is allegedly fixed in newer hubs but we haven't retested to confirm. This of course was a shame since running linux at that distribution layer seemed like a nice work around to the queue collection limitations of winsock.
We also noticed an even more bizzarre issue with down stream hub to robot encryption effecting hub performance at the distribution layer even when tunnel encryption is off and hub to robot encrption is off at the upstream level. We never did get an explenation on that one and it's been some time since we tested enabling hub to robot encyption.
It seems that a lot of the development and q/a work and scalability testing was traditionally done with encryption disabled, and the encryption options are somewhat risky and very inneficient. To Nimsoft's credit, they have put a bit more emphisis on the hub performance and security options within the underlying nimbus infrastructure lately, and some of my observations are based on experience with older versions so you may find the situation improved.
The port must be open for the tunnel client to initiate the connection to the tunnel server. As long as the return traffic is allowed (which is the case in nearly every firewall available), nothing needs to be explicitly open from server to client.
It sounds like your setup should work fine because the telnet test from the client hub to the server hub on TCP 48003 worked.
thanks for the replies and assistance ....have a nice day
keithk wrote:The port must be open for the tunnel client to initiate the connection to the tunnel server. As long as the return traffic is allowed (which is the case in nearly every firewall available), nothing needs to be explicitly open from server to client. It sounds like your setup should work fine because the telnet test from the client hub to the server hub on TCP 48003 worked.
The magic number is 64 subscribers\queues anything greater and you will receive an alarm.
severity = warning
message contents = The Hub X has to many active subscribers (some number over 64). You should consider to offload some of the subscribers to another Hub
Performance is severly impacted when this happens. We keep our windows servers in the 40-50 range (subscribers\queues) not tunnel connections.
The hub can only handle a maximum of 64 subscribers in a Windows environment due to Windows OS (winsock) limitations on the number of open sockets. So, if a hub has more than 64 subscribers (i.e. more than 64 queues, (which includes temp queues) then the hub will start to service the queues in a "round robin" fashion. This means one queue will get temporarily dropped while another one is picked up, then that one gets dropped and another one is picked up, etc. This results in two undesirable effects. One effect is that the hub’s overall performance will be seriously degraded. Another effect is that if the queue from a probe gets dropped and the probe notices this, it would cause the probe to restart itself in an attempt to re-establish its queue.
Windows only allows you to WaitForMultipleObjects on 64 sockets, after which you have to split this information in two or more lists and WaitForMultipleIObjects on each of them. This causes an unacceptable delay in the message flow of messages on sockets not in the first list. Unix select() does not have this limitation, and so the limit is not relevant in Linux/UNIX.
The number of subscribers for Linux/Unix hubs is only limited by available resources. Note that in Linux/UNIX, once again, temp queues count as subscribers. Note that subscriber_max_threshold doesn't really matter for UNIX/Linux OS’s as it does not suffer from the Windows limitation of WaitForMultipleObjects on 64 sockets. Though, on Linux it's been seen before that if it runs out of sockets and or open files, open files must be increased to mitigate.
Windows uses an OS call to monitor handles - the limit there is 64 handles to monitor for changes. Therefore, the hub can monitor only a combination of 64 tunnels and queues.
Linux uses the select() call which is limited to monitoring file descriptors 0-1023. So, you can run as much as you want as long as you never use a file descriptor over 1023. In reality this comes close to 50 tunnels and a single corresponding get queue.
1000 is hard-coded file descriptor upper limit in Linux HUB, this is due to limitation with select() as Garin described.
You are not able to increase the value, so increasing file descriptor limit in OS side does not affect.
Back to original topic - safe to say
A Windows Tunnel Server takes care 30 Tunnel Clients.
A Linux Tunnel Server takes care 50 Tunnel Clients.
And one other thing that is really hard to get a handle on (pun intended...) is the number of handles/file descriptors necessary when probes like discovery_server cycle through and retrieve information from a hub or when name_to_ip calls are made. you can have a hub with only a couple active tunnels that still runs out of resources.
And in the Linux case of the issue, the select call takes a bitmap of up to 1024 consecutive file descriptors. As such, it is possible for the hub to be in a situation where it has two file descriptors active, say 1 and 2000 and it will be unable to operate correctly.
Retrieving data ...