Currently Being Moderated

ldms_core performance

Posted by Jack Coates on Jul 15, 2008 6:39:10 PM

I've done a few installs of ldms_core on servers where XDD or scheduled UDD has been running for a good long while in a 5,000 to 10,000 node environment, and I've found a bit of a performance issue. Eventually it all works itself out, but that first run takes a day or two to complete. Gradually it gets faster and faster, and within a week it works the way it does for a fresh LANDesk core. This post is not to announce a fix, just acknowledging that I see the problem and know what needs to be done.

 

This is what's happening: If NMAP's binary is found in the configured place, then ldms_core goes to the database and looks for targets with no OS Name, sorted by lastscantime desc for a LIFO scan order. So far so good, except a few months ago I realized that some NMAP scans finish in a few seconds, while others take several minutes. In order to get the most success up front, I first send a single ping packet to each target and see if it responds. If it answers ping, it goes to the front of the list, and if it doesn't, it goes to the back of the list.

 

That works really well when the list of targets is a few dozen machines, but when it's a few hundred machines the performance is terrible. As NMAP updates the OS Names, the target list becomes smaller, the pings go faster, and less time is taken. I'm not parallelizing any of this traffic, so a long time spent on step 1 means step 2 can cool its heels in the corner. This all fits with my design goal of "low and slow", under-the-radar information gathering, but no one can tell that it's doing anything -- there's an icon in the system tray and a couple of entries in event viewer, but no CPU, no network, no nothing.

 

I've considering batching targets into small groups, which I could then do in parallel, but decided it was too complex and wasteful of resources. So, the planned fix is a two parter:

  1. First, I'm going to log and balloon tip what's happening so that the end user knows that it's okay to leave it alone... killing the program and starting it over just makes it take longer, and all the statistics that they installed it for happen at the end.

  2. Second, I'm going to radically decrease the ping timeout -- if the node doesn't ICMP REPLY in a second or two, I know what I came for. Better still, if the machine responds to ping I'll do NMAP right away, but if it doesn't respond to ping I'll push it to the end of the line.

If you really care a lot, the source is installed with the binary, and this is the DoNMAP subroutine beginning at line 1119.

Tags: ldms_core, nmap


LANDesk Community powered by Jive Software's Clearspace ® Subscribe| Legal Notices| Investor Relations| Avocent| Privacy Policy © 2009 LANDesk Software