Jack Coates' Blog

10 Posts tagged with the health tag

log monitoring tool

Posted by Jack Coates Jul 14, 2009

Here's a useful little tool that aggregates the various logfiles that LANDesk produces: http://www.droppedpackets.org/scripts/ldms_log

 

You'll find core and client programs, no installer necessary.

ldms_log_core.png

0 Comments Permalink

ldms_client 2.4.5 news

Posted by Jack Coates Apr 6, 2009

ldms_client has picked up a few new tricks, particularly for the Dell shops. Version 2.4.5 adds code to detect the system hardware, so that DCCU will not be run on non-Dell machines. This version also reports on the number of crashes a workstation has enjoyed over the last seven days, so you can get some early warning (or refute your user's claim that it's crashed a dozen times since Friday... see? only 11 crashes).

 

I'm also splitting Mac support back out of the main package, so that everyone doesn't have to download it. This is preparatory to removing Macintosh support altogether... I just don't like working on Macs, and I haven't updated ldms_client_mac since July of 2008. If anyone would like to take over this section of ldms_client or wants some help doing their own inventory extension, you know where to find me.

 

The ldms_client_core UI is getting unwieldy and will be using tabs in the near future. I'm also starting to toy with a wild-card-capable registry reader, which would allow for some of those recursive tricks like mounted PSTs and mapped network printers.

0 Comments Permalink

I just posted ldms_core 3.4.7, which integrates some of the feedback and discoveries of the last few weeks... thanks for all the help, folks. Debugging help and feature suggestions from LANDesk admins around the world are making this utility into a very useful tool indeed.

tall-tree.jpg

Here's a few notes about where it might (or might not) go next:

  • Scheduled Tasks
  •   
       
    •  
    •  
    • Check scheduled tasks and policies for RRD stats -- jobs without start times, jobs in success-level buckets, duplicated jobs...   
          
    •  
    •  
    •  
    • Delete ghost devices from scheduled tasks (stuck in active because they reported status). If they were from a query they should be deleted from the list, but if they were from a static targeting they should be moved to pending. http://community.landesk.com/support/message/17222#17222   
          
    •  
    •  
  • Import   ldms_delete_users, auto-reassign to single user or delete objects, give the user an option to decide what should be done. Alternatively, rewrite ldms_delete_users as a standalone tool...
  • NMAP as an XDD client add-on instead of a core-side piece... this implies some command-channel use and data-passing which are non-trivial, but entirely possible. On the plus side, it will also produce a much higher level of accuracy in OS fingerprinting.
  • Email
  •   
       
    •  
    •  
    • Be smart about hysteresis... maybe it could not send another email within a day unless the new email it wants to send is more urgent than the last email that it had to send? Users going from daily runs to hourly runs are having challenges sorting the important emails from the repetitive info.
    •  
    •  
    •  
    • Maybe it's email worthy that unmanaged nodes isn't fresh...   
          
    •  
    •  
  • Web pages and reports
  •   
       
    •  
    •  
    • In RRD pages, give textual data supporting the graph. That'll probably push it over the edge to needing templated data instead of straight html.   
          
    •  
    •  
    •  
    • Support proxy servers (nice to have for update check, will need for geo-location)   
          
    •  
    •  
    •  
    • Give links to non-RFC1918 addresses on maps: GeoIP2Location   
          
    •  
    •  
    •  
    • Drill-down from topology map with per subnet listings of computers, including inventory and remote control links for them   
          
    •  
    •  
  • Auto-import email from domain controller into ConsoleUser table. If UserName is like Directory and Email is blank, then import from AD. Requires AD credential input in UI.
  • Count duplicate serial number records and show a count before the number... e.g. "34 machines with serial number SystemSerialNumb, 2 machines with LYAC12"
  • More options, more smarts, more feedback, more efficiency...
  • Find why McAfee silently stops it from working properly when it's run as a scheduled task (error 0x9 in Windows scheduled task, immediate "success" as a LANDesk scheduled task, works great when run interactively from the start menu).
2 Comments Permalink

ldms_core 3.1.5

Posted by Jack Coates Nov 3, 2008

I've posted a new ldms_core which has a lot of changes (good, bad, and otherwise).

 

  • added ability to report on stale vuln data (greater than 7 days will produce a warning)

  • installation directory changes -- NullSoft Installer System uninstallation routine actually assumes a separate directory per program, and was deleting things that shouldn't be deleted when users would remove a program. To correct this, uninstall all Monkeynoodle programs and delete Program Files\Monkeynoodle before installing ldms_core.

  • detect dual boot systems via serial number

  • fixed &CullIPs again -- I had a function which seemed to do the right thing, but was actually deleting the oldest IP -- the downfall of using a small test set is that the expected result might happen for the wrong reasons.

  • Always check that what's supposed to be an IP is one -- failure to do so was causing spurious calls to DoNMAP and CullIPs

  • LDMS statistics graphing and trending via RRD. This is pretty cool; I'm just generating the graphics and putting them into ldmain\reports\ldms_core for now, but I'll throw together a nice index.html for it in a bit. LDSS stats are not being gathered yet.

  • hourglass cursor when setup is doing things

  • Unmanaged nodes culling (&CullUDD) failed when the discovered node was a WAP; skipping the attempt for now.

 

I'm still trying to decide if I want to spend time on a more formalized test procedure and/or beta period... if anyone has thoughts or would like to volunteer as a tester, please let me know.

 

I'm also having some difficulty with the Right Way(TM) to schedule repeated runs... in the past, I've asked the user to create a Windows scheduled task, but those quit working when the service account password changes. Currently I'm creating a LANDesk scheduled task, but those are finicky and are least likely to work on the cores which most need an automatic maintenance program. I could go to a long-running service, but memory consumption is high and that introduces a whole new set of potential problems. Ideas are welcome.

9 Comments Permalink

ldms_core ate ldms_status

Posted by Jack Coates Oct 17, 2008

LANDesk cores are getting more stable all the time, and I'm just not seeing the need to check that everything's working every 10 minutes any more. I've taken the service checking routines from ldms_status and put them into ldms_core 3.1.2 so that it can check and restart services once, when it runs, instead of all the time. I'll leave ldms_status online of course, but I don't see a lot of future for it at this time and will mark it obsolete real soon now.

Here's some roadmap for ldms_core, with discussion of the items:

  • It's been 7 days or more since you downloaded content -- I still need to find the best way to filter this. The publishdate column in vulnerabilities is one option, but could false positive. select count(*) from vulnerability where publishdate > getdate()-7; if that value is 0, complain.Maybe even run vaminer.exe instead of just complaining?
  • List uninstallable patches? This will probably not be possible, at least without introducing a lot of bugs. XML blobs need to be extracted from the database and parsed, which seems like a lot of work for a little gain.
  • More progress indications, for instance after clicking Authorization button in setup. Not exciting.
  • Group patches by vendor? Probably not something I can do in this context.
  • Topology map. Gateways become nodes, devices sharing gateways are grouped in clouds around them, subnet masks decide size of circle. Core's gateway is in the center and traceroute hops to the other gateways are used to define the map. Use Perl::Graph to generate HTML? This is the one I want to work on next. select defgtwyaddr,count(address) from tcp where nullif(address,'') is not null group by defgtwyaddr
  • Kick out a Google Earth file to plot non-RFC1918 addresses on a map. The topology map and this map would both be dumped into the LANDesk reports file share, I suppose.
  • Check scheduled tasks and policies for status, alert if lots of jobs have bad status
  • Switch to MIME email so I can send multipart messages with attachments, such as those maps. That would also allow reworking of some of the log messages into a table format, using HTML. I prefer the retro look of Text::Table, but it probably can't be displayed properly by the average LANDesk admin's email program.
  • The new alerting system might be a better way to look for sync scan issues than using event viewer. If it ain't broke, don't fix it, but it's possible that database lookup would be faster.
  • Keep old information and show trend lines on a purty chart, http://search.cpan.org/src/CHARTGRP/Chart-2.4.1/README. This is going to involve storing data and managing time and I'm just not in a big hurry to reinvent that wheel.
  • Find a way to detect stuck LPM Event Listeners. Not even sure if this is a database or filesystem issue, but it needs to be found and fixed.
  • Convert into a long-running service? Probably not going to happen, I just can't come up with enough justifications to explain why I'd want to go through the hassle.
  • Cull Automatically Gathered software definitions with no installations. This is going to be hard, and may not be compatible with the changes planned for LDMS 9.0, but it also would be a lot of bang for console performance. Tempting.
2 Comments Permalink

ldms_core 3.0.5

Posted by Jack Coates Oct 3, 2008

ldms_core home page

 

The new alert system in version 8.8 can get stacked up on low-performance cores, and it doesn't purge records unless you tell it to. ldms_core will now check that queue and purge records older than X days. There's also an email test button in the setup window, so you can make sure you've got email right.

 

I've also updated the manual.

 

 

1 Comments Permalink

ldms_status bug fix

Posted by Jack Coates Aug 2, 2008

http://www.droppedpackets.org/scripts/ldms_status

 

I was at a customer Friday, and discovered that when I hovered over ldms_status, it would turn red, flail madly, and keel over dead, sort of like Bowker holding a ball at first and trying to decide what to do with it. Turns out, their server had 1 Inventory Service thread and 4500 clients, so LDSCAN contained a backlog of 4300 inventory scans, and growing. Whenever CountPendingScans ran, ldms_status was correctly deciding to restart the inventory service. It was incorrectly ignoring several instructions to wait a few seconds, and it was incorrectly doing this over and over as long as I hovered over the icon, causing a flickering stream of balloon tips and event viewer messages and doing nothing for the backlog of scans.

 

Partially, this is a discovery of unexpected behavior from the poorly documented perltray... I'm discovering that the Tooltip subroutine runs repeatedly as long as you're hovering on the icon, and I suspect that it sets the Timer too, overriding my timer settings. But the other part is that I was sloppy about the tooltip in the first place, and was calling code on demand. So, I decoupled the CountPendingScans subroutine from ToolTip. I'd been meaning to do that anyway as a matter of good practice, so finding a bug is just good impetus to do it right.

0 Comments Permalink

ldms_core updated

Posted by Jack Coates Jun 14, 2008

 

ldms_core version 2.9 has been uploaded to http://www.droppedpackets.org/scripts/ldms_core

 

 

This release adds self-version checking, so that it will let you know when ldms_core is out of date. It also takes a first stab at providing download instructions for manual-download vulnerabilities that are detected, but haven't had their patches downloaded. This feature is still very stabby... while the instructions are always somewhere in the vulnerability or rule definition, they aren't always in a predictable place. I'm told that will improve as instructions are moved to the community, and eventually the output will be a list of community articles.

 

 

I also started to parse the core event viewer, assuming that ldms_core is running on the core; right now that's just warning you if you have a lot of full-scan-forced synchronization messages in a short period, but it will soon be used to inform of more exotic Bad Things(TM).

 

 

 

 

 

0 Comments Permalink

ldms_status updated

Posted by Jack Coates Jun 2, 2008

I found a stupid bug in the service stopping routine when I applied the 5-08 rollup to my core, so this fixes that. I also added a "check for the latest version" routine that I'd been working on anyway.

0 Comments Permalink

ldms_core updated

Posted by Jack Coates May 22, 2008

 

http://www.droppedpackets.org/scripts/ldms_core

 

 

This version includes the LANDesk version detection fix mentioned here: http://community.landesk.com/support/blogs/jack/2008/04/21/ldmscore-updated#comments-1370

 

 

As well as a sprightly and attractive system tray icon (well, how about the same LANDesk server icon I always use, only this time it's tinted blue). Now you'll know more easily if it's taking 40 hours to NMAP everything in the known world.

 

 

There's duplicate device name detection, based on a SQL statement Steve Dubish sent me some years ago. Thanks to Eric Hill again for checking it on Oracle,

 

 

Next on the todo list is to read the statistics it's been recording into the Event Viewer and spit out some trend analysis... maybe even with a GRAPH.

 

 

I've also been goofing around with a database reindexing subroutine. It works as advertised, but there's a nasty little problem... database reindexing takes your database offline for X minutes (hours?). Worse yet, I can't seem to find a way to identify if the database needs reindexing (at least, one which is any less intrusive than just reindexing). The goal is to make the computer take care of itself when the people who are supposed to take care of it can't/don't/won't. In order to do that, I'll need to be able to identify that it needs reindexing and how long it's expected to take, and both queries need to be low impact. If anyone knows how to do either of those things on MSSQL or Oracle, I'd like to hear about it. 

 

 

0 Comments Permalink
LANDesk Community powered by Jive Software's Clearspace ® Subscribe| Legal Notices| Investor Relations| Avocent| Privacy Policy © 2009 LANDesk Software