Runaway Job

zannizanni
I'm having problems with the EventLog check. I've set it up to look for a specific string in the Windows EventLog and have activated it in the ServerCheck web interface. However the check never seems to run. I've put ServersCheck into debug mode and I'm only running the EventLog check. It appears that the monitoring_rule.exe files continuously dies and restarts. Also, my EventLog check is labeled as a runaway job. Here's a few lines of the debug output:



# M Mon Dec 5 10:16:00 2005 Monitoring_manager instances allowed: 1

# M Mon Dec 5 10:16:00 2005 Monitoring_manager instances counted: 1

#

# M Mon Dec 5 10:16:00 2005 Loading language file: EN.lang

# M Mon Dec 5 10:16:05 2005 ServersCheck Monitoring Manager

# M Mon Dec 5 10:16:05 2005 ENTERPRISE version 5.11.4

# M Mon Dec 5 10:16:05 2005 Started OK

# 1 Mon Dec 5 10:16:06 2005 Starting Monitoring Rule Thread 1

# 1 Mon Dec 5 10:16:06 2005 ServersCheck Monitoring Component

# 1 Mon Dec 5 10:16:06 2005 ENTERPRISE version 5.12.0

# 1 Mon Dec 5 10:16:06 2005 monitoring_rule instances allowed: 2

# 2 Mon Dec 5 10:16:06 2005 Starting Monitoring Rule Thread 2

# 2 Mon Dec 5 10:16:06 2005 ServersCheck Monitoring Component

# 2 Mon Dec 5 10:16:06 2005 ENTERPRISE version 5.12.0

# 2 Mon Dec 5 10:16:06 2005 monitoring_rule instances allowed: 2

# 1 Mon Dec 5 10:16:06 2005 monitoring_rule instances counted: 2

# 2 Mon Dec 5 10:16:06 2005 monitoring_rule instances counted: 2

# M Mon Dec 5 10:16:22 2005 Zanni_App_EventlogEVENTLOG job queued

# 2 Mon Dec 5 10:16:22 2005 Skipping D:Program FilesServersCheck_MonitoringjobsAZanni_App_EventlogEVENTLOG.0.31658935546875

# 1 Mon Dec 5 10:16:22 2005 Zanni_App_EventlogEVENTLOG - Starting check

# M Mon Dec 5 10:16:25 2005 Runaway job: AZanni_App_EventlogEVENTLOG.0.5950927734375

# M Mon Dec 5 10:16:59 2005 Cleaning job: AZanni_App_EventlogEVENTLOG.0.5950927734375

# M Mon Dec 5 10:17:01 2005 Zanni_App_EventlogEVENTLOG job queued

# 2 Mon Dec 5 10:17:02 2005 Zanni_App_EventlogEVENTLOG - Starting check

# M Mon Dec 5 10:17:04 2005 Runaway job: AZanni_App_EventlogEVENTLOG.0.17095947265625

# M Mon Dec 5 10:17:37 2005 Cleaning job: AZanni_App_EventlogEVENTLOG.0.17095947265625

# M Mon Dec 5 10:17:39 2005 Zanni_App_EventlogEVENTLOG job queued

# M Mon Dec 5 10:17:55 2005 Monitoring Rule Process Watcher: 0 found

# M Mon Dec 5 10:17:55 2005 Monitoring_rule.exe seems to have died; it will now be restarted

# R2 Mon Dec 5 10:17:55 2005 Starting Monitoring Rule Thread R2

# R2 Mon Dec 5 10:17:56 2005 ServersCheck Monitoring Component

# R2 Mon Dec 5 10:17:56 2005 ENTERPRISE version 5.12.0

# R2 Mon Dec 5 10:17:56 2005 monitoring_rule instances allowed: 2

# R2 Mon Dec 5 10:17:56 2005 monitoring_rule instances counted: 1

# R2 Mon Dec 5 10:17:56 2005 keyfile1 AZanni_App_EventlogEVENTLOG.0.233154296875 Zanni_App_EventlogEVENTLOG 1

# R2 Mon Dec 5 10:17:56 2005 Zanni_App_EventlogEVENTLOG - Starting check

# R3 Mon Dec 5 10:18:01 2005 Starting Monitoring Rule Thread R3

# R3 Mon Dec 5 10:18:01 2005 ServersCheck Monitoring Component

# R3 Mon Dec 5 10:18:01 2005 ENTERPRISE version 5.12.0

# R3 Mon Dec 5 10:18:01 2005 monitoring_rule instances allowed: 2

# R3 Mon Dec 5 10:18:01 2005 monitoring_rule instances counted: 1

# M Mon Dec 5 10:18:02 2005 Runaway job: AZanni_App_EventlogEVENTLOG.0.233154296875

# M Mon Dec 5 10:18:35 2005 Cleaning job: AZanni_App_EventlogEVENTLOG.0.233154296875

# M Mon Dec 5 10:18:37 2005 Zanni_App_EventlogEVENTLOG job queued

# R3 Mon Dec 5 10:18:38 2005 Zanni_App_EventlogEVENTLOG - Starting check

# M Mon Dec 5 10:18:41 2005 Runaway job: AZanni_App_EventlogEVENTLOG.0.365753173828125

# M Mon Dec 5 10:19:14 2005 Cleaning job: AZanni_App_EventlogEVENTLOG.0.365753173828125

# M Mon Dec 5 10:19:16 2005 Zanni_App_EventlogEVENTLOG job queued

# M Mon Dec 5 10:19:34 2005 Monitoring Rule Process Watcher: 0 found

# M Mon Dec 5 10:19:34 2005 Monitoring_rule.exe seems to have died; it will now be restarted

....

....

....



Any ideas on what might be causing this?

Thanks


Comments

  • AdministratorAdministrator
    Indeed the EventLog does not seem to respond within the maximum response time of 30 seconds.



    In the current version, the timeout can not be set.
  • AdministratorAdministrator
    One of our engineers has made a special build for you with a timeout of 60 seconds instead of 30 seconds.



    The component in question can be downloaded here:

    http://www.serverscheck.com/files/monitoring_manager.zip



    Let us know if that solves the issue
  • I tried the new build, but received more of the same in the debugging output. Here's an excerpt:



    # M Mon Dec 5 13:49:08 2005 Monitoring_rule.exe seems to have died; it will now be restarted

    # R17 Mon Dec 5 13:49:09 2005 Starting Monitoring Rule Thread R17

    # R17 Mon Dec 5 13:49:09 2005 ServersCheck Monitoring Component

    # R17 Mon Dec 5 13:49:09 2005 ENTERPRISE version 5.12.0

    # R17 Mon Dec 5 13:49:09 2005 monitoring_rule instances allowed: 2

    # R17 Mon Dec 5 13:49:09 2005 monitoring_rule instances counted: 1

    # R18 Mon Dec 5 13:49:14 2005 Starting Monitoring Rule Thread R18

    # R18 Mon Dec 5 13:49:14 2005 ServersCheck Monitoring Component

    # R18 Mon Dec 5 13:49:14 2005 ENTERPRISE version 5.12.0

    # R18 Mon Dec 5 13:49:14 2005 monitoring_rule instances allowed: 2

    # R18 Mon Dec 5 13:49:14 2005 monitoring_rule instances counted: 2

    # R19 Mon Dec 5 13:49:19 2005 Starting Monitoring Rule Thread R19

    # R19 Mon Dec 5 13:49:19 2005 ServersCheck Monitoring Component

    # R19 Mon Dec 5 13:49:19 2005 ENTERPRISE version 5.12.0

    # R19 Mon Dec 5 13:49:19 2005 monitoring_rule instances allowed: 2

    # R19 Mon Dec 5 13:49:19 2005 monitoring_rule instances counted: 3

    # M Mon Dec 5 13:49:50 2005 Cleaning job: AZanni_App_EventlogEVENTLOG.0.928741455078125

    # M Mon Dec 5 13:49:52 2005 Zanni_App_EventlogEVENTLOG job queued

    # R17 Mon Dec 5 13:49:52 2005 Zanni_App_EventlogEVENTLOG - Starting check

    # M Mon Dec 5 13:49:55 2005 Runaway job: AZanni_App_EventlogEVENTLOG.0.597259521484375

    # M Mon Dec 5 13:50:54 2005 Monitoring Rule Process Watcher: 1 found

    # M Mon Dec 5 13:50:54 2005 Monitoring_rule.exe seems to have died; it will now be restarted



    Also, I can see from the security event logs on the target system (the one whose EventLogs I'm trying to check) that the domain user is successfully logging on and logging off the machine during the checks. Usually the id is logged in for only a second. This occurs about once a minute.



    The ~ServersCheckchecklogsZanni_app_eventlog.log contains the following entries which seem to indicate it is actually getting a "DOWN" status, which is what I'm expecting. (The check is looking for the Symantec startup message in the application event log.)



    Mon Dec 5 13:48:48 2005 DOWN - Information event on zanni at 12/5/2005 12:15:39 PM (GMT-5) - Source: "Symantec AntiVirus". Event: Symantec AntiVirus services startup was successful.



    Mon Dec 5 13:49:53 2005 DOWN - Information event on zanni at 12/5/2005 12:15:39 PM (GMT-5) - Source: "Symantec AntiVirus". Event: Symantec AntiVirus services startup was successful.



    Mon Dec 5 13:51:07 2005 DOWN - Information event on zanni at 12/5/2005 12:15:39 PM (GMT-5) - Source: "Symantec AntiVirus". Event: Symantec AntiVirus services startup was successful.



    Mon Dec 5 13:52:14 2005 DOWN - Information event on zanni at 12/5/2005 12:15:39 PM (GMT-5) - Source: "Symantec AntiVirus". Event: Symantec AntiVirus services startup was successful.



    One last thing, the "All Rules View" of ServersCheck shows the status as "DOWN?". The "Group View" shows it as "DOWN".
  • AdministratorAdministrator
    Do you get any message not in the debug log file but in the black screen?
  • From the monitoring_manager, I am receiving these error messages:



    D:Program FilesServersCheck_Monitoring>monitoring_manager > debug2.txt

    # Error: More monitoring_rule instances running then licensed for. Killing curr

    ent.

    Socket could not be created : Unknown error

    Socket could not be created : Unknown error



    I'm not receving any messages from the s-alerts.exe command window.
  • AdministratorAdministrator
    What other checks do you have defined? Do you run a personal firewall or similar?



    ServersCheck fails because of a Winsock issue, meaning that it can not create a socket based communication.



    What is your OS and Service Pack?
  • ServersCheck is running on Windows 2003, SP1.

    It's attempting to read the event logs of a Win XP Pro, SP2.



    I'm not sure if this helps, but in an attempt to figure out where things are going wrong, I created an additional rule to check Ping status. So I had just the eventlog and ping rule running. Both log files (in the checklogs directory) for the eventlog and ping are updated continuously with the correct information, but the status is not reflected on the ServersCheck page.
  • AdministratorAdministrator
    The PING check is causing the issue reported:

    "Socket could not be created : Unknown error"



    Can you please not use the PING check at this stage. Just the event log one with the new build of the monitoring manager. Let it run for 5 minutes in Debug Mode and send me the output again.
  • The Ping check has been turned off so that only the EventLog check is running. I've sent an e-mail to techsupport@serverscheck.com with the logging information.
  • AdministratorAdministrator
    Thread closed online and continued in Helpdesk system
  • The problem turned out to be an issue with the enterprise.conf file. Using the web interface, I had configured ServersCheck to send events to a central syslog server. I was just testing this functionality and removed the address for the syslog server shortly afterward. When I blanked out the address in the interface, it updated the enterprise.conf file, but did not put the trailing <X>'s at the end of the Syslog: line entry. I edited the file manually and put the <X>'s back. After that, it worked fine. No more socket problems or monitoring rules that don't respond.
  • AdministratorAdministrator
    Thanks for reporting this very usefull information.



    I will forward it to the development team
This discussion has been closed.