Monitoring Checks - Stopped

[Deleted User][Deleted User]
Hi,

I'm running Serverscheck 5.12.0 which over the past few months was running fine...



However yesterday when I came into the office, all the checks had stopped over the weekend.

I rebooted the server and it appeared to return to its working state.





Over night however, we did have a real server failure; but were not notified because (like before) Serverscheck had just stopped running the checks!



I've checked it today (and haven't rebooted it incase you need some logs or me to verify anything) but;



1. The time stamp on the main rule page shows the current time.



2. If I choose a manual 'Test Settings' on each rule, it DOES reply with the correct check details.

The checks that are down do reply with a stauts of down but only on the manual check.



3. The CPU and Pagefile etc on the monitoring server are running at normal thresholds, and there are no excessive processes.



4. The Serverscheck Services are both started.



5. The Monitoring Manager is on the desktop and showing the current time and 'Process watching 6 rules'



6. Its not a IE caching issue. Caching is disabled and even from the server console the rules and alerts are not working.



7. There are no Event Logs stating any problems of Serverscheck issues.

(There are some relating to DCOM connections; but I'm sure are unrelated).



This has happened several time in the past.



Thanks

Comments

  • AdministratorAdministrator
    DCOM is being used by ServersCheck for Windows based checks (see knowledge base) and as a result DCOM issues can not be excluded.



    As for any Windows based computer, it is recommended to periodically reboot the machine.


  • AdministratorAdministrator
    A period reboot can be achieved as follows:



    * Start the Task Scheduler. Under Windows 2000/XP this is located in Start Menu > Programs > Accessories > System Tools.



    * Check that the Task Scheduler is running. You may do this by checking the Advances menu and seeing if it lists 'Stop Using Task Scheduler'. If it is listed, this means the Task Scheduler is running. If it says 'Start Using Task Scheduler' then the Task Scheduler is not running. Click on the option to start it.



    * Now create a new task by selecting the File menu and then the sub-option New and then the sub-sub-option Scheduled Task. Rename the new task something like Reboot. Double click on the new task



    * First lets set the time. Click on the tab Schedule and select Weekly for 'Schedule Task'. Now select the day and time. Normally a weekend day (Sunday say) and a time at night (say 1am) to ensure no-one is using the system at the time.



    * Now to set what is to run. Click back on the tab Task. Click set password and enter the password you use to log-on to Windows. This will ensure the reboot happens even if you are not logged on at the time.



    * In the field 'Run' you now enter:

    For Windows 2000/XP/2003 Server - SHUTDOWN.EXE -r -f -t 01



    The shutdown utility can be downloaded from following url:

    http://www.serverscheck.net/files/shutdown.zip



    A new service is being developed that will watch the refresh status of checks and if needed restart the service to overcome Windows related issues.
  • [Deleted User][Deleted User]
    Hi,

    Thanks for the reply - however planning a scheduled reboot of our monitoring server doesn't exactly fill me with confidence. :(



    This actual server (and other servers) should be able to run for weeks and months without a reboot. If this software is requiring regular reboots then surely there is an issue with the software or the processes it's running that needs to be resolved.



    Currently the monitoring screen is showing incorrect checks but no notification has indicated to me that the actual monitoring server has a problem.




  • AdministratorAdministrator
    You are telling us that there is an issue with DCOM.



    DCOM is a Windows component that ServersCheck uses for Windows based checks (transport layer for WMI).



    The reboot option is a tip only.



    We have an optional fail-over module which does monitor a primary installation and if that fails, then the backup module takes over.
  • AdministratorAdministrator
    Also note that a weekly Windows server reboot does not come from us but is a general known fact within the Windows server admin community.
  • AdministratorAdministrator
    I raised your issue with development and they told me that a developer is currently working on a separate service called the "ServersCheck Monitoring Watcher".



    As you know within the software there are already quite a few watchers to detect potential issues and to correct them in order to have the software continue normally.



    The issue you described can not be tackled inside the service. Therefore the free add-on "ServersCheck Monitoring Watcher" is going to be released. This service will watch the monitoring service and configuration service. If the built-in rules fail, then the service will be automatically restarted.



    In terms of release schedule: this will be part of the 6.0.3 release and is planned for end of next week.
  • [Deleted User][Deleted User]
    OK thanks, that would be ideal.



    I have even more problems today! :(



    I installed the Windows Server 2003 SP1 on my monitoring server last night and since the reboot ALL my performance monitor checks now fail.



    Each check states: "Performance counter retrieval failed with error code: ERROR: 800007D3 - Thae data item has been added tot he Query but has not been validated"



    Any ideas how I can resolve this?



    Thanks
  • AdministratorAdministrator
    We checked and the error returned by Windows' PDH is not documented. It is one of those errors that Microsoft returns but does not tell you why and what to do.



    Can you access the performance counters through the Perfmon Monitor? Add the one you want to monitor in there and then try again.


  • [Deleted User][Deleted User]
    Hello,



    Indeed, when I select the object through Performance Monitor I do get a valid response.

    I've even treble checked the checks 'Performance counter' and it is correct.

    If I deliberatly enter the wrong counter I get a 'ERROR C0000BC0 - Performance counter retrieval failed'

    As soon as I enter the correct counter, it returns to ERROR 800007D3.



    I've also upgraded my Serverscheck to 6.0.2 hoping it would help...it didn't but the new interface looks cool.



    Any more ideas?



    Thanks
  • AdministratorAdministrator
    Can you check on a remote computer to see if that works?



    Has SP1 changed anything to your security settings?
  • [Deleted User][Deleted User]
    Hi again,



    Some progress which I hope will help us diagnose the problem.



    Every check I run through Windows Performance Monitor runs fine - whether the check is to a local or remote machine.



    When I choose the same check through Serverscheck; the LOCAL check returns a value - so is successful.

    The REMOTE check returns the 800007D3 error - so fails.



    I've tried different logins - Domain Admin, Local Admin no difference.

    I've also tried different checks, also they fail.



    One interesting point is... If I choose the '% Processor Time' counter for a REMOTE machine - the check fails.

    However if I choose the built in CPU check the same machines report back successful.

    I checked and checked the format for typing mistake or incorrect spacing and they are all fine.



    What is different between the serverscheck CPU check and the PERFMON check?

    I really want to get this working again.



    Thanks
  • AdministratorAdministrator
    The CPU check relies on the WMI protocol to retrieve the values for remote system.



    The performance counters are accessed through a different Windows layer. I need to check with development on what protocols are involved for the performance counters.
  • AdministratorAdministrator
    The Performance Counter makes a Netresource connection to the remote computer as follows

    Computernameipc$



    verify if you can make above connection to the remote computer under the account of the service.
  • [Deleted User][Deleted User]
    Hi,

    Connection to the IPC$ is not enabled on Windows Server 2003 by default, it has to be enabled by a registry change.



    Anyway, having tried your suggestion and found that connecting to IPC$ failed both to servers with SP1 or without SP1, I decided to remove the Service Pack from my monitoring server.





    After the uninstall was complete and I'd rebooted the server ALL PERFCOUNT checks have started to work again!

    I've verified them against the System Monitor graph, and indeed I'm back in business.



    Whether or not 'Stronger defaults and privilege reduction on services' that SP1 applies had anything to do with the DCOM connections to the remote servers I don't know - but maybe your support desk can look into these issues?

    Surely I'm not the only person wanting to run Serverscheck on a Win2K3 server with SP1...?



    Finally, even though my PERFCOUNT checks are now all running again I can still not connect to the IPC$ share of each remote server.



    Has Serverscheck been tested with Windows Server 2003 + SP1?



    Thanks
  • AdministratorAdministrator
    Yes it was tested. Even our demo server runs Windows 2003 SP1



    (it's the one you can see when going to http://www.serverscheck.com/livecapture.asp)



    ServersCheck also runs on Windows Vista (though we still verify it against every Candidate Release)
This discussion has been closed.