HUGE problems with 6.1.4

[Deleted User][Deleted User]
Hi,

I have just upgraded to Serverscheck 6.1.4 (Enterprise Edition MX2)



After restarting the server after the upgrade (as I normally do) things looked ok....



but, after about 15 mins the CPU utilisation and pagefile size on the monitoring server have gone crazy!



The CPU sits constantly at 100%, and I have HUNDREDS of 'rrdtool.exe' processes running, they seem to keep creating more and more every min.





This server is a Dual Xeon 2Ghz with 2GB RAM and its grinding to halt since the upgrade.



Please can you help ASAP??



Thanks

Comments

  • AdministratorAdministrator
    See knowledge base:

    http://kb.serverscheck.com/index.php?page=index_v2&id=14&c=5



    Make sure that the RRDTool.exe has been updated (as part of the new upgrade => see release notes)
  • [Deleted User][Deleted User]
    Hi,

    I've stopped the service, deleted the .graph file and restarted the service but still the CPU remains at 100%.



    I've not updated the RRDTool.exe like you specify... as I can't find the update or the release notes. (Can you place a link to them please?)



    Maybe it will return to normal after that update.



    Thanks
  • AdministratorAdministrator
    You can download the RRDTool.exe from the RRD website or from following url

    http://www.serverscheck.net/files/rrdtool.exe
  • [Deleted User][Deleted User]
    Hi again,

    I downloaded the application you listed, and copied into the Serverscheck_databases folder...



    Stopped the services, deleted the .graph files and finally restarted the server for good measure.



    Its now been running about 1.5hrs in which time the CPU remains at 100% and the pagefile continues to grow (currently at 1.2GB).



    Any advice appreciated.



    Thanks
  • AdministratorAdministrator
    And the 100% CPU is related to the RRDTOOL.exe ?



    If so, then you might try following:

    http://kb.serverscheck.com/index.php?page=index_v2&id=34&c=6
  • AdministratorAdministrator
    If it still happens, then it might be related to following:

    - increased number of threads in MX2 (compared to release 5)

    - error in the generated graph files



    To test out reason 2:

    - Kill the s-graphs.exe process

    - Open a graph file in the queue folder

    - Perform the command in that file when being in the command prompt in the serverscheck_databases directory

    (especially the lines starting with "graph")
  • [Deleted User][Deleted User]
    OK, I've tried to carry out the diagnostic steps you advised.



    1. I added the <X>graph entry to the conf file (although to be honest I would like these 2 additional graphs). - This made no difference to the CPU utilisation.



    I killed the s-graphs.exe process, and this has made a big difference to the CPU load, it is now running at approx. 40 - 60%.

    I tried to perform the Graph command, but had no luck - it claimed the update file could not be found. :(

    (If you have detailed instructions I can try again).



    I'll leave the s-graphs.exe process off overnight to see how the CPU load is, hoping in the meantime you may have a fix?



    Thanks in advance.
  • AdministratorAdministrator
    No other users have reported this and without knowing what the issue is, we can't bring a fix out.



    You might want to try and perform a manual upgrade (so not by using the upgrade.exe tool) as described in the knowledge base. It might be that a version conflict is the cause of this issue.
  • AdministratorAdministrator
    I tried to perform the Graph command, but had no luck - it claimed the update file could not be found. :(



    => Just to verify

    1/ You were with the command prompt in the serverscheck_databases directory

    2/ you typed the command

    rrdtool update .....



    If so then please login to your helpdesk account, send screenshots via email to our tech support team
  • [Deleted User][Deleted User]
    Morning,

    By killing the s-graphs.exe process and leaving the server overnight, the CPU utilisation has been constant around 20% (like it was before the upgrade).



    Today I have tried the 'rrdtool graph' command for several of the graph files. The CPU remains arround 20% still.



    Obviously since I stopped the s-graph process I dont have any graphs from yesterday afternoon or overnight.



    I can raise a support call with your helpdesk if thats preferred? Do you have any documentation of how to perform a manual upgrade (or reinstall) without losing all my current checks?



    Thanks
  • [Deleted User][Deleted User]
    Hi,

    I followed your instructions for uninstalling Serverscheck and reinstalling it from scratch.



    It appears to have fixed the CPU issue (haven't left it overnight yet) - but I've now only got the MX1 edtion.



    I copied the files I was sent to upgrade it to the MX2 edition to the server -but its not working... Its still showing as MX1.



    Can you help?



    Thanks
  • AdministratorAdministrator
    You forgot to copy the serverscheck-ins.lic file
  • [Deleted User][Deleted User]
    Good morning.

    Thanks - after I copied that file back and restarted the server I know have MX2 edition. :)



    I'm still having some issues after the reinstall though..



    Most of the checks are not being performed. The last up time shows the time and date of 2 days ago (before reinstall) BUT if I open each check and click the 'Test Settings' button, I DO get a correct response from each check...



    For example; a power outage turned off a server overnight. I have no alert, serverscheck shows it as OK and green; however when I 'Test Settings' on that check it correctly responds it is DOWN.



    Strangely enough it is only the CPU checks correctly being performed and updating the main rules view.



    Any idea as to where the problem is?



    Thanks
  • AdministratorAdministrator
    Please run it in Debug mode. The login to the helpdesk and send the debug file to tech support.



    http://kb.serverscheck.com/index.php?page=index_v2&id=33&c=6
This discussion has been closed.