SNMPv2c not stable

mrbigdogmrbigdog
edited July 2013 in InfraSensing Sensors
We have purchased three v5 gateways to monitor the temperature in our server rooms.



I've configured them as SNMPv2c and provided our standard read/write communities. The SNMP "process" on the sensor does not seem very stable. After a reboot we can poll the temperature but after a random amount of time it stops working.



The sensors were delivered with 3.0 firmware and I've tried upgrading to 3.03 & 3.06 but they don't help.



Any suggestions how I can troubleshoot this further?


Comments

  • AdministratorAdministrator
    Please try to do a factory reset and update to 3.06
  • mrbigdogmrbigdog
    That was the first thing I tried. I've repeated this a few times and its always the same issue.



    Right now all three of my devices are on 3.03 and the snmp polling has been stable for 12hrs.
  • AdministratorAdministrator
    How did you say it stops working? Does it stop giving sensor values?
  • mrbigdogmrbigdog
    We only poll .1.3.6.1.4.1.17095.3.2.0 to retrieve the temperature using a mix of get and getnext queries. (ie not discoveries / walks )



    When its working normally, tcpdump shows snmp packets being sent and received from the unit. Once the process has stalled then we get back no replies. During this condition the web interface/ping all work fine. Rebooting the device fixes the issue.



    As a further test I changed one of the three devices (running 3.03) to our custom communities. Since then this device has stalled while the other two have continued to work.
  • AdministratorAdministrator
    When it has stalled, on the web interface, are you seeing values being retrieved from the sensors?
  • mrbigdogmrbigdog
    When snmp is stalled the web interfaces reports the temperatures correctly.



    Over the weekend I did more testing and discovered that changing the snmp communities from public/private to custom strings causes the snmp process to become unstable. This is repeatable in 3, 3.03 beta and 3.06.



    To make thing work:



    1) reset the device to factory defaults

    2) upgrade firmware to 3.06

    3) change network ip address setting to suite

    4) poll the device using the "public" community and everything works fine



    To make the device unstable:

    1) reset the device to factory defaults

    2) upgrade firmware to 3.06

    3) change network ip address setting to suite

    4) change the communities and wait for device to reboot

    5) poll the device using the new community. It works fine for a while and then dies.



    This situation can only be fixed by resetting the device to factory defaults. Change the community back to public will not help.



    I suspect that the function which changes the communities is buggy and inadvertently overwrites part of the firmware?


  • AdministratorAdministrator
    We are following your procedure -



    - changed IP

    - community string to publ



    How long after does it fail for you?



    The test we ran on above configuration was 20.000 SNMP GET requests in 118 seconds or 169 SNMP requests per second. This the equivalent of one SNMP Get request every minute for 14 days.



    ** UPDATE **

    We ran 1076554 SNMP calls in 86 minutes and 29 seconds. The SNMP did not return once a failed SNMP call at a rate of 200.1 SNMP Get requests per second. This is the equivalent of one SNMP Get request every minute for 747 days.
  • mrbigdog2mrbigdog2
    Hi, normally the process dies within 15minutes.



    Could you please try a community >6 characters eg: abc2000
  • AdministratorAdministrator
    Thank you.



    We tried to contact you but as you used a fake email address to register we couldn't.



    We have been able to replicate it and passed it on to our engineering division for a firmware fix
  • mrbigdog2mrbigdog2
    Thank you for the update and confirmation that you can reproduce the problem.



    Any ETA on the new firmware?


  • AdministratorAdministrator
    No ETA on the fix.
  • mrbigdog2mrbigdog2
    no probs



    I'll revert test device back to a known good state and wait for the update.


  • AdministratorAdministrator
    This issue has now been fixed in our latest firmware version 3.20.



    You can download firmware 3.20 from

    http://www.serverscheck.com/sensors/firmware.asp
This discussion has been closed.