[SFtrack] Updated: (XECS-59) Overnight LoadTest failure - sipXvxml memory hog
[ http://track.sipfoundry.org/browse/XECS-59?page=all ]
Mark Gertsvolf updated XECS-59:
-------------------------------
Attachment: leak.4.patch
Another source of memory leak in media server. It is leaking around ~180 octets
per message retrieval call to VM. In a test with 13646 messages deposited plus
26471 VM login/logout calls the overall leak was 18.5M.
Technically it is not a leak, but an ever growing cache. It occurs in libwww
library in module HTAnchor. This module maintains a cache of all Anchors ever
used by the application. Anchors are URLs referenced by the application.
The cache is implemented as a hash with 599 buckets (on my system) and a linked
list hanging off of each bucket . Since the cache is growing and is never
cleaned or aged the lists become longer and performance of media server suffers
over time. There are searches in the cache occurring for each URL accessed
during each VM call.
For a single message deposit the following URLs are searched, then inserted
into the cache if not found.
1.
`https://localhost:8091/cgi-bin/voicemail/mediaserver.cgi?action=deposit&mailbox=19801&from=sipp%3Csip%3A19811%40markg-sipx2.example.com%3E%3Btag%253D1'
2. `https://localhost:8091/cgi-bin/voicemail/mediaserver.cgi`
3. `http://localhost:8090/vm_vxml/root.vxml'
4. `http://localhost:8090/vm_vxml/savemessage.vxml'
5. `file:/tmp/fileUk6Ekj'
Note that there are 2 URLs that are unique per call: one is the temporary file
(presumably the message being recorded). The second URL is the URL to call CGI
script.This URL contains from header of an incoming call including the to tag,
which means that it will be unique for each message deposit call.
When VXIInterpreter tries to load a document it invokes Open method on
SBinetInterface/SBinetChannel and eventually creates an instance of
SBinetHttpStream class. This class interfaces with libwww and HTAnchor module.
It does HTTP GET/POST of the document and along with that accesses the HTAnchor
cache via calls to HTAnchor_findAddress.
SBinetHttpStream seems to have some dead code that used to rely on HTCache -
libwww cache. I am guessing the use of HTAnchor module may have been done in
conjunction with the HTCache and may not be needed anymore.
l implemented a relatively low risk solution, which counts the number of
instances of SBinetHttpStream objects, clears the cache when the number of
these objects reaches zero for the n-th time (default n=1000). SBinetHttpStream
module seems to be the only module using HTAnchor cache. I tested the solution
and there does not seem to be any side effects. The leak in media server has
been contained with this fix.
Just because I am paranoid and I don't feel comfortable that I do not fully
understand what is going on, I added a configuration parameter to VXI
configfile to control the patch externally. Hopefully I have not violated VXI
_architecture_ by passing config params all the way to the inet module
initialization and by using SBClient utility code in inet module.
The patch is attached to the issue in JIRA.
Here is the comment I added to the code:
/**
May 2007 Mark Gertsvolf
Disclaimer: I do not pretend to understand this code well. All I know is it was
originally designed to work with W3C wwwlib library and
hence this API is used extensively. The original code may have been designed to
use HTCache to prevent repetitive HTTP gets.
Comments in the code suggest that the original designers struggled a bit and
decided to first disable the cache, then the code
may have been patched to use sipXtacklib HTTP transfers.
However the wwwlib calls have been left all over the place and they do serve
some purpose I recon, at least the HTChunk carries the result of the HTTP
get to be retrieved by Read call later on.
There is a memory leak or an ever growing HTAnchor cache. It is about ~180
octets per message retrieval.
The size of the leak depends on the SIP domain names of the calling user
For a single message deposit the following URLs are searched, then inserted
into the cache if not found.
1.
'https://localhost:8091/cgi-bin/voicemail/mediaserver.cgi?action=deposit&mailbox=19801&from=sipp%3Csip%3A19811%40markg-sipx2.example.com%3E%3Btag%253D1'
2. `https://localhost:8091/cgi-bin/voicemail/mediaserver.cgi`
3. `http://localhost:8090/vm_vxml/root.vxml'
4. `http://localhost:8090/vm_vxml/savemessage.vxml'
5. `file:/tmp/fileUk6Ekj'
Note that there are 2 URLs that are unique per call: one is the
temporary file (presumably the message being recorded). The second URL
is the URL to call CGI script.This URL contains from header of an
incoming call including the to tag, which means that it will be unique
for each message deposit call.
My preference would be to rewrite this whole module, but at this point I am
just here to fix the leak.
The code seems fragile and I am afraid to make large scale changes. Instead, I
am adding a band aid, i.e. an
instance counting logic and a call to clear the cache whenever the number of
SBinetHttpStream objects drops
to zero and in periods that are controlled via config parameter.
Modules other then SBinetHttpStream do not seem to be using HTAnchor cache
The configuration is there to allow external control over this fix in case it
does damage.
In order to configure the patch add the following line into VXI config file:
inet.htAnchorPeriod VXIInteger <n>
where n is an integer equal or greater then zero. Zero disables the patch,
otherwise n controls how many times the number if active
instances of SBinetHttpStream class has to drop to zero before the HTAnchor
cache is cleaned
**/
> Overnight LoadTest failure - sipXvxml memory hog
> ------------------------------------------------
>
> Key: XECS-59
> URL: http://track.sipfoundry.org/browse/XECS-59
> Project: sipXecs
> Issue Type: Bug
> Affects Versions: 3.7
> Environment: [sipx@markg-sipx2 sipxpbx]$ uname -a
> Linux markg-sipx2 2.6.20-1.2933.fc6 #1 SMP Mon Mar 19 11:38:26 EDT 2007 i686
> i686 i386 GNU/Linux
> [sipx@markg-sipx2 bin]$ ~/sipxecs/out/main/bin/sipx-config --version
> sipX version information:
> sipxportlib 3.7.0-010003 2007-04-15T14:20:51 markg-sipx2
> sipxtacklib 3.7.0-010003 2007-04-15T14:23:22 markg-sipx2
> sipxmedialib 3.7.0-010003 2007-04-15T14:26:05 markg-sipx2
> sipxcalllib 3.7.0-010003 2007-04-15T14:29:46 markg-sipx2
> sipxcommserverlib 3.7.0-010003 2007-04-15T14:35:02 markg-sipx2
> sipxregistry 3.7.0-010003 2007-04-15T14:38:49 markg-sipx2
> sipxpublisher 3.7.0-010003 2007-04-15T14:38:38 markg-sipx2
> sipxproxy 3.7.0-010003.sipx 2007-04-15T14:37:56 markg-sipx2
> sipxconfig 3.7.6-010003 2007-04-15T14:46:39 markg-sipx2
> sipxvxml 3.7.0-010003 2007-04-15T14:40:12 markg-sipx2
> sipxpbx 3.7.0-010003 2007-04-15T14:48:34 markg-sipx2
> XECS-47 fix is applied ontop.
> Reporter: Mark Gertsvolf
> Assigned To: Mark Gertsvolf
> Fix For: 3.8
>
> Attachments: leak.4.patch, mem.leak.1.patch, mem.leak.3.patch
>
>
> A rather heavy test consisting of 6 SIPP scripts has stalled 100% overnight.
> sipXvxml process is 1.5G in size. All new requests receive 408 Request timeout
> I forgot to turn on the debug logs, but even with NOTICE level the logs are
> 95M.
> The test consisted of the following:
> 1. register-unregister test. 20 simultaneous registrations from 20 different
> users followed by a deregistration after 30 seconds followed by a pause of 32
> seconds.
> 2. MWI subscribe-unsibscribe test. 20 simultaneous MWI subscription dialogs
> setup for 30 seconds followed by dialog termination and a pause of 40 seconds.
> 3. Simple call test. User registered from EyeBeam with AutoAnswer function
> turned on. The test makes a single call, EyeBeam answers, the test waits 2
> seconds, plays a 7-second pre-recorded message and disconnects, then waits 2
> more seconds.
> 4. VM call test. 20 simultaneous calls to VM. 14 seconds after VM answers the
> test disconnects.
> 5. VM message deposit. 20 simultaneous calls to an unregistered user, calls
> answered by VM and a 7-second pre-recorded message is deposited to the VM box.
> 6. XECS-47 test. This test sends 20 simultaneous requests with unsupported
> extention in Proxy-Require header, expects to receive 420 response then waits
> 5 seconds.
>
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://track.sipfoundry.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira