usage - ruby release memory




Finding the cause of a memory leak in Ruby (3)

I've discovered a memory leak in my Rails code - that is to say, I've found what code leaks but not why it leaks. I've reduced it down to a testcase that doesn't require Rails:

require 'csspool'
require 'ruby-mass'

def report
    puts 'Memory ' + `ps ax -o pid,rss | grep -E "^[[:space:]]*#{$$}"`.strip.split.map(&:to_i)[1].to_s + 'KB'
    Mass.print
end

report

# note I do not store the return value here
CSSPool::CSS::Document.parse(File.new('/home/jason/big.css'))

ObjectSpace.garbage_collect
sleep 1

report

ruby-mass supposedly lets me see all the objects in memory. CSSPool is a CSS parser based on racc. /home/jason/big.css is a 1.5MB CSS file.

This outputs:

Memory 9264KB

==================================================
 Objects within [] namespace
==================================================
  String: 7261
  RubyVM::InstructionSequence: 1151
  Array: 562
  Class: 313
  Regexp: 181
  Proc: 111
  Encoding: 99
  Gem::StubSpecification: 66
  Gem::StubSpecification::StubLine: 60
  Gem::Version: 60
  Module: 31
  Hash: 29
  Gem::Requirement: 25
  RubyVM::Env: 11
  Gem::Specification: 8
  Float: 7
  Gem::Dependency: 7
  Range: 4
  Bignum: 3
  IO: 3
  Mutex: 3
  Time: 3
  Object: 2
  ARGF.class: 1
  Binding: 1
  Complex: 1
  Data: 1
  Gem::PathSupport: 1
  IOError: 1
  MatchData: 1
  Monitor: 1
  NoMemoryError: 1
  Process::Status: 1
  Random: 1
  RubyVM: 1
  SystemStackError: 1
  Thread: 1
  ThreadGroup: 1
  fatal: 1
==================================================

Memory 258860KB

==================================================
 Objects within [] namespace
==================================================
  String: 7456
  RubyVM::InstructionSequence: 1151
  Array: 564
  Class: 313
  Regexp: 181
  Proc: 113
  Encoding: 99
  Gem::StubSpecification: 66
  Gem::StubSpecification::StubLine: 60
  Gem::Version: 60
  Module: 31
  Hash: 30
  Gem::Requirement: 25
  RubyVM::Env: 13
  Gem::Specification: 8
  Float: 7
  Gem::Dependency: 7
  Range: 4
  Bignum: 3
  IO: 3
  Mutex: 3
  Time: 3
  Object: 2
  ARGF.class: 1
  Binding: 1
  Complex: 1
  Data: 1
  Gem::PathSupport: 1
  IOError: 1
  MatchData: 1
  Monitor: 1
  NoMemoryError: 1
  Process::Status: 1
  Random: 1
  RubyVM: 1
  SystemStackError: 1
  Thread: 1
  ThreadGroup: 1
  fatal: 1
==================================================

You can see the memory going way up. Some of the counters go up, but no objects specific to CSSPool are present. I used ruby-mass's "index" method to inspect the objects that have references like so:

Mass.index.each do |k,v|
    v.each do |id|
        refs = Mass.references(Mass[id])
        puts refs if !refs.empty?
    end
end

But again, this doesn't give me anything related to CSSPool, just gem info and such.

I've also tried outputting "GC.stat"...

puts GC.stat
CSSPool::CSS::Document.parse(File.new('/home/jason/big.css'))
ObjectSpace.garbage_collect
sleep 1
puts GC.stat

Result:

{:count=>4, :heap_used=>126, :heap_length=>138, :heap_increment=>12, :heap_live_num=>50924, :heap_free_num=>24595, :heap_final_num=>0, :total_allocated_object=>86030, :total_freed_object=>35106}
{:count=>16, :heap_used=>6039, :heap_length=>12933, :heap_increment=>3841, :heap_live_num=>13369, :heap_free_num=>2443302, :heap_final_num=>0, :total_allocated_object=>3771675, :total_freed_object=>3758306}

As I understand it, if an object is not referenced and garbage collection happens, then that object should be cleared from memory. But that doesn't seem to be what's happening here.

I've also read about C-level memory leaks, and since CSSPool uses Racc which uses C code, I think this is a possibility. I've run my code through Valgrind:

valgrind --partial-loads-ok=yes --undef-value-errors=no --leak-check=full --fullpath-after= ruby leak.rb 2> valgrind.txt

Results are here. I'm not sure if this confirms a C-level leak, as I've also read that Ruby does things with memory that Valgrind doesn't understand.

Versions used:

  • Ruby 2.0.0-p247 (this is what my Rails app runs)
  • Ruby 1.9.3-p392-ref (for testing with ruby-mass)
  • ruby-mass 0.1.3
  • CSSPool 4.0.0 from here
  • CentOS 6.4 and Ubuntu 13.10

Building on @mudasobwa's explanation, I finally tracked down the cause. The code in CSSPool was checking the very long data URI for escape sequences. It would call scan on the URI with a regexp that matched an escape sequence or a single character, map those results to unescape, and then join it back into a string. This was effectively allocating a string for every character in the URI. I modified it to gsub the escape sequences, which seems to have the same results (all the tests pass) and greatly reduces the ending memory used.

Using the same test case as originally posted (minus the Mass.print output) this is the result before the change:

Memory 12404KB
Memory 292516KB

and this is the result after the change:

Memory 12236KB
Memory 19584KB

It looks like you are entering The Lost World here. I don’t think the problem is with c-bindings in racc either.

Ruby memory management is both elegant and cumbersome. It stores objects (named RVALUEs) in so-called heaps of size of approx 16KB. On a low level, RVALUE is a c-struct, containing a union of different standard ruby object representations.

So, heaps store RVALUE objects, which size is not more than 40 bytes. For such objects as String, Array, Hash etc. this means that small objects can fit in the heap, but as soon as they reach a threshold, an extra memory outside of the Ruby heaps will be allocated.

This extra memory is flexible; is will be freed as soon as an object became GC’ed. That’s why your testcase with big_string shows the memory up-down behaviour:

def report
  puts 'Memory ' + `ps ax -o pid,rss | grep -E "^[[:space:]]*#{$$}"`
          .strip.split.map(&:to_i)[1].to_s + 'KB'
end
report
big_var = " " * 10000000
report
big_var = nil 
report
ObjectSpace.garbage_collect
sleep 1
report
# ⇒ Memory 11788KB
# ⇒ Memory 65188KB
# ⇒ Memory 65188KB
# ⇒ Memory 11788KB

But the heaps (see GC[:heap_length]) themselves are not released back to OS, once acquired. Look, I’ll make a humdrum change to your testcase:

- big_var = " " * 10000000
+ big_var = 1_000_000.times.map(&:to_s)

And, voilá:

# ⇒ Memory 11788KB
# ⇒ Memory 65188KB
# ⇒ Memory 65188KB
# ⇒ Memory 57448KB

The memory is not released back to OS anymore, because each element of the array I introduced suits the RVALUE size and is stored in the ruby heap.

If you’ll examine the output of GC.stat after the GC was run, you’ll find that GC[:heap_used] value is decreased as expected. Ruby now has a lot of empty heaps, ready.

The summing up: I don’t think, the c code leaks. I think the problem is within base64 representation of huge image in your css. I have no clue, what’s happening inside parser, but it looks like the huge string forces the ruby heap count to increase.

Hope it helps.


This could be due to the "Lazy Sweeping" feature in Ruby 1.9.3 and above.

Lazy sweeping basically means that, during garbage collection, Ruby only "sweeps" away enough objects to create space for the new objects it needs to create. It does this because, while the Ruby garbage collector runs, nothing else does. This is known as "Stop the world" garbage collection.

Essentially, lazy sweeping reduces the time that Ruby needs to "stop the world." You can read more about lazy sweeping here.

What does your RUBY_GC_MALLOC_LIMIT environment variable look like?

Here is an excerpt from Sam Saffron's blog concerning lazy sweeping and the RUBY_GC_MALLOC_LIMIT:

The GC in Ruby 2.0 comes in 2 different flavors. We have a "full" GC that runs after we allocate more than our malloc_limit and a lazy sweep (partial GC) that will run if we ever run out of free slots in our heaps.

The lazy sweep takes less time than a full GC, however only performs a partial GC. It's goal is to perform a short GC more frequently thus increasing overall throughput. The world stops, but for less time.

The malloc_limit is set to 8MB out of the box, you can raise it by setting the RUBY_GC_MALLOC_LIMIT higher.

Is your RUBY_GC_MALLOC_LIMIT extremely high? Mine is set to 100000000 (100MB). The Default is around 8MB, but for rails apps they recommend it to be quite a bit higher. If yours is too high it could be preventing Ruby from deleting garbage objects, because it thinks it has plenty of room to grow.





valgrind