memory-leaks hashmap example - Creating a memory leak with Java





15 Answers

Static field holding object reference [esp final field]

class MemorableClass {
    static final ArrayList list = new ArrayList(100);
}

Calling String.intern() on lengthy String

String str=readString(); // read lengthy string any source db,textbox/jsp etc..
// This will place the string in memory pool from which you can't remove
str.intern();

(Unclosed) open streams ( file , network etc... )

try {
    BufferedReader br = new BufferedReader(new FileReader(inputFile));
    ...
    ...
} catch (Exception e) {
    e.printStacktrace();
}

Unclosed connections

try {
    Connection conn = ConnectionFactory.getConnection();
    ...
    ...
} catch (Exception e) {
    e.printStacktrace();
}

Areas that are unreachable from JVM's garbage collector, such as memory allocated through native methods

In web applications, some objects are stored in application scope until the application is explicitly stopped or removed.

getServletContext().setAttribute("SOME_MAP", map);

Incorrect or inappropriate JVM options, such as the noclassgc option on IBM JDK that prevents unused class garbage collection

See IBM jdk settings.

simple how to

I just had an interview, and I was asked to create a memory leak with Java. Needless to say I felt pretty dumb having no clue on how to even start creating one.

What would an example be?




Below there will be a non-obvious case where Java leaks, besides the standard case of forgotten listeners, static references, bogus/modifiable keys in hashmaps, or just threads stuck without any chance to end their life-cycle.

  • File.deleteOnExit() - always leaks the string, if the string is a substring, the leak is even worse (the underlying char[] is also leaked) - in Java 7 substring also copies the char[], so the later doesn't apply; @Daniel, no needs for votes, though.

I'll concentrate on threads to show the danger of unmanaged threads mostly, don't wish to even touch swing.

  • Runtime.addShutdownHook and not remove... and then even with removeShutdownHook due to a bug in ThreadGroup class regarding unstarted threads it may not get collected, effectively leak the ThreadGroup. JGroup has the leak in GossipRouter.

  • Creating, but not starting, a Thread goes into the same category as above.

  • Creating a thread inherits the ContextClassLoader and AccessControlContext, plus the ThreadGroup and any InheritedThreadLocal, all those references are potential leaks, along with the entire classes loaded by the classloader and all static references, and ja-ja. The effect is especially visible with the entire j.u.c.Executor framework that features a super simple ThreadFactory interface, yet most developers have no clue of the lurking danger. Also a lot of libraries do start threads upon request (way too many industry popular libraries).

  • ThreadLocal caches; those are evil in many cases. I am sure everyone has seen quite a bit of simple caches based on ThreadLocal, well the bad news: if the thread keeps going more than expected the life the context ClassLoader, it is a pure nice little leak. Do not use ThreadLocal caches unless really needed.

  • Calling ThreadGroup.destroy() when the ThreadGroup has no threads itself, but it still keeps child ThreadGroups. A bad leak that will prevent the ThreadGroup to remove from its parent, but all the children become un-enumerateable.

  • Using WeakHashMap and the value (in)directly references the key. This is a hard one to find without a heap dump. That applies to all extended Weak/SoftReference that might keep a hard reference back to the guarded object.

  • Using java.net.URL with the HTTP(S) protocol and loading the resource from(!). This one is special, the KeepAliveCache creates a new thread in the system ThreadGroup which leaks the current thread's context classloader. The thread is created upon the first request when no alive thread exists, so either you may get lucky or just leak. The leak is already fixed in Java 7 and the code that creates thread properly removes the context classloader. There are few more cases (like ImageFetcher, also fixed) of creating similar threads.

  • Using InflaterInputStream passing new java.util.zip.Inflater() in the constructor (PNGImageDecoder for instance) and not calling end() of the inflater. Well, if you pass in the constructor with just new, no chance... And yes, calling close() on the stream does not close the inflater if it's manually passed as constructor parameter. This is not a true leak since it'd be released by the finalizer... when it deems it necessary. Till that moment it eats native memory so badly it can cause Linux oom_killer to kill the process with impunity. The main issue is that finalization in Java is very unreliable and G1 made it worse till 7.0.2. Moral of the story: release native resources as soon as you can; the finalizer is just too poor.

  • The same case with java.util.zip.Deflater. This one is far worse since Deflater is memory hungry in Java, i.e. always uses 15 bits (max) and 8 memory levels (9 is max) allocating several hundreds KB of native memory. Fortunately, Deflater is not widely used and to my knowledge JDK contains no misuses. Always call end() if you manually create a Deflater or Inflater. The best part of the last two: you can't find them via normal profiling tools available.

(I can add some more time wasters I have encountered upon request.)

Good luck and stay safe; leaks are evil!




The answer depends entirely on what the interviewer thought they were asking.

Is it possible in practice to make Java leak? Of course it is, and there are plenty of examples in the other answers.

But there are multiple meta-questions that may have been being asked?

  • Is a theoretically "perfect" Java implementation vulnerable to leaks?
  • Does the candidate understand the difference between theory and reality?
  • Does the candidate understand how garbage collection works?
  • Or how garbage collection is supposed to work in an ideal case?
  • Do they know they can call other languages through native interfaces?
  • Do they know to leak memory in those other languages?
  • Does the candidate even know what memory management is, and what is going on behind the scene in Java?

I'm reading your meta-question as "What's an answer I could have used in this interview situation". And hence, I'm going to focus on interview skills instead of Java. I believe your more likely to repeat the situation of not knowing the answer to a question in an interview than you are to be in a place of needing to know how to make Java leak. So, hopefully, this will help.

One of the most important skills you can develop for interviewing is learning to actively listen to the questions and working with the interviewer to extract their intent. Not only does this let you answer their question the way they want, but also shows that you have some vital communication skills. And when it comes down to a choice between many equally talented developers, I'll hire the one who listens, thinks, and understands before they respond every time.




Probably one of the simplest examples of a potential memory leak, and how to avoid it, is the implementation of ArrayList.remove(int):

public E remove(int index) {
    RangeCheck(index);

    modCount++;
    E oldValue = (E) elementData[index];

    int numMoved = size - index - 1;
    if (numMoved > 0)
        System.arraycopy(elementData, index + 1, elementData, index,
                numMoved);
    elementData[--size] = null; // (!) Let gc do its work

    return oldValue;
}

If you were implementing it yourself, would you have thought to clear the array element that is no longer used (elementData[--size] = null)? That reference might keep a huge object alive ...




You are able to make memory leak with sun.misc.Unsafe class. In fact this service class is used in different standard classes (for example in java.nio classes). You can't create instance of this class directly, but you may use reflection to do that.

Code doesn't compile in Eclipse IDE - compile it using command javac (during compilation you'll get warnings)

import java.lang.reflect.Constructor;
import java.lang.reflect.Field;
import sun.misc.Unsafe;


public class TestUnsafe {

    public static void main(String[] args) throws Exception{
        Class unsafeClass = Class.forName("sun.misc.Unsafe");
        Field f = unsafeClass.getDeclaredField("theUnsafe");
        f.setAccessible(true);
        Unsafe unsafe = (Unsafe) f.get(null);
        System.out.print("4..3..2..1...");
        try
        {
            for(;;)
                unsafe.allocateMemory(1024*1024);
        } catch(Error e) {
            System.out.println("Boom :)");
            e.printStackTrace();
        }
    }

}



Here's a simple/sinister one via http://wiki.eclipse.org/Performance_Bloopers#String.substring.28.29.

public class StringLeaker
{
    private final String muchSmallerString;

    public StringLeaker()
    {
        // Imagine the whole Declaration of Independence here
        String veryLongString = "We hold these truths to be self-evident...";

        // The substring here maintains a reference to the internal char[]
        // representation of the original string.
        this.muchSmallerString = veryLongString.substring(0, 1);
    }
}

Because the substring refers to the internal representation of the original, much longer string, the original stays in memory. Thus, as long as you have a StringLeaker in play, you have the whole original string in memory, too, even though you might think you're just holding on to a single-character string.

The way to avoid storing an unwanted reference to the original string is to do something like this:

...
this.muchSmallerString = new String(veryLongString.substring(0, 1));
...

For added badness, you might also .intern() the substring:

...
this.muchSmallerString = veryLongString.substring(0, 1).intern();
...

Doing so will keep both the original long string and the derived substring in memory even after the StringLeaker instance has been discarded.




Take any web application running in any servlet container (Tomcat, Jetty, Glassfish, whatever...). Redeploy the app 10 or 20 times in a row (it may be enough to simply touch the WAR in the server's autodeploy directory.

Unless anybody has actually tested this, chances are high that you'll get an OutOfMemoryError after a couple of redeployments, because the application did not take care to clean up after itself. You may even find a bug in your server with this test.

The problem is, the lifetime of the container is longer than the lifetime of your application. You have to make sure that all references the container might have to objects or classes of your application can be garbage collected.

If there is just one reference surviving the undeployment of your web app, the corresponding classloader and by consequence all classes of your web app cannot be garbage collected.

Threads started by your application, ThreadLocal variables, logging appenders are some of the usual suspects to cause classloader leaks.




I have had a nice "memory leak" in relation to PermGen and XML parsing once. The XML parser we used (I can't remember which one it was) did a String.intern() on tag names, to make comparison faster. One of our customers had the great idea to store data values not in XML attributes or text, but as tagnames, so we had a document like:

<data>
   <1>bla</1>
   <2>foo</>
   ...
</data>

In fact, they did not use numbers but longer textual IDs (around 20 characters), which were unique and came in at a rate of 10-15 million a day. That makes 200 MB of rubbish a day, which is never needed again, and never GCed (since it is in PermGen). We had permgen set to 512 MB, so it took around two days for the out-of-memory exception (OOME) to arrive...




I thought it was interesting that no one used the internal class examples. If you have an internal class; it inherently maintains a reference to the containing class. Of course it is not technically a memory leak because Java WILL eventually clean it up; but this can cause classes to hang around longer than anticipated.

public class Example1 {
  public Example2 getNewExample2() {
    return this.new Example2();
  }
  public class Example2 {
    public Example2() {}
  }
}

Now if you call Example1 and get an Example2 discarding Example1, you will inherently still have a link to an Example1 object.

public class Referencer {
  public static Example2 GetAnExample2() {
    Example1 ex = new Example1();
    return ex.getNewExample2();
  }

  public static void main(String[] args) {
    Example2 ex = Referencer.GetAnExample2();
    // As long as ex is reachable; Example1 will always remain in memory.
  }
}

I've also heard a rumor that if you have a variable that exists for longer than a specific amount of time; Java assumes that it will always exist and will actually never try to clean it up if cannot be reached in code anymore. But that is completely unverified.




Create a static Map and keep adding hard references to it. Those will never be GC'd.

public class Leaker {
    private static final Map<String, Object> CACHE = new HashMap<String, Object>();

    // Keep adding until failure.
    public static void addToCache(String key, Object value) { Leaker.CACHE.put(key, value); }
}



You can create a moving memory leak by creating a new instance of a class in that class's finalize method. Bonus points if the finalizer creates multiple instances. Here's a simple program that leaks the entire heap in sometime between a few seconds and a few minutes depending on your heap size:

class Leakee {
    public void check() {
        if (depth > 2) {
            Leaker.done();
        }
    }
    private int depth;
    public Leakee(int d) {
        depth = d;
    }
    protected void finalize() {
        new Leakee(depth + 1).check();
        new Leakee(depth + 1).check();
    }
}

public class Leaker {
    private static boolean makeMore = true;
    public static void done() {
        makeMore = false;
    }
    public static void main(String[] args) throws InterruptedException {
        // make a bunch of them until the garbage collector gets active
        while (makeMore) {
            new Leakee(0).check();
        }
        // sit back and watch the finalizers chew through memory
        while (true) {
            Thread.sleep(1000);
            System.out.println("memory=" +
                    Runtime.getRuntime().freeMemory() + " / " +
                    Runtime.getRuntime().totalMemory());
        }
    }
}



I came across a more subtle kind of resource leak recently. We open resources via class loader's getResourceAsStream and it happened that the input stream handles were not closed.

Uhm, you might say, what an idiot.

Well, what makes this interesting is: this way, you can leak heap memory of the underlying process, rather than from JVM's heap.

All you need is a jar file with a file inside which will be referenced from Java code. The bigger the jar file, the quicker memory gets allocated.

You can easily create such a jar with the following class:

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.zip.ZipEntry;
import java.util.zip.ZipOutputStream;

public class BigJarCreator {
    public static void main(String[] args) throws IOException {
        ZipOutputStream zos = new ZipOutputStream(new FileOutputStream(new File("big.jar")));
        zos.putNextEntry(new ZipEntry("resource.txt"));
        zos.write("not too much in here".getBytes());
        zos.closeEntry();
        zos.putNextEntry(new ZipEntry("largeFile.out"));
        for (int i=0 ; i<10000000 ; i++) {
            zos.write((int) (Math.round(Math.random()*100)+20));
        }
        zos.closeEntry();
        zos.close();
    }
}

Just paste into a file named BigJarCreator.java, compile and run it from command line:

javac BigJarCreator.java
java -cp . BigJarCreator

Et voilà: you find a jar archive in your current working directory with two files inside.

Let's create a second class:

public class MemLeak {
    public static void main(String[] args) throws InterruptedException {
        int ITERATIONS=100000;
        for (int i=0 ; i<ITERATIONS ; i++) {
            MemLeak.class.getClassLoader().getResourceAsStream("resource.txt");
        }
        System.out.println("finished creation of streams, now waiting to be killed");

        Thread.sleep(Long.MAX_VALUE);
    }

}

This class basically does nothing, but create unreferenced InputStream objects. Those objects will be garbage collected immediately and thus, do not contribute to heap size. It is important for our example to load an existing resource from a jar file, and size does matter here!

If you're doubtful, try to compile and start the class above, but make sure to chose a decent heap size (2 MB):

javac MemLeak.java
java -Xmx2m -classpath .:big.jar MemLeak

You will not encounter an OOM error here, as no references are kept, the application will keep running no matter how large you chose ITERATIONS in the above example. The memory consumption of your process (visible in top (RES/RSS) or process explorer) grows unless the application gets to the wait command. In the setup above, it will allocate around 150 MB in memory.

If you want the application to play safe, close the input stream right where it's created:

MemLeak.class.getClassLoader().getResourceAsStream("resource.txt").close();

and your process will not exceed 35 MB, independent of the iteration count.

Quite simple and surprising.




As a lot of people have suggested, Resource Leaks are fairly easy to cause - like the JDBC examples. Actual Memory leaks are a bit harder - especially if you aren't relying on broken bits of the JVM to do it for you...

The ideas of creating objects that have a very large footprint and then not being able to access them aren't real memory leaks either. If nothing can access it then it will be garbage collected, and if something can access it then it's not a leak...

One way that used to work though - and I don't know if it still does - is to have a three-deep circular chain. As in Object A has a reference to Object B, Object B has a reference to Object C and Object C has a reference to Object A. The GC was clever enough to know that a two deep chain - as in A <--> B - can safely be collected if A and B aren't accessible by anything else, but couldn't handle the three-way chain...




Threads are not collected until they terminate. They serve as roots of garbage collection. They are one of the few objects that won't be reclaimed simply by forgetting about them or clearing references to them.

Consider: the basic pattern to terminate a worker thread is to set some condition variable seen by the thread. The thread can check the variable periodically and use that as a signal to terminate. If the variable is not declared volatile, then the change to the variable might not be seen by the thread, so it won't know to terminate. Or imagine if some threads want to update a shared object, but deadlock while trying to lock on it.

If you only have a handful of threads these bugs will probably be obvious because your program will stop working properly. If you have a thread pool that creates more threads as needed, then the obsolete/stuck threads might not be noticed, and will accumulate indefinitely, causing a memory leak. Threads are likely to use other data in your application, so will also prevent anything they directly reference from ever being collected.

As a toy example:

static void leakMe(final Object object) {
    new Thread() {
        public void run() {
            Object o = object;
            for (;;) {
                try {
                    sleep(Long.MAX_VALUE);
                } catch (InterruptedException e) {}
            }
        }
    }.start();
}

Call System.gc() all you like, but the object passed to leakMe will never die.

(*edited*)




The interviewer might have be looking for a circular reference solution:

    public static void main(String[] args) {
        while (true) {
            Element first = new Element();
            first.next = new Element();
            first.next.next = first;
        }
    }

This is a classic problem with reference counting garbage collectors. You would then politely explain that JVMs use a much more sophisticated algorithm that doesn't have this limitation.

-Wes Tarle




Related