Adverts

Java handles sub-strings in a way that isn't immediately obvious and which can have some nasty side effects if you don't realize what is going on under the hood. When the substring method is used it appears to return a new String but it actually isn't a new String at all. What the method does is create a new String descriptor that maintains a pointer to the original String and the start and end index - in other words the relevant part of the old String. This is all well and good most of the time and works because Strings are immutable. It can, however, lead to memory exhaustion problems in some cases. The cause of the potential memory exhaustion problem is simple. The newly created sub-string is maintaining a reference to the old String -“ even when the old String appears to have no more references it won't be garbage collected if there are references to the sub-string. Imagine for a moment that the old String was several megabytes in size and the sub-string was just the first few characters. Those few characters are actually consuming many megabytes of memory! The example program at the bottom of the page shows this in action.

Interestingly the behaviour described above even happens with a completely empty String (e.g. substring is called with the same start and end index) so it's quite easy to end up with an apparently empty String using vast chunks of the system memory. A way to avoid this problem is to internalize the sub-string. This will provide a reference to the String held in the internal cache and is created if necessary. The String in the internalized cache is a real String so only takes up the amount of space you would expect.

It is also worth pointing out that sub-strings aren't internalized automatically even if the original String was internalized. You might have expected that if the original String was internalized sub-strings would be as well but that is not the case. The reason for this behaviour is probably because intern always creates a complete new String in the cache whereas substring can simply use references into an old String. For long Strings and their sub-strings the currently method is probably the most efficient, this method is even more efficient if neither the original String nor it's sub-string descendants are long lived which is often the case in real world applications.

package com.crazysquirrel.examples;

public class Substring {

   
public static void main( String[] args ) {
       
Substring substring = new Substring();
        substring.memoryUsageExample
();
        substring.internExample
();
   
}

   
private void memoryUsageExample() {
       
Runtime runtime = Runtime.getRuntime();

        System.gc
();
       
// Show the base memory load
       
System.out.println( "Memory Usage (1): " + ( runtime.totalMemory() - runtime.freeMemory() ) );

       
// Build a huge String
       
StringBuilder builder = new StringBuilder();
       
for( int i = 0, n = 10000000; i < n; i++ ) {
           
builder.append( "x" );
       
}

       
String hugeString = builder.toString();
       
// At this point we have the StringBuilder and the String in memory
       
System.out.println( "Memory Usage (2): " + ( runtime.totalMemory() - runtime.freeMemory() ) );

        builder =
null;
        System.gc
();
       
// We should now only have hugeString in memory
       
System.out.println( "Memory Usage (3): " + ( runtime.totalMemory() - runtime.freeMemory() ) );

        String littleString = hugeString.substring
( 0, 5 );

       
// Internalize the littleString to remove the reference to the hugeString
        // littleString = littleString.intern();

       
hugeString = null;
        System.gc
();

       
// Even though hugeString is dereferenced the memory usage is still high.
        // Try commenting out the display of littleString below you will notice
        // that the memory drops as you would expect.
       
System.out.println( "Memory Usage (4): " + ( runtime.totalMemory() - runtime.freeMemory() ) );

        System.out.println
( "Little String: " + littleString );
   
}

   
private void internExample() {
       
String test = "This is a test string";
        test = test.intern
();
        String sub = test.substring
( 10, 14 );

       
// False as you would expect
       
System.out.println( "Are the test string and sub-string the same? " + ( test == sub ? "True" : "False" ) );

       
// False but you might expect it to be true. After all the original test
        // String was internalized so you would be forgiven for expecting it to
        // internalize sub-strings as well.
       
System.out.println( "Is the sub-string internalized? " + ( "test" == sub ? "True" : "False" ) );

        sub = sub.intern
();

       
// True. String literals are automatically internalized and now our sub-string is
        // the same String.
       
System.out.println( "Is the sub-string internalized now? " + ( "test" == sub ? "True" : "False" ) );
   
}

}

Adverts

Donate and Help

Please support this site and
Bandwidth doesn't grow on trees y' know :o)

Adverts

Get Adsense