Web applications are getting more and more complex from year to year. Couple of years ago nobody cared about the memory leaks on the web pages (yeah, they were really a set of web pages, but not web applications). Even if you forget to clean up some memory – it’s not expected, that user will spend a lot of time on the same page. There were a lot of navigation links, which removed information about the entire page from memory and loaded new page.
But now you cannot relate on this behavior, because now web sites turned into web applications. User loads one small HTML file, one script file and some other stuff (like CSS and images) and that’s it. Making requests from browser to servers user can stay on on the “page” for ages. In the worst scenario you will receive a report from production user: “Oh, your application is crashing after 60 hours of using”.
It looks like you have leaking memory in the application.
There are two main types of GC implementations:
- Reference-counting GC;
- Mark-and-sweep GC;
I don’t want to dig deep into the details of implementations. You can look it for example on the MDN. But need to mention couple of important details.
The first one was used in IE6 and IE7. Now these browsers are not widely used, but it was not the case 5 years ago. The main issue with this mechanism is circular references. For example you have something like that in the code:
This two objects cannot be collected by the Reference-counting GC mechanism, even after leaving their scope, because they always have non-zero count of references.
The good news is that this mechanism is no longer used in modern browsers. Currently all browsers use Mark-and-sweep mechanism and its improvements (f.e. generational). The main idea of this mechanism:
- Determine list of the GC-roots;
- Mark all the objects, which are reachable from this GC-roots as available;
- Clean up all other non-reachable objects.
The cool thing – is that it works as expected. If group of objects is not reachable anymore (even if they have circular references between each other) – they will be removed from the memory.
GC-roots and Retained trees
The most important part of GC logic, which you definitely should understand – what is the GC-roots and how browser determines these roots. GC-root – is an object in the memory, which is currently available in the application (yeah, I know it’s not a quite correct definition, but currently I cannot express this better). There are couple of things, which can be taken as a GC-root:
- Global Object (in browsers it’s
windowobject). I believe, that it’s not a surprise. Every field of the global object available to entire application in any moment of page life cycle. And basically that’s one of the reasons (not the only reason) to avoid global variables if it’s possible;
- Document DOM tree;
- Local variables including local variables throughout the entire call stack;
- Functions in the message queue.
Starting from these roots GC mechanism marks all available objects. The problem is that when we want some object to be removed we set reference to it as
null, but we cannot guarantee, that it’s the only pass to this particular object.
Also, I want mention couple more terms:
- Object’s Retaining Path – basically it’s the path from the roots, which is retaining this object from being collected. Most of the time objects have multiple retaining paths;
- Object’s Shallow Size – size of the object in the heap;
- Object’s Retained Size – size of the memory, which can be freed after deleting this particular objects.
Now let’s look at the main weak places to start investigations.
Memory leaks sources
It’s definitely not the complete list, but based on my experience I can highlight the next points:
- DOM (Document Object Model);
- Events and Pub/Sub Pattern;
- Misuse of libraries and frameworks;
So, let’s go down the list and I’ll try to explain places, which I personally think dangerous in the code, why I think these places are dangerous.
Why I think it’s dangerous? The problem with DOM, that it’s always doubly-linked tree. So having reference to any node in such a tree will retain entire tree from garbage collection. Let’s take a look at the small easy sample (it’s not production code, in the real application things can be much more complicated):
So now, if you open developer tools and take a look at the heap snapshot, you’ll see, that DOM nodes are not completely removed, they are marked as “Detached DOM”, but still are available using
window.myDemoObject.child.element. And the biggest problem, that in the current situation you are retaining not one DIV element, but entire tree. That’s kind of visual presentation of what’s going on:
Using read color I’ve marked references, which were successfully removed, using black – regular references, using green – the most important reference, which is the source of the leak. This reference retain entire sub-tree starting from
At the top of the screenshot you can see, that
HTMLDivElement presents twice. The first one is marked with yellow color, the second one is red. The difference between these two items is in the retaining path. The
window global object). It’s the most important information, which can help you to find the code to fix. In this section you can see objects, which hold the reference to the DOM element you are trying to collect.
To be honest, fixing leaking DOM – is the easiest task in my opinion, because the heap snapshot is doing a lot of work for you, you just need to determine which parts of the Detached DOM are really not needed any more (there are some cases, when you detach some DOM from document depending on user’s actions), find retaining objects and remove references.
- Which object will be retained by closures;
- When these objects can be removed;
Let’s take a look at the small code sample (considering that in HTML mark up you have a
div element with
Using this sample I want to highlight two important points:
collectedObjectwill not be available when you’ll hit
debugger;breakpoint. It was collected by GC, because it’s not used in any closure and can be safely removed;
- You’ll not see any Detached DOM in heap snapshot, because reference to
div#myIdwas removed after GC cycle, when
But, there is one very important exception:
Yeah, “eval is Evil”. Virtual Machine cannot collect any object in this case, because it cannot predict the content of the
eval function. As a result:
collectedObject will not be collected and
div#myId will remain in the application as a Detached DOM.
So, what I wanted to show – is that closures themselves are working predictively and do not retain unnecessary objects in memory. But each time you create closure you need to keep in mind, that objects of this closure will be alive, until function is alive. In the simplest scenario function will be member of one object. But situation can be more complicated, when you bypass this function as a parameter to other functions, make it a member of other objects. That’s a pretty common situation, when you bypass a callback function to another object and also “store” context of the object (f.e. by using
bind or closure). And after doing that you are basically creating an implicit reference from one object to another:
Now you can collect two heap snapshots: one before
setTimeout callback call and one – after. You’ll see, that object
local is still alive and its retaining path is: window -> global -> _eventListeners ->  (index in the _eventListeners Array) -> context of anonymous function. And despite the fact, that is was local object and it’s not available anywhere else – it’s still cannot be collected.
Events and Pub/Sub Pattern
Basically existing events mechanism in my experience never was an issue. Let’s imagine you have and object, with event handler, reference to DOM node. Now using
addEventListener function you are subscribing event handler to an event – you are creating doubly-linked objects. DOM node is on the one side and your object is on the other side. But as we discussed earlier all modern browser can easily resolve this cycle reference, in case of detached DOM and removed reference to the object.
What I really want to discuss in this section – is the Pub/Sub pattern. I saw a lot of articles on the Internet, where Pub/Sub is something similar to panacea, some new super-awesome pattern, which can resolve all your problems. But if you at least know Gang of Four book you should know, that it’s not something really new. It’s two well-known patterns, mixed together: Observer and Singleton. Sometimes it’s extended with some additional functionality (for example “channels”, when you can listen not to only one event, but a bunch of events, which is called channel). But the main idea is always a sort of mix: Observer + Singleton.
As a result we have an Object, which lives during the entire page cycle. And it contains dozens of event listeners. So, if you not remove all event listeners once they are not needed – you’ll get a memory leak. And it’s not always an easy task.
Personally I really don’t like this “pattern” and try to avoid it. And based on my experience – memory leaks is not the biggest problem with this pattern. Pub/Sub – is a global eventing mechanism. It’s something really similar to global variables. After some time you’ll get to situation, when you’ll have dozens of events, tied together in some really crazy chains. Use it for your own risk.
Misuse of frameworks and libraries
Nowadays it’s almost impossible to start new project without using library (jQuery, Dojo) or framework (Angular, Backbone, etc.). Good thing about this stuff – you don’t need to reinvent the wheel time and time again. The bad thing – frameworks and libraries are changing each time you are changing the project (sometimes even during the same project). And you need to learn how all these code works and should be used. Most of libraries is just a huge set of syntax sugar. And sometimes if you are misusing this sugar – you can run into problems.
Note: sometimes libraries and frameworks can contains real bugs. For example a long time ago (yeah, in a galaxy far, far away) I’ve found a memory leak issue in Dojo Toolkit itself. But it’s not really something, which is happening very often, so this section is not regarding these kinds of issue.
I want to show you the code sample with memory leak, caused by misusing of jQuery library:
What’s the issue here?
off functions of jQuery are full of syntax sugar and basically are just wrappers for
removeEventListener. If you replace call of
on function with
addEventListener – you’ll not get a memory leak, but jQuery saves all references to DOM nodes and handlers inside. Why it’s needed? If you don’t have reference to handler function
removeEventListener will not work, but
off will work just fine. It saved reference to handler for you and can easily remove it… The problem, that GC cannot remove DOM after detaching it (if you didn’t use jQuery for this action), because jQuery is still holding reference to it.
It’s only one sample, I believe there are dozens of other samples. And each time you remove something you need to think about the other places, where reference can present.
I didn’t try to scare you, but I hope, that after reading this short article you will not think about GC mechanism like “It’s some black box, which is just working”…