ARC in Depth: Part I
As I’ve mentioned before, when I use a system, I like to know why it works the way it does. About half the time, this leads me down the road of reading specifications and design documents, the other half it takes me into the depths of the code that makes everything I write run. Knowing how things work gives me a guide when things go wrong, as well as giving me context when I’m writing code. I try not to make assumptions in my code about how the runtime is implemented, but understanding how the runtime operates makes debugging and optimization a great deal easier. It’s also a lot of fun at very particular types of parties.
With all that in mind, today we’re going to take a look at ARC and figure out exactly how it works. We’re going to be diving into some of the runtime source code and looking at why certain operations do or don’t perform a certain way. We will also be making a quick detour through the runtime to help us understand how ARC uses it to accomplish its goals.
This post, due to length, will be split into two parts - I’ll be publishing the second part soon.
Why ARC Internals?
So maybe you buy into the idea that looking into ‘why’ things work the way they do is a good idea, but maybe you don’t quite buy why ARC is interesting. I can understand that point of view. ARC, at the base, is really just a reference counting system tied to a static analyzer - both of those components, while individually interesting, don’t seem all that complicated in terms of understanding how they work together.
I think it is glossing over the beauty of the Objective-C runtime not to understand these details, as well as missing out on a great opportunity to understand how to debug and optimize your own code. There are a number of surprising optimizations and implementation details you may have never even considered in the implementation of ARC and reference counting, and even lessons to be learned. Lastly, understanding the platform and runtime in which our code operates can help us to have more context as we approach software development as a whole, and leave us more enlightened when we run into impasses.
An obvious place to start our exploration is to look at how reference counts themselves are implemented. To that end, let’s consider how some simple reference counting calls work. Objects have a
retain and a
release call available, and when called, they increment or decrement the retain count, which itself is accessible in a property called
retainCount available on anything conforming to the
NSObject protocol. Let’s start by focusing on where the retain count itself is stored.
The obvious place to store the retain count would be as a property of the object itself - perhaps a private member on NSObject and other root objects? That’s one way you could do it, but it isn’t exactly what Objective-C does. There are a few issues with using a property though, but the most notable one is we don’t want to have the overhead of invoking objc_msgSend - which is a very expensive operation compared to a simple C function call.
Instead, we want to be able to get performance close to the speed of the aforementioned function call and have it modify the retain count of the object in question as quickly as possible. To facilitate this, Objective-C stores the actual retain count of an object in one of two places. Where these work we’ll get it into in a minute as we unwind the code.
Let’s see if we can figure out how
release work. I would like to note before we really get into this that sometimes the files we’re talking about have several definitions of the same function or method I’ll be using the variant defined for tagged pointer support - since they handle the non-tagged case as well.
Understanding Retain and Release
When ARC generates a call to retain or release it’s actually invoking a call to the C functions
objc_release - or a variant thereof. By doing this, we stay away from the time sink that is
objc_msgSend for most cases, and we rely on good old fashioned function calls that in many cases can be inlined.
Let’s have a look at these two functions, and see if we can figure out how they work. Let’s look at how retain works:
1 2 3 4 5 6 7 8
So this is pretty simple, but let’s break it down a little.
- We verify that the object pointer is not nil
- We check if the pointer itself is a tagged pointer and if so we don’t actually retain it - we’ll get to why that is done in a minute
- Assuming our other checks passed, we go ahead and invoke the method retain on the Objective-C object (it turns out in the current runtime Objective-C objects are C++ objects - for those clever, this was already obvious in the C++ in the stack traces of many crash logs)
The release call looks just about the same, so I won’t include it here. So based on this, we know that we’re avoiding invoking retain or release on any tagged pointers, whatever those are, and that we invoke a C++ retain or release method. Let’s see where these two methods lead.
You Got Your C++ in My Objective-C!
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Here we have the definition of
objc_object::retain. Now we’re cooking - this looks promising. So what does this code do? Good questions - the comments will help us out a little as we try to figure it out, so let’s go through section by section like we did earlier.
- We assert that we’re not using the defunct garbage collector, or if we are that this class has a “custom RR” whatever that is
- We assert that this is not a tagged pointer.
- We check to be sure there isn’t a “custom RR”, if there isn’t, we call
rootRetain, otherwise we invoke
objc_msgSendand call retain on ourselves.
The obvious question is what is a ‘custom RR’? Custom RR denotes a custom retain-release implementation. If your class overrides retain or release, the runtime calls
objc_class::setHasCustomRR which tags that
Class and all child classes as having a custom implementation, and therefore opting out of fast retain/release behavior. This is one of several reasons why it’s a bad idea to override retain and release in just about every case, and why ARC persuades you not to through compiler messages.
Okay - so that makes sense - we’re using the actual
release message when we have them overridden. Why does the objc_msgSend look so weird though? If you’ve read the 64-bit migration guide, it turns out that objc_msgSend has to be cast in this way or ‘Bad Things’ happen because it doesn’t necessarily know the types of the parameters, you end up performing a narrowing implicit cast followed by a widening implicit case in some cases - see my references for more information on this.
release call looks identical to the
retain call except that it invokes
rootRelease for the fast path and sends the
SEL_release selector to the object for the custom retain/release path.
The Root of The Matter
Let’s have a look at what
rootRetain does and try to understand how it works. This one is a bit of a doozy, so get ready…
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
Phew, that’s more code than I generally include in my blog posts, but it’s worth every line! Let’s dig in and start figuring out what this thing does.
Starting at the top, we can see that we do what has become our standard preamble - we assert no GC and return the object itself if we are a tagged pointer. The next thing we do is load some variables, and check if our
newisa is indexed, and if not we perform the unindexed retain path by doing something called a
sidetable_retain() unless we were asked to do a
tryRetain in which case we try to use a sidetable retain, and if we fail, we return nil.
If we are indexed, we check if we were trying a retain, and if the object is already deallocating, we go to the try fail logic. If not, we continue and add
RC_ONE to the
newisa value, and story the carry. If we have a carry and we are handling carries, we execute an operation to move half of the
extra_rc to the sidetable. Afterwards, regardless of what we did, we store the
newisa value into our
Phew, that’s pretty complicated, but we now know the answer to our quesiton, even if we don’t know what it means yet: retain counts are stored either in something called a sidetable if a pointer is unindexed, an
isa if it is, or both if it’s indexed and overflows the
newisa. Now we just need to know what all those words like sidetable and indexed mean.
What’s in An ISA (or how I learned to stop worrying and love bitfields)
Objects have to know what class they belong to for a number of reasons, but the most obvious is to invoke methods. In Objective-C method invocation means looking up IMP pointers and SEL pointers to invoke objc_msgSend which handles message passing - but where do you look these up from?
ISA is the answer - it is the first pointer on every
objc_object and it points to the
Class object which that object is an instance of.
What does this have to do with retain counts? On 32-bit platforms, it doesn’t, it all comes down to those 32 extra bits you get on x86_64 or ARM64. 64 bits is more than we need to address all the memory most MMUs can support - in fact, on x86_64, it’s only 48 bits. That means that even though the pointer width of the architecture is 64 bits we only need 48 bits, which means we have 16 bits left over - that’s two bytes! And as it turns out,
ISA is a pointer - which means
ISA has two extra bytes.
Wonder what would be a good way to use those two bytes? Fill it with zeroes? Fill it with ones? Use it for storing values that are commonly accessed and need low-overhead? Ihe last one sounds pretty awesome.
A tagged (or indexed) pointer is a pointer with the pattern I just described - only some of the bits in it are actually used to store the memory location it points to, and the remaining bits are used to store metadata associated with the object or memory pointed to. In Objective-C, tagged pointers are currently supported on x86_64 and ARM64, and are created by the default alloc implementation. And herein lies another of the gotchas: if you override allocWithZone in your class, you will by default lose tagged pointers and all the wonderful optimizations they provide - we’ll talk about how to avoid this a bit later.
At this point I also should ention that the tagged pointers Objective-C uses should never have assumptions made about the values they store or their internal structure. The internal details of tagged pointers, and even the presence of them, can change between runtime releases in unpredictable ways. There are generally accessors to get everything you would want out of a tagged pointer, so there’s no real reason to access the data directly.
In the case of
objc_object we don’t have the individual object pointers themselves act as tagged pointers - if we did, people writing code would have to use special accessors to retrieve the pointer to memory out of the tagged pointer. Instead, we use something that every objects has to have as we mentioned before: the
One thing stored in this pointer, among other things, is the retain count of the object. Since it is stored here, we can access and modify it very quickly - its only manipulating a bitfield on an object from a C function. For further optimization, the C function can often be inlined. It turns out this is much faster than calling
objc_msgSend, but what happens when the limited space in the tag is filled, where does it go?
A SideTable to Put Your RetainCount On
When space in a tagged pointer for retain counts run out, they must be stored somewhere else. If we look at the code, we can see it stores it in the place we talked about briefly before - the side table - but the code does not make it entirely clear what the side table is or how it works. To answer that question, we have to look a little further down, where we handle cases where pointers are not tagged and we call
sidetable_retain. Let’s have a look at
sidetable_retain and see if we can get more detail on how side tables work.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
When we call
sidetable_retain we can see that it asserts if tagged pointers are enabled and our isa is a tagged pointer. If not, we query to get the side table for this pointer, and then we try to lock the table. If we succeed, we get the reference count element for this particular pointer, and increase it. If we didn’t acquire the lock, we go to something called
This makes sense. If we look into what a
SideTable is defined as, we’ll find that it’s an optimized hash table, with a hash function that is, well, have a look for yourself:
1 2 3 4 5 6
One more thing to note here is that side tables appear to be segmented and it appears there is more than one - and that’s true. To optimize performance, we first get a side table associated with a given range of addresses, and then we look up that particular address in what amounts to a fancy optimized hash table.
We don’t need to really get into the slow version of
sidetable_retain as it is very similar to
sidetable_retain, except that instead of trying to lock optimistically it spins on the spinlock until it becomes available. This is a performance optimization for the optimistic, and common, case where nothing is locking the retain count. As an aside - this also explains why you don’t have to synchronize retain and release - the runtime does this for you by either performing the operation on the tagged pointer or spin locking on the side table entry.
Putting it all together…
So it seems like we have an answer to our original question of where the retain count is stored, as well as how
release work. In short, the retain count is stored either in a sidetable - which is an optimized set of hash tables segmented by memory address, it is stored in a tagged pointer, or it is stored in both.
This only answers our first questions about how
release work though, and how they are optimized, but we still have a little more to look at before we fully understand ARC. This does give us a lot of information about why ARC encourages you to stay away from certain behaviors that break these optimizations though, and provides a good base for understanding additional internals in ARC. Before we close out this post, let’s review the high points of “things not to do” we learned from diving into the source code.
Custom Retain/Release Implementations
ARC pretty heavily discourages implementing
release calls on classes, but there are still ways you can get an implementation in. If you have any custom
release operations though your code is going to go through the slow path of using
objc_msgSend which, depending on usage patterns, could result in a pretty big performance hit. Apple spends a lot of time optimizing memory accesses, and between runtimes there are often a lot of changes, so it doesn’t make a lot of sense not to take advantage of these optimizations.
The documentation explicitly says unless you are implementing your own memory management scheme separately from ARC you should not override these methods - and that’s exactly what you should abide by here. If you’re counting on using
release to signal or log something in your code, that’s a really bad practice. Even if you’re willing to accept the optimization penalty, counting on
release being called is still semantically incorrect. As mentioned earlier in this document, ARC makes all kind of optimizations to avoid calling
release, so relying on them being called in any way is just flat out incorrect.
The only case where it might be valid to override
release is the case the documentation mentions: if you actually using an alternative custom memory management system. Perhaps you have to explicitly cooperate with another reference counting system, and the
release calls don’t end up calling
super but instead ask your implementation to decrement or increment the reference count. That might be fine, but just keep in mind this will make your code slower on these sorts of operations.
Custom Alloc Implementations
One of the things that is the responsibility of
allocWithZone: is to set the
isa pointer. As a result of this, if you overrice
allocWithZone: you may end up not having a tagged pointer present for the
isa field, and end up having to use the sidetable to store reference count information.
To avoid this, as my references note, you should explicitly be using the
object_setClass function to set the
isa pointer, not directly setting it. This is of particular concern for any codebase that is migrating to 64 bit support (as mandated by Apple for all iOS applications recently) as it is an easy change that will get you a decent boost in performance for most codebases. If you can avoid overriding alloc altogether, all the better.
In this post, we have looked at how
release worked in a great deal of depth, and now understand how choices we make in writing our own code can influence the behavior of ARC and its speed. We’ve also looked at some of the optimizations made in these parts of the reference counting implementation, and discussed how ARC takes advantage of them.
In the next post, part II of our in depth ARC investigation, we will start digging into even more ARC internals, focusing this time on weak pointers, how ARC gets away with optimizing and eliminating some calls to
objc_release through use of
objc_storeStrong and friends, and talk about the types of optimizations ARC is allowed to make at various points in the code.