ARC in Depth: Part I

As I’ve mentioned before, when I use a system, I like to know why it works the way it does. About half the time, this leads me down the road of reading specifications and design documents, the other half it takes me into the depths of the code that makes everything I write run. Knowing how things work gives me a guide when things go wrong, as well as giving me context when I’m writing code. I try not to make assumptions in my code about how the runtime is implemented, but understanding how the runtime operates makes debugging and optimization a great deal easier. It’s also a lot of fun at very particular types of parties.

With all that in mind, today we’re going to take a look at ARC and figure out exactly how it works. We’re going to be diving into some of the runtime source code and looking at why certain operations do or don’t perform a certain way. We will also be making a quick detour through the runtime to help us understand how ARC uses it to accomplish its goals.

This post, due to length, will be split into two parts - I’ll be publishing the second part soon.

Let’s begin!

Why ARC Internals?

So maybe you buy into the idea that looking into ‘why’ things work the way they do is a good idea, but maybe you don’t quite buy why ARC is interesting. I can understand that point of view. ARC, at the base, is really just a reference counting system tied to a static analyzer - both of those components, while individually interesting, don’t seem all that complicated in terms of understanding how they work together.

I think it is glossing over the beauty of the Objective-C runtime not to understand these details, as well as missing out on a great opportunity to understand how to debug and optimize your own code. There are a number of surprising optimizations and implementation details you may have never even considered in the implementation of ARC and reference counting, and even lessons to be learned. Lastly, understanding the platform and runtime in which our code operates can help us to have more context as we approach software development as a whole, and leave us more enlightened when we run into impasses.

Reference Counts

An obvious place to start our exploration is to look at how reference counts themselves are implemented. To that end, let’s consider how some simple reference counting calls work. Objects have a retain and a release call available, and when called, they increment or decrement the retain count, which itself is accessible in a property called retainCount available on anything conforming to the NSObject protocol. Let’s start by focusing on where the retain count itself is stored.

The obvious place to store the retain count would be as a property of the object itself - perhaps a private member on NSObject and other root objects? That’s one way you could do it, but it isn’t exactly what Objective-C does. There are a few issues with using a property though, but the most notable one is we don’t want to have the overhead of invoking objc_msgSend - which is a very expensive operation compared to a simple C function call.

Instead, we want to be able to get performance close to the speed of the aforementioned function call and have it modify the retain count of the object in question as quickly as possible. To facilitate this, Objective-C stores the actual retain count of an object in one of two places. Where these work we’ll get it into in a minute as we unwind the code.

Let’s see if we can figure out how retain and release work. I would like to note before we really get into this that sometimes the files we’re talking about have several definitions of the same function or method I’ll be using the variant defined for tagged pointer support - since they handle the non-tagged case as well.

Understanding Retain and Release

When ARC generates a call to retain or release it’s actually invoking a call to the C functions objc_retain or objc_release - or a variant thereof. By doing this, we stay away from the time sink that is objc_msgSend for most cases, and we rely on good old fashioned function calls that in many cases can be inlined.

Let’s have a look at these two functions, and see if we can figure out how they work. Let’s look at how retain works[1]:

1
2
3
4
5
6
7
8
__attribute__((aligned(16)))
id
objc_retain(id obj)
{
    if (!obj) return obj;
    if (obj->isTaggedPointer()) return obj;
    return obj->retain();
}

So this is pretty simple, but let’s break it down a little.

  1. We verify that the object pointer is not nil
  2. We check if the pointer itself is a tagged pointer and if so we don’t actually retain it - we’ll get to why that is done in a minute
  3. Assuming our other checks passed, we go ahead and invoke the method retain on the Objective-C object (it turns out in the current runtime Objective-C objects are C++ objects - for those clever, this was already obvious in the C++ in the stack traces of many crash logs)

The release call looks just about the same, so I won’t include it here. So based on this, we know that we’re avoiding invoking retain or release on any tagged pointers, whatever those are, and that we invoke a C++ retain or release method. Let’s see where these two methods lead.

You Got Your C++ in My Objective-C!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
// Equivalent to calling [this retain], with shortcuts if there is no override
inline id
objc_object::retain()
{
    // UseGC is allowed here, but requires hasCustomRR.
    assert(!UseGC  ||  ISA()->hasCustomRR());
    assert(!isTaggedPointer());

    if (! ISA()->hasCustomRR()) {
        return rootRetain();
    }

    return ((id(*)(objc_object *, SEL))objc_msgSend)(this, SEL_retain);
}

Here we have the definition of objc_object::retain[2]. Now we’re cooking - this looks promising. So what does this code do? Good questions - the comments will help us out a little as we try to figure it out, so let’s go through section by section like we did earlier.

  1. We assert that we’re not using the defunct garbage collector, or if we are that this class has a “custom RR” whatever that is
  2. We assert that this is not a tagged pointer.
  3. We check to be sure there isn’t a “custom RR”, if there isn’t, we call rootRetain, otherwise we invoke objc_msgSend and call retain on ourselves.

The obvious question is what is a ‘custom RR’? Custom RR denotes a custom retain-release implementation. If your class overrides retain or release, the runtime calls objc_class::setHasCustomRR which tags that Class and all child classes as having a custom implementation, and therefore opting out of fast retain/release behavior. This is one of several reasons why it’s a bad idea to override retain and release in just about every case, and why ARC persuades you not to through compiler messages.

Okay - so that makes sense - we’re using the actual retain and release message when we have them overridden. Why does the objc_msgSend look so weird though? If you’ve read the 64-bit migration guide, it turns out that objc_msgSend has to be cast in this way or ‘Bad Things’ happen because it doesn’t necessarily know the types of the parameters, you end up performing a narrowing implicit cast followed by a widening implicit case in some cases - see my references for more information on this.

The release call looks identical to the retain call except that it invokes rootRelease for the fast path and sends the SEL_release selector to the object for the custom retain/release path.

The Root of The Matter

Let’s have a look at what rootRetain does and try to understand how it works. This one is a bit of a doozy, so get ready…[3]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
ALWAYS_INLINE id
objc_object::rootRetain(bool tryRetain, bool handleOverflow)
{
    assert(!UseGC);
    if (isTaggedPointer()) return (id)this;

    bool sideTableLocked = false;
    bool transcribeToSideTable = false;

    isa_t oldisa;
    isa_t newisa;

    do {
        transcribeToSideTable = false;
        oldisa = LoadExclusive(&isa.bits);
        newisa = oldisa;
        if (!newisa.indexed) goto unindexed;
        // don't check newisa.fast_rr; we already called any RR overrides
        if (tryRetain && newisa.deallocating) goto tryfail;
        uintptr_t carry;
        newisa.bits = addc(newisa.bits, RC_ONE, 0, &carry);  // extra_rc++

        if (carry) {
            // newisa.extra_rc++ overflowed
            if (!handleOverflow) return rootRetain_overflow(tryRetain);
            // Leave half of the retain counts inline and
            // prepare to copy the other half to the side table.
            if (!tryRetain && !sideTableLocked) sidetable_lock();
            sideTableLocked = true;
            transcribeToSideTable = true;
            newisa.extra_rc = RC_HALF;
            newisa.has_sidetable_rc = true;
        }
    } while (!StoreExclusive(&isa.bits, oldisa.bits, newisa.bits));

    if (transcribeToSideTable) {
        // Copy the other half of the retain counts to the side table.
        sidetable_addExtraRC_nolock(RC_HALF);
    }

    if (!tryRetain && sideTableLocked) sidetable_unlock();
    return (id)this;

 tryfail:
    if (!tryRetain && sideTableLocked) sidetable_unlock();
    return nil;

 unindexed:
    if (!tryRetain && sideTableLocked) sidetable_unlock();
    if (tryRetain) return sidetable_tryRetain() ? (id)this : nil;
    else return sidetable_retain();
}

Phew, that’s more code than I generally include in my blog posts, but it’s worth every line! Let’s dig in and start figuring out what this thing does.

Starting at the top, we can see that we do what has become our standard preamble - we assert no GC and return the object itself if we are a tagged pointer. The next thing we do is load some variables, and check if our newisa is indexed, and if not we perform the unindexed retain path by doing something called a sidetable_retain() unless we were asked to do a tryRetain in which case we try to use a sidetable retain, and if we fail, we return nil.

If we are indexed, we check if we were trying a retain, and if the object is already deallocating, we go to the try fail logic. If not, we continue and add RC_ONE to the newisa value, and story the carry. If we have a carry and we are handling carries, we execute an operation to move half of the extra_rc to the sidetable. Afterwards, regardless of what we did, we store the newisa value into our isa.

Phew, that’s pretty complicated, but we now know the answer to our quesiton, even if we don’t know what it means yet: retain counts are stored either in something called a sidetable if a pointer is unindexed, an isa if it is, or both if it’s indexed and overflows the newisa. Now we just need to know what all those words like sidetable and indexed mean.

What’s in An ISA (or how I learned to stop worrying and love bitfields)

Objects have to know what class they belong to for a number of reasons, but the most obvious is to invoke methods. In Objective-C method invocation means looking up IMP pointers and SEL pointers to invoke objc_msgSend which handles message passing - but where do you look these up from? ISA is the answer - it is the first pointer on every objc_object and it points to the Class object which that object is an instance of.

What does this have to do with retain counts? On 32-bit platforms, it doesn’t, it all comes down to those 32 extra bits you get on x86_64 or ARM64. 64 bits is more than we need to address all the memory most MMUs can support - in fact, on x86_64, it’s only 48 bits. That means that even though the pointer width of the architecture is 64 bits we only need 48 bits, which means we have 16 bits left over - that’s two bytes! And as it turns out, ISA is a pointer - which means ISA has two extra bytes.

Wonder what would be a good way to use those two bytes? Fill it with zeroes? Fill it with ones? Use it for storing values that are commonly accessed and need low-overhead? Ihe last one sounds pretty awesome.

A tagged (or indexed) pointer is a pointer with the pattern I just described - only some of the bits in it are actually used to store the memory location it points to, and the remaining bits are used to store metadata associated with the object or memory pointed to. In Objective-C, tagged pointers are currently supported on x86_64 and ARM64, and are created by the default alloc implementation. And herein lies another of the gotchas: if you override allocWithZone in your class, you will by default lose tagged pointers and all the wonderful optimizations they provide - we’ll talk about how to avoid this a bit later.

At this point I also should ention that the tagged pointers Objective-C uses should never have assumptions made about the values they store or their internal structure. The internal details of tagged pointers, and even the presence of them, can change between runtime releases in unpredictable ways. There are generally accessors to get everything you would want out of a tagged pointer, so there’s no real reason to access the data directly.

In the case of objc_object we don’t have the individual object pointers themselves act as tagged pointers - if we did, people writing code would have to use special accessors to retrieve the pointer to memory out of the tagged pointer. Instead, we use something that every objects has to have as we mentioned before: the ISA pointer.

One thing stored in this pointer, among other things, is the retain count of the object. Since it is stored here, we can access and modify it very quickly - its only manipulating a bitfield on an object from a C function. For further optimization, the C function can often be inlined. It turns out this is much faster than calling objc_msgSend, but what happens when the limited space in the tag is filled, where does it go?

A SideTable to Put Your RetainCount On

When space in a tagged pointer for retain counts run out, they must be stored somewhere else. If we look at the code, we can see it stores it in the place we talked about briefly before - the side table - but the code does not make it entirely clear what the side table is or how it works. To answer that question, we have to look a little further down, where we handle cases where pointers are not tagged and we call sidetable_retain. Let’s have a look at sidetable_retain and see if we can get more detail on how side tables work[4].

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
id
objc_object::sidetable_retain()
{
#if SUPPORT_NONPOINTER_ISA
    assert(!isa.indexed);
#endif
    SideTable *table = SideTable::tableForPointer(this);

    if (spinlock_trylock(&table->slock)) {
        size_t& refcntStorage = table->refcnts[this];
        if (! (refcntStorage & SIDE_TABLE_RC_PINNED)) {
            refcntStorage += SIDE_TABLE_RC_ONE;
        }
        spinlock_unlock(&table->slock);
        return (id)this;
    }
    return sidetable_retain_slow(table);
}

When we call sidetable_retain we can see that it asserts if tagged pointers are enabled and our isa is a tagged pointer. If not, we query to get the side table for this pointer, and then we try to lock the table. If we succeed, we get the reference count element for this particular pointer, and increase it. If we didn’t acquire the lock, we go to something called sidetable_retain_slow.

This makes sense. If we look into what a SideTable is defined as, we’ll find that it’s an optimized hash table, with a hash function that is, well, have a look for yourself[5]:

1
2
3
4
5
6
// Pointer hash function.
// This is not a terrific hash, but it is fast
// and not outrageously flawed for our purposes.

// Based on principles from http://locklessinc.com/articles/fast_hash/
// and evaluation ideas from http://floodyberry.com/noncryptohashzoo/

One more thing to note here is that side tables appear to be segmented and it appears there is more than one - and that’s true. To optimize performance, we first get a side table associated with a given range of addresses, and then we look up that particular address in what amounts to a fancy optimized hash table.

We don’t need to really get into the slow version of sidetable_retain as it is very similar to sidetable_retain, except that instead of trying to lock optimistically it spins on the spinlock until it becomes available. This is a performance optimization for the optimistic, and common, case where nothing is locking the retain count. As an aside - this also explains why you don’t have to synchronize retain and release - the runtime does this for you by either performing the operation on the tagged pointer or spin locking on the side table entry.

Putting it all together…

So it seems like we have an answer to our original question of where the retain count is stored, as well as how retain and release work. In short, the retain count is stored either in a sidetable - which is an optimized set of hash tables segmented by memory address, it is stored in a tagged pointer, or it is stored in both.

This only answers our first questions about how retain and release work though, and how they are optimized, but we still have a little more to look at before we fully understand ARC. This does give us a lot of information about why ARC encourages you to stay away from certain behaviors that break these optimizations though, and provides a good base for understanding additional internals in ARC. Before we close out this post, let’s review the high points of “things not to do” we learned from diving into the source code.

Custom Retain/Release Implementations

ARC pretty heavily discourages implementing retain and release calls on classes, but there are still ways you can get an implementation in. If you have any custom retain or release operations though your code is going to go through the slow path of using objc_msgSend which, depending on usage patterns, could result in a pretty big performance hit. Apple spends a lot of time optimizing memory accesses, and between runtimes there are often a lot of changes, so it doesn’t make a lot of sense not to take advantage of these optimizations.

The documentation explicitly says unless you are implementing your own memory management scheme separately from ARC you should not override these methods - and that’s exactly what you should abide by here. If you’re counting on using retain and release to signal or log something in your code, that’s a really bad practice. Even if you’re willing to accept the optimization penalty, counting on retain and release being called is still semantically incorrect. As mentioned earlier in this document, ARC makes all kind of optimizations to avoid calling retain and release, so relying on them being called in any way is just flat out incorrect.

The only case where it might be valid to override retain and release is the case the documentation mentions: if you actually using an alternative custom memory management system. Perhaps you have to explicitly cooperate with another reference counting system, and the retain and release calls don’t end up calling super but instead ask your implementation to decrement or increment the reference count. That might be fine, but just keep in mind this will make your code slower on these sorts of operations.

Custom Alloc Implementations

One of the things that is the responsibility of allocWithZone: is to set the isa pointer. As a result of this, if you overrice alloc or allocWithZone: you may end up not having a tagged pointer present for the isa field, and end up having to use the sidetable to store reference count information.

To avoid this, as my references note, you should explicitly be using the object_setClass function to set the isa pointer, not directly setting it. This is of particular concern for any codebase that is migrating to 64 bit support (as mandated by Apple for all iOS applications recently) as it is an easy change that will get you a decent boost in performance for most codebases. If you can avoid overriding alloc altogether, all the better.

Conclusion

In this post, we have looked at how retain and release worked in a great deal of depth, and now understand how choices we make in writing our own code can influence the behavior of ARC and its speed. We’ve also looked at some of the optimizations made in these parts of the reference counting implementation, and discussed how ARC takes advantage of them.

In the next post, part II of our in depth ARC investigation, we will start digging into even more ARC internals, focusing this time on weak pointers, how ARC gets away with optimizing and eliminating some calls to objc_retain and objc_release through use of objc_storeStrong and friends, and talk about the types of optimizations ARC is allowed to make at various points in the code.

References

  • https://developer.apple.com/library/ios/documentation/General/Conceptual/CocoaTouch64BitGuide/ConvertingYourAppto64-Bit/ConvertingYourAppto64-Bit.html#//apple_ref/doc/uid/TP40013501-CH3-SW26
  • http://opensource.apple.com/source/objc4/objc4–646/runtime/NSObject.mm
  • http://www.opensource.apple.com/source/objc4/objc4–646/runtime/objc-object.h?txt
  • http://opensource.apple.com/source/objc4/objc4–646/runtime/objc-references.mm
  • http://opensource.apple.com/source/objc4/objc4–646/runtime/objc-weak.mm
  • http://opensource.apple.com/source/objc4/objc4–646/runtime/objc-private.h
  • http://opensource.apple.com/source/objc4/objc4–646/runtime/objc-runtime-new.mm
  • http://www.sealiesoftware.com/blog/archive/2013/09/24/objc_explain_Non-pointer_isa.html

  1. NSObject.mm (Under APSL License)  ↩

  2. objc-object.h (Under APSL License)  ↩

  3. objc-object.h (Under APSL License)  ↩

  4. NSObject.mm (Under APSL License)  ↩

  5. objc-private.h (Under APSL License)  ↩

Mar 31st, 2015

The Ruby Rabbit Hole

I like playing with different programming languages. I find that, much like spoken languages, using different programming languages broadens your horizons, and how you would express a concept in one language is different than another. I find that not only my style of programming, but also my thought processes, shift a bit as I use other languages.

This penchant for learning many languages, paired with my desire to understand how things work at as low a level possible, has led me down many rabbit holes that often took me far afield of just learning a language or using it to be productive. That said, these diversions have also been immensely useful, as rarely do you understand something as well as when you really dive into how it works and is implemented, and once you do that, programming at a higher level becomes far easier, less challenging, and a lot more fun.

Today, I’m going to explore one of these rabbit holes in Ruby. Oh, and for clarity, I’m really only going to dive into the implementation by Matz and I’m going to assume you have no or very limited familiarity with how Ruby is implemented.

What’s in an object?

Let’s start simply. In Ruby, everything is conceptually an “object”. This lets you do fun things like this:

1
2
3
2.bit_length     # This is sending a message to a number literal (a Fixnum instance)
2 + 4            # This is sending the + message to this Fixnum instance
2.+(4)           # This is sending the + message to this Fixnum as well, identical to above

If we’re really going to understand Ruby, the first thing we should probably ask is how objects are implemented, so let’s do that! An object, in Ruby’s implementation, looks something like[1]:

1
2
3
4
5
6
7
8
9
10
11
struct RObject {
    struct RBasic basic;
    union {
      struct {
        long numiv; /* only uses 32-bits */
        VALUE *ivptr;
        struct st_table *iv_index_tbl; /* shortcut for RCLASS_IV_INDEX_TBL(rb_obj_class(obj)) */
      } heap;
      VALUE ary[ROBJECT_EMBED_LEN_MAX];
    } as;
};

Wow, that’s a lot of stuff for a simple object. Let’s have a look and figure out what it does.

  • basic - This is an RBasic structure, we won’t get into everything it does in this post, but from a high level almost every Ruby structure contains this, and it helps identify what kind of object this is. So even if you don’t know what the object you have is, you can cast it to an RBasic, ask it what it is, then cast it to the appropriate type.
  • as - The as union is used to either store a VALUE array or a struct for instance variables, depending on how much and what kind of data needs to be stored.

Let’s look at a simple example:

1
my_var = MyClass.new

For this bit of code, Ruby will internally create an RObject for this object, with an RBasic member saying this is of class MyClass.

Now, there is one thing we should also mention here, let’s return to a modified version of our previous example:

1
i = 2.2

So, what do you think Ruby does internally here? If you said store an RObject, you’re wrong, but it’s not your fault. Let’s step back for a minute and actually look at RBasic[2]

1
2
3
4
struct RBasic {
    VALUE flags;
    const VALUE klass;
}

RBasic includes two real elements, one is a set of flags indicating what the ‘internal’ object type is (flags) and the class of the object (klass). This is a really important distinction and is in my mind the key to understanding Ruby in real depth: the internal Ruby representations of objects and the type they appear as in the userland is all stored by RBasic. That’s why it’s so important.

Why have different types of structs at all? Easy: optimization. So in this case, you’re going to get an RFloat struct internally, with an RBasic that says it is a struct of RFloat and of class Float.

What other structures exist you ask? Plenty! For details you can check out some of my sources below, but the one we’ll be focusing on today is RClass.

What’s in a class

So if in Ruby-land, everything is an object, then that must mean classes are objects, right? On the nose my friend! RClass is the struct type Ruby uses to store Class objects. So what actually is a class object, both from an external and internal perspective?

From an external perspective, a class is an instance of the class Class. Wow, that’s really confusing, huh? Maybe if we look at it from the other side of the two-way mirror it will clear it up.

Internally, a class in Ruby is an RClass struct that has an RBasic who’s klass is Class.

Great, so that brings us to our next question…

What’s in an RClass

So why do classes need their own separate structure? Well, let’s have a look at ruby.h and see what an RClass actually is[3].

1
2
3
4
5
6
struct RClass {
    struct RBasic basic;
    VALUE super;
    rb_classext_t *ptr;
    struct method_table_wrapper *m_tbl_wrapper;
};

So there’s one part of this that’s probably pretty obvious, and a few more that are a bit mysterious, so let’s break it down a bit. From a high level, these are what each component of RClass are used for: * basic - The same thing as in RObject, internal type, and the class * super - This one is the RClass that is the superclass to this one, and comes into play primarily during message sending * ptr - rb_classext_t sounds confusing. Is it used for class extensions? Is it used for extensibility? As it turns out, it isn’t, this is far more mundane - it’s used to store a bunch of extended metadata for the class, including things like constants, subclasses, etc. It’s also used by modules, but we’ll get to that in a bit. * m_tbl_wrapper - This one is important, and is probably what caught your eye. In addition to having a super class, what distinguishes a class from an object is that it has methods. So here you go - this is the method table for this class!

So how do we actually pass a message to an object? Let’s look at a quick example, and then discuss it from a high level. If you’d like more low level details, the source files are pretty readable and great resources, but some of the other links I’ve given at the end of this article have some walkthroughs as well.

1
2
3
4
5
6
7
8
9
10
11
class MyClass
  def test
    puts "I tested something!"
  end
end

class MyClass2 < MyClass
end

my_var = MyClass2.new
my_var.test

Okay, so let’s go through what Ruby is doing:

  1. Ruby instantiates a new object of type class, so Ruby creates an RClass, with klass of Class, and superclass of Object, and creates a global constant called MyClass to store the RClass
  2. Ruby adds a new method to the m_tbl_wrapper for test for the constant MyClass
  3. Ruby instantiates a new object of type class, so Ruby creates an RClass, with klass of Class and superclass of MyClass, and creates a global constant called MyClass2 to store the RClass - note that the class that does the inheriting just has a different super - it is still of type Class
  4. Ruby creates a new RObject with klass of MyClass2
  5. Ruby reads my_var, reads the RBasic on it and finds out its klass if of type MyClass2
  6. Ruby access MyClass2 and invokes an implementation searching function - it looks in a cache first, failing that, it searches m_tbl_wrapper for the method - it can’t find it there, so it looks in the super for MyClass2, which is MyClass, it searches again there, it finds it, and invokes it, passing the my_var RObject as a parameter so instance variables can be used.

Based on this, we not only now see how method invocation works, but also how it works in the case of inheritance. There’s one other gem of knowledge here that will be really useful when we get to understanding how class methods and singleton classes work: Ruby always searches an object’s class for methods, then goes up the inheritance hierarchy. In other words, it resolves methods using: klass -> super -> super -> super until it hits the top level superclass, BasicObject.

But wait, aren’t we forgetting something? Crap…

What’s in a module?

Well shoot, we handled classes so elegantly, and now we have modules. Luckily for us, modules are handled internally the same way classes are are! In other words, the flags element on RBasic is different for a module, but it’s still stored as an RClass struct! This means that the same process for method invocation, etc, is still the case.

That’s good news, I hate repeating myself.

Classy Class Methods

The elephant in the room from what I’ve explained so far, for those familiar with Ruby, is probably going to be how mix-ins and class methods fit into everything we’ve been discussing, so let’s get to it.

I’ve shown you how Ruby dispatches instance methods on an Object, so let’s look at a regular invocation of a class method in Ruby:

1
2
3
4
class MyClass
end

my_var = MyClass.new

We already discussed how Ruby creates a constant called MyClass that stores the RClass structure for MyClass - so what happens when I say MyClass.new.

The same thing as before.

Mind blown.

How is this possible? Well, when you say MyClass.new what Ruby evaluates is you’re invoking a method, in this case new, on an object, in this case an object represented by an RClass. So what does Ruby do? Easy, it checks klass on the RClass for MyClass which is, of course, Class. It then searches Class, which is itself - you guessed it - an RClass for the method new. It finds it, invokes it, and the rest, as they say, is history.

So what does this really mean? I’m not the first to say it, but I’m definitely excited to say it, class methods in Ruby are actually instance methods on the klass object of a particular Class.

Now I can hear what you’re saying - you agree this works all fine and dandy for new, but what about when you define class methods yourself? What about:

1
2
3
4
5
class MyClass
  def self.my_class_method
    puts "Invoked my class method"
  end
end

Certainly, my_class_method is not a method defined on the Class object, so how do we handle this? Exploring this question pushes us deeper into the rabbit hole, into the fun and wacky world of singleton classes.

Singleton Classes

Now we get to a topic that confuses many people, even seasoned Ruby folks: singleton classes. Before we begin though, we have a very key question: what is a singleton class? And also, what is a metaclass and eigenclass? Do they all mean the same thing? Unfortunately, a lot of different places have abused these terms or treated them inter-changably, but as usual, if we just look in the source, we find the answer[4]:

1
2
3
4
5
6
7
/*!
 * \defgroup class Classes and their hierarchy.
 * \par Terminology
 * - class: same as in Ruby.
 * - singleton class: class for a particular object
 * - eigenclass: = singleton class
 * - metaclass: class of a class. metaclass is a kind of singleton class.

Now we know that a singleton class is the class for a particular object. We also know that a metaclass is just the class of a Class object, and eigenclass is just another word for singleton class. Hm. That still isn’t very useful, is it? Let’s try to decode this a bit.

To rephrase the above, a singleton class is the Class object to pointed to by a particular RBasic klass element - so in other words, it’s the klass for a particular object. But why would we ever need to distinguish this from a normal class? Can we have classes that aren’t ‘normal’ classes?

Yup.

ICLASS

Remember how modules are RClass structs that have the flags set to represent a module? Well as it turns out, there’s another type of RClass struct, an ICLASS. An RClass with ICLASS flags represents an internal class, or, as we might call it on the Ruby side, an anonymous class. As we’ll see, these anonymous classes are generally singleton classes, but can be used for other things as well. This class was not explicitly defined in the Ruby world, and won’t ever be returned explicitly to it, though you can trick Ruby into giving it to you in some ways.

For those of you from Java, it’s kind of helpful to think of these as anonymous classes - because that’s very nearly exactly what they are - they’re classes that are not assigned to a global constant, and are created implicitly by Ruby to implement a few core extensibility.

So now that we know what a singleton class or ICLASS is, let’s return to the question at hand, how does it help us solve the problem with class methods? Let’s look at our code example again.

1
2
3
4
5
class MyClass
  def self.my_class_method
    puts "Invoked my class method"
  end
end

Let’s dissect what Ruby’s doing as it encounters this all just as we did before:

  1. Ruby creates a new constant MyClass and stores in it an RClass struct, with a klass of Class, and a super of Object.
  2. Ruby finds def self.my_class_method - it looks at what self is, in this case, the MyClass class, and finds it needs to declare a method on the Class object. Since we only want this method available for this intance of Class and not all instances of Class
  3. Ruby creates a singleton class for MyClass and creates a new method in the m_tbl_wrapper on it called my_class_method. Once this is complete, the RClass for MyClass now has klass pointing to this nameless singleton class. In term, the singleton class has a klass of Class and a super of Class

I realize this is a bit confusing, but what I’m telling you here is that when you try to define a class method, Ruby really creates a separate singleton class to store class methods for this instance of Class, and then puts the methods there. It follows up by making the new singleton class inherit from Class making sure that the existing methods that could be called can still be called.

I encourage you to return to the previous rule we established, method resolution works by going to klass, looking there, and then following super until it finds something. As we get into more convoluted examples, if you keep this in mind, it’ll make it quite easy to see where a method must be placed.

There’s one other impact of what I’ve noted as well, that you may have caught on to. Ruby makes no distinction for RClass in terms of singleton classes. This is how Ruby adds a method to any instance of a particular class, instead of the entire class. Let’s look at another example to make this clear:

1
2
3
4
5
my_var = MyClass.new
my_var_again = MyClass.new
def my_var.new_method
  puts "A method only on my_var"
end

After this, my_var.new_method will print the text we placed there, but my_var_again.new_method will raise an exception since the method is not found. Why? my_var has a klass pointing to a singleton class, containing an RClass with our new_method on it, and that singleton class has a super of MyClass. my_var_again on the other hand has a klass of MyClass.

So this means class methods are not only instance methods, they’re not special at all, they’re just a case of defining a method on an instance instead of on a class

With that, I think we’re ready to return to mix-ins…those of you thinking ahead may already have guessed how those get implemented.

Mixing it Up With Mix-Ins

What are mix-ins at their core? Given a module, you either import the instance methods on that module as class methods (using extend) or as instance methods (using include).

Based on this, and our knowledge of how method definitions on instances work, we can see that this must use singleton classes somehow. So let’s start by looking at how we would use an extenstion. Let’s check out some code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
module MyModule
  def test
    puts "Test from MyModule"
  end
end

def MyClass1
  include MyModule
end

def MyClass2
  extend MyModule
end

MyClass2.test          # Prints out "Test from MyModule"
my_var = MyClass1.new
my_var.test            # Prints out "Test from MyModule"

When we use include, we expect to call the methods as though they were instance methods defined within the class itself. Based on this, we can use the rule we discussed earlier to break down what Ruby must be doing: method resolution works by klass -> super -> repeat. So, invoking test on my_var Ruby looks at klass for the RObject referenced by my_var and finds MyClass1. Since we can probably assume Ruby wouldn’t just add methods willy nilly to MyClass1, that means Ruby must be doing something with the super on MyClass1

And that would be right. Ruby sees the include, and creates a new anonymous class using the same flag we use for singleton classes, ICLASS, and inserts it as the super for MyClass1. This anonymous class points to the same m_tbl_wrapper as the module MyModule - thereby sharing all of its methods.

It’s important to note that while this uses the same flag used for singleton classes, it isn’t technically a singleton class. By the definition we saw earlier from Ruby’s source, we know that a singleton class is the klass for a particular object - not just any old anonymous internal class.

So, how does this work for multiple include statements? Simple: more anonymous classes, more inheritance.

Now that we’ve tackled how include works, let’s consider extend. When we use extend we want to be able to call the module methods on the actual constant, in this case MyClass2 that refers to the RClass we created. Based on what we know, that means we the methods need to be in the klass for MyClass2, or somewhere in its inheritance hierarchy. We know we can’t change the inheritance hierarchy for Class, which is the klass for MyClass2, since that would add this module’s methods to every class.

That leaves us with the singleton class for MyClass2. And indeed, that is how Ruby deals with extend - we augment the super the same way we did for include, but we do it on the singleton class for MyClass2 - so now the klass for MyClass2 references the singleton class, which in turn has a super that is the anonymous class with the method table of MyModule, which has a super of Class.

Phew, that was a lot of words, but if you get through it, it’s pretty clear why singleton classes and anonymous classes play such a central role in Ruby: they’re how Ruby can be as dynamic as it is, and how so many of the most powerful and popular features of Ruby really work.

Conclusion

Hopefully after reading this, you have a much deeper, and better, understanding of things like “eigenclass” and “singleton class” and understand a bit how Ruby handles the everyday operations that you do. To bring this back to what I originally said, different languages force me to think in different ways, and for me, exploring how these languages really work at their core helps me relate new things I do to the existing things I’ve done. Understanding how you can use design patterns in one language to implement design features in another is a really profound thing, at least for me.

There’s a lot of things I haven’t covered here today, and should you want to go even deeper into this rabbit hole, my references are all quite excellent and provide a wonderful look at how all of this works in even greater detail.

Thanks

I’d like to call out the fact that this is my approach of explaining Ruby through looking at its implementation, but this is hardly a new idea, and has been done probably most notably and thoroughly by Burke Libbey - and I often referred to his material to sanity check my own work as I worked through the source code. I have included links to it, as well as my other sources, below, if you’d like more detail I’d refer you to those.

References

  • https://github.com/ruby/ruby/blob/aaed10716a55d659309a8636a41a8e159347a32c/include/ruby/ruby.h
  • https://github.com/ruby/ruby/blob/aacc35e144f2a3d5c145f85e337accd55a8acc90/internal.h
  • https://github.com/ruby/ruby/blob/6115f65d7dd29561710c3e84bb27180e5bab4380/class.c
  • https://ruby-hacking-guide.github.io
  • http://www.slideshare.net/burkelibbey/learn-ruby-by-reading-the-source
  • http://www.slideshare.net/burkelibbey/rubys-object-model-metaprogramming-and-other-magic

  1. from ruby.h (Under Ruby License)  ↩

  2. from ruby.h (Under Ruby License)  ↩

  3. from ruby.h (Under Ruby License)  ↩

  4. from class.c (Under Ruby License)  ↩

Dec 9th, 2014

ARC Exploration and Pitfalls

In the good old days we managed memory directly - and by good old days, I really mean before mainstream garbage collection, shared_ptr<T> and its cohorts, and our beloved Objective-C retain/release. In those days, memory errors were a lot more common, but also a lot better understood by most than today - after all, if you were manually managing your memory you understood quite clearly how memory management operations work.

These days, most working programmers don’t have a full understanding of the memory allocation and management scheme used by their language of choice, and honestly, for the most part, this doesn’t impact them. They may produce code that makes lots of unnecessary temporary copies, and probably isn’t as efficient in terms of latency or memory footprint as it could be if they had a better understanding, but for most programmers, that doesn’t matter most of the time I’d wager. That said - for those willing there is a lot to be gained by jumping into the deep end of the memory management pool and having a look at how it all works under the covers. To that end, let’s discuss how ARC works in Objective-C, and look at some common pitfalls.

Reference Counting

One step above manually managing your allocations and deallocations is the concept of reference counting. Most of you are familiar with reference counting, but for those who aren’t - the concept is you store, along with a pointer, a counter that indicates how many references exist to this memory address, and when it reaches zero, you deallocate the object. This means that objects are only deallocated, at least in theory, when they are unreachable and unreferenced. The benefit of reference counting is you avoid the overhead associated with garbage collectors, and also retain a lot of control over object lifetimes by controling the reference count.

Let’s automate it!

Originally in Objective-C, you had to manually call retain and release to increment and decrement the reference count of objects, respectively, which led to numerous errors where people would over-release an object or under-release an object, which led to crashes and memory leaks.

With the switch to LLVM and the addition of a powerful static analyzer to Objective-C’s toolbox, it was realized that the compiler could warn programmers when they likely forgot to insert retains or releases in their code. Once this was achieved, it was wondered if it could be taken one step further, and if if the compiler could automatically insert retains and releases where necessary. And hence we arrive at Automatic Reference Counting (ARC).

Ownership

One thing I’m going to be repeating over and over throughout this article is that the key principle behind memory management, and ARC in particular, is ownership. If you don’t clearly understand which objects create and own each other you’re going to run into memory managment issues sooner or later. ARC is not a magic bullet, and it’s important to understand what ownership means for your codebase as well as how it determines ARC behavior.

Ownership Qualifiers

For ARC to be able to automatically manage retain and release calls, it must understand how objects are owned, and which objects own each other, in your application. For this purpose, it provides four ownership qualifiers. Keep in mind that while these influence the retain and release behavior of objects, it’s still best to think about them as denoting ownership.

Strong

Strong pointers denote a form of ownership. Strong pointers retain objects they point to during assignment and release them when the pointer is on the left side of the assignment operator again. Most people are very familiar with this qualifier as it is the most commonly used, so I won’t go into any more detail.

Weak

Weak pointers denote a non-ownership reference, the most common example probably being delegates. They are used to break retain cycles, which are discussed later, and make several guarantees that make them very useful. They are nilled out when the objects they point to are deallocated.

Autorelease

Autoreleased pointers have a similar effect to calling [obj autorelease] in non-ARC code - they guarantee that an object will live across a call boundary and are primarily used for out parameters.

Unsafe Unretained

Unsafe unretained pointers are ‘primitive’ pointers - they have no retain or release operations associated with them and should rarely be used. One thing to note is that several older Objective-C APIs, including KVO, use __unsafe_unretained to keep references to objects, so it is important to check the older APIs for that type of issue and read the documentation very carefully while using them.

Inferring Ownership

Generally speaking, we rarely explicitly qualify the ownership of objects, and so ARC uses inference to determine what ownership qualifier to use. There are a few special cases, but most of the time, you can assume ARC will use __strong unless you tell it otherwise. I’ve outlined the one case you most likely need to know about below.

Out Pointers

Indirect pointers, that is, pointers of the form T**, have a different inference rule for their ownership qualification - in the case of method parameters, indirect pointers are inferred to posess the ownership qualifier __autoreleasing. The big reason for this is to avoid deallocation of “out” parameters like the common NSError** before the caller can use them.

Corner Cases

Losing ownership while executing

One of the most common points of confusion I’ve found regards __weak pointers and if a __weak pointer, if used to invoke a method, can suddenly be deallocated while the method is running, thereby leaving a method mid-invocation and with an invalid self pointer.

In a word: No.

ARC explicitly tells us that __weak pointers are retained for the duration of the fully qualified expression they are used in - this means that a method call on a __weak pointer either has a valid self the entire time it executes, or it is invoked on the nil pointer and does nothing - but it cannot be nilled out halfway through execution. Let’s look at some sample code:

1
2
3
4
5
- (void) testWeak
{
  __weak TestClass* test = _test;
  [test callMethod];
}

So you can see here we’re sending a message to a weak pointer - let’s see how the compiler handles this and look at the assembly:

1
2
3
blx _objc_loadWeakRetained
blx _objc_msgSend
bl  _objc_release

I’ve trimmed the assembly to remove some superfluous code and make it more compact - but the important parts remain. The compiler calls objc_loadWeakRetained which loads and retains the weak pointer, to ensure that for the duration of the expression it remains valid, in this case, that expression being a method call. Afterwards, objc_release is called to balance the previous retain.

However - there is more to this.

__strong pointers do not make this guarantee - if you invoke a method on a __strong pointer, and then its retain count reaches zero from being nilled in another thread, the method invocation will suddenly have an invalid self pointer. This is an important thing to remember, and I’ll say it several times, __strong implies you know something about an object’s ownership. ARC counts on you to make sure this situation can’t occur.

Bridged Casts

Bridging casts should be used when you need to convert between object types that are reference counted and toll-free bridged - so between NS and CF variants of the same underlying object. This means moving between NSDictionary* and CFDictionaryRef. The way you do this is important, because core foundation types, while reference counted, are not managed through ARC, so when they are casted to or from a type that is managed using ARC we need to let ARC know what to do.

  • __bridge casts between CF and NS types without transfering ownership - so whichever object is currently the “owner” remains the owner. In other words, if you are casting from a CFDictionaryRef to an NSDictionary* if you release the CFDictionaryRef the object will be deallocated even though the NSDictionary* still references it.
  • __bridge_retained casts an NS object to a CF object, and increments the retain count by one prior to the cast - this transfers ownership so the original object pointer can be set to nil without destroying the object.
  • __bridge_transfer casts a CF object to an NS object and tells ARC that it now has ownership of the object - so the object will be released at the end of the current scope automatically - you don’t need to call CFRelease on the original object handle.

It’s also worth noting that core foundation has macros that do similar things - CFBridgingRelease and CFBridgingRetain.

Consumed Parameters and Retained Returns

As discussed earlier, ARC really centers around the concept of ownership, and to this end it provides several attributes to signal that a method will assume ownership of a parameter, or give ownership of a return back to the caller. These impact how ARC inserts retains, subject to its optimization rules, and have some other technical impacts we’ll get into, but I encourage you to think of these primarily as denoting ownership instead of just modifying the retain count of the given parameters.

Normally, when passing parameters, ARC uses the same inference rules we discussed earlier, so most non-explicitly qualified parameters end up with __strong. This results in the method prologue generally having retain operations inserted during assignment in the form of objc_storeStrong - you can see an example of this below

1
2
3
4
mov r0, r3
mov r1, r2
bl  _objc_storeStrong
.loc  1 15 0 prologue_end

From a high level, what this means, is that the original caller still owns the object that it passes in, it remains responsible for its lifetime and semantically speaking the function is simply using the parameter for an operation - not taking over ownership or lifetime determination.

At a lower level, the impact of this is that once a method is called, before the method body begins, parameters (with the exception of the implicit self parameter) are stored. One thing to note, though incredibly unlikely, it is possible that this could generate an invalid memory location should a race condition exist between the objc_storeStrong and a release on the original object. In practice, this should not occur, particularly if you manage object lifetimes well, but is worth mentioning.

To change this behavior, you can attribute a parameter with the ns_consumed attribute. At a high level, this attribute tells ARC that the parameter being passed in is actually being given to the function, with the function now having ownership. From a lower level perspective, while ARC may still insert the objc_storeStrong call, you are guaranteed the parameter will be retained before the method is called instead of during the prologue - in other words, the object will be retained before begin passed as a parameter. This guarantees this pointer is in a valid state, and means the method is entered with a retain count one higher than usual. ARC cleans this up at the end of the method with a release prior to leaving the method. A small example of the generated assembly is below:

1
2
3
mov r0, r1
bl  _objc_retain
[method invocation]

It’s worth noting, as we go into detail in the optimization section, ARC may actually not even perform the objc_storeStrong on a parameter that is consumed, because it can rely on the fact that it comes in already retained.

The most notable method that has a consumed parameter is init which also is special since it consumes the implicit parameter: self. To consume the self pointer you attribute the entire method with the ns_consumes_self attribute.

In addition to consumed parameters, methods can also have an attribute that denotes that they surrender ownership of the object they return - this has the converse effect to the one we went into before - an object will be retained prior to it being returned, and the caller will release it at the end of the full expression it is a part of. The attribute that denotes this behavior is ns_returns_retained. Once again, while it does influence the behavior of ARC, it’s most useful to remember this as surrendering ownership to the caller.

Retain Cycles

Probably the most common, and hopefully best understood, aspect of ARC, and indeed reference counting in general, is the retain cycle. The general concept is simple: we figure out when to deallocate objects based on how many references (the retain count) they have, so what happens if object A references object B which also references A. They will eventually only reference each other, and be unreachable, but cannot be deallocated since their reference counts never fall to zero.

The classic approach to this is to use __weak pointers for these sorts of cases - so A references B, but B references A only weakly, thereby not incrementing the reference count of A and permitting A, and subsequently B, to be deallocated. This solution is fine, and works great in the general case, but there’s a few specific points I’d like to bring up.

Using __weak everywhere to avoid having to manage your object lifetimes is not a solution: it’s an even worse problem. Abuse of __weak to avoid thinking about object lifetimes is probably one of the worst problems I’ve seen in a lot of Objective-C code in terms of code quality. So, why is this so important? Glad you asked.

__weak in Objective-C implies that the object holding the reference does not know, and in fact, can’t know about the pointed-to-object’s lifetime in a meaningful way, because in fact it isn’t an owner of the object. Given this ARC makes a different set of guarantees around it, including that method calls invoked on weak pointers will have a valid self parameter for the duration of the method, a guarantee not made on __strong pointers, which ARC assumes users of understand the lifetimes of.

To this end, generally think about where you’re using __weak, and ask not only are you breaking a retain cycle, but why are you having to break a retain cycle here. Are you using __weak because the object you’re passing the object to really shouldn’t be an owner, or are you doing it because it breaks a retain cycle that is being caused by poorly thought out or evolved architecture?

Imprecise Lifetimes and Interior Pointers

One thing that can occasionally get people is using interior pointers with imprecise lifetimes. Wow, that was a lot of strange words in one sentence.

Interior pointers are pointers to objects within another object - for instance, if you have an Objective-C object shown below

1
2
3
4
5
@interface MyObject
{
  int* myInt;
}
@end

myInt would be an interior pointer in this object. The lifetime of an interior pointer is determined by its enclosing object, so when the enclosing object that stores the interior pointer gets deallocated, the interior pointers generally will as well.

So how does this end up causing a problem? Let’s say I want to store that interior pointer…like…this:

1
2
3
4
5
6
7
8
- (int*) doSomeStuff
{
  MyObject* obj = [[MyObject alloc] init];
  [obj setupMyInt];
  int* importantInteger = obj.myInt;
  *importantInteger += 3
  return *importantInteger;
}

This shouldn’t be a problem - I have a strong pointer to a MyObject that I create and it, as well as its interior pointer, live until the end of the doSomeStuff method.

Nope. importantInteger may point to invalid memory. Note I said may, not will, as it all really depends on how ARC optimizes this.

ARC does not guarantee precise lifetime semantics for every strong pointer, it only guarantees that the object won’t be deallocated until at least the last usage. In this case, we aren’t using the object, we’re using an interior pointer, and ARC has no idea that the int* in question will become invalid memory when it deallocates the enclosing object.

So, how can we get around this? Well, there’s a few things we can do. The first is, we can request precise lifetime semantics using the objc_precise_lifetime attribute. Using this disables ARC’s optimization for this variable and will prevent the problem, but really, the issue is probably not with the caller, but with the callee not properly letting ARC know that an interior pointer is being exposed - which we can handily let ARC know about by using the objc_returns_inner_pointer annotation on the method returning the interior pointer.

One thing to note about this approach is that this is of particular concern for non-object pointer types - that is, types not managed by ARC. In the case of a type managed by ARC the object will be retained once it is returned, so it won’t be deallocated even if the object that owns it is.

Unexpected Optimizations

The last gotcha I really want to bring up is something people get caught up in when they forget that ARC, at its core, is always trying to be fast: operations may or may not be pruned based on guarantees ARC can make at compile time. I have alluded to this before. ARC makes guarantees about the lifetimes of your objects, it doesn’t really guarantee every “theoretical” retain/release pair will end up in your code, nor does it guarantee that autorelease will ever actually be called - indeed, we’ve seen in a lot of cases ARC explicitly avoids this for efficiency.

This isn’t a big deal, and doesn’t cause problems unless you assume ARC is going to call retain, or release, or autorelease at certain times, instead of just assuming that ARC is going to make sure your objects stay alive in the ways it promises.

So that’s it…

Hopefully, this little document has helped you understand how ARC works a bit more at a higher level, as well as in a bit more technical depth. If you have anything you think I’ve missed or gotten wrong, please let me know.

I know there’s also a lot I haven’t covered - including method families, and some details I’ve skimmed over how autorelease and weak pointers work, and plenty of other low level details we can get into. If you liked this post, let me know, and if enough people are interested I may write a second one going into more details.

References

  • http://clang.llvm.org/docs/AutomaticReferenceCounting.html
  • http://www.bignerdranch.com/blog/arc-gotcha-unexpectedly-short-lifetimes/

Fun With Function Pointers

One fun aspect of working in a variety of languages is being exposed to a variety of approaches, as well as syntaxes, to passing functions around. It’s syntactically easier with the increasing acceptance of using lambda expression syntax across more classically imperative languages. Passing closures around, almost as though they were first class objects (perish the thought!), can lead to a lot of questions though from all sides, particularly folks not familiar with more functional programming approaches. In particular, closures bring up interesting questions as they capture scope. In this post, I’m going to address some of the most common questions, and pitfalls, I’ve found. By no means is it comprehensive.

Oh, I’m also going to be focusing on C-based derivatives (C, C++, and Objective-C) with a notable bias towards Mac technologies.

Function Pointers

Function pointers are the oldest and least technically complicated way to pass around functions. It works something like this:

1
2
3
4
void doSomethingAndCalMeBack(void (*callback)(int,int))
{
  callback(2,2);
}

In this case, you have a function that takes as a parameter a function pointer. This technique works in C, C++, and Objective-C, and while a bit heavy-weight syntactically (usually covered up by typedefs) gets the job done.

The benefits of this approach are pretty clear:

  • Separation of concerns, you don’t need specialized functions for everything, you can use callbacks to do the heavy lifting (think non-OOP strategy pattern).
  • Use of functional programming tools like map-filter-fold with a limited scope (lacking closure support)
  • Hot-swappable behavior without recompiling, combined with modular programming permits you to write very dynamic programs - the C standard library includes this ability and accepts function pointers for functions like qsort.

If you’re sticking to pure C, things stay pretty simple, one important thing to note is you should avoid casting function pointers to void * (you should probably not be doing this anyways…) because the compiler may represent function pointers entirely differently than it represents data pointers - so this cast can cause loss of data, even if you cast back appropriately.

Blocks

Thanks to clang, we get another way to handle passing functions around (at least in clang) with the block extensions. These are probably most notable as being used heavily in Objective-C for asynchronous programming, and they are basically lambdas by any other name.

Clang’s blocks are similar to the C++ lambda expressions we’ll be discussing a bit later on, but they capture anything from their surrounding scope that they read or write. Here’s an example:

1
2
3
4
5
int five = 5;

int(^addFive)(int) = ^int(int a) { return five + a; };

printf("addFive(3) = %d", addFive(3));

As you can see here, we declare a block, which is basically a function, and we are able to capture parts of the surrounding scope in it, and then invoke it later. It is worth noting that, as you can see, it captures the elements of the scope around it that it uses. This can have some interesting implications in terms of memory management, which we’ll tackle in a few.

One thing worth mentioning, for reasons I dig into a little more in the “advanced” section below, if you pass blocks around, it’s important that the callee copy the block or captured variables within the block will become invalid in most cases.

Blocks are useful because they provide us an easy way to create anonymous methods and implement functions such as generators. As noted previously, they are used extensively by the Cocoa and other frameworks used on clang and with Objective-C to provide a way to enable asynchronous programming.

Memory In Blocks

As we noted previously, blocks capture anything they reference, which can kind of be problematic when it comes to object lifetimes and memory management. In the case of Objective-C based objects, this isn’t too much of a problem, because we’re generally talking about heap allocated objects and we’re generally using automated reference counting - this doesn’t remove all problems, but does simplify the question of figuring out when a block is finished with an object.

One thing to note regarding blocks is that unless you use the __block storage class specifier to declare a variable before a block captures it, variables captured in a block are copied by value when the block leaves its parent scope, if you use __block it captures by reference and both you and the original scope can make changes that impact each other - these changes are also not guaranteed to be atomic, so be careful with that (it’s actually slightly more complex than this, see section below for more…).

One last thing - just because I said the captured values are copied by value, not reference, you should remember that pointers being copied by values does not mean the object being pointed to is immutable or cannot have messages sent to it: just that you can’t modify what the original pointer you copied from points to.

Am I weak?

One of the most common uses for blocks is asynchronous execution of a completion handler, eg: run this operation that loads data for me, alert me when complete. In this case, generally an object will create the block, and the block will in turn capture self so it can invoke a completion handler on the object that creates it. There is a problem here though - if the block has a strong reference to self, and then a processor object has a reference to the block, and self has a reference to the processor object, you have a retain cycle - and should the processor never exit, all these objects will never be deallocated, and can in turn cause further memory leaks in objects they have strong references to.

Remember, ARC does what you tell it, not what you want.

We get around this by creating a weak version of self to use within the block - this permits the block to retain a reference to self, but it also breaks the retain cycle. Here’s how that looks:

1
2
3
4
5
6
7
8
- (void) sendRequest
{

  __weak weakRef = self;
  [request sendWithBlock:^(int responseCode){
    [weakRef doSomeWork];
  }];
}

But there’s still a problem…

Or am I strong?

In this solution, we still run into issues because any method calls you make on the weak version of self will independently either invoke on the correct memory location, or on the nil pointer, depending on if something has deallocated the object weak self points to. This means one call could succeed, and the next call could fail. Even, potentially, worse this means you could run into a situation where you’re passing nil as a parameter to a method that doesn’t expect it, and you could crash.

One thing to note here that I’ve had regular debates with others about is whether or not when holding a weak reference to an object, that object could be deallocated in the middle of a method call. If you read the ARC specification, under the section on ownership semantics, you’ll see that reading _weak objects causes weak to insert a retain before the read, and a release after the read. This means that when you call a method on a weak pointer, the method will atomically either execute with a valid instance or it will go to nil.

The problem, as noted though, is that this is not the case if you have multiple method calls - each individually has a retain/release before/after the call, but there is no guarantee a dealloc won’t happen should a strong reference on another thread have been removed during a method call.

To solve this, you have a few options;

  • You can add a strong reference and cast the weak self object to a strong self object, then verify that strong self is intact - this is what you need to do if you’re passing self as a parameter of a method as well to verify it is not nil (unless the method you’re calling is fine with nil being passed in).

  • You can create one method on the self object that you invoke, and is guaranteed to succeed or fail, that in turn calls any other methods you need

  • You can understand your object lifecycles well enough to determine if it’s even necessary to use a weak self in this case, and subsequently only worry about retain cycles in cases where they can actually arise

I strongly recommend the third option.

Dig a bit deeper…

As I mentioned earlier, how blocks manage their memory and are allocated and passed around isn’t quite as simple as I explained. It’s true that you can consider __block as marking a variable to be passed by reference, but what it actually does is tell the compiler to store that variable (potentially) on the heap instead of the stack - or that if it is on the stack, to move it to the heap when a referencing block is copied.

When a block resides on the stack, variables captured from any other storage class outside of __block will be copied as though invoking a const copy constructor - so you end up with variables copied by value. For variables in __block you instead have variables copied without the above const rule, so you end up with the effect of a pass by reference. Oh, one other note, if a captured variable’s type does not include a copy constructor, it will throw an error.

I also noted above that you need to copy blocks when you pass them around - when a block “leaves” its parent scope (read: gets outside of its current stack frame) it risks having captured variables destroyed at the end of the enclosing scope, which can corrupt the block. By copying, you in turn cause the block and all of its captured variables to be moved to the heap and only be destroyed when the refcount reaches 0.

C++ Function Pointers

Luckily, as noted, function pointers are also fairly straightforward in C. When you get into C++ however, things get a bit more complicated. In C++, we have the additional concept of member functions which are associated with an instance of an object. This means for non-static functions in a class, we end up with something more like…

1
2
3
4
void doSomethingAndCallback(void (MyClass::*callback)(int,int),const MyClass& instance)
{
  instance->*callback(2,2);
}

Notice how we have to pass in an instance to invoke this method on. This can be problematic and lead to less-than-elegant solutions. You can use templating for some of this, but that can also lead to a lot more problems than maintainable solutions. There’s also some tricks I won’t get into here (but will reference later) that you can use involving templates to break encapsulation in conjunction with function pointers.

Put them together and you get…

When you intermix C++ and C function pointers you can get some fun results. The big issue is the calling conventions for invoking the functions may vary, and that C++ has (compiler specific) name mangling used to implement all the lovely namespaces - this means you need to be extra careful to use your extern “C” correctly throughout. Note I didn’t say be careful about passing member function pointers, because you shouldn’t be passing member functions pointers to C, remember? How would you even cast them to something C understands? There are techniques to avoid this and get around this, but you should generally be reviewing your architecture if you’re in this situation.

Functors and Functionoids

Another way we can pass functions around in C++ is using functors and functionoids. Functors are basically just objects that act as functions by overriding the operator(), functionoids do the same thing, but don’t override operator() and instead have an alternative member function. Why you’d choose one over the other are a bit outside the scope of this document, but I will bring up one important case - because you’re overriding the operator(), you can use functors with templated functions that take “function-like objects” which gives you flexibility of passing in either a function pointer, functor, etc, because they are all called the same way syntactically (though behind the scenes there are important differences). Let’s look at an example!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
struct between
{
  between(int low, int high) : high(high),low(low) { }
  bool operator()(int value) { return value < high && value > low; }
private:
  int low;
  int high;
};

between between_4_and_8(4,8);
if (between_4_and_8(5))
{
  std::cout << "It works!" << std::endl;
}

You can see here, each between instance encapsulates a logical function based on the parameters to the constructor - in a way, the constructor is generating functions for you. This storing of state is critical to what a functor is. This state in between calls can also be used as a safe way, when designed correctly, in multi-threaded applications, something that is traditionally difficult to do in C, and as we’ll see moving forward, a concern with closures as well.

C++ Closures

As I noted in the introduction, closures by many names have been making their way into more imperatively oriented languages as of late, and as such, we are starting to see programmers using more traditionally functional programming techniques (map, filter, fold comes to mind) in “traditionally” non-functional languages such as C++. The way C++ has dealt with closures is flexible, and it fits into the template functionality of C++ in such a way that it becomes relatively easy to template a function so it can accept a functor, function pointer, or closure.

But let’s not put the cart before the proverbial horse (only the horse is proverbial, the cart is real). Let’s look at an example.

1
2
3
4
5
int low = 5;
int high = 8;
auto between_4_and_8 = [=](int val) -> int {
  return val < high && val > low;
};

So in this example you can see that the closure is not only defining a function, but is also capturing the state around it. We’ll return to this state capture in a minute, but before going on, I’d like to bring us back to functors for a second.

State: Captured or Created?

It’s important to understand, a functor is actually an object which then has a function, and lambdas are (effectively) implemented in the same way - as a small class that overrides the function call operator. This similiarity being mentioned, it is worthwhile to note that how they keep state is slightly different. A functor explicitly declares its own state in its class definition, whereas a lambdas state is captured from the enclosing scope and either brought into the implicit class definition as a reference or copied by value - depending on how you specify capture should work.

Does this mean you can’t keep state in between executions with a lambda expression? No, but it does mean you have to explicitly declare a variable outside of the lambda expression for this purpose. This means that if you find yourself keeping track of complex state in a lambda expression and polluting the surrounding scope with variables that are unused, it may be worth considering using a functor instead.

Capture: Honey or Vinegar?

One aspect of closures that make them so powerful is that they are able to take the scope around them - the same way functors have local state. C++ permits you to specify several options for how to perform capture - including capturing by reference and by value, only referenced variables, all variables, or explicit variables.

One thing to note about this capture is that if you capture by reference, should you be running in a multithreaded environment, the values you’re using could be spontaneously changed and accesses to values are not guaranteed to be atomic - just like any other access in C/C++. When doing asynchronous programming, this can be an often overlooked problem.

Another thing we run into here is capturing stack allocated variables that can become invalid. For instance, if a lambda references a locally stack allocated integer within the function it is defined, when that function is popped off the stack, the reference to that variable is no longer valid. Avoid doing this, or if you have to capture things from a scope that is about to be invalid, put them on the heap instead.

STL Stuff

In addition to giving us beautiful, magnificent lambdas, C++ also has given us a number of structures in the standard template library to help us encapsulate the concepts of functions, function pointers, functors, functionoids, and lambdas. I’m going to go over a few of these briefly, but not in too much depth - some of these get into areas I’m not too interested in getting into in this article, but may in another, such as how some of the implicit conversions between functions and pointers work.

std::function

From a high level, an std::function is anything that could be invoked - this includes pointers to functions, Callable objects, lambda expressions, member function pointers, functors, functionoids, etc. The value in this object is that it is a very high level representation of a function and present a polymorphic way of handling functions that is consistent, making it easier to handle passing these around as arbitrary parameters. Combined with bind, you also get some very powerful magic we’ll get to in a minute.

std::bind

Bind does exactly what it says it does - given a Callable object, you then specify where each parameter of this function will come from, and it generates a forwarding wrapper. In other words, you can take a Callable (function pointers, lambda expression, functor, etc) and then get an std::function that takes fewer parameters, or re-maps parameters. Let’s look at a simple example:

1
2
3
4
5
6
7
8
9
int add(int a, int b)
{
  return a+b;
}

std::function generateAddFunction(int toAdd)
{
  std::bind(add,_1,toAdd);
}

In this example, the generateAddFunction returns a function that accepts an integer, and adds whatever you have specified to it.

This is incredibly powerful, and can permit you to create new functions simply by building on those already existing and augmenting them. This also permits you to simply and effectively provide alternative API surfaces (adapters, facades, etc) in a very lightweight way.

std::mem_fun

mem_fun is a way to wrap the ugly member function pointer syntax we discussed earlier into an object. Once wrapped in this way, it’s a lot easier to deal with member function pointers, and they can be passed around to anything that takes a function object.

One big happy family

Now that we’ve talked about how C++ and Objective-C do lambdas, it’s worth noting that clang supports equivalency between the two. This is great news, because it means something expecting an std::function can accept a block, and you can use things like std::bind on blocks as well. This can make working across language boundaries easier when working with libraries written in C++.

References

Below are the references I used while writing this article, I strongly recommend checking them out if you want additional details.

  • https://isocpp.org/wiki/faq/pointers-to-members
  • http://clang.llvm.org/docs/BlockLanguageSpec.html
  • http://clang.llvm.org/docs/AutomaticReferenceCounting.html
  • http://clang.llvm.org/docs/LanguageExtensions.html
  • http://clang.llvm.org/docs/BlockLanguageSpec.html
  • http://c-faq.com/ptrs/
  • http://en.cppreference.com/w/cpp/language/lambda
  • http://en.cppreference.com/w/cpp/utility/functional/function
  • http://en.cppreference.com/w/cpp/utility/functional/bind
  • http://en.cppreference.com/w/cpp/utility/functional/mem_fun
Nov 11th, 2014