Posts ARC in Depth: Part II
Post
Cancel

ARC in Depth: Part II

In my last post, I had a look at how the reference counting implementation in the Objective-C runtime works and talked about some lessons we can learn from the implementation. Reference counting is part of ARC, but only half of it, the other half is the “automated” portion. Today, I’m going to cover how automated reference counting ends up automated from a variety of levels. We’ll start with a brief tour of Clang, focusing on AST generation and semantic analysis, and then step into code generation where all those lovely retain and release calls actually get generated.

It’s worth noting that, as in the prior post, all of this is based upon the version of Clang you use, and could be changed at any time. This also only applies to Objective-C code - once the Swift compiler is open sourced I plan on having a look at its semantic analyzer and AST generator to see how it handles ARC as well.

Lets get started

Clang: Modular Compilation

Before we can understand how automated reference counting instructions are generated, we first have to understand how the compiler that generates ARC works, and how its design fundamentally enables this behavior.

Compilation with Clang

Clang is a modular recursive descent parser that generates LLVM IR (called bitcode). Clang is modular in that each step in Clang is tied to, but relatively contained from, other steps. This design allows Clang to act in a highly modular fashion - it is relatively easy to include Clang in a variety of products including IDEs, debuggers, and other code generators.

We’ll use a slightly simplified view of Clang’s phases to help us understand ARC.

  1. Lexing - The Clang lexer handles tokenizing the input - it is important to note that unlike many compilers/parsers, the lexer here does not distinguish between user identifiers, types, etc - this is the job of the next step of the process
  2. Parsing - The parser acts on the token stream generated by the lexer, and begins building an AST from the input. Throughout this, it regularly calls on the semantic analyzer to validate the input and ensure validity of things like types. The AST generation here plays a big part in how ARC ownership and lifetime qualifiers are applied. Most of the qualifiers and ownership information is added at this step.
  3. Semantic Analysis - The semantic analyzer works with the parser to validate types and make transformations and additions to the AST as necessary. The semantic analyzer is another place where ARC ownership and lifetime qualifiers can be added or modified, and is also where a lot of ARC rules are enforced as we will see.
  4. Code Generation - The code generation library takes an AST and uses it to generate an LLVM IR representation of the AST. The LLVM IR representation can then be passed to LLVM to generate platform specific assembly code which can then be assembled into a final binary representation. This step is where Clang actually emits ARC instructions like retain and release based on the ownership qualifiers and lifetime data created during AST generation and semantic analysis.

Semantic Analyzer and AST Generation in Depth

As I mentioned earlier, the semantic analysis and AST generation phases are the components we really want to focus on when looking at ARC ownership and rule verification. We’re going to look at each, and a sample of how each does its part in making ARC work.

AST Generation

The AST, or abstract syntax tree, is the internal tree representation of the code you’ve written, and is generated by the parser and semantic analyzer as they step through the token stream the lexer has generated. The AST is important in that it contains all the information necessary to generate code for your program, and is a simplified representation of your program itself. The AST provides numerous services to Clang in terms of generating nodes and parsing the state of the token string, but we’re going to focus on one specific part: ownership and lifetime.

The AST itself can generally determine its context and add the necessary implicit ownership and lifetime qualifiers based on the ARC rule set. In this way, the AST and the parser that generates it plays the key role in terms of generating the annotations the code generator requires to make ARC work.

Let’s have a look at an example of the AST handling ownership.

Climbing the AST

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
void ObjCMethodDecl::createImplicitParams(ASTContext &Context,
                                          const ObjCInterfaceDecl *OID) {
  QualType selfTy;
  if (isInstanceMethod()) {
    // There may be no interface context due to error in declaration
    // of the interface (which has been reported). Recover gracefully.
    if (OID) {
      selfTy = Context.getObjCInterfaceType(OID);
      selfTy = Context.getObjCObjectPointerType(selfTy);
    } else {
      selfTy = Context.getObjCIdType();
    }
  } else // we have a factory method.
    selfTy = Context.getObjCClassType();

  bool selfIsPseudoStrong = false;
  bool selfIsConsumed = false;

  if (Context.getLangOpts().ObjCAutoRefCount) {
    if (isInstanceMethod()) {
      selfIsConsumed = hasAttr<NSConsumesSelfAttr>();

      // 'self' is always __strong.  It's actually pseudo-strong except
      // in init methods (or methods labeled ns_consumes_self), though.
      Qualifiers qs;
      qs.setObjCLifetime(Qualifiers::OCL_Strong);
      selfTy = Context.getQualifiedType(selfTy, qs);

      // In addition, 'self' is const unless this is an init method.
      if (getMethodFamily() != OMF_init && !selfIsConsumed) {
        selfTy = selfTy.withConst();
        selfIsPseudoStrong = true;
      }
    }
    else {
      assert(isClassMethod());
      // 'self' is always const in class methods.
      selfTy = selfTy.withConst();
      selfIsPseudoStrong = true;
    }
  }

This method1 is pretty large, so let’s break it down a little bit. This method annotates a given Objective-C method declaration with ownership qualifiers. Before anything else, we resolve the type of self used in this method. This is important for later when we have to annotate self.

With this done, next we check if ARC is enabled. If so, we first check if the method is an instance method, if so, we see if it is attributed to consume self, and set a temporary if so. Based on the comments, we can get a pretty good understanding of what happens next. We get the reference to self that was resolved in the first block of code in the method and set it to strong since every instance method has a strong self. We also check if this is in the “init” method family (eg: new, init*, or attributed in such a way to be in this family) - if not, we add an implicit const to self to prevent modifying the value. We avoid this in the case of init to allow programmers to use the self = [super ...] idiom we are so familiar with.

If we are not an instance method, we must be a class method, so if we are not we assert. In class methods self is actually the Class instance, so it should always be const.

We can see here that what we’re really doing is not really adding much to the code itself, but rather annotating it in the way the Clang ARC specification specifies. The AST’s job is really to do exactly this: provide a code level representation of what the language standard specifies.

The Semantic Analyzer

The semantic analyzer performs several crucial operations for Clang. Most notably, it validates the types used, generates implicit casts, and applies a variety of annotations and modifications to the generated AST based on the context of the code. In other words, if parser helps determine if the given line of code make sense syntactically, then the semantic analyzer checks if they actually make sense in terms of meaning and makes various corrections and small changes. Compare them to a writer and an editor.

As far as our analysis of ARC is concerned, the semantic analyzers primary purpose is the enforcement of ARC rules on the code, checking for inconsistencies, and applying qualifiers and lifetime annotations in certain situations where the AST generation phase may not have done so. To this end, we’ll be looking at a few cases where the semantic analyzer provides enforcement for ARC rules.

Looking Inside Sema

All of the code for the semantic analyzer is in the Sema subdirectory in Clang’s source. For this example, let’s have a look at SemaExprObjC.cpp.

If we look at the start of the file, we can see it describes what it does2:

1
2
3
4
5
//===----------------------------------------------------------------------===//
//
//  This file implements semantic analysis for Objective-C expressions.
//
//===----------------------------------------------------------------------===//

Let’s look a little deeper and see if we can find some examples of some of the rules this semantic analysis module has.

1
2
3
4
5
6
7
8
9
10
ExprResult Sema::ParseObjCStringLiteral(SourceLocation *AtLocs,
                                        Expr **strings,
                                        unsigned NumStrings) {
  StringLiteral **Strings = reinterpret_cast<StringLiteral**>(strings);

  // Most ObjC strings are formed out of a single piece.  However, we *can*
  // have strings formed out of multiple @ strings with multiple pptokens in
  // each one, e.g. @"foo" "bar" @"baz" "qux"   which need to be turned into one
  // StringLiteral for ObjCStringLiteral to hold onto.
  StringLiteral *S = Strings[0];

So by reading this3, we can see a pretty clear explanation of what this function does - it parses an Objective-C string literal and will combine multiple string literals into one if necessary. This is not ARC related, but will definitely help us understand how the semantic analyzer works. Let’s look a little lower and see how it performs this operation4.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
  // If we have a multi-part string, merge it all together.
  if (NumStrings != 1) {
    // Concatenate objc strings.
    SmallString<128> StrBuf;
    SmallVector<SourceLocation, 8> StrLocs;

    for (unsigned i = 0; i != NumStrings; ++i) {
      if (!S->isAscii()) {
        Diag(S->getLocStart(), diag::err_cfstring_literal_not_string_constant)
          << S->getSourceRange();
        return true;
      }

      // Append the string.
      StrBuf += S->getString();

      // Get the locations of the string tokens.
      StrLocs.append(S->tokloc_begin(), S->tokloc_end());
    }

So we can see the semantic analyzer does exactly what you’d think for this case - it loops through all the strings present and concatenates them into a buffer. After this step, as you can imagine, it combines them into a single Obj-C string literal and returns this object.

It’s worth noting how well documented this code was. Clang is generally very good about this, and if you decide to go code spelunking you’ll usually be helped by a multitude of well written and helpful comments.

Now that we have a feel for how Clang’s semantic analyzers work, let’s see if we can find something related to ARC. Let’s look for ‘ARC’ in this file and see what we get5.

1
2
3
4
5
6
7
8
9
10
11
12
  // In ARC, forbid the user from using @selector for
  // retain/release/autorelease/dealloc/retainCount.
  if (getLangOpts().ObjCAutoRefCount) {
    switch (Sel.getMethodFamily()) {
    case OMF_retain:
    case OMF_release:
    case OMF_autorelease:
    case OMF_retainCount:
    case OMF_dealloc:
      Diag(AtLoc, diag::err_arc_illegal_selector) <<
        Sel << SourceRange(LParenLoc, RParenLoc);
      break;

This code snippet is from Sema::ParseObjCSelectorExpression. As you might guess, this snippet is responsible for the error you get in Xcode if you try to use @selector(retain) or associated calls. This enforces section 7.1.1 in the Clang ARC standard. As we noted earlier, as far as ARC is concerned most of the semantic analyzer’s work is of this type: making sure the source AST obeys ARC rules.

Code Generation

Once the AST has been created by the parser and has been updated, annotated, and verified by the semantic analyzer, we end up with what we want to send to the code generator. The source code for the code generator is, unsuprisingly, stored in the CodeGen subdirectory of the Clang source code. The code generator takes an AST generated by Clang, and evaluates it to generate LLVM IR.

As part of this evaluation process, the code generator takes the annotations, lifetime information, and ownership qualifiers previously inserted as part of the AST into account to generate our well known and loved retain and release calls.

Let’s have a look at an example of how the code generator performs this work. I’m going to snip out the “primitive” case of a non-ARC type to save us a little space here6

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
void CodeGenFunction::EmitStoreThroughLValue(RValue Src, LValue Dst,
                                             bool isInit) {
  if (!Dst.isSimple()) {
    ...
  }

  // There's special magic for assigning into an ARC-qualified l-value.
  if (Qualifiers::ObjCLifetime Lifetime = Dst.getQuals().getObjCLifetime()) {
    switch (Lifetime) {
    case Qualifiers::OCL_None:
      llvm_unreachable("present but none");

    case Qualifiers::OCL_ExplicitNone:
      // nothing special
      break;

    case Qualifiers::OCL_Strong:
      EmitARCStoreStrong(Dst, Src.getScalarVal(), /*ignore*/ true);
      return;

    case Qualifiers::OCL_Weak:
      EmitARCStoreWeak(Dst.getAddress(), Src.getScalarVal(), /*ignore*/ true);
      return;

    case Qualifiers::OCL_Autoreleasing:
      Src = RValue::get(EmitObjCExtendObjectLifetime(Dst.getType(),
                                                     Src.getScalarVal()));
      // fall into the normal path
      break;
    }
  }

So we can see what this piece of code does pretty clearly. If the destination object has an Objective-C lifetime attribute, we have to decide what to do with it. We look at the lifetime of the object, and since we’re doing an assignment, we emit the code necessary to handle each case. If our lifetime is strong, we call EmitARCStoreStrong which, as you might have guessed, emits the IR representation of objc_storeStrong. Similarly, for a weak pointer, we call EmitARCStoreWeak.

Using these simple rules, the code generator is able to understand what it needs to emit for each case of assignment. Similar, though sometimes a lot more complex, rules exist for a variety of cases throughout the code generator, always verifying if a given object has an Objective-C lifetime or ownership qualifier, and then acting on that.

The elimination of duplicate retain and release calls then is not really an ‘elimination’ per say in the way we would think about it if we were doing it manually. This elimination is actually just the compiler having absolute knowledge of its own rules, and being able to act on those rules in a minimal way to try to generate the minimal set of retains and releases required. In many cases, as we saw above, no real retain or release calls are generated at all - the compiler simply can emit code using objc_storeStrong and friends.

One other thing that is worth noting here is at no point do we care if this assignment is happening in C++ or Objective-C or anywhere else - we simply check if the AST has the necessary annotations. The code generator, at this step, doesn’t really care - it is acting solely on what is present in the AST, which includes no real notions of “source format” - only the tree that specifies the program’s state.

Conclusion

We have looked into the depths of Clang to understand how ARC is implemented at each level in the compiler stack: the parser, the semantic analyzer, and the code generator. We have looked at cases where ARC rules are enforced, where ARC annotations are generated, and where ARC calls are emitted into LLVM IR.

With this understanding it’s a lot easier to appreciate what ARC does for us when it manages our retain count, in terms of both optimizing calls to retain and release as well as helping us achieve the minimum set of retain and release calls required. In addition to this appreciation we can also learn some things from the design and implementation of ARC.

Lessons

  • A lot of what ARC does is done during AST generation and semantic analysis. If we aren’t careful in how we manage our objects, and aren’t aware of how implicit lifetimes and ownership qualifiers impact ARC’s behavior, we are always going to be at the mercy of rules we may not completely understand. It really is worth taking the time to understand the ARC specification and how it works.
  • The ARC system is pervasive throughout Clang’s architecture, and though Clang is modular, it’s difficult to really untangle it from the system it is in. Towards this end, it’s sometimes hard to determine the exact determinations ARC will make for complex code, and often, the only way to tell is either to look at the LLVM IR or look at the final assembly generated by LLVM. We should not be afraid to do this if it helps us understand our code.
  • The architecture for ARC, though spread throughout Clang, is actually based on very simple principles: lifetimes and ownership qualifiers. Using these primitives in a consistent manner has allowed Clang to have this almost magical ability that we all use every day, and we should take that to heart when we consider the designs of our own programs.

Thanks for Reading

Thanks for taking this journey with me - ARC is a fascinating look at compiler architecture and implementation, as well as an interesting solution to a complicated problem. As always, if you have any corrections please let me know - while I do my best to check the accuracy of my articles, there’s always opportunity for improvement, so feel free to let me know via email or Twitter.

References and Reading Materials

Source Attributions