The Ruby Rabbit Hole

I like playing with different programming languages. I find that, much like spoken languages, using different programming languages broadens your horizons, and how you would express a concept in one language is different than another. I find that not only my style of programming, but also my thought processes, shift a bit as I use other languages.

This penchant for learning many languages, paired with my desire to understand how things work at as low a level possible, has led me down many rabbit holes that often took me far afield of just learning a language or using it to be productive. That said, these diversions have also been immensely useful, as rarely do you understand something as well as when you really dive into how it works and is implemented, and once you do that, programming at a higher level becomes far easier, less challenging, and a lot more fun.

Today, I’m going to explore one of these rabbit holes in Ruby. Oh, and for clarity, I’m really only going to dive into the implementation by Matz and I’m going to assume you have no or very limited familiarity with how Ruby is implemented.

What’s in an object?

Let’s start simply. In Ruby, everything is conceptually an “object”. This lets you do fun things like this:

1
2
3
2.bit_length     # This is sending a message to a number literal (a Fixnum instance)
2 + 4            # This is sending the + message to this Fixnum instance
2.+(4)           # This is sending the + message to this Fixnum as well, identical to above

If we’re really going to understand Ruby, the first thing we should probably ask is how objects are implemented, so let’s do that! An object, in Ruby’s implementation, looks something like[1]:

1
2
3
4
5
6
7
8
9
10
11
struct RObject {
    struct RBasic basic;
    union {
      struct {
        long numiv; /* only uses 32-bits */
        VALUE *ivptr;
        struct st_table *iv_index_tbl; /* shortcut for RCLASS_IV_INDEX_TBL(rb_obj_class(obj)) */
      } heap;
      VALUE ary[ROBJECT_EMBED_LEN_MAX];
    } as;
};

Wow, that’s a lot of stuff for a simple object. Let’s have a look and figure out what it does.

  • basic - This is an RBasic structure, we won’t get into everything it does in this post, but from a high level almost every Ruby structure contains this, and it helps identify what kind of object this is. So even if you don’t know what the object you have is, you can cast it to an RBasic, ask it what it is, then cast it to the appropriate type.
  • as - The as union is used to either store a VALUE array or a struct for instance variables, depending on how much and what kind of data needs to be stored.

Let’s look at a simple example:

1
my_var = MyClass.new

For this bit of code, Ruby will internally create an RObject for this object, with an RBasic member saying this is of class MyClass.

Now, there is one thing we should also mention here, let’s return to a modified version of our previous example:

1
i = 2.2

So, what do you think Ruby does internally here? If you said store an RObject, you’re wrong, but it’s not your fault. Let’s step back for a minute and actually look at RBasic[2]

1
2
3
4
struct RBasic {
    VALUE flags;
    const VALUE klass;
}

RBasic includes two real elements, one is a set of flags indicating what the ‘internal’ object type is (flags) and the class of the object (klass). This is a really important distinction and is in my mind the key to understanding Ruby in real depth: the internal Ruby representations of objects and the type they appear as in the userland is all stored by RBasic. That’s why it’s so important.

Why have different types of structs at all? Easy: optimization. So in this case, you’re going to get an RFloat struct internally, with an RBasic that says it is a struct of RFloat and of class Float.

What other structures exist you ask? Plenty! For details you can check out some of my sources below, but the one we’ll be focusing on today is RClass.

What’s in a class

So if in Ruby-land, everything is an object, then that must mean classes are objects, right? On the nose my friend! RClass is the struct type Ruby uses to store Class objects. So what actually is a class object, both from an external and internal perspective?

From an external perspective, a class is an instance of the class Class. Wow, that’s really confusing, huh? Maybe if we look at it from the other side of the two-way mirror it will clear it up.

Internally, a class in Ruby is an RClass struct that has an RBasic who’s klass is Class.

Great, so that brings us to our next question…

What’s in an RClass

So why do classes need their own separate structure? Well, let’s have a look at ruby.h and see what an RClass actually is[3].

1
2
3
4
5
6
struct RClass {
    struct RBasic basic;
    VALUE super;
    rb_classext_t *ptr;
    struct method_table_wrapper *m_tbl_wrapper;
};

So there’s one part of this that’s probably pretty obvious, and a few more that are a bit mysterious, so let’s break it down a bit. From a high level, these are what each component of RClass are used for: * basic - The same thing as in RObject, internal type, and the class * super - This one is the RClass that is the superclass to this one, and comes into play primarily during message sending * ptr - rb_classext_t sounds confusing. Is it used for class extensions? Is it used for extensibility? As it turns out, it isn’t, this is far more mundane - it’s used to store a bunch of extended metadata for the class, including things like constants, subclasses, etc. It’s also used by modules, but we’ll get to that in a bit. * m_tbl_wrapper - This one is important, and is probably what caught your eye. In addition to having a super class, what distinguishes a class from an object is that it has methods. So here you go - this is the method table for this class!

So how do we actually pass a message to an object? Let’s look at a quick example, and then discuss it from a high level. If you’d like more low level details, the source files are pretty readable and great resources, but some of the other links I’ve given at the end of this article have some walkthroughs as well.

1
2
3
4
5
6
7
8
9
10
11
class MyClass
  def test
    puts "I tested something!"
  end
end

class MyClass2 < MyClass
end

my_var = MyClass2.new
my_var.test

Okay, so let’s go through what Ruby is doing:

  1. Ruby instantiates a new object of type class, so Ruby creates an RClass, with klass of Class, and superclass of Object, and creates a global constant called MyClass to store the RClass
  2. Ruby adds a new method to the m_tbl_wrapper for test for the constant MyClass
  3. Ruby instantiates a new object of type class, so Ruby creates an RClass, with klass of Class and superclass of MyClass, and creates a global constant called MyClass2 to store the RClass - note that the class that does the inheriting just has a different super - it is still of type Class
  4. Ruby creates a new RObject with klass of MyClass2
  5. Ruby reads my_var, reads the RBasic on it and finds out its klass if of type MyClass2
  6. Ruby access MyClass2 and invokes an implementation searching function - it looks in a cache first, failing that, it searches m_tbl_wrapper for the method - it can’t find it there, so it looks in the super for MyClass2, which is MyClass, it searches again there, it finds it, and invokes it, passing the my_var RObject as a parameter so instance variables can be used.

Based on this, we not only now see how method invocation works, but also how it works in the case of inheritance. There’s one other gem of knowledge here that will be really useful when we get to understanding how class methods and singleton classes work: Ruby always searches an object’s class for methods, then goes up the inheritance hierarchy. In other words, it resolves methods using: klass -> super -> super -> super until it hits the top level superclass, BasicObject.

But wait, aren’t we forgetting something? Crap…

What’s in a module?

Well shoot, we handled classes so elegantly, and now we have modules. Luckily for us, modules are handled internally the same way classes are are! In other words, the flags element on RBasic is different for a module, but it’s still stored as an RClass struct! This means that the same process for method invocation, etc, is still the case.

That’s good news, I hate repeating myself.

Classy Class Methods

The elephant in the room from what I’ve explained so far, for those familiar with Ruby, is probably going to be how mix-ins and class methods fit into everything we’ve been discussing, so let’s get to it.

I’ve shown you how Ruby dispatches instance methods on an Object, so let’s look at a regular invocation of a class method in Ruby:

1
2
3
4
class MyClass
end

my_var = MyClass.new

We already discussed how Ruby creates a constant called MyClass that stores the RClass structure for MyClass - so what happens when I say MyClass.new.

The same thing as before.

Mind blown.

How is this possible? Well, when you say MyClass.new what Ruby evaluates is you’re invoking a method, in this case new, on an object, in this case an object represented by an RClass. So what does Ruby do? Easy, it checks klass on the RClass for MyClass which is, of course, Class. It then searches Class, which is itself - you guessed it - an RClass for the method new. It finds it, invokes it, and the rest, as they say, is history.

So what does this really mean? I’m not the first to say it, but I’m definitely excited to say it, class methods in Ruby are actually instance methods on the klass object of a particular Class.

Now I can hear what you’re saying - you agree this works all fine and dandy for new, but what about when you define class methods yourself? What about:

1
2
3
4
5
class MyClass
  def self.my_class_method
    puts "Invoked my class method"
  end
end

Certainly, my_class_method is not a method defined on the Class object, so how do we handle this? Exploring this question pushes us deeper into the rabbit hole, into the fun and wacky world of singleton classes.

Singleton Classes

Now we get to a topic that confuses many people, even seasoned Ruby folks: singleton classes. Before we begin though, we have a very key question: what is a singleton class? And also, what is a metaclass and eigenclass? Do they all mean the same thing? Unfortunately, a lot of different places have abused these terms or treated them inter-changably, but as usual, if we just look in the source, we find the answer[4]:

1
2
3
4
5
6
7
/*!
 * \defgroup class Classes and their hierarchy.
 * \par Terminology
 * - class: same as in Ruby.
 * - singleton class: class for a particular object
 * - eigenclass: = singleton class
 * - metaclass: class of a class. metaclass is a kind of singleton class.

Now we know that a singleton class is the class for a particular object. We also know that a metaclass is just the class of a Class object, and eigenclass is just another word for singleton class. Hm. That still isn’t very useful, is it? Let’s try to decode this a bit.

To rephrase the above, a singleton class is the Class object to pointed to by a particular RBasic klass element - so in other words, it’s the klass for a particular object. But why would we ever need to distinguish this from a normal class? Can we have classes that aren’t ‘normal’ classes?

Yup.

ICLASS

Remember how modules are RClass structs that have the flags set to represent a module? Well as it turns out, there’s another type of RClass struct, an ICLASS. An RClass with ICLASS flags represents an internal class, or, as we might call it on the Ruby side, an anonymous class. As we’ll see, these anonymous classes are generally singleton classes, but can be used for other things as well. This class was not explicitly defined in the Ruby world, and won’t ever be returned explicitly to it, though you can trick Ruby into giving it to you in some ways.

For those of you from Java, it’s kind of helpful to think of these as anonymous classes - because that’s very nearly exactly what they are - they’re classes that are not assigned to a global constant, and are created implicitly by Ruby to implement a few core extensibility.

So now that we know what a singleton class or ICLASS is, let’s return to the question at hand, how does it help us solve the problem with class methods? Let’s look at our code example again.

1
2
3
4
5
class MyClass
  def self.my_class_method
    puts "Invoked my class method"
  end
end

Let’s dissect what Ruby’s doing as it encounters this all just as we did before:

  1. Ruby creates a new constant MyClass and stores in it an RClass struct, with a klass of Class, and a super of Object.
  2. Ruby finds def self.my_class_method - it looks at what self is, in this case, the MyClass class, and finds it needs to declare a method on the Class object. Since we only want this method available for this intance of Class and not all instances of Class
  3. Ruby creates a singleton class for MyClass and creates a new method in the m_tbl_wrapper on it called my_class_method. Once this is complete, the RClass for MyClass now has klass pointing to this nameless singleton class. In term, the singleton class has a klass of Class and a super of Class

I realize this is a bit confusing, but what I’m telling you here is that when you try to define a class method, Ruby really creates a separate singleton class to store class methods for this instance of Class, and then puts the methods there. It follows up by making the new singleton class inherit from Class making sure that the existing methods that could be called can still be called.

I encourage you to return to the previous rule we established, method resolution works by going to klass, looking there, and then following super until it finds something. As we get into more convoluted examples, if you keep this in mind, it’ll make it quite easy to see where a method must be placed.

There’s one other impact of what I’ve noted as well, that you may have caught on to. Ruby makes no distinction for RClass in terms of singleton classes. This is how Ruby adds a method to any instance of a particular class, instead of the entire class. Let’s look at another example to make this clear:

1
2
3
4
5
my_var = MyClass.new
my_var_again = MyClass.new
def my_var.new_method
  puts "A method only on my_var"
end

After this, my_var.new_method will print the text we placed there, but my_var_again.new_method will raise an exception since the method is not found. Why? my_var has a klass pointing to a singleton class, containing an RClass with our new_method on it, and that singleton class has a super of MyClass. my_var_again on the other hand has a klass of MyClass.

So this means class methods are not only instance methods, they’re not special at all, they’re just a case of defining a method on an instance instead of on a class

With that, I think we’re ready to return to mix-ins…those of you thinking ahead may already have guessed how those get implemented.

Mixing it Up With Mix-Ins

What are mix-ins at their core? Given a module, you either import the instance methods on that module as class methods (using extend) or as instance methods (using include).

Based on this, and our knowledge of how method definitions on instances work, we can see that this must use singleton classes somehow. So let’s start by looking at how we would use an extenstion. Let’s check out some code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
module MyModule
  def test
    puts "Test from MyModule"
  end
end

def MyClass1
  include MyModule
end

def MyClass2
  extend MyModule
end

MyClass2.test          # Prints out "Test from MyModule"
my_var = MyClass1.new
my_var.test            # Prints out "Test from MyModule"

When we use include, we expect to call the methods as though they were instance methods defined within the class itself. Based on this, we can use the rule we discussed earlier to break down what Ruby must be doing: method resolution works by klass -> super -> repeat. So, invoking test on my_var Ruby looks at klass for the RObject referenced by my_var and finds MyClass1. Since we can probably assume Ruby wouldn’t just add methods willy nilly to MyClass1, that means Ruby must be doing something with the super on MyClass1

And that would be right. Ruby sees the include, and creates a new anonymous class using the same flag we use for singleton classes, ICLASS, and inserts it as the super for MyClass1. This anonymous class points to the same m_tbl_wrapper as the module MyModule - thereby sharing all of its methods.

It’s important to note that while this uses the same flag used for singleton classes, it isn’t technically a singleton class. By the definition we saw earlier from Ruby’s source, we know that a singleton class is the klass for a particular object - not just any old anonymous internal class.

So, how does this work for multiple include statements? Simple: more anonymous classes, more inheritance.

Now that we’ve tackled how include works, let’s consider extend. When we use extend we want to be able to call the module methods on the actual constant, in this case MyClass2 that refers to the RClass we created. Based on what we know, that means we the methods need to be in the klass for MyClass2, or somewhere in its inheritance hierarchy. We know we can’t change the inheritance hierarchy for Class, which is the klass for MyClass2, since that would add this module’s methods to every class.

That leaves us with the singleton class for MyClass2. And indeed, that is how Ruby deals with extend - we augment the super the same way we did for include, but we do it on the singleton class for MyClass2 - so now the klass for MyClass2 references the singleton class, which in turn has a super that is the anonymous class with the method table of MyModule, which has a super of Class.

Phew, that was a lot of words, but if you get through it, it’s pretty clear why singleton classes and anonymous classes play such a central role in Ruby: they’re how Ruby can be as dynamic as it is, and how so many of the most powerful and popular features of Ruby really work.

Conclusion

Hopefully after reading this, you have a much deeper, and better, understanding of things like “eigenclass” and “singleton class” and understand a bit how Ruby handles the everyday operations that you do. To bring this back to what I originally said, different languages force me to think in different ways, and for me, exploring how these languages really work at their core helps me relate new things I do to the existing things I’ve done. Understanding how you can use design patterns in one language to implement design features in another is a really profound thing, at least for me.

There’s a lot of things I haven’t covered here today, and should you want to go even deeper into this rabbit hole, my references are all quite excellent and provide a wonderful look at how all of this works in even greater detail.

Thanks

I’d like to call out the fact that this is my approach of explaining Ruby through looking at its implementation, but this is hardly a new idea, and has been done probably most notably and thoroughly by Burke Libbey - and I often referred to his material to sanity check my own work as I worked through the source code. I have included links to it, as well as my other sources, below, if you’d like more detail I’d refer you to those.

References

  • https://github.com/ruby/ruby/blob/aaed10716a55d659309a8636a41a8e159347a32c/include/ruby/ruby.h
  • https://github.com/ruby/ruby/blob/aacc35e144f2a3d5c145f85e337accd55a8acc90/internal.h
  • https://github.com/ruby/ruby/blob/6115f65d7dd29561710c3e84bb27180e5bab4380/class.c
  • https://ruby-hacking-guide.github.io
  • http://www.slideshare.net/burkelibbey/learn-ruby-by-reading-the-source
  • http://www.slideshare.net/burkelibbey/rubys-object-model-metaprogramming-and-other-magic

  1. from ruby.h (Under Ruby License)  ↩

  2. from ruby.h (Under Ruby License)  ↩

  3. from ruby.h (Under Ruby License)  ↩

  4. from class.c (Under Ruby License)  ↩

Dec 9th, 2014
Follow Me on Twitter

Comments