The Ruby Rabbit Hole
I like playing with different programming languages. I find that, much like spoken languages, using different programming languages broadens your horizons, and how you would express a concept in one language is different than another. I find that not only my style of programming, but also my thought processes, shift a bit as I use other languages.
This penchant for learning many languages, paired with my desire to understand how things work at as low a level possible, has led me down many rabbit holes that often took me far afield of just learning a language or using it to be productive. That said, these diversions have also been immensely useful, as rarely do you understand something as well as when you really dive into how it works and is implemented, and once you do that, programming at a higher level becomes far easier, less challenging, and a lot more fun.
Today, I’m going to explore one of these rabbit holes in Ruby. Oh, and for clarity, I’m really only going to dive into the implementation by Matz and I’m going to assume you have no or very limited familiarity with how Ruby is implemented.
What’s in an object?
Let’s start simply. In Ruby, everything is conceptually an “object”. This lets you do fun things like this:
1 2 3
If we’re really going to understand Ruby, the first thing we should probably ask is how objects are implemented, so let’s do that! An object, in Ruby’s implementation, looks something like:
1 2 3 4 5 6 7 8 9 10 11
Wow, that’s a lot of stuff for a simple object. Let’s have a look and figure out what it does.
basic- This is an
RBasicstructure, we won’t get into everything it does in this post, but from a high level almost every Ruby structure contains this, and it helps identify what kind of object this is. So even if you don’t know what the object you have is, you can cast it to an
RBasic, ask it what it is, then cast it to the appropriate type.
as- The as union is used to either store a
VALUEarray or a struct for instance variables, depending on how much and what kind of data needs to be stored.
Let’s look at a simple example:
For this bit of code, Ruby will internally create an RObject for this object, with an RBasic member saying this is of class MyClass.
Now, there is one thing we should also mention here, let’s return to a modified version of our previous example:
So, what do you think Ruby does internally here? If you said store an
RObject, you’re wrong, but it’s not your fault. Let’s step back for a minute and actually look at
1 2 3 4
RBasic includes two real elements, one is a set of flags indicating what the ‘internal’ object type is (
flags) and the class of the object (
klass). This is a really important distinction and is in my mind the key to understanding Ruby in real depth: the internal Ruby representations of objects and the type they appear as in the userland is all stored by
RBasic. That’s why it’s so important.
Why have different types of structs at all? Easy: optimization. So in this case, you’re going to get an
RFloat struct internally, with an
RBasic that says it is a struct of
RFloat and of class
What other structures exist you ask? Plenty! For details you can check out some of my sources below, but the one we’ll be focusing on today is
What’s in a class
So if in Ruby-land, everything is an object, then that must mean classes are objects, right? On the nose my friend!
RClass is the struct type Ruby uses to store Class objects. So what actually is a class object, both from an external and internal perspective?
From an external perspective, a class is an instance of the class
Class. Wow, that’s really confusing, huh? Maybe if we look at it from the other side of the two-way mirror it will clear it up.
Internally, a class in Ruby is an
RClass struct that has an
Great, so that brings us to our next question…
What’s in an RClass
So why do classes need their own separate structure? Well, let’s have a look at ruby.h and see what an
RClass actually is.
1 2 3 4 5 6
So there’s one part of this that’s probably pretty obvious, and a few more that are a bit mysterious, so let’s break it down a bit. From a high level, these are what each component of RClass are used for:
basic - The same thing as in RObject, internal type, and the class
super - This one is the RClass that is the superclass to this one, and comes into play primarily during message sending
ptr - rb_classext_t sounds confusing. Is it used for class extensions? Is it used for extensibility? As it turns out, it isn’t, this is far more mundane - it’s used to store a bunch of extended metadata for the class, including things like constants, subclasses, etc. It’s also used by modules, but we’ll get to that in a bit.
m_tbl_wrapper - This one is important, and is probably what caught your eye. In addition to having a super class, what distinguishes a class from an object is that it has methods. So here you go - this is the method table for this class!
So how do we actually pass a message to an object? Let’s look at a quick example, and then discuss it from a high level. If you’d like more low level details, the source files are pretty readable and great resources, but some of the other links I’ve given at the end of this article have some walkthroughs as well.
1 2 3 4 5 6 7 8 9 10 11
Okay, so let’s go through what Ruby is doing:
- Ruby instantiates a new object of type class, so Ruby creates an
Object, and creates a global constant called
MyClassto store the
- Ruby adds a new method to the
testfor the constant
- Ruby instantiates a new object of type class, so Ruby creates an
MyClass, and creates a global constant called
MyClass2to store the
RClass- note that the class that does the inheriting just has a different
super- it is still of type
- Ruby creates a new
- Ruby reads
my_var, reads the
RBasicon it and finds out its
klassif of type
- Ruby access
MyClass2and invokes an implementation searching function - it looks in a cache first, failing that, it searches
m_tbl_wrapperfor the method - it can’t find it there, so it looks in the
MyClass2, which is
MyClass, it searches again there, it finds it, and invokes it, passing the
my_varRObject as a parameter so instance variables can be used.
Based on this, we not only now see how method invocation works, but also how it works in the case of inheritance. There’s one other gem of knowledge here that will be really useful when we get to understanding how class methods and singleton classes work: Ruby always searches an object’s class for methods, then goes up the inheritance hierarchy. In other words, it resolves methods using:
klass -> super -> super -> super until it hits the top level superclass,
But wait, aren’t we forgetting something? Crap…
What’s in a module?
Well shoot, we handled classes so elegantly, and now we have modules. Luckily for us, modules are handled internally the same way classes are are! In other words, the
flags element on RBasic is different for a module, but it’s still stored as an
RClass struct! This means that the same process for method invocation, etc, is still the case.
That’s good news, I hate repeating myself.
Classy Class Methods
The elephant in the room from what I’ve explained so far, for those familiar with Ruby, is probably going to be how mix-ins and class methods fit into everything we’ve been discussing, so let’s get to it.
I’ve shown you how Ruby dispatches instance methods on an Object, so let’s look at a regular invocation of a class method in Ruby:
1 2 3 4
We already discussed how Ruby creates a constant called
MyClass that stores the
RClass structure for
MyClass - so what happens when I say
The same thing as before.
How is this possible? Well, when you say
MyClass.new what Ruby evaluates is you’re invoking a method, in this case
new, on an object, in this case an object represented by an
RClass. So what does Ruby do? Easy, it checks
klass on the
MyClass which is, of course,
Class. It then searches
Class, which is itself - you guessed it - an
RClass for the method
new. It finds it, invokes it, and the rest, as they say, is history.
So what does this really mean? I’m not the first to say it, but I’m definitely excited to say it, class methods in Ruby are actually instance methods on the
klass object of a particular Class.
Now I can hear what you’re saying - you agree this works all fine and dandy for
new, but what about when you define class methods yourself? What about:
1 2 3 4 5
my_class_method is not a method defined on the
Class object, so how do we handle this? Exploring this question pushes us deeper into the rabbit hole, into the fun and wacky world of singleton classes.
Now we get to a topic that confuses many people, even seasoned Ruby folks: singleton classes. Before we begin though, we have a very key question: what is a singleton class? And also, what is a metaclass and eigenclass? Do they all mean the same thing? Unfortunately, a lot of different places have abused these terms or treated them inter-changably, but as usual, if we just look in the source, we find the answer:
1 2 3 4 5 6 7
Now we know that a singleton class is the class for a particular object. We also know that a metaclass is just the class of a
Class object, and eigenclass is just another word for singleton class. Hm. That still isn’t very useful, is it? Let’s try to decode this a bit.
To rephrase the above, a singleton class is the
Class object to pointed to by a particular
klass element - so in other words, it’s the
klass for a particular object. But why would we ever need to distinguish this from a normal class? Can we have classes that aren’t ‘normal’ classes?
Remember how modules are
RClass structs that have the flags set to represent a module? Well as it turns out, there’s another type of
RClass struct, an
ICLASS flags represents an internal class, or, as we might call it on the Ruby side, an anonymous class. As we’ll see, these anonymous classes are generally singleton classes, but can be used for other things as well. This class was not explicitly defined in the Ruby world, and won’t ever be returned explicitly to it, though you can trick Ruby into giving it to you in some ways.
For those of you from Java, it’s kind of helpful to think of these as anonymous classes - because that’s very nearly exactly what they are - they’re classes that are not assigned to a global constant, and are created implicitly by Ruby to implement a few core extensibility.
So now that we know what a singleton class or
ICLASS is, let’s return to the question at hand, how does it help us solve the problem with class methods? Let’s look at our code example again.
1 2 3 4 5
Let’s dissect what Ruby’s doing as it encounters this all just as we did before:
- Ruby creates a new constant
MyClassand stores in it an
RClassstruct, with a
Class, and a
- Ruby finds
def self.my_class_method- it looks at what self is, in this case, the
MyClassclass, and finds it needs to declare a method on the
Classobject. Since we only want this method available for this intance of
Classand not all instances of
- Ruby creates a singleton class for
MyClassand creates a new method in the
m_tbl_wrapperon it called
my_class_method. Once this is complete, the
klasspointing to this nameless singleton class. In term, the singleton class has a
klassof Class and a
I realize this is a bit confusing, but what I’m telling you here is that when you try to define a class method, Ruby really creates a separate singleton class to store class methods for this instance of
Class, and then puts the methods there. It follows up by making the new singleton class inherit from
Class making sure that the existing methods that could be called can still be called.
I encourage you to return to the previous rule we established, method resolution works by going to
klass, looking there, and then following
super until it finds something. As we get into more convoluted examples, if you keep this in mind, it’ll make it quite easy to see where a method must be placed.
There’s one other impact of what I’ve noted as well, that you may have caught on to. Ruby makes no distinction for
RClass in terms of singleton classes. This is how Ruby adds a method to any instance of a particular class, instead of the entire class. Let’s look at another example to make this clear:
1 2 3 4 5
my_var.new_method will print the text we placed there, but
my_var_again.new_method will raise an exception since the method is not found. Why?
my_var has a
klass pointing to a singleton class, containing an
RClass with our
new_method on it, and that singleton class has a
my_var_again on the other hand has a
klass of MyClass.
So this means class methods are not only instance methods, they’re not special at all, they’re just a case of defining a method on an instance instead of on a class
With that, I think we’re ready to return to mix-ins…those of you thinking ahead may already have guessed how those get implemented.
Mixing it Up With Mix-Ins
What are mix-ins at their core? Given a module, you either import the instance methods on that module as class methods (using
extend) or as instance methods (using
Based on this, and our knowledge of how method definitions on instances work, we can see that this must use singleton classes somehow. So let’s start by looking at how we would use an extenstion. Let’s check out some code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
When we use include, we expect to call the methods as though they were instance methods defined within the class itself. Based on this, we can use the rule we discussed earlier to break down what Ruby must be doing: method resolution works by
super -> repeat. So, invoking
my_var Ruby looks at
klass for the
RObject referenced by
my_var and finds
MyClass1. Since we can probably assume Ruby wouldn’t just add methods willy nilly to
MyClass1, that means Ruby must be doing something with the
And that would be right. Ruby sees the include, and creates a new anonymous class using the same flag we use for singleton classes,
ICLASS, and inserts it as the
MyClass1. This anonymous class points to the same
m_tbl_wrapper as the module
MyModule - thereby sharing all of its methods.
It’s important to note that while this uses the same flag used for singleton classes, it isn’t technically a singleton class. By the definition we saw earlier from Ruby’s source, we know that a singleton class is the
klass for a particular object - not just any old anonymous internal class.
So, how does this work for multiple include statements? Simple: more anonymous classes, more inheritance.
Now that we’ve tackled how
include works, let’s consider
extend. When we use
extend we want to be able to call the module methods on the actual constant, in this case
MyClass2 that refers to the
RClass we created. Based on what we know, that means we the methods need to be in the
MyClass2, or somewhere in its inheritance hierarchy. We know we can’t change the inheritance hierarchy for
Class, which is the
MyClass2, since that would add this module’s methods to every class.
That leaves us with the singleton class for
MyClass2. And indeed, that is how Ruby deals with extend - we augment the
super the same way we did for include, but we do it on the singleton class for
MyClass2 - so now the
MyClass2 references the singleton class, which in turn has a
super that is the anonymous class with the method table of
MyModule, which has a
Phew, that was a lot of words, but if you get through it, it’s pretty clear why singleton classes and anonymous classes play such a central role in Ruby: they’re how Ruby can be as dynamic as it is, and how so many of the most powerful and popular features of Ruby really work.
Hopefully after reading this, you have a much deeper, and better, understanding of things like “eigenclass” and “singleton class” and understand a bit how Ruby handles the everyday operations that you do. To bring this back to what I originally said, different languages force me to think in different ways, and for me, exploring how these languages really work at their core helps me relate new things I do to the existing things I’ve done. Understanding how you can use design patterns in one language to implement design features in another is a really profound thing, at least for me.
There’s a lot of things I haven’t covered here today, and should you want to go even deeper into this rabbit hole, my references are all quite excellent and provide a wonderful look at how all of this works in even greater detail.
I’d like to call out the fact that this is my approach of explaining Ruby through looking at its implementation, but this is hardly a new idea, and has been done probably most notably and thoroughly by Burke Libbey - and I often referred to his material to sanity check my own work as I worked through the source code. I have included links to it, as well as my other sources, below, if you’d like more detail I’d refer you to those.