I like playing with different programming languages. I find that, much like spoken languages, using different programming languages broadens your horizons, and how you would express a concept in one language is different than another. I find that not only my style of programming, but also my thought processes, shift a bit as I use other languages.
This penchant for learning many languages, paired with my desire to understand how things work at as low a level possible, has led me down many rabbit holes that often took me far afield of just learning a language or using it to be productive. That said, these diversions have also been immensely useful, as rarely do you understand something as well as when you really dive into how it works and is implemented, and once you do that, programming at a higher level becomes far easier, less challenging, and a lot more fun.
Today, I’m going to explore one of these rabbit holes in Ruby. Oh, and for clarity, I’m really only going to dive into the implementation by Matz and I’m going to assume you have no or very limited familiarity with how Ruby is implemented.
What’s in an object?
Let’s start simply. In Ruby, everything is conceptually an “object”. This lets you do fun things like this:
1
2
3
2.bit_length # This is sending a message to a number literal (a Fixnum instance)
2 + 4 # This is sending the + message to this Fixnum instance
2.+(4) # This is sending the + message to this Fixnum as well, identical to above
If we’re really going to understand Ruby, the first thing we should probably ask is how objects are implemented, so let’s do that! An object, in Ruby’s implementation, looks something like1:
1
2
3
4
5
6
7
8
9
10
11
struct RObject {
struct RBasic basic;
union {
struct {
long numiv; /* only uses 32-bits */
VALUE *ivptr;
struct st_table *iv_index_tbl; /* shortcut for RCLASS_IV_INDEX_TBL(rb_obj_class(obj)) */
} heap;
VALUE ary[ROBJECT_EMBED_LEN_MAX];
} as;
};
Wow, that’s a lot of stuff for a simple object. Let’s have a look and figure out what it does.
basic
- This is anRBasic
structure, we won’t get into everything it does in this post, but from a high level almost every Ruby structure contains this, and it helps identify what kind of object this is. So even if you don’t know what the object you have is, you can cast it to anRBasic
, ask it what it is, then cast it to the appropriate type.as
- The as union is used to either store aVALUE
array or a struct for instance variables, depending on how much and what kind of data needs to be stored.
Let’s look at a simple example:
1
my_var = MyClass.new
For this bit of code, Ruby will internally create an RObject for this object, with an RBasic member saying this is of class MyClass.
Now, there is one thing we should also mention here, let’s return to a modified version of our previous example:
1
i = 2.2
So, what do you think Ruby does internally here? If you said store an RObject
, you’re wrong, but it’s not your fault. Let’s step back for a minute and actually look at RBasic
2
1
2
3
4
struct RBasic {
VALUE flags;
const VALUE klass;
}
RBasic
includes two real elements, one is a set of flags indicating what the ‘internal’ object type is (flags
) and the class of the object (klass
). This is a really important distinction and is in my mind the key to understanding Ruby in real depth: the internal Ruby representations of objects and the type they appear as in the userland is all stored by RBasic
. That’s why it’s so important.
Why have different types of structs at all? Easy: optimization. So in this case, you’re going to get an RFloat
struct internally, with an RBasic
that says it is a struct of RFloat
and of class Float
.
What other structures exist you ask? Plenty! For details you can check out some of my sources below, but the one we’ll be focusing on today is RClass
.
What’s in a class
So if in Ruby-land, everything is an object, then that must mean classes are objects, right? On the nose my friend! RClass
is the struct type Ruby uses to store Class objects. So what actually is a class object, both from an external and internal perspective?
From an external perspective, a class is an instance of the class Class
. Wow, that’s really confusing, huh? Maybe if we look at it from the other side of the two-way mirror it will clear it up.
Internally, a class in Ruby is an RClass
struct that has an RBasic
who’s klass
is Class
.
Great, so that brings us to our next question…
What’s in an RClass
So why do classes need their own separate structure? Well, let’s have a look at ruby.h and see what an RClass
actually is3.
1
2
3
4
5
6
struct RClass {
struct RBasic basic;
VALUE super;
rb_classext_t *ptr;
struct method_table_wrapper *m_tbl_wrapper;
};
So there’s one part of this that’s probably pretty obvious, and a few more that are a bit mysterious, so let’s break it down a bit. From a high level, these are what each component of RClass are used for:
basic
- The same thing as in RObject, internal type, and the classsuper
- This one is the RClass that is the superclass to this one, and comes into play primarily during message sendingptr
- rb_classext_t sounds confusing. Is it used for class extensions? Is it used for extensibility? As it turns out, it isn’t, this is far more mundane - it’s used to store a bunch of extended metadata for the class, including things like constants, subclasses, etc. It’s also used by modules, but we’ll get to that in a bit.m_tbl_wrapper
- This one is important, and is probably what caught your eye. In addition to having a super class, what distinguishes a class from an object is that it has methods. So here you go - this is the method table for this class!
So how do we actually pass a message to an object? Let’s look at a quick example, and then discuss it from a high level. If you’d like more low level details, the source files are pretty readable and great resources, but some of the other links I’ve given at the end of this article have some walkthroughs as well.
1
2
3
4
5
6
7
8
9
10
11
12
class MyClass
def test
puts "I tested something!"
end
end
class MyClass2 < MyClass
end
my_var = MyClass2.new
my_var.test
Okay, so let’s go through what Ruby is doing:
- Ruby instantiates a new object of type class, so Ruby creates an
RClass
, withklass
ofClass
, andsuperclass
ofObject
, and creates a global constant calledMyClass
to store theRClass
- Ruby adds a new method to the
m_tbl_wrapper
fortest
for the constantMyClass
- Ruby instantiates a new object of type class, so Ruby creates an
RClass
, withklass
ofClass
andsuperclass
ofMyClass
, and creates a global constant calledMyClass2
to store theRClass
- note that the class that does the inheriting just has a differentsuper
- it is still of typeClass
- Ruby creates a new
RObject
withklass
ofMyClass2
- Ruby reads
my_var
, reads theRBasic
on it and finds out itsklass
if of typeMyClass2
- Ruby access
MyClass2
and invokes an implementation searching function - it looks in a cache first, failing that, it searchesm_tbl_wrapper
for the method - it can’t find it there, so it looks in thesuper
forMyClass2
, which isMyClass
, it searches again there, it finds it, and invokes it, passing themy_var
RObject as a parameter so instance variables can be used.
Based on this, we not only now see how method invocation works, but also how it works in the case of inheritance. There’s one other gem of knowledge here that will be really useful when we get to understanding how class methods and singleton classes work: Ruby always searches an object’s class for methods, then goes up the inheritance hierarchy. In other words, it resolves methods using: klass -> super -> super -> super
until it hits the top level superclass, BasicObject
.
But wait, aren’t we forgetting something? Crap…
What’s in a module?
Well shoot, we handled classes so elegantly, and now we have modules. Luckily for us, modules are handled internally the same way classes are are! In other words, the flags
element on RBasic is different for a module, but it’s still stored as an RClass
struct! This means that the same process for method invocation, etc, is still the case.
That’s good news, I hate repeating myself.
Classy Class Methods
The elephant in the room from what I’ve explained so far, for those familiar with Ruby, is probably going to be how mix-ins and class methods fit into everything we’ve been discussing, so let’s get to it.
I’ve shown you how Ruby dispatches instance methods on an Object, so let’s look at a regular invocation of a class method in Ruby:
1
2
3
4
class MyClass
end
my_var = MyClass.new
We already discussed how Ruby creates a constant called MyClass
that stores the RClass
structure for MyClass
- so what happens when I say MyClass.new
.
The same thing as before.
Mind blown.
How is this possible? Well, when you say MyClass.new
what Ruby evaluates is you’re invoking a method, in this case new
, on an object, in this case an object represented by an RClass
. So what does Ruby do? Easy, it checks klass
on the RClass
for MyClass
which is, of course, Class
. It then searches Class
, which is itself - you guessed it - an RClass
for the method new
. It finds it, invokes it, and the rest, as they say, is history.
So what does this really mean? I’m not the first to say it, but I’m definitely excited to say it, class methods in Ruby are actually instance methods on the klass
object of a particular Class.
Now I can hear what you’re saying - you agree this works all fine and dandy for new
, but what about when you define class methods yourself? What about:
1
2
3
4
5
class MyClass
def self.my_class_method
puts "Invoked my class method"
end
end
Certainly, my_class_method
is not a method defined on the Class
object, so how do we handle this? Exploring this question pushes us deeper into the rabbit hole, into the fun and wacky world of singleton classes.
Singleton Classes
Now we get to a topic that confuses many people, even seasoned Ruby folks: singleton classes. Before we begin though, we have a very key question: what is a singleton class? And also, what is a metaclass and eigenclass? Do they all mean the same thing? Unfortunately, a lot of different places have abused these terms or treated them inter-changably, but as usual, if we just look in the source, we find the answer4:
1
2
3
4
5
6
7
/*!
* \defgroup class Classes and their hierarchy.
* \par Terminology
* - class: same as in Ruby.
* - singleton class: class for a particular object
* - eigenclass: = singleton class
* - metaclass: class of a class. metaclass is a kind of singleton class.
Now we know that a singleton class is the class for a particular object. We also know that a metaclass is just the class of a Class
object, and eigenclass is just another word for singleton class. Hm. That still isn’t very useful, is it? Let’s try to decode this a bit.
To rephrase the above, a singleton class is the Class
object to pointed to by a particular RBasic
klass
element - so in other words, it’s the klass
for a particular object. But why would we ever need to distinguish this from a normal class? Can we have classes that aren’t ‘normal’ classes?
Yup.
ICLASS
Remember how modules are RClass
structs that have the flags set to represent a module? Well as it turns out, there’s another type of RClass
struct, an ICLASS
. An RClass
with ICLASS
flags represents an internal class, or, as we might call it on the Ruby side, an anonymous class. As we’ll see, these anonymous classes are generally singleton classes, but can be used for other things as well. This class was not explicitly defined in the Ruby world, and won’t ever be returned explicitly to it, though you can trick Ruby into giving it to you in some ways.
For those of you from Java, it’s kind of helpful to think of these as anonymous classes - because that’s very nearly exactly what they are - they’re classes that are not assigned to a global constant, and are created implicitly by Ruby to implement a few core extensibility.
So now that we know what a singleton class or ICLASS
is, let’s return to the question at hand, how does it help us solve the problem with class methods? Let’s look at our code example again.
1
2
3
4
5
class MyClass
def self.my_class_method
puts "Invoked my class method"
end
end
Let’s dissect what Ruby’s doing as it encounters this all just as we did before:
- Ruby creates a new constant
MyClass
and stores in it anRClass
struct, with aklass
ofClass
, and asuper
ofObject
. - Ruby finds
def self.my_class_method
- it looks at what self is, in this case, theMyClass
class, and finds it needs to declare a method on theClass
object. Since we only want this method available for this intance ofClass
and not all instances ofClass
… - Ruby creates a singleton class for
MyClass
and creates a new method in them_tbl_wrapper
on it calledmy_class_method
. Once this is complete, theRClass
forMyClass
now hasklass
pointing to this nameless singleton class. In term, the singleton class has aklass
of Class and asuper
ofClass
I realize this is a bit confusing, but what I’m telling you here is that when you try to define a class method, Ruby really creates a separate singleton class to store class methods for this instance of Class
, and then puts the methods there. It follows up by making the new singleton class inherit from Class
making sure that the existing methods that could be called can still be called.
I encourage you to return to the previous rule we established, method resolution works by going to klass
, looking there, and then following super
until it finds something. As we get into more convoluted examples, if you keep this in mind, it’ll make it quite easy to see where a method must be placed.
There’s one other impact of what I’ve noted as well, that you may have caught on to. Ruby makes no distinction for RClass
in terms of singleton classes. This is how Ruby adds a method to any instance of a particular class, instead of the entire class. Let’s look at another example to make this clear:
1
2
3
4
5
my_var = MyClass.new
my_var_again = MyClass.new
def my_var.new_method
puts "A method only on my_var"
end
After this, my_var.new_method
will print the text we placed there, but my_var_again.new_method
will raise an exception since the method is not found. Why? my_var
has a klass
pointing to a singleton class, containing an RClass
with our new_method
on it, and that singleton class has a super
of MyClass
. my_var_again
on the other hand has a klass
of MyClass.
So this means class methods are not only instance methods, they’re not special at all, they’re just a case of defining a method on an instance instead of on a class
With that, I think we’re ready to return to mix-ins…those of you thinking ahead may already have guessed how those get implemented.
Mixing it Up With Mix-Ins
What are mix-ins at their core? Given a module, you either import the instance methods on that module as class methods (using extend
) or as instance methods (using include
).
Based on this, and our knowledge of how method definitions on instances work, we can see that this must use singleton classes somehow. So let’s start by looking at how we would use an extenstion. Let’s check out some code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
module MyModule
def test
puts "Test from MyModule"
end
end
def MyClass1
include MyModule
end
def MyClass2
extend MyModule
end
MyClass2.test # Prints out "Test from MyModule"
my_var = MyClass1.new
my_var.test # Prints out "Test from MyModule"
When we use include, we expect to call the methods as though they were instance methods defined within the class itself. Based on this, we can use the rule we discussed earlier to break down what Ruby must be doing: method resolution works by klass
-> super
-> repeat. So, invoking test
on my_var
Ruby looks at klass
for the RObject
referenced by my_var
and finds MyClass1
. Since we can probably assume Ruby wouldn’t just add methods willy nilly to MyClass1
, that means Ruby must be doing something with the super
on MyClass1
And that would be right. Ruby sees the include, and creates a new anonymous class using the same flag we use for singleton classes, ICLASS
, and inserts it as the super
for MyClass1
. This anonymous class points to the same m_tbl_wrapper
as the module MyModule
- thereby sharing all of its methods.
It’s important to note that while this uses the same flag used for singleton classes, it isn’t technically a singleton class. By the definition we saw earlier from Ruby’s source, we know that a singleton class is the klass
for a particular object - not just any old anonymous internal class.
So, how does this work for multiple include statements? Simple: more anonymous classes, more inheritance.
Now that we’ve tackled how include
works, let’s consider extend
. When we use extend
we want to be able to call the module methods on the actual constant, in this case MyClass2
that refers to the RClass
we created. Based on what we know, that means we the methods need to be in the klass
for MyClass2
, or somewhere in its inheritance hierarchy. We know we can’t change the inheritance hierarchy for Class
, which is the klass
for MyClass2
, since that would add this module’s methods to every class.
That leaves us with the singleton class for MyClass2
. And indeed, that is how Ruby deals with extend - we augment the super
the same way we did for include, but we do it on the singleton class for MyClass2
- so now the klass
for MyClass2
references the singleton class, which in turn has a super
that is the anonymous class with the method table of MyModule
, which has a super
of Class
.
Phew, that was a lot of words, but if you get through it, it’s pretty clear why singleton classes and anonymous classes play such a central role in Ruby: they’re how Ruby can be as dynamic as it is, and how so many of the most powerful and popular features of Ruby really work.
Conclusion
Hopefully after reading this, you have a much deeper, and better, understanding of things like “eigenclass” and “singleton class” and understand a bit how Ruby handles the everyday operations that you do. To bring this back to what I originally said, different languages force me to think in different ways, and for me, exploring how these languages really work at their core helps me relate new things I do to the existing things I’ve done. Understanding how you can use design patterns in one language to implement design features in another is a really profound thing, at least for me.
There’s a lot of things I haven’t covered here today, and should you want to go even deeper into this rabbit hole, my references are all quite excellent and provide a wonderful look at how all of this works in even greater detail.
Thanks
I’d like to call out the fact that this is my approach of explaining Ruby through looking at its implementation, but this is hardly a new idea, and has been done probably most notably and thoroughly by Burke Libbey - and I often referred to this material to sanity check my own work as I worked through the source code. I have included links to it, as well as my other sources, below, if you’d like more detail I’d refer you to those.
References and Reading Materials
- ruby.h from Ruby Source
- internal.h from Ruby source
- class.c from Ruby source
- Ruby Hacking Guide
- Burke Libbey’s Blog Post on Ruby’s Internals
- Burke Libbey’s Ruby Internals Slides
Source Attributions
From ruby.h (Under Ruby License) ↩
From ruby.h (Under Ruby License) ↩
From ruby.h (Under Ruby License) ↩
From class.c (Under Ruby License) ↩