Served three ways
A feature I definitely wanted when making this website was commenting. Initially, I was going to just use Disqus on this journey of not reinventing the wheel, but I wanted to write a little more Ruby, so I decided to roll my own.
Since I was using Mongoid, I thought this would be the best scenario to make use of embedded documents, but there were some issues. Principal among them, embedded comments can't
belongs_to documents. This is problematic from a design perspective because comments belong to both the post they were written for and the user who wrote them.
While it wouldn't be too difficult to code the behavior, it would lead to maintenance nightmares down the road. Furthermore, we would have to be extra careful about coding callbacks and deleting documents. All in all, complicating the codebase wouldn't be worth the advantage of embedding comments. Thus, we had to settle for
belongs_to, basically the straightforward design of a site with users, posts, and comments.
Not Eager Enough
We can achieve basically the same effect (not having to hit the database again when loading the comments) as using embedded documents by using eager loading. One thing to note though is that eager loading doesn't cascade, which makes perfect sense. If you eager load one kind of document, those documents don't also automatically bring in whatever other relations they may have. This is bad for us though, because
Comments have other
Comments (straightforward nesting comments). Thus, if we tried to access the
comments field in a
Comment, we would trigger another database query, which is a waste because we already loaded all comments when we fetched the post!
However, there are two fields that don't require a database access which give us all the information we need: the
id of the comment and the
parent_comment_id. We can identify which comments belong to which other comments by looking at the
parent_comment_id, and the
id of each comment lets us identify which comments are which, thus we can reconstruct our nested comments without having to touch the database again!
Served Three Ways
So, now we have our plan, but what's the most efficient way to go about it? Well, remembering that premature optimization is the root of all evil, I decided to try out any way that came to mind, with some help from a friend. There were three unique takes, all in
PostsHelper. The first way is basically to figure out what comments are there, and then parent them (
tree_comments_2_pass). The second is to use an
Array as a placeholder for parent comments we haven't found yet, and then when we find the comment, we parent the children in that array. The final way is basically the second approach, but instead of using an array, we stub the comment class to avoid one of the
Finally, there's nothing to do but to benchmark them, for which we use Ruby's
Benchmark class. You can see the details of the benchmark in the
PostsHelperSpec, but for the results, we used the following modifications (to really stress each implementation):
itrs = 10000 comment_list: [3, 2, 1, 2, 1, 2, 2, 4]
And the results (
ruby 2.3.0dev (2015-10-26 trunk 52281) [x86_64-linux] on an Intel Core i7 4810MQ):
Rehearsal -------------------------------------------------------------------- tree_comments_array_placeholder: 53.620000 0.120000 53.740000 ( 53.954377) tree_comments_2_pass: 34.900000 0.010000 34.910000 ( 34.882461) tree_comments_stub_placeholder: 40.190000 0.010000 40.200000 ( 40.184618) --------------------------------------------------------- total: 128.850000sec user system total real tree_comments_array_placeholder: 41.440000 0.010000 41.450000 ( 41.413919) tree_comments_2_pass: 37.550000 0.050000 37.600000 ( 37.576457) tree_comments_stub_placeholder: 39.230000 0.010000 39.240000 ( 39.211102) . Finished in 4 minutes 7.7 seconds (files took 1.86 seconds to load) 1 example, 0 failures
if statements are pretty expensive. While there isn't too much of a difference in performance, our two pass method actually outperforms the other two, which was surprising, though from a computer architecture stand point, we're taking advantage of branch prediction more than in the other two implementations. As we suspected though, stubbing out the comment class improved performance, though not considerably. We could of course avoid this necessity by using embedded comments, but that makes our relations and callback more complicated than they should be. With modest enough comment counts, there won't really be an effect on user experience. And if a post gets really popular, I can always implement caching (which is currently used to serve you the body of posts).