Friday, February 26, 2010

Teh Thread Everlasting at One Year: A Graphical Retrospective

At 10:28 PM Pharyngula timestamp time (=EST) on February 24, 2010, teh Eternal Thread reached exactly one year in age. From shortly after its humble beginning as an argument with delugionists in a thread ostensibly about some comicbook movie to its current status as damn-near permanent Most Active comment thread of all of ScienceBorg, I have assiduously copied and pasted the date and time of every hundredth comment. This ridiculous time-wasting hobby has now yielded a number of sniny graphs and a few hard-won Truths; because I'm not really sure what the latter are, exactly, here I share the former.

First, the raw data, exactly one year of teh Thread (all figures are clickable and thereby enlargable):
Fig. 1: One year's worth of teh Thread Everlasting. See current update for details.

So, there it is, in all its glory, such as it is. Just the facts, ma'am. 365 days; 28049 comments.

On January 28, I had ventured the following forecasts of the incipient month's commenting:
Fig. 2: Predictions by extrapolation of best-fit polynomial equations for the whole data set and two pretty much arbitrary subsets (see linked post for details)

A few weeks later (February 14), in the last update before the Threadiversary, I essayed this update calculated from the entire Thread:

Fig. 3: In aqua, a fifth-order polynomial model of the whole Thread, forecast from Feb. 14 ahead to the first anniversary on Feb. 24. 2010.

At some point in there (2/01/10), I was also goaded into making a more precise (and therefore more likely to be WOTI) prediction for the count a few days later (see here). At the time, I noted, however, that the prediction was based on the red "hockey stick" equation (Fig. 2) and therefore "anything less than that and I will claim success."
Which I now so claim.
No real hockey stick, though, IMO. Yet.
I felt I had hedged my bets pretty well there with a spread from approx. 25 to 40 K and indeed, the final total ended up satisfactorily bracketed, and remarkably (and entirely coincidentally) close to the babybear green curve in Fig. 2 and not far from the aqua whole-Thread curve in Fig. 3 either..

Now, one of my commenters has opined that my "methods for prediction are far too ad-hoc to be considered scientific or to make accurate predictions...I do not believe the rates can be predicted." Of course I found this both highly insulting and entirely correct. There is absolutely no reason at all that the relationship between time and blog-comments should or ought to be expected to fit any particular curve, and further, no justification whatsoever for extrapolating a nearly arbitrary polynomial equation from the data to which it was illegitimately fit into the unknowable future.
This does not stop me from continuing the practice, however, in what follows directly. Why? Because for some stupid reason it entertains me to do so. It also occupies much time that would otherwise be in danger of being well-spent on grading and writing and all the other shit that I do this instead of. Anyway. More on the kibitzer "qbsmd"--if that is her or his real name--below.

OK but anyway so I thought about subdividing teh Thread into like segemnts in various ways for analysis and shit, and I squinted hard at various instars of Fig. 1 with that in mind, but with the whole year of data in hand (if you had Fig. 1 in your hand) there is one clear landmark breakpoint.
I refer, of course, to the horror--the HORROR--of which threadizens whisper still, the Anastomosation. [!!!!!!!!!!!!] Look at Fig. 1: that red threadmarker is the fulcrum on which the lever of teh Thread pivots, more-or-less clearly separating and delineating phlegmatic Before from manic After. Lifting and separating.

[Exclusive!!! IJTS researchers investigating the anastomisation catastrophe have recently uncovered shocking smoking-gun evidence of premeditation on the part of teh ECO, as well as an unsettling, disquietingly prescient warning of anastomisation foretold. We now return you to your regularly scheduled retrospective.]

For reference, therefore, here is a representation of teh Thread's post-anastomisation Modern Era:
Fig. 4: Teh Thread's Modern Era, post-anastomisation, described by linear and quadratic regressions.

The impressive fit of the quadratic curve suggests that not just comment counts but commenting rates are still on the increase. We can examine this apparent trend in more detail by looking at the integrated, or derived, or whatever data on subThread duration and commenting rate. I took a stab at this kind of descriptive analysis once before, and since I haven't had any better ideas since, here's the same kind of thing again:

Fig. 5: SubThread duration. Each datum represents a subThread, plotted on the date that each ended. Pink line is a 3-point moving average. Vertical red line maeks anastomisation, the fulcrum on which the lever etc.

Well, it's no secret that subThreads have gotten briefer. The post-anastomisation mess is easier to follow on semi-log coordinates:

Fig. 6: Same as Fig. 5, only plotted on a logarithmic ordinate axis.

Duration seems to have stabilized, after a precipitous drop, at about 2 days. My guess is that it will remain right there for a while. I sincerely doubt teh CO will be bothered to spawn a daily subThread (nor should he), and longer sTs (in time) would mean at this point longer sTs (in comments and kilobytes).

Thread duration is not, of course, a truly random variable, because it depends in part on an arbitrary limit on length (in cs and kBs) imposed from On High by teh ECO PZ.
Traditionally, teh CO would slam down the portcullis on one subThread and spawn another whenever he noticed that the comment count had exceeded 1000. On the day before Squidmas Eve, PZ had the class to inquire among us hoi-polloi about the possibility of a lower target. Subsequent subThreads were (and are) portcullised when he notices that more than 666 comments have accumulated (as confirmed here).

So subThreads are shorter (in time) partly because of an imposed change in the arbitrary limit, but also, of course, because of a relatively recent increase in commenting rate:

Fig. 7: SubThread-wise commenting rates, conventions as in Figs. 5-6.

 So in my previous metathreadual analysis, only four sTs post-A, I wondered, "Can the ridiculous current rates, approaching 200/d, be sustained?" They were not; they were greatly exceeded instead. 300 comments/d and more is not unusual recently. 
Before asking the obvious causal question, though, I was curious about the relationship between commenting rate and subThread duration. Clearly the faster the commenting the shorter the subThread, but is the negative relationship linear? tight? diffuse? Was there a change in the relationship corresponding to the change in limit? Here are the data:
  Fig. 8: Bivariate relationship of commenting rate and subThread duration. Green data: pre-anastomisation; blue data: post-A.

I did not predict a power relationship, but the fit is tight. Of course, one of the best things about a power relationship is that it's linear on log-log coordinates:

Fig. 9: Like Fig. 8, but log-log coordinates and separate pre- & post-A regressions.

It's hard to be sure without running the ANCOVA (which I am not so nutty as to bother with for this), but to my eye the post-A line is probably not significantly shallower than the green pre-A data. The single point of overlap is consistent with homogeneity of both slopes and intercepts. If so, then we are led to the startling conclusion that the pink-boxed equation should provide robust predictions of future subThread durations from knowledge of commenting rate even if the portcullis-target is changed again! Also, I have a beautiful piece of land in Florida and/or a bridge in Brooklyn I can let you have cheap!

It is important to note, however, that the relationship between commenting rate and thread duration is not simple. For example, it has been noted (by qbsmd, as well as SEF and I think some other people in the Thread itself someplace) that one effect of arbitrarily turning over sTs sooner (by changing the target-limit) is that the Thread spends more time on the Pharyngula front page, and this probably increases commenting rates (a "frontpage" effect of transiently increased rates is clear in many subThreads illustrated in Fig. 1 and my anastomisation post). I'll add that much of teh Thread's burgeoning post-A growth occured during the month of January, when teh CO was travelling in other realms and posting a bit less often than usual. The point being that shorter subThreads cause higher commenting rates as higher commenting rates cause shorter subThreads. Perhaps shorter subThreads simply do not permit the frontpage effect to peter out (this is similar in concept to summation of electrical potentials and muscle tension).
Another idea is that there may be some positive feedback involving the widgets and gizmos in the sidebars. High commenting rates will cause more frequent appearance of teh Thread in the "recent comments" list (which I rely on) and the "Most Active" Top 5 as well, which might be expected to increase pagehits and commenting rates in turn. A vicious cycle, a spiralling out of control, a blithering bleak bloated blast of bad-blood-borne bubonic blech. Eschaton immanentized. End to all things. All.


PZ Myers said...

There's also the possibility that, if the rate of growth increases, I could try splitting -- branching into two or more threads and allowing people to choose between them. It might be an interesting experiment in speciation.

It would also be a little bit evil.

Rorschach said...

Uhm, how's that Assistant Professorship coming along, btw ?

Chas Peterson said...

Ooh, PZ Myetrs commented on my "weblog"! I shall cherish this thread like a Marv Throneberry autographed baseball.

I agree that the cladogenesis experiment is well worth trying.

By the way, i'd like to thank PZ for paying attention enough to end a subTHread right at the one-year mark, almost to the minute. I appreciated that very much.

Rorschach--oh, you know, pretty good. (If wasn't doing this in the middle of the night I'd be reading novels. This is the Story Of My Life.)