Hacker News has been my favorite news site for close to 10 years now, and I’ve always loved ‘meta’ posts that aggregate statistics about HN. About 7 months ago my co-founder and I started gathering and tagging data from HN for our custom newsletter company, Morning Brief, so now I have enough data to write some HN analysis posts of my own!
In this first post, I want to look at the HN users who have submitted the highest number of ‘popular’ posts and see if we can learn anything about the HN community from the investigation.
Finding more effective submitters
One obvious way to find more effective submitters is to look at how often a user submits links compared to how many “popular” submissions that user has. We’ll define a ‘popular’ submission as anything above 50 upvotes – this is somewhat arbitrary, but 50 upvotes generally ensures that a post stays on the front page for at least a few hours. My guess is that the number of a user’s popular submissions will increase linearly with their total number of posts, but maybe not!
Here’s a chart that shows the correlation between the number of links a user submits vs the number of their ‘popular’ submissions (over the ~7 months that we’ve been collecting data for). Over the top of the line is charted the 'median' ratio between popular submissions and total submissions:
There are some interesting things about this chart! You can see that while popular submissions do increase linearly with total number of posts, some people have a much higher percentage of popular submissions than others. If we split submitters into two groups, it should be clear that each point above the median line represents a more effective submitter and each point below the line represents a less effective submitter.
The slope of the line is about 0.09, which means that the median user has 9 popular posts for every 100 posts they submit. The submitters above the line average more than 9 popular posts, and the submitters below the line average less than 9 popular posts.
A few of these data points stand out. For instance, the user with the highest number of popular posts (145 over this ~7 month time frame!) had a hit ratio of about 11 per 100, over twice the ~5 per 100 of the most prolific submitter.
Their ratios pale in comparison to the most effective submitters, though, who have a 1 in 2 hit rate! Those users obviously have the pulse of HN.
What are these submitters doing differently? Can we glean some insights from their aggregate behavior?
Optimal number of submissions
Let's see if there's a 'hit rate' sweet spot with respect to how much a user submits. We'll define 'hit rate' as the ratio between a user's popular posts and their total posts – i.e. how effective are they at submitting popular posts.
Here's the 'hit rate' charted against the total number of submissions for a user:
There's not really a good 'sweet spot' here, but there is an interesting phenomenon – when a user gets above about 100 total posts, the hit ratio 'floor' rises significantly. I assume this is because the only submitters that are willing to get to that point are the ones that get good positive feedback, and the rest just self-select out (or are removed by mods for spamming).
Top ten domains posted to by more effective submitters
I’m defining a ‘more effective submitter’ as a submitter who has a ‘popular submission’ to ‘total submission’ ratio of more than 0.09 (above the regression line in the chart above).
Let’s look at all the submissions made by more effective submitters. Here are the top ten domains that the more effective submitters post to the most:
So the users that have the highest proportion of popular submissions tend to very strongly favor github as a source of information! They post over twice as many github links than links to any other site.
Top ten domains posted to by less effective submitters
“Less effective” might be a bit of a misnomer, since we’re still talking about users with lots of popular posts. They just don’t have as high a hit rate as the more effective posters – their ‘popular submission’ to ‘total submission’ ratio is less than 0.09 (below the regression line in the first chart).
Let’s look at all the submissions made by less effective submitters. Here are the top ten domains that the less effective submitters post the most:
The distribution looks different, although many of the top sites are the same. One notable difference is the fact that arxiv.org is in the top 10 domains for more effective submitters, but doesn’t make the cut for the less effective submitters. The differences between the two groups aren’t all that stark, though, presumably because both groups are still getting lots of popular posts to the front page (just in different ratios).
Top ten domains across all popular submissions
Now that we know that the more effective submitters submit a higher proportion of github posts than the less effective submitters, my guess is that if we chart the total number of popular submissions by domain we’ll see that github is overrepresented in the top ten domains. Let’s check:
Yep! Github dominates popular submissions, by even more than we would expect. So if you want to get a higher number of your submissions to the front page, perhaps submitting valuable github links is the way to do it!
Submission time of day for more effective submitters
Now that we know who the more effective submitters are, let’s see if they’re doing anything different with respect to the times of day and the days of the week that they’re submitting on.
Cool! So hour 16 of the day in UTC is the most popular time for more effective submitters to submit links. Since most of this data was collected during daylight saving time in the US, this corresponds to 9am on the west coast of the US and 12pm on the east coast.
Is this different from the distribution of submissions from the less effective submitters? Let’s see:
The distribution is definitely different, although the differences are subtle. Hour 16 is still the most-submitted hour, barely edging out hour 14. The other subtle differences here are
1. The peak around hour 16 is more pronounced for the more effective submitters – maybe there are more west coast submitters in the ‘more effective submitters’ group?
2. There are proportionally more submitters in the ‘more effective submitters’ group in hours 0-8. This is a little weird to me, but might have something to do with having a higher number of more effective submitters in east asia?
Without a more extensive analysis we can’t really say anything definitive here, but I still thought it was interesting data to look at.
Submission day of week for more effective submitters
Here’s another place where the differences between the groups are fairly negligible. The more effective submitters have a higher proportion of posts on Tuesday and the less effective submitters have a higher proportion of posts on Thursday, but I wouldn’t read too much into this.
My co-founder saw the data above, and mentioned that he generally finds more success posting on the weekends, so he was curious about whether the data generally supports the conclusion that weekend submissions have a higher hit rate.
Monday ratio: 0.083
Tuesday ratio: 0.081
Wednesday ratio: 0.075
Thursday ratio: 0.076
Friday ratio: 0.075
Saturday ratio: 0.096
Sunday ratio: 0.099
It does seem to be the case! It looks like Sunday has the highest ratio of popular posts to total submissions, while Wednesday has the lowest. While I didn’t test the variance/significance, I did notice that this trend also holds when re-defining ‘popular’ to mean 10 and 100 upvotes (as well as the 50 upvote definition that we’ve been using).
Some HN submitters have a much better hit ratio than others when it comes to submitting popular articles.
Submitting good GitHub links seems like a hallmark of more effective submitters.
There’s not a lot of difference between more effective and less effective submitters when it comes to the time of day or day of the week they submit.
To be continued
Next week I’m going to keep going with this line of inquiry, but incorporate data about our specialty, semantic tags! It will be interesting to see what the most commonly upvoted (and ignored) topics are on HN.
I’ll also investigate whether or not long-time HN users with large amounts of karma have better submission ratios than average.
If you liked this post and want to start receiving your own custom newsletter that collects the best articles from HN, Twitter, and Reddit about the topics YOU choose, try Morning Brief today!
 There are a couple additional pieces of information you should know about this chart:
1. It only shows counts for users that have submitted more than 10 times over the last ~7 months, and
2. It only counts submissions that contain ‘articles’, and so excludes submissions that contain only video and audio, links to SaaS homepages, etc.
Thanks to David Baur for making some charting suggestions and identifying some data extraction issues prior to publication.