In this post I’ve investigated the engagement and retention of active users for Hacker News. Hacker News ( http://news.ycombinator.com) is a user driven link sharing and commentary site with a bent towards technology and issues of interest to nerds. Officially, “anything that gratifies one’s intellectual curiosity” is on-topic for discussion. There is an open API that allows the public to easily interact with the underlying public data from the site https://github.com/HackerNews/API. The data for all items (posts, comments, and a few other types of objects) are also scraped into BigQuery so you don’t have to go through the trouble of enumerating all the item IDs yourself and sending a few million GET requests to firebase. You can just query it directly in BigQuery or export the data and query it in the tool of your choice (in my case Clickhouse + R/ggplot2/plotly for visualization).
Paul Graham, aka pg, the founder of the site, noted over 10 years ago that he was concerned with the dilution of the character of the site that is associated with growth in users and traffic1. The data indicates that the community is very seasoned and the dilution of Hacker News is likely an illusion as new users are NOT over-running the site very rapidly.
In February 2020 there were 38,335 distinct users2 who engaged with the site publicly and they had been active users for a mean tenure of 48.8 months, with a median tenure of 41 months. The median active HN User has over 3.5 years of tenure, which strongly suggests that the community experiences much slower drift away from the original HN ethos than what might otherwise be expected with such a popular site. The average tenure of active users has steadily risen over time, which is driven by the impressive retention of active users over long periods of time. As more and more users have become engaged in the community, users from older cohorts have not fled the site but have continued to contribute submissions and comments.
Alex Schultz (VP Growth Marketing, Analytics and i18n at Facebook) describes a condition to validate product market fit based on user retention: If a plot of % of engaged users against acquisition age asymptotes above the X-axis then you have product market fit at least for a subset of users and a viable business3.
Based on the above definition: Hacker News has product market fit for contributing users based on their long term retention. Even as some cohorts of users have been on the site for more than ten years, there are still users from that cohort who are active in story submission or commenting! In the above figure the % active doesn’t drop to 0, except for the very oldest cohort (pg and some other early testers [n very small]). Where this asymptote line lies above the X-axis is a function of the nature of the business. For something transactional like AirBnB, one wouldn’t expect that users have a good reason to rent accommodation every month, and they might instead be looking for annual engagement from users as a sign that they are doing a good job retaining users. For a different type of service like Twitter they may be looking for multiple engagements per day per user. For Hacker News data, only public engagement data is available, not logged in views, or up/downvoting.
The increasing tenure of users could suggest that acquisition of new users is relatively weak or slowing over time rather than being driven predominately by retention. However, it appears that there is a fairly strong consistent acquisition of new active users.
The excellent long term retention of active users, combined with a steady stream of new members of the community, has led to significant growth in the total number of contributors to HN.
In order to understand retention better we can look at the activity of each cohort in each month. The triangular figures below show one cell for each monthly cohort (x-axis) and monthly date pair for the relevant statistic. The bottom diagonal in these figures shows the initial month for a cohort of users where all users in a cohort are by definition active. These plots allow for assessment of the consistentcy of trends in retention across time for various cohorts. Each parallel diagonal line represents cohorts of the same age. By looking at all the cohort-time data together trends based upon cohort age, as well as the actual calendar month can be identified. Horizontal bands evident in the figure indicate some sort of monthly effect where in a particular month all cohorts did relatively worse or better (the site was overall up or down in activity). Vertical lines indicate differing behavior by cohort, where some cohorts may have been particularly strong or weak relativel to adjacent cohorts.
We can see that outside of some of the very early cohorts, the number of active users in each cohort drops rapidly in the first few months but stays consistently above 0. The above figure shows the fraction of users in each cohort that are active in each month, so the diagonal line is 1.00 for each cohort as all users are active in the first month (by definition of the cohort). The top horizontal line of pixels shows the same data as a previous figure in this post.
On a month by month basis, we can determine the relative impact of each cohort to the total user population.
There is a noticeable impact of brand new users each month, but as the site has aged, the contribution of a large number of cohorts of users is reflected in the user population and no cohort of users tends to overly dominate. We can also see this directly by looking at the absolute numbers of users from each cohort active each month.
Impact of Power Users
The analysis presented so far has analyzed only the population of active users (those engaging with the site through comments, submissions, etc) and has treated them as a population of users, giving each active user in a month equal weight. On social sites, even among contributors it is likely that there is a broad range of contribution intensity. Some users may make many comments, others few.
By the nature of Hacker News, unlike Reddit, Facebook, Twitter or Stack Overflow, there is only one front page of stories at a time attracting the vast majority of attention of the user base. This means there are not many different sub-niches, but instead there is only one more cohesive overall community that users engage in. This suggests that power users may dominate the entire site, rather than only a particular niche or topic.
After filtering out dead comments, (as well as some basic data cleaning which hopefully accounts for spam that has been deeply removed) ~1% of users who ever posted a comment (4704 users out of 415641 users in my dataset) are responsible for 50% of the total comments on Hacker News over all time having posted between 722 and more than 50,000 comments per user.
For comments in February 2020 (the last full month available in my dataset) the most active commenters also dominated overall comment volume, but the effect is less severe with the top ~6.5% of commenters (top 2059 out of 31380 users who posted between 21 to 629 comments each) collectively responsible for 50% of the total comment volume for the month.
Rather than analyzing the population of users directly, instead we can look at an activity weighted view and demonstrate the impact of long tenured power users. The average comment on HN in the last month was written by an account with 5 years of tenure!
The quality engagement of Hacker News and retention of active users can be seen by understanding how often users engage with the site. For all users who engaged at least once in the last twelve months with account age 12 months or older we can calculate the number of distinct months that they engaged with the site. In examining this graph we see an initially decreasing trend, and then an increasing trend for the number of users with 11 and 12 months active in the last year. Having an upward tick at the end of such a graph is a sign of great product retention, indicating that there is a steady population of users consistently returning and engaging with your product.
As a service run by one of the premier start up investors in the world, it is perhaps not surprising that HN has excellent metrics for growth and retention. It did surprise me that the retention was so strong over such long periods of time. Perhaps I shouldn’t have been surprised as I’ve read the site for at least 8 years and have no plans on stopping any time soon!
Thanks to Rachel, Nathan, and Daniel for reading early versions of this post.
Based on the available data we are technically tracking accounts not people as it is common practice on Hacker News for one to potentially use multiple accounts such as using throwaways to preserve anonymity.↩