Attention Is All You Need:
What 4 Million Posts Reveal About Going Viral on HN & Product Hunt
contact@memvid.com
Analyzed: 4,010,957 Hacker News and Product Hunt submissions (Oct 2006 to Jun 2023) with scores and timestamps. The score distribution is highly skewed: median score is 2, mean is 15.18, and 93.2% of posts remain below 50 points. The top 1% begins at 270 points (top 0.1% at 735). Keyword lift analysis on title unigrams (n >= 100, Welch tests) finds large positive lifts for a small set of niche terms (e.g., "youtube-dl" +1222%, p = 3.4e-05) and large negative lifts for non-English spam tokens (around -93%). Day-of-week effects are statistically significant but tiny (ANOVA F = 3.95, p = 0.003, eta^2 = 0.00047), while hour-of-day is not significant (p = 0.47). Title length correlates weakly with score (Pearson r = -0.017, Spearman r = 0.048, n = 100k); very short titles (0-19 chars) have the highest mean score (18.8). Format patterns show modest but significant shifts: "Show HN:" +12%, "Ask HN:" -16%, question titles -22%, and "Tell HN:" +209% with small n (3,884). Results are descriptive and reflect platform dynamics rather than causal effects.
1 Introduction
Hacker News (HN) and Product Hunt are high-signal platforms where launches compete for a small number of front-page slots. Scores are extremely heavy-tailed: a minority of posts receive the majority of attention. For builders, the question is not only what to build, but how to present it in a way that resonates with the community.
This paper provides a reproducible, data-driven look at HN story submissions. We quantify distributional properties, keyword lift, timing effects, and title structure using a dataset of 4,010,957 stories from Oct 2006 to Jun 2023. Our focus is on descriptive statistics with transparent significance testing.
We make the following contributions:
- A cleaned summary of 4,010,957 HN stories with timestamps and scores
- Keyword lift analysis for unigrams (n >= 100) with significance tests
- Day-of-week and hour-of-day timing patterns in US Eastern time
- Title length and format pattern analysis with confidence intervals
- Year-over-year trend estimates for mean score evolution
2 Related Work
Prior work on online popularity focuses on social platforms and large-scale diffusion models. In this analysis we restrict ourselves to HN stories and cite only the data sources used to build the dataset. The intent is to provide a reproducible baseline rather than a new predictive model.
3 Dataset Construction
3.1 Data Collection
We downloaded the HuggingFace dataset "julien040/hacker-news-posts" (stories only) and exported it to CSV via download_hn.py. The dataset contains titles, URLs, scores, timestamps, comment counts, and submitter usernames.
3.2 Data Cleaning
We performed minimal cleaning. Rows with missing title, score, or timestamp were removed (none in this dataset). We did not deduplicate URLs or adjust scores for time-on-site.
| Statistic | Value |
|---|---|
| Total submissions | 4,010,957 |
| Date range | Oct 2006 to Jun 2023 |
| Unique submitters | 364,400 |
| External links | 3,767,011 (93.9%) |
| Self posts | 243,946 (6.1%) |
| Mean score | 15.18 |
| Median score | 2 |
| Standard deviation | 61.09 |
| Max score | 6,015 |
| Mean comments | 7.34 |
3.3 Score Distribution
Scores are heavily right-skewed:
- 79.9% of posts score 5 points or less
- 85.8% score 10 points or less
- 93.2% score under 50 points
- 6.8% reach 50 points or more
- 0.28% reach 500 points or more
We define "viral" as the top 1% of scores (>= 270 points), with top 5% at 76 points and top 0.1% at 735 points.
4 Methodology
4.1 Engagement Lift
For any feature F (keyword, time window, title pattern), we define lift as:
We compare titles that contain F against titles that do not using Welch's t-test and a normal approximation for p-values. We only report results with n >= 100 and p < 0.05.
4.2 Temporal Analysis
Timestamps are converted to America/New_York for day-of-week and hour-of-day comparisons. We evaluate timing effects with one-way ANOVA using a 50k sample and 300 permutations to estimate p-values, and report eta^2 as effect size.
4.3 Correlation and Bucketing
Title length correlation is measured with Pearson and Spearman coefficients using a random 100k sample. Length buckets are fixed by character count and reported with 95% confidence intervals for the mean.
5 Results: Lexical Features
5.1 High-Impact Keywords
Table 2 shows the strongest positive and negative lifts among unigrams with n >= 100 and p < 0.05. Negative tokens are dominated by non-English spam keywords; we show ASCII-only examples for readability.
| Keyword | Mean | Lift | n | p |
|---|---|---|---|---|
| youtube-dl | 200.6 | +1222% | 130 | 3.4e-05 |
| turbotax | 135.0 | +790% | 115 | 4.1e-05 |
| ublock | 109.0 | +619% | 174 | 5.7e-06 |
| factorio | 93.5 | +516% | 106 | 4.8e-04 |
| sci-hub | 91.8 | +505% | 265 | 2.6e-11 |
| s21 | 91.5 | +503% | 127 | <1e-16 |
| chanel | 1.0 | -93% | 103 | <1e-16 |
| een | 1.0 | -93% | 103 | <1e-16 |
| voor | 1.0 | -93% | 111 | <1e-16 |
| terbaru | 1.0 | -93% | 119 | <1e-16 |
| kanker | 1.0 | -93% | 154 | <1e-16 |
| obat | 1.0 | -93% | 587 | <1e-16 |
High-lift tokens are often specific tools, products, or batch tags (e.g., S21). Strong negative tokens are concentrated in non-English spam-like titles, which likely receive fewer votes rather than being penalized causally by the words themselves.
6 Results: Temporal Patterns
6.1 Day-of-Week Effects
Weekend posts have higher mean scores, but the effect size is very small (ANOVA F = 3.95, p = 0.003, eta^2 = 0.00047).
| Day | Posts | Mean | Lift |
|---|---|---|---|
| Monday | 645,475 | 14.9 | -1.8% |
| Tuesday | 690,746 | 14.4 | -5.1% |
| Wednesday | 682,895 | 14.5 | -4.6% |
| Thursday | 668,107 | 14.4 | -4.8% |
| Friday | 585,556 | 14.6 | -3.5% |
| Saturday | 363,641 | 17.3 | +13.8% |
| Sunday | 374,537 | 18.4 | +21.5% |
6.2 Hour-of-Day Effects
Hour-of-day differences are not statistically significant (ANOVA p = 0.47). The best hour by mean score is 07:00 ET (mean 16.9), while the lowest is 02:00 ET (mean 13.9), but the effect size is negligible.
| Hour (ET) | Posts | Mean | Lift |
|---|---|---|---|
| 07:00 (highest) | 156,245 | 16.9 | +11.1% |
| 02:00 (lowest) | 106,270 | 13.9 | -8.5% |
7 Results: Title Structure
7.1 Length Effects
Title length has a weak relationship with score (Pearson r = -0.017, Spearman r = 0.048, n = 100k). Short titles (0-19 characters) have the highest mean score, while very long titles are rare and tend to underperform.
| Length | Mean | Lift | n |
|---|---|---|---|
| 0-19 chars | 18.8 | +24.1% | 232,743 |
| 20-39 chars | 15.9 | +4.9% | 1,098,009 |
| 40-59 chars | 14.1 | -7.2% | 1,489,189 |
| 60-79 chars | 15.2 | +0.1% | 1,119,888 |
| 80-99 chars | 14.2 | -6.4% | 70,607 |
| 100-119 chars | 5.7 | -62.2% | 379 |
| 120-139 chars | 4.8 | -68.1% | 106 |
| 140+ chars | 11.0 | -27.7% | 36 |
7.2 Format Patterns
| Format | Mean | Lift | n | p |
|---|---|---|---|---|
| Show HN: | 17.0 | +12.0% | 118,813 | <1e-16 |
| Ask HN: | 12.8 | -15.8% | 158,838 | <1e-16 |
| Tell HN: | 46.9 | +209.3% | 3,884 | <1e-16 |
| Question mark | 11.9 | -21.6% | 386,965 | <1e-16 |
8 Predictive Model
No supervised prediction model is trained in this repository, so we do not report accuracy metrics. This analysis is descriptive and intended to summarize empirical patterns.
9 Discussion
Several effects are statistically significant but practically small. Weekend posting shows a measurable lift, yet the effect size (eta^2) is near zero. Hour-of-day differences are not significant in aggregate.
Title length has a weak relationship with score. Very short titles slightly outperform the mean, while very long titles are rare and underperform, but correlations are close to zero.
Format signals matter: "Show HN:" is modestly positive, while "Ask HN:" and question titles trend lower. "Tell HN:" has a large lift but small sample size. External links also score slightly higher than self posts (mean 15.39 vs 11.91).
Year-over-year mean scores rise roughly one point per year (Pearson r = 0.98 across yearly means), indicating platform growth. Cross-year comparisons should consider this drift.
10 Limitations
Correlation is not causation. Keyword effects may capture topical differences rather than causal boosts. We do not control for author reputation or submission quality.
Snapshot scores. The dataset reflects scores at collection time, not necessarily final scores, and does not include comment velocity or moderation effects.
Multiple testing. We test many keywords. Although we require n >= 100 and p < 0.05, false discoveries remain possible without formal correction.
11 Conclusion
Across 4,010,957 HN stories, we observe a heavy-tailed score distribution with a small fraction of posts capturing most attention. Keyword choices, format patterns, and day-of-week timing show measurable but modest effects, while title length and hour-of-day are weak predictors. Presentation matters, yet the largest driver of success remains underlying content quality.
References
[1] HuggingFace dataset: julien040/hacker-news-posts.
[2] Hacker News API: github.com/HackerNews/API.
Code & Data: github.com/memvid/memvid
© 2025 Memvid