Attention Is All You Need:
What 4 Million Posts Reveal About Going Viral on HN & Product Hunt

Memvid Research Team

Memvid
contact@memvid.com

Abstract

Analyzed: 4,010,957 Hacker News and Product Hunt submissions (Oct 2006 to Jun 2023) with scores and timestamps. The score distribution is highly skewed: median score is 2, mean is 15.18, and 93.2% of posts remain below 50 points. The top 1% begins at 270 points (top 0.1% at 735). Keyword lift analysis on title unigrams (n >= 100, Welch tests) finds large positive lifts for a small set of niche terms (e.g., "youtube-dl" +1222%, p = 3.4e-05) and large negative lifts for non-English spam tokens (around -93%). Day-of-week effects are statistically significant but tiny (ANOVA F = 3.95, p = 0.003, eta^2 = 0.00047), while hour-of-day is not significant (p = 0.47). Title length correlates weakly with score (Pearson r = -0.017, Spearman r = 0.048, n = 100k); very short titles (0-19 chars) have the highest mean score (18.8). Format patterns show modest but significant shifts: "Show HN:" +12%, "Ask HN:" -16%, question titles -22%, and "Tell HN:" +209% with small n (3,884). Results are descriptive and reflect platform dynamics rather than causal effects.

1 Introduction

Hacker News (HN) and Product Hunt are high-signal platforms where launches compete for a small number of front-page slots. Scores are extremely heavy-tailed: a minority of posts receive the majority of attention. For builders, the question is not only what to build, but how to present it in a way that resonates with the community.

This paper provides a reproducible, data-driven look at HN story submissions. We quantify distributional properties, keyword lift, timing effects, and title structure using a dataset of 4,010,957 stories from Oct 2006 to Jun 2023. Our focus is on descriptive statistics with transparent significance testing.

We make the following contributions:

A cleaned summary of 4,010,957 HN stories with timestamps and scores
Keyword lift analysis for unigrams (n >= 100) with significance tests
Day-of-week and hour-of-day timing patterns in US Eastern time
Title length and format pattern analysis with confidence intervals
Year-over-year trend estimates for mean score evolution

2 Related Work

Prior work on online popularity focuses on social platforms and large-scale diffusion models. In this analysis we restrict ourselves to HN stories and cite only the data sources used to build the dataset. The intent is to provide a reproducible baseline rather than a new predictive model.

3 Dataset Construction

3.1 Data Collection

We downloaded the HuggingFace dataset "julien040/hacker-news-posts" (stories only) and exported it to CSV via download_hn.py. The dataset contains titles, URLs, scores, timestamps, comment counts, and submitter usernames.

3.2 Data Cleaning

We performed minimal cleaning. Rows with missing title, score, or timestamp were removed (none in this dataset). We did not deduplicate URLs or adjust scores for time-on-site.

Statistic	Value
Total submissions	4,010,957
Date range	Oct 2006 to Jun 2023
Unique submitters	364,400
External links	3,767,011 (93.9%)
Self posts	243,946 (6.1%)
Mean score	15.18
Median score	2
Standard deviation	61.09
Max score	6,015
Mean comments	7.34

Table 1: Dataset summary statistics.

3.3 Score Distribution

Scores are heavily right-skewed:

79.9% of posts score 5 points or less
85.8% score 10 points or less
93.2% score under 50 points
6.8% reach 50 points or more
0.28% reach 500 points or more

We define "viral" as the top 1% of scores (>= 270 points), with top 5% at 76 points and top 0.1% at 735 points.

4 Methodology

4.1 Engagement Lift

For any feature F (keyword, time window, title pattern), we define lift as:

\text{Lift}(F) = \frac{\bar{x}_F - \bar{x}_{\neg F}}{\bar{x}_{\neg F}} \times 100\%

(1)

We compare titles that contain F against titles that do not using Welch's t-test and a normal approximation for p-values. We only report results with n >= 100 and p < 0.05.

4.2 Temporal Analysis

Timestamps are converted to America/New_York for day-of-week and hour-of-day comparisons. We evaluate timing effects with one-way ANOVA using a 50k sample and 300 permutations to estimate p-values, and report eta^2 as effect size.

4.3 Correlation and Bucketing

Title length correlation is measured with Pearson and Spearman coefficients using a random 100k sample. Length buckets are fixed by character count and reported with 95% confidence intervals for the mean.

5 Results: Lexical Features

5.1 High-Impact Keywords

Table 2 shows the strongest positive and negative lifts among unigrams with n >= 100 and p < 0.05. Negative tokens are dominated by non-English spam keywords; we show ASCII-only examples for readability.

Keyword	Mean	Lift	n	p
youtube-dl	200.6	+1222%	130	3.4e-05
turbotax	135.0	+790%	115	4.1e-05
ublock	109.0	+619%	174	5.7e-06
factorio	93.5	+516%	106	4.8e-04
sci-hub	91.8	+505%	265	2.6e-11
s21	91.5	+503%	127	<1e-16
chanel	1.0	-93%	103	<1e-16
een	1.0	-93%	103	<1e-16
voor	1.0	-93%	111	<1e-16
terbaru	1.0	-93%	119	<1e-16
kanker	1.0	-93%	154	<1e-16
obat	1.0	-93%	587	<1e-16

Table 2: Top keyword lifts (unigrams, n >= 100, p < 0.05). Mean is average score.

High-lift tokens are often specific tools, products, or batch tags (e.g., S21). Strong negative tokens are concentrated in non-English spam-like titles, which likely receive fewer votes rather than being penalized causally by the words themselves.

6 Results: Temporal Patterns

6.1 Day-of-Week Effects

Weekend posts have higher mean scores, but the effect size is very small (ANOVA F = 3.95, p = 0.003, eta^2 = 0.00047).

Day	Posts	Mean	Lift
Monday	645,475	14.9	-1.8%
Tuesday	690,746	14.4	-5.1%
Wednesday	682,895	14.5	-4.6%
Thursday	668,107	14.4	-4.8%
Friday	585,556	14.6	-3.5%
Saturday	363,641	17.3	+13.8%
Sunday	374,537	18.4	+21.5%

Table 3: Engagement by day of week (America/New_York).

6.2 Hour-of-Day Effects

Hour-of-day differences are not statistically significant (ANOVA p = 0.47). The best hour by mean score is 07:00 ET (mean 16.9), while the lowest is 02:00 ET (mean 13.9), but the effect size is negligible.

Hour (ET)	Posts	Mean	Lift
07:00 (highest)	156,245	16.9	+11.1%
02:00 (lowest)	106,270	13.9	-8.5%

Table 4: Hour-of-day extremes (descriptive only; overall test not significant).

7 Results: Title Structure

7.1 Length Effects

Title length has a weak relationship with score (Pearson r = -0.017, Spearman r = 0.048, n = 100k). Short titles (0-19 characters) have the highest mean score, while very long titles are rare and tend to underperform.

Length	Mean	Lift	n
0-19 chars	18.8	+24.1%	232,743
20-39 chars	15.9	+4.9%	1,098,009
40-59 chars	14.1	-7.2%	1,489,189
60-79 chars	15.2	+0.1%	1,119,888
80-99 chars	14.2	-6.4%	70,607
100-119 chars	5.7	-62.2%	379
120-139 chars	4.8	-68.1%	106
140+ chars	11.0	-27.7%	36

Table 5: Engagement by title length bucket (characters).

7.2 Format Patterns

Format	Mean	Lift	n	p
Show HN:	17.0	+12.0%	118,813	<1e-16
Ask HN:	12.8	-15.8%	158,838	<1e-16
Tell HN:	46.9	+209.3%	3,884	<1e-16
Question mark	11.9	-21.6%	386,965	<1e-16

Table 6: Engagement by title format pattern.

8 Predictive Model

No supervised prediction model is trained in this repository, so we do not report accuracy metrics. This analysis is descriptive and intended to summarize empirical patterns.

9 Discussion

Several effects are statistically significant but practically small. Weekend posting shows a measurable lift, yet the effect size (eta^2) is near zero. Hour-of-day differences are not significant in aggregate.

Title length has a weak relationship with score. Very short titles slightly outperform the mean, while very long titles are rare and underperform, but correlations are close to zero.

Format signals matter: "Show HN:" is modestly positive, while "Ask HN:" and question titles trend lower. "Tell HN:" has a large lift but small sample size. External links also score slightly higher than self posts (mean 15.39 vs 11.91).

Year-over-year mean scores rise roughly one point per year (Pearson r = 0.98 across yearly means), indicating platform growth. Cross-year comparisons should consider this drift.

10 Limitations

Correlation is not causation. Keyword effects may capture topical differences rather than causal boosts. We do not control for author reputation or submission quality.

Snapshot scores. The dataset reflects scores at collection time, not necessarily final scores, and does not include comment velocity or moderation effects.

Multiple testing. We test many keywords. Although we require n >= 100 and p < 0.05, false discoveries remain possible without formal correction.

11 Conclusion

Across 4,010,957 HN stories, we observe a heavy-tailed score distribution with a small fraction of posts capturing most attention. Keyword choices, format patterns, and day-of-week timing show measurable but modest effects, while title length and hour-of-day are weak predictors. Presentation matters, yet the largest driver of success remains underlying content quality.

References

[1] HuggingFace dataset: julien040/hacker-news-posts.

[2] Hacker News API: github.com/HackerNews/API.

Code & Data: github.com/memvid/memvid

Attention Is All You Need:What 4 Million Posts Reveal About Going Viral on HN & Product Hunt