← Back to PredictorResearch Paper

Attention Is All You Need:
What 4 Million Posts Reveal About Going Viral on HN & Product Hunt

Memvid Research Team
Memvid
contact@memvid.com
Abstract

Analyzed: 4,010,957 Hacker News and Product Hunt submissions (Oct 2006 to Jun 2023) with scores and timestamps. The score distribution is highly skewed: median score is 2, mean is 15.18, and 93.2% of posts remain below 50 points. The top 1% begins at 270 points (top 0.1% at 735). Keyword lift analysis on title unigrams (n >= 100, Welch tests) finds large positive lifts for a small set of niche terms (e.g., "youtube-dl" +1222%, p = 3.4e-05) and large negative lifts for non-English spam tokens (around -93%). Day-of-week effects are statistically significant but tiny (ANOVA F = 3.95, p = 0.003, eta^2 = 0.00047), while hour-of-day is not significant (p = 0.47). Title length correlates weakly with score (Pearson r = -0.017, Spearman r = 0.048, n = 100k); very short titles (0-19 chars) have the highest mean score (18.8). Format patterns show modest but significant shifts: "Show HN:" +12%, "Ask HN:" -16%, question titles -22%, and "Tell HN:" +209% with small n (3,884). Results are descriptive and reflect platform dynamics rather than causal effects.

1  Introduction

Hacker News (HN) and Product Hunt are high-signal platforms where launches compete for a small number of front-page slots. Scores are extremely heavy-tailed: a minority of posts receive the majority of attention. For builders, the question is not only what to build, but how to present it in a way that resonates with the community.

This paper provides a reproducible, data-driven look at HN story submissions. We quantify distributional properties, keyword lift, timing effects, and title structure using a dataset of 4,010,957 stories from Oct 2006 to Jun 2023. Our focus is on descriptive statistics with transparent significance testing.

We make the following contributions:

  • A cleaned summary of 4,010,957 HN stories with timestamps and scores
  • Keyword lift analysis for unigrams (n >= 100) with significance tests
  • Day-of-week and hour-of-day timing patterns in US Eastern time
  • Title length and format pattern analysis with confidence intervals
  • Year-over-year trend estimates for mean score evolution

2  Related Work

Prior work on online popularity focuses on social platforms and large-scale diffusion models. In this analysis we restrict ourselves to HN stories and cite only the data sources used to build the dataset. The intent is to provide a reproducible baseline rather than a new predictive model.

3  Dataset Construction

3.1  Data Collection

We downloaded the HuggingFace dataset "julien040/hacker-news-posts" (stories only) and exported it to CSV via download_hn.py. The dataset contains titles, URLs, scores, timestamps, comment counts, and submitter usernames.

3.2  Data Cleaning

We performed minimal cleaning. Rows with missing title, score, or timestamp were removed (none in this dataset). We did not deduplicate URLs or adjust scores for time-on-site.

StatisticValue
Total submissions4,010,957
Date rangeOct 2006 to Jun 2023
Unique submitters364,400
External links3,767,011 (93.9%)
Self posts243,946 (6.1%)
Mean score15.18
Median score2
Standard deviation61.09
Max score6,015
Mean comments7.34
Table 1: Dataset summary statistics.

3.3  Score Distribution

Scores are heavily right-skewed:

  • 79.9% of posts score 5 points or less
  • 85.8% score 10 points or less
  • 93.2% score under 50 points
  • 6.8% reach 50 points or more
  • 0.28% reach 500 points or more

We define "viral" as the top 1% of scores (>= 270 points), with top 5% at 76 points and top 0.1% at 735 points.

4  Methodology

4.1  Engagement Lift

For any feature F (keyword, time window, title pattern), we define lift as:

Lift(F)=xˉFxˉ¬Fxˉ¬F×100%\text{Lift}(F) = \frac{\bar{x}_F - \bar{x}_{\neg F}}{\bar{x}_{\neg F}} \times 100\%(1)

We compare titles that contain F against titles that do not using Welch's t-test and a normal approximation for p-values. We only report results with n >= 100 and p < 0.05.

4.2  Temporal Analysis

Timestamps are converted to America/New_York for day-of-week and hour-of-day comparisons. We evaluate timing effects with one-way ANOVA using a 50k sample and 300 permutations to estimate p-values, and report eta^2 as effect size.

4.3  Correlation and Bucketing

Title length correlation is measured with Pearson and Spearman coefficients using a random 100k sample. Length buckets are fixed by character count and reported with 95% confidence intervals for the mean.

5  Results: Lexical Features

5.1  High-Impact Keywords

Table 2 shows the strongest positive and negative lifts among unigrams with n >= 100 and p < 0.05. Negative tokens are dominated by non-English spam keywords; we show ASCII-only examples for readability.

KeywordMeanLiftnp
youtube-dl200.6+1222%1303.4e-05
turbotax135.0+790%1154.1e-05
ublock109.0+619%1745.7e-06
factorio93.5+516%1064.8e-04
sci-hub91.8+505%2652.6e-11
s2191.5+503%127<1e-16
chanel1.0-93%103<1e-16
een1.0-93%103<1e-16
voor1.0-93%111<1e-16
terbaru1.0-93%119<1e-16
kanker1.0-93%154<1e-16
obat1.0-93%587<1e-16
Table 2: Top keyword lifts (unigrams, n >= 100, p < 0.05). Mean is average score.

High-lift tokens are often specific tools, products, or batch tags (e.g., S21). Strong negative tokens are concentrated in non-English spam-like titles, which likely receive fewer votes rather than being penalized causally by the words themselves.

6  Results: Temporal Patterns

6.1  Day-of-Week Effects

Weekend posts have higher mean scores, but the effect size is very small (ANOVA F = 3.95, p = 0.003, eta^2 = 0.00047).

DayPostsMeanLift
Monday645,47514.9-1.8%
Tuesday690,74614.4-5.1%
Wednesday682,89514.5-4.6%
Thursday668,10714.4-4.8%
Friday585,55614.6-3.5%
Saturday363,64117.3+13.8%
Sunday374,53718.4+21.5%
Table 3: Engagement by day of week (America/New_York).

6.2  Hour-of-Day Effects

Hour-of-day differences are not statistically significant (ANOVA p = 0.47). The best hour by mean score is 07:00 ET (mean 16.9), while the lowest is 02:00 ET (mean 13.9), but the effect size is negligible.

Hour (ET)PostsMeanLift
07:00 (highest)156,24516.9+11.1%
02:00 (lowest)106,27013.9-8.5%
Table 4: Hour-of-day extremes (descriptive only; overall test not significant).

7  Results: Title Structure

7.1  Length Effects

Title length has a weak relationship with score (Pearson r = -0.017, Spearman r = 0.048, n = 100k). Short titles (0-19 characters) have the highest mean score, while very long titles are rare and tend to underperform.

LengthMeanLiftn
0-19 chars18.8+24.1%232,743
20-39 chars15.9+4.9%1,098,009
40-59 chars14.1-7.2%1,489,189
60-79 chars15.2+0.1%1,119,888
80-99 chars14.2-6.4%70,607
100-119 chars5.7-62.2%379
120-139 chars4.8-68.1%106
140+ chars11.0-27.7%36
Table 5: Engagement by title length bucket (characters).

7.2  Format Patterns

FormatMeanLiftnp
Show HN:17.0+12.0%118,813<1e-16
Ask HN:12.8-15.8%158,838<1e-16
Tell HN:46.9+209.3%3,884<1e-16
Question mark11.9-21.6%386,965<1e-16
Table 6: Engagement by title format pattern.

8  Predictive Model

No supervised prediction model is trained in this repository, so we do not report accuracy metrics. This analysis is descriptive and intended to summarize empirical patterns.

9  Discussion

Several effects are statistically significant but practically small. Weekend posting shows a measurable lift, yet the effect size (eta^2) is near zero. Hour-of-day differences are not significant in aggregate.

Title length has a weak relationship with score. Very short titles slightly outperform the mean, while very long titles are rare and underperform, but correlations are close to zero.

Format signals matter: "Show HN:" is modestly positive, while "Ask HN:" and question titles trend lower. "Tell HN:" has a large lift but small sample size. External links also score slightly higher than self posts (mean 15.39 vs 11.91).

Year-over-year mean scores rise roughly one point per year (Pearson r = 0.98 across yearly means), indicating platform growth. Cross-year comparisons should consider this drift.

10  Limitations

Correlation is not causation. Keyword effects may capture topical differences rather than causal boosts. We do not control for author reputation or submission quality.

Snapshot scores. The dataset reflects scores at collection time, not necessarily final scores, and does not include comment velocity or moderation effects.

Multiple testing. We test many keywords. Although we require n >= 100 and p < 0.05, false discoveries remain possible without formal correction.

11  Conclusion

Across 4,010,957 HN stories, we observe a heavy-tailed score distribution with a small fraction of posts capturing most attention. Keyword choices, format patterns, and day-of-week timing show measurable but modest effects, while title length and hour-of-day are weak predictors. Presentation matters, yet the largest driver of success remains underlying content quality.

References

[1] HuggingFace dataset: julien040/hacker-news-posts.

[2] Hacker News API: github.com/HackerNews/API.

Code & Data: github.com/memvid/memvid

© 2025 Memvid