Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse

isurg

  1. Home
  2. Selfhosted
  3. Self-host Reddit – 2.38B posts, works offline, yours forever

Self-host Reddit – 2.38B posts, works offline, yours forever

Scheduled Pinned Locked Moved Selfhosted
selfhosted
2 Posts 2 Posters 0 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • 19_84@lemmy.dbzer0.com1 This user is from outside of this forum
    19_84@lemmy.dbzer0.com1 This user is from outside of this forum
    19_84@lemmy.dbzer0.com
    wrote last edited by
    #1

    Reddit's API is effectively dead for archival. Third-party apps are gone. Reddit has threatened to cut off access to the Pushshift dataset multiple times. But 3.28TB of Reddit history exists as a torrent right now, and I built a tool to turn it into something you can browse on your own hardware.

    The key point: This doesn't touch Reddit's servers. Ever. Download the Pushshift dataset, run my tool locally, get a fully browsable archive. Works on an air-gapped machine. Works on a Raspberry Pi serving your LAN. Works on a USB drive you hand to someone.

    What it does: Takes compressed data dumps from Reddit (.zst), Voat (SQL), and Ruqqus (.7z) and generates static HTML. No JavaScript, no external requests, no tracking. Open index.html and browse. Want search? Run the optional Docker stack with PostgreSQL – still entirely on your machine.

    API & AI Integration: Full REST API with 30+ endpoints – posts, comments, users, subreddits, full-text search, aggregations. Also ships with an MCP server (29 tools) so you can query your archive directly from AI tools.

    Self-hosting options:

    • USB drive / local folder (just open the HTML files)
    • Home server on your LAN
    • Tor hidden service (2 commands, no port forwarding needed)
    • VPS with HTTPS
    • GitHub Pages for small archives

    Why this matters: Once you have the data, you own it. No API keys, no rate limits, no ToS changes can take it away.

    Scale: Tens of millions of posts per instance. PostgreSQL backend keeps memory constant regardless of dataset size. For the full 2.38B post dataset, run multiple instances by topic.

    How I built it: Python, PostgreSQL, Jinja2 templates, Docker. Used Claude Code throughout as an experiment in AI-assisted development. Learned that the workflow is "trust but verify" – it accelerates the boring parts but you still own the architecture.

    Live demo: https://online-archives.github.io/redd-archiver-example/
    GitHub: https://github.com/19-84/redd-archiver (Public Domain)

    Pushshift torrent: https://academictorrents.com/details/1614740ac8c94505e4ecb9d88be8bed7b6afddd4

    Link Preview Image
    GitHub - 19-84/redd-archiver: A PostgreSQL-backed archive generator that creates browsable HTML archives from link aggregator platforms including Reddit, Voat, and Ruqqus.

    A PostgreSQL-backed archive generator that creates browsable HTML archives from link aggregator platforms including Reddit, Voat, and Ruqqus. - 19-84/redd-archiver

    favicon

    GitHub (github.com)

    medicpigbabysaver@lemmy.worldM 1 Reply Last reply
    0
    • 19_84@lemmy.dbzer0.com1 19_84@lemmy.dbzer0.com

      Reddit's API is effectively dead for archival. Third-party apps are gone. Reddit has threatened to cut off access to the Pushshift dataset multiple times. But 3.28TB of Reddit history exists as a torrent right now, and I built a tool to turn it into something you can browse on your own hardware.

      The key point: This doesn't touch Reddit's servers. Ever. Download the Pushshift dataset, run my tool locally, get a fully browsable archive. Works on an air-gapped machine. Works on a Raspberry Pi serving your LAN. Works on a USB drive you hand to someone.

      What it does: Takes compressed data dumps from Reddit (.zst), Voat (SQL), and Ruqqus (.7z) and generates static HTML. No JavaScript, no external requests, no tracking. Open index.html and browse. Want search? Run the optional Docker stack with PostgreSQL – still entirely on your machine.

      API & AI Integration: Full REST API with 30+ endpoints – posts, comments, users, subreddits, full-text search, aggregations. Also ships with an MCP server (29 tools) so you can query your archive directly from AI tools.

      Self-hosting options:

      • USB drive / local folder (just open the HTML files)
      • Home server on your LAN
      • Tor hidden service (2 commands, no port forwarding needed)
      • VPS with HTTPS
      • GitHub Pages for small archives

      Why this matters: Once you have the data, you own it. No API keys, no rate limits, no ToS changes can take it away.

      Scale: Tens of millions of posts per instance. PostgreSQL backend keeps memory constant regardless of dataset size. For the full 2.38B post dataset, run multiple instances by topic.

      How I built it: Python, PostgreSQL, Jinja2 templates, Docker. Used Claude Code throughout as an experiment in AI-assisted development. Learned that the workflow is "trust but verify" – it accelerates the boring parts but you still own the architecture.

      Live demo: https://online-archives.github.io/redd-archiver-example/
      GitHub: https://github.com/19-84/redd-archiver (Public Domain)

      Pushshift torrent: https://academictorrents.com/details/1614740ac8c94505e4ecb9d88be8bed7b6afddd4

      Link Preview Image
      GitHub - 19-84/redd-archiver: A PostgreSQL-backed archive generator that creates browsable HTML archives from link aggregator platforms including Reddit, Voat, and Ruqqus.

      A PostgreSQL-backed archive generator that creates browsable HTML archives from link aggregator platforms including Reddit, Voat, and Ruqqus. - 19-84/redd-archiver

      favicon

      GitHub (github.com)

      medicpigbabysaver@lemmy.worldM This user is from outside of this forum
      medicpigbabysaver@lemmy.worldM This user is from outside of this forum
      medicpigbabysaver@lemmy.world
      wrote last edited by
      #2

      Fuck Reddit and Fuck Spez.

      1 Reply Last reply
      0
      Reply
      • Reply as topic
      Log in to reply
      • Oldest to Newest
      • Newest to Oldest
      • Most Votes


      • Login

      • Don't have an account? Register

      • Login or register to search.
      • First post
        Last post
      0
      • Categories
      • Recent
      • Tags
      • Popular
      • World
      • Users
      • Groups