This seems like a good time for a PSA:
If in the future you see something on a public-facing webpage you want to make a durable record of for use as evidence, don't take a screenshot. Those are -- understandably -- widely considered too easy to fabricate.
Instead, snapshot the page with the Internet Archive. It'll log a timestamped copy of the page to their servers. Highly tamper-resistant.
https://archive.org/web/ ("save page now", bottom-right)
if you just want a backup and not something to present as unfakable evidence, you can probably use httrack.
i'm pretty sure there are also selenium based crawlers for this stuff, those might work better for JS heavy sites.
@starkatt though if the owner of the website adds a robots.txt, archive.org will retroactively apply it
@sir Oh, that's a good caveat, thank you!
@sir Not going forward.
"...A few months ago we stopped referring to robots.txt files on U.S. government and military web sites for both crawling and displaying web pages (though we respond to removal requests sent to email@example.com). As we have moved towards broader access it has not caused problems, which we take as a good sign. We are now looking to do this more broadly...."
@starkatt I also just heard about Rhizome, who have something similar for archiving web pages. Not sure if compares.
Russian hackers faked bad Joy Reid blog posts and put them in the Internet Archive.
@starkatt until the owner of the web page adds a robots.txt which causes the internet archive to delete its cache.
@starkatt Even better, the Archive has extensions for all major browsers IIRC, which make it even easier to do this. Plus, they detect 404s and can redirect you to archived versions of missing pages!
@starkatt Counter PSA is that pages can also ask to NOT be a part of the archive :|
So do both.
The Vulpine Club is a friendly and welcoming community of foxes and their associates, friends, and fans! =^^=