katt, sky-guided 🌈🌟
Follow

This seems like a good time for a PSA:

If in the future you see something on a public-facing webpage you want to make a durable record of for use as evidence, don't take a screenshot. Those are -- understandably -- widely considered too easy to fabricate.

Instead, snapshot the page with the Internet Archive. It'll log a timestamped copy of the page to their servers. Highly tamper-resistant.

archive.org/web/ ("save page now", bottom-right)

Yeah though, it's an understandable limitation that this only works for public-facing pages. I don't have a good solution offhand for non-public locales.

@starkatt
if you just want a backup and not something to present as unfakable evidence, you can probably use httrack.

i'm pretty sure there are also selenium based crawlers for this stuff, those might work better for JS heavy sites.

@starkatt Here's a bookmarklet that sends the URL of the current tab to the Archive:

javascript:location.href='web.archive.org/save/'+locatio

Fire and forget.

@drwho @starkatt unfortunately, the way mastodon collapses urls makes it impossible to copy this bookmarklet

@LogicalDash @drwho @starkatt

Good news—Mastodon actually does a little bit of fancy footwork to make it so that you *can* copy the URL even though it looks like you can't—right clicking "copy link" should still work. Here's the page explaining the HTML/CSS wizardry that makes that possible: github.com/tootsuite/documenta

@codesections @drwho @starkatt the bookmarklet is more than just the link tho, it includes the javascript: bit

@LogicalDash @drwho @starkatt

True. You'd have to type out the JavaScript bit, that's a fair point.

@starkatt though if the owner of the website adds a robots.txt, archive.org will retroactively apply it

@sir @starkatt didn’t they change their mind about that recently? I feel like I remember seeing some posts

@sir Not going forward.

"...A few months ago we stopped referring to robots.txt files on U.S. government and military web sites for both crawling and displaying web pages (though we respond to removal requests sent to info@archive.org). As we have moved towards broader access it has not caused problems, which we take as a good sign. We are now looking to do this more broadly...."

blog.archive.org/2017/04/17/ro

@starkatt

@starkatt I also just heard about Rhizome, who have something similar for archiving web pages. Not sure if compares.

@starkatt I'd also like to note that perma.cc is an excellent tool used by law libraries in this regard. (And I perma.cc all links in my papers because bitrot is a thing.)

@starkatt Selling points: 1) A 9 character URL, easily typeable from dead-tree. 2) It screenshots the page, so can archive *google docs* and other esoteric stuff that archive breaks on.

Downsides: it is limited to 10/mo unless your library subscribes.

@starkatt
Russian hackers faked bad Joy Reid blog posts and put them in the Internet Archive.

@starkatt until the owner of the web page adds a robots.txt which causes the internet archive to delete its cache.

@VamptVo @starkatt thanks for that information. That's really great. I never understood why they used robots.txt in the first place.

@starkatt Even better, the Archive has extensions for all major browsers IIRC, which make it even easier to do this. Plus, they detect 404s and can redirect you to archived versions of missing pages!

@starkatt Very useful, and I use this.

You might also make use of archive.is or archive.fo (both point at the same service), which can grab snapshots on demand.

Whilst you can submit a page to the Internet Archive, I'm not certain it will /update/ an archived page on demand.

@starkatt Caveat: the owner of the site can tell the Internet Archive to take it down by email or robots.txt and they will delete all their copies archive.org/about/faqs.php#2

@starkatt Counter PSA is that pages can also ask to NOT be a part of the archive :|
So do both.

Sign in to participate in the conversation
The Vulpine Club

The Vulpine Club is a friendly and welcoming community of foxes and their associates, friends, and fans! =^^=