48cd8e No.318649
ARCHIVE.IS MAY BE COMPROMISED, OR NEARLY COMPROMISED.
>All 8chan archive links were pulled off the site for a few hours today.
>During that time, we exposed a severe vulnerability in relying on Archive.is.
>The cited reason for the pulldown of the site was archived pages containing CP keep showing up.
>After much arguing and drama, the 8chan archives were reinstated.
BUT WE DON'T KNOW HOW LONG THAT WILL BE THE CASE!
http://s000.tinyupload.com/index.php?file_id=07149261893554013542
Contains a small script I wrote that will download a local copy of all the /v/ #GG thread archives. Use it to download your own backup copy.
I want to do something similar for /gamergate/ and GGHQ, for Deepfreeze links, and Wiki citations. We can worry about hosting later. If we lose Archive.is, we lose everything.
Help in this Herculean effort would be incredibly appreciated!
____________________________
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
48cd8e No.318652
Thanks. Doing it right now.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
48cd8e No.318659
I have a massive amount of corrupted archives. So far: about 1200 corrupted files (616 bytes each) against 630 proper archives. These corrupted archives seem otherwise available on archive.is; any idea why it's happening?
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
48cd8e No.318660
>>318659
Is this happening with the script, or a general observation?
If the former, give me an example link and I'll look into it.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
48cd8e No.318662
>>318660
General observation. The file contains some javascript code, looks like Google Analytic.
I rewrote "archlink" so it contains the failed downloads only, and tried again. The second time around some finally downloaded, other won't. Maybe it's just archive.is' way of saying "not now honey, mommy's busy"
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
48cd8e No.318702
>>318662
I want porn of this.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
48cd8e No.318709
>>318702
Original Character, Do Not Steal
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
48cd8e No.318711
semi related: deepfreeze uses archive(dot)is links are affected their articles, there some alternative for archives (eg. openwayback)
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
48cd8e No.318724
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
48cd8e No.318727
script to pull archive.is urls out of a text file: https://pastee.org/gecp3
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
48cd8e No.318731
>>318727
Thanks, have some archive links https://pastee.org/34wz7
HTH
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
48cd8e No.318739
>>318662
Interesting. /pol/ talked about that script at one point, saying there was no reason why traffic from archive.is should be redirecting there.
Educated guess: If the script tries to load on a page, the grabber gets a corrupted copy. If it doesn't try to load you get the clear .zip of the archive. Plausible?
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
48cd8e No.318740
>>318739
It's getting weirder… I launched the script again with the missing files, this time all of the downloaded zips are 131.3kB. They are technically valid, yet there's only one file inside: index.html. If I extract this file and open if with an hexadecimal editor, it contains the opening and closing "html" tag, but starting at address 0x0D, it seem to be binary?! It's not a program, but it doesn't look like a compressed stream either. I have no clue what's going on.
Anybody else is having issues?
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
48cd8e No.318742
>>318740
Give me a couple suffixes for known bad files pls, I'd like to look at this. By suffix I mean the five letter "xDyNM" from the archive link.
I'll download em manually and see what happens.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
48cd8e No.318743
>>318742
1keQV
1kOrY
1lPT4
1mPAr
1N9tv
1nCzA
1Oa5H
1ob2Z
1oXlb
1peXJ
1ptmQ
1SDVw
1sIN8
1UHVr
1XupV
21j3o
23p93
24mMx
24tvv
2HJuj
2ia1s
2Jico
2KZL4
2lgQn
2lii3
2m4aX
2N0yZ
2nhKw
2QEUH
2sfbf
2StTm
2yc27
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
48cd8e No.318745
>>318743
This is really fucking weird.
I'm going to town on 1kOry from that list and NOTHING can download it. Every time, you get a zip with a corrupted file beginning with html headers.
Yet if you go to the page http://archive.is/1kOrY and you click the "Download .zip" link, you get the EXACT same source file, but everything is 100% perfect.
Happens with wget, happens with curl… Just what the fug is going on here?
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
48cd8e No.318746
>>318745
To make it clear: these files would download as 616 bytes files yesterday. It's always these files I'm trying to download. Yesterday I had a little more luck and would sometimes manage to finally grab some files that previously wouldn't download properly…
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
48cd8e No.318747
>>318746
I have a possible solution. Stand by.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
48cd8e No.318748
>>318746
>>318747
Alright, I think I've got it!
Actually this is probably better, since it opens archiving up to non-Linuxfags too. Here we go:
Download the "DownThemAll!" extension for Firefox or Pale Moon from here:
https://addons.mozilla.org/en-US/firefox/addon/downthemall/
Install it and restart your browser.
Take the 'archlinks' file and open it in a text editor, and resave it in your GG archive folder as 'archlinks.txt'
Now, in Firefox or Pale Moon, go to:
>Tools
>DownThemAll Tools
>Manager
This will open the DownThemAll raw file interface. You'll see some bullshit files there that you don't want in the list, so just click - delete them.
Now right click anywhere on that screen and choose
>Advanced
>Import from File
Select your archlinks.txt file as the source (change the filetype from Meta to .txt using the little dropdown box in the bottom right corner to make it show up)
Now at the main Manager screen again choose your destination folder for the downloads from the box on the left. It will default to your home folder or desktop if you let it, lol. Get them in the right spot.
At this point you should see the massive list of downloads, all from archive.is, filling the manager window.
Right click one and hit "Select All", then right click one again and say "Check all Selected." That tells it to download every fucking one of the files.
Press the Start! button at the bottom and let it go to work.
I tested in 5 of the problem files above and it worked like a charm.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
48cd8e No.318749
>>318748
It works. Thanks!
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
48cd8e No.318751
Scratch that. Some of them work, but I still get 616 bytes long files more than half of the time:
m7x5b
M81dB
MbGja
mcv5h
MeCrT
qXYx0
…
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
48cd8e No.318752
>>318751
I just tested all of those you listed, DLs completed without issue.
Running a selection of 50 right now for a bigger test.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
48cd8e No.318753
>>318752
To be fair, I have a list of 999 (yeah, it's frustrating) that won't download.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
48cd8e No.318754
>>318752
I'll just keep forcing download until I get the proper thing I guess.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
48cd8e No.318755
>>318751
Alright, I'm beginning to suspect its a bandwidth issue.
I did 50 random files, for about 7 corrupted. A good example would be
naHEC
Came in at 8.7 mb the first time, corrupt, and over 33mb on the redo that gave me the proper archive. Similar with most of the problem files - they're really big in non-corrupt state.
I'll play with more options tonight and tomorrow and see if we can compensate for this somehow without having to go full turbonerd. If we can throttle it on our end such that archive.is' servers don't choke on it we can probably make this work.
Duds are easy to find in *nix because you can automate an integrity check on the files very easily, but we'd need some kind of recursive program to keep downloading them til everything was correct.
Oy fucking Vey.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
48cd8e No.318758
>>318755
Think you can get some backups going on the archives I sent you in an email?
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
48cd8e No.318761
>>318758
Yes, but we need to solve this file integrity issue first and foremost.
Downloading all the archives won't do a lot of good if half of them come up busted. I have a plan and I'll keep working on it tomorrow and updating this thread regularly.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
48cd8e No.318762
>>318755
>>318746
I know someone who downloaded a bunch of the zips manually said they stopped being able to access archive.is for a while, and I've personally had the same thing happen with manually archiving too many pages in a short time.
Are you just hitting the limit from abusing their servers too much and then getting a bunch of dummy files until enough time passes that the servers let you download them again? Maybe try putting a significant delay between each download so you don't download too many in an hour (or whatever timeframe they use) and then let them slowly download overnight.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
48cd8e No.318763
>>318761
Excellent.
Hopefully this shit with archive.is is a one time thing, but better safe than sorry.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
48cd8e No.318764
>>318762
It's likely. I'll try to break the initial list into smaller chunks, and download one chunk every thirty minutes maybe?
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
48cd8e No.318765
>>318762
My theory is that their server throttles you with that redirect if you have too much bandwidth taken up.
I configured DownThemAll to use no more than one connection per server and to download one file at a time with the speed capped at 50kbps. I also activated the setting that downloads the last part of a file first, to keep the checksum from getting messed with. I'm running a batch of 200 archives with those settings right now.
Currently at ~21 files complete with no corruptions yet. Cross your fingers.
EDIT:
At 68 files downloaded, zero corrupt files. I have to get to bed, but I ~think~ we've figured it out? Knock on wood. I'll be back late tomorrow to check it out again, but this seems very promising.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
Post last edited at
48cd8e No.318782
Hilarious. Absolutely top kek.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
48cd8e No.318804
Okay, now will people pay attention to my archive.is batch downloader? Just feed it a text file or export of your bookmarks and you get all the .zip backups from archive.is or archive.today links.
https://gitgud.io/MetalUpa/BackUpa
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
48cd8e No.318812
I don't understand half the techno babble in this thread but keep at it and let me know whenever its finished.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
48cd8e No.318863
>>318765
CAN CONFIRM
200 archives downloaded without error.
SETTINGS
In the DownThemAll manager, click the Preferences link at the bottom and go to the Advanced tab.
>Set 'Max number of segments per download' to 1
>Timeout to 15 minutes
>Download last few kilobytes first to somewhere around 2800
In the 'Network' tab:
>Concurrent downloads to 8
>Downloads per server to 1
That fixed the file corruption for me. Can anyone else confirm?
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
48cd8e No.318874
>>318863
At 50 files now with no corruptions using those settings.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
48cd8e No.318893
Close, but I still had some issues with some file being 616 bytes. I capped the transfer rate in the manager window. It takes a fuck ton of time, but that'll do it. I hope
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
48cd8e No.318975
>>318893
What did you set the limit to? 50kbps?
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
48cd8e No.319006
>>318975
100Kb. Finished downloading everything a couple of hours ago. I still need to copy the files on another computer and check if everything can be unpacked properly.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
48cd8e No.319761
My mum is always involved in some shit with the council / old people's homes, I told her to save their pages with archive.is, so they can't change things later.
I believe the Finnish government has used a national firewall to block the archive. It's surely only a matter of time before other countries follow suit. No doubt claiming they're only doing it to "stop the hate speech from Gamergate".
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
48cd8e No.319822
>>319761
Using the good ol' GamerGate boogeyman could be done, even though there are outer outs possible for governments to block the service:
The "right to be forgotten", which allows someone to request that search engines remove links to pages deemed private, even if the pages themselves remain on the internet, this is in practice in the EU and Argentina since 2006. Surely, archive.is isn't a search engine, but I'm also (and probably legitimately) concerned about the slippery slope.
The "digital millennium copyright act", which tilts strongly in favor of copyright holders could be become an issue since there's a push the revise the way copyright works on the Internet (everything and anything you publish should be protected by a copyright, either from you, the service you use or other private registration services).
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
48cd8e No.319873
you need some kind of trusted broker still that's the point of an archive. You can't tamper with it. A flat file under your control on your local host can be tampered with compromising it's trustability.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
48cd8e No.328457
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.