>>11260
Download all images from a thread:
wget -nd -nc -r -l 1 -H -e robots=off -D 8kun.top,media.8kun.top -A png,gif,jpg,jpeg,webm http://8kun.top/url/to/thread.html
-nd tells wget not to create a whole new folder, if you want that you can, iirc use -p foldername
-nc tells it not to overwrite files that are already there
-r means recursive
-l NUMBER are how many levels of links it follows on the page in question
-H is span hosts
-e robots=off tells it not to follow the robots.txt, don't know if that is even needed on 8kun
-D is the domains you're working with, obviously. 8kun has images hosted on a different subdomain so just following links to 8kun.top won't get you the images.
-A is the formats that will be downloaded
You can alter it for different imageboards too. If some site doesn't want to comply you can add arbitrary shit like some commonly used user agent and waits:
--user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:21.0) Gecko/20100101 Firefox/21.0" --random-wait --limit-rate=1M
to make it seem more natural. Of course this didn't work while vanwanet was going, but after they changed it to be less obnoxious is works well again.
You could probably make it ignore thumbnails somehow, but honestly, it's easier to just remove them manually afterwards, they are pretty obvious due to their small file size, some imageboards even prefix them with somethin like
t_ which makes it even easier.
Also there is a really good tool called httrack to download whole websites and create local mirrors, you may also try that.
When it comes to archiving whole pages, always pay attention to the recursion levels, as most sites use tons of external stuff like scripts from cdns. 8kun not so much, but you still will not want whatever tool you use to follow every damn link which at some point would lead to downloading the whole fucking internet if you don't put a limit to it, either by defining the number of allowed recursion or by limiting to domains, or both.
On imageboards in particular it's annoying when your download tool probes all the links to other boards first. They're at the top of the page so they're processed first.