[ / / / / / / / / / / / / / ] [ dir / choroy / dempart / just / lounge / miku / roze / wooo / youthot ]

/a/ - Animu & Mango

Name
Email
Subject
Comment *
File
Password (Randomized for file and post deletion; you may also set your own.)
* = required field[▶ Show post options & limits]
Confused? See the FAQ.
Embed
(replaces files and can be used instead)
Oekaki
Show oekaki applet
(replaces files and can be used instead)
Options

Allowed file types:jpg, jpeg, gif, png, webm, mp4, swf, pdf
Max filesize is 16 MB.
Max image dimensions are 15000 x 15000.
You may upload 5 per post.


Welcome to /a/, please read the rules before posting.
Reminder that in the event 8ch goes down, our bunker will still be up and running.

File: ba6e7783c67691f⋯.jpg (35.4 KB, 480x360, 4:3, endangeredTOES.jpg)

File: a5b6856c973e186⋯.png (21.97 KB, 287x347, 287:347, UnfortunatelySANNSE.png)

 No.906489

Anime feet project needs help archiving files

This is a followup to this archived November thread >>878519 ( https://web.archive.org/web/20190215190723/https://8ch.net/a/res/878519.html )

Back then a very helpful anon responded within a day >>879299 saying they wrote a script to download all the images. According to >>880116 it was 51,386 and >>880420 estimated it at 15.3 GB, though when >>880657 finally delivered it at https://mega.nz/#!vFZlnISK!ayYpoBkrRSDYNJSc2C8-LE_JdLIY4-pDRAcUiTYdt0E the size was 14.3 GB, possibly due to use of compression software.

I come with a much tougher endeavour. https://animefeet.fandom.com/wiki/Special:Statistics shows the project has 151,328 files, nearly 3x as many. I don't know if it would be 3x the data because I think WaterMaiden always added top-quality hi-res images of bathing, whereas AF included some lower-res images. I believe this will take more than one Mega account (they are limited to 15 GB) but possibly might fit into 2 instead of 3. It should definitely fit under 3 though, I'm sure of that.

I don't know if super-helpful-script-god anon is still around, you never left an e-mail address, but I can be contacted at Clawfootking@gmail.com if you might have any advice to offer. I don't know if downloading/mega-backing this project would be something you'd want to do, but if not, if you might be willing to share instructions on how to operate your script for downloading these images. I know that once someone is able to do that, uploading them to Mega or other hosts for others to back up would be easy.

The problem I'm facing is a 12 year old desktop which tends to spontaneously kill itself after 5-10 minutes of browsing lately. I switched from Vista to Lubuntu and it helped for a couple months but now even that is failing. I don't know if this is something I could do myself with present resources, but if I am able to share your instructions, I could hope possibly that someone else is able to do this.

 No.906499

File: 400bf5cba5c6b83⋯.jpg (72.19 KB, 800x914, 400:457, 288d2c7dd23f7513afcdeb7f57….jpg)

I'm still here.

I just need to fix the script because apparently the AllPages special page can consist of a recursive tree of page ranges.

In the mean time, please enjoy this:

http://asciiazumanga.ytmnd.com/


 No.906508

First they came for the bath scenes, and I did not speak up, because I don't really care about bath scenes.

Then they came for the feet, and I did not speak up, because I was not a footfag.


 No.906518

File: f26cf0153b15286⋯.jpg (Spoiler Image, 87.96 KB, 950x1267, 950:1267, b514886e8a005f0f3a32cb2757….jpg)

The script seems to be working, but it's going to take a very long time to run because of the sheer number of files.

I'll post code and a writeup later.

>>906508

I'm not really into either, but I have weird fetishes of my own so I feel a certain camaraderie with other perverts.

Wikis like these deserve to be preserved, if only as a monument to the supreme autism of their creators.


 No.906519

File: 9b4b74b2a9bd975⋯.jpg (63.89 KB, 479x660, 479:660, 46280254628b308f457f101e2a….jpg)

>Fetish wikis

It's just the way of the world. They need to prepare their anus to get fucked by China.


 No.906525

File: b6754444f654dc7⋯.mp4 (260.65 KB, 800x450, 16:9, Gets.mp4)


 No.906526

File: 3a363dc6a665441⋯.jpg (53.52 KB, 720x610, 72:61, cheers faggot.jpg)

>>906518

>if only as a monument to the supreme autism of their creators.

It is the autism I respect. I commend the OP for his autism conservation initiative.


 No.906530

>>906526

Is autism a finite resource? Will we one day run out of autism? Will we need to launch autism recycling drives and dream of a perpetual autism machine?


 No.906532

First the bathfags, then the footfags. When will this horror end?


 No.906537

>>906489

I want to fuck 2D girls' (clean) feet.


 No.906540

>>906518

Do it and I'll cross-check. This connection I'm on is F.A.S.T.


 No.906541

>>906489

Wasn't this a problem like 4 months ago?


 No.906544

File: fe41afdf10be5af⋯.jpg (8.98 KB, 300x222, 50:37, stab.jpg)

FIRST THEY CAME FOR THE BATHFAGS AND I DID NOT SPEAK OUT BECAUSE I WAS NOT A BATHFAG

THEN THEY CAME FOR THE FOOTFAGS AND I DID NOT SPEAK OUT BECAUSE I WAS NOT A FOOTFAG


 No.906545

>>906537

I want to fuck 2D girls' (sweaty after a long day in pantyhose and shoes) feet.


 No.906550

>>906489

>The problem I'm facing is a 12 year old desktop which tends to spontaneously kill itself after 5-10 minutes of browsing lately. I switched from Vista to Lubuntu and it helped for a couple months but now even that is failing.

It sounds like your hard drive is failing. It would explain how when you switched to a less resource intensive OS things improved but even that was just to buy time. Download Auslogics Disk Defrag or any defrag program that includes a S.M.A.R.T. test and run that. It will tell you the closest you can get to diagnosing your hard drive short of a magic wand. Compare the results with definitions online and recommended "safe" parameters.

Polite sage for /tech/ support. Felt that this could help.


 No.906551

>>906545

Is this worth dying for?


 No.906553

>>906541

That was when they decided to destroy the anime bath pictures wiki.


 No.906554

>>906550

Seconding this. OP, don't fuck about, your computer is about to fail forever and you will lose all data you don't have backups of. Back up your data now, or you will lose everything.


 No.906555

I am not the original script person, but I though I would contribute in the hopes someone helps save my fetish one day.


#!/bin/sh

echo "Get AllPages to get page groups"
curl -sS 'https://animefeet.fandom.com/wiki/Special:AllPages' | grep -oE 'href="[^"]+?"' | grep AllPages | cut -d '"' -f 2 > targets

echo "getting all page groups"
for i in $(cat targets)
do
echo "Scraping page group ${i}"
curl -sS "https://animefeet.fandom.com${i}" | grep -oE 'href="/wiki/[^"]+?"' | cut -d '"' -f 2 >> pagelist
done

echo "Sorting pagelist"
sort -u pagelist -o sorted_list

echo "getting all pages"
for i in $(cat sorted_list)
do
echo "scraping page $i"
curl -sS "https://animefeet.fandom.com${i}" | grep -oE 'href="/wiki/File:[^"]+"' | cut -d '"' -f 2 >> filelist
done

echo "sorting filelist"
sort -u filelist -o sorted_files
mkdir outdir

echo "getting all files"
for i in $(cat sorted_files)
do
echo "Getting $i"
urla=$(curl -sS "https://animefeet.fandom.com${i}" | grep -oE 'fullMedia"><a href="https://vignette.wikia.nocookie.net/animatedfeetscenes/images/[^"]+"' | cut -d '"' -f 3)
urlb=${urla%?cb=*}
echo "got full image link, fetching image file: $urlb"
fetch -o outdir $urlb
done

I'm not 100% sure the last bit works, I'm still on B for the filelist. Frankly it's pretty shit, and could use some xargs or coprocesses to run some curls in parallel, but w/e.

>>906550

>>906489

Op, yeah, don't fuck around with hard drives, you're lucky it's giving you any warning at all.

If you're on linux get the smartctl package and run

smartctl -a /dev/sda
and pastebin the output here.

Also, post the original script please.


 No.906556

>>906532

According to the site one of the next ones is anime shoes. Seems some of the other sites like pantsu and oppai and even a kodomo no jikan wikia has been purged already.


 No.906561

>>906555

>Also, post the original script please.

Here you go. You'll need a recent version of python 3 (I'm on 3.6).

I still plan to add some functionality such as the ability to resume in the event of a crash. All the paths are all hardcoded in main(), so you might want to edit those. Sorry for the lack of comments.

#!/usr/bin/env python3

from requests_html import HTMLResponse, HTMLSession
from pathlib import Path
from urllib.parse import unquote
from typing import Set


def recursively_traverse_allpages(session: HTMLSession, url: str, depth: int = 0) -> Set[str]:
response: HTMLResponse = session.get(url)

pages = set()

allpageslist = response.html.find(".allpageslist", first=True)
if allpageslist is not None:
page_ranges = allpageslist.absolute_links
num = len(page_ranges)
for ii, link in enumerate(page_ranges):
if depth == 0:
print(f"{ii}/{num}")
pages = pages.union(recursively_traverse_allpages(session, link, depth+1))

allpagestable = response.html.find(".mw-allpages-table-chunk", first=True)
if allpagestable is not None:
pages = pages.union(allpagestable.absolute_links)

return pages


def get_files(session: HTMLSession, wikiname: str) -> Set[str]:
url = "https://{}.wikia.com/wiki/Special:AllPages?namespace=6".format(wikiname)
pages = recursively_traverse_allpages(session, url)
return pages


def download_image(session: HTMLSession, url: str, folder: Path):
response: HTMLResponse = session.get(url)
original = response.html.find("div.fullMedia", first=True).absolute_links.pop()

filename = folder / Path(unquote(original.split("/")[-3])).name

response = session.get(original, stream=True)
with filename.open("wb") as fp:
for chunk in response.iter_content(2**13):
fp.write(chunk)


def main():
ss = HTMLSession()
files = get_files(ss, "animefeet")

folder = Path("images_feet")
if not folder.exists():
folder.mkdir(parents=True)
total = len(files)
print("found {} files".format(total))
progwidth = 40
errors = []
for ii, img in enumerate(files):
filled = int(progwidth * ii/total)
bar = '▉'*filled + ' '*(progwidth-filled)
print("\r[{}] {}/{}".format(bar, ii, total), end='')
try:
download_image(ss, img, folder)
except:
print("\nError downloading image:", img)
errors.append(img)
with open("errors.txt", "w") as fp:
fp.write("\n".join(errors))
print("\r[{0}] {1}/{1}".format('▉'*progwidth, total))


if __name__ == "__main__":
main()


 No.906562

>>906489

What's the deal with these wikijackings and purgings? Those chinks again?


 No.906563

Christ, what the fuck is wrong with the world right now?

Everything, ANYTHING, that can be, or is, remotely lewd is getting attacked and purged from everywhere, more often than not because muh sponsors, despite "sex sells" being one of the oldest phrases in the world and the world revolving around pussy and dick. It's like we're back to the eighties and christian moralists are back at full force except this time people somehow are paying attention to them. What the hell is going on? There has to be a deeper reason than advertising money for all this bullshit.


 No.906567

>>906561

Ok, thanks. Looks like I'm using roughly the same technique.

Had to fix a few things, gotta specify the filename for fetch, since they all end in /latest:


#!/bin/sh

echo "Get AllPages to get page groups"
curl -sS 'https://animefeet.fandom.com/wiki/Special:AllPages' | grep -oE 'href="[^"]+?"' | grep AllPages | cut -d '"' -f 2 > targets

echo "getting all page groups"
for i in $(cat targets)
do
echo "Scraping page group ${i}"
curl -sS "https://animefeet.fandom.com${i}" | grep -oE 'href="/wiki/[^"]+?"' | cut -d '"' -f 2 >> pagelist
done

echo "Sorting pagelist"
sort -u pagelist -o sorted_list

echo "getting all pages"
for i in $(cat sorted_list)
do
echo "scraping page $i"
curl -sS "https://animefeet.fandom.com${i}" | grep -oE 'href="/wiki/File:[^"]+"' | cut -d '"' -f 2 >> filelist
done

echo "sorting filelist"
sort -u filelist -o sorted_files
mkdir outdir

echo "getting all files"
for i in $(cat sorted_files)
do
echo "Getting $i"
urla=$(curl -sS "https://animefeet.fandom.com${i}" | grep -oE 'fullMedia"><a href="https://vignette.wikia.nocookie.net/animatedf
eetscenes/images/[^"]+"' | cut -d '"' -f 3)
urlb=${urla%?cb=*}
echo "got full image link, fetching image file: $urlb"
fname=$(basename $i)
fetch -o outdir/${fname} $urlb
done


 No.906571

File: 6f2d255954b77ad⋯.jpg (1.74 MB, 1295x1812, 1295:1812, nue.jpg)

>>906567

>animefeet

>Fritz the Cat

>Cinderella

>Batman

Running now though.

And given feet came up, if anyone likes the artist Oouso (Usotsukiya, mostly a /jp/ in-joke who draws mostly feet and watersports stuff of Touhou, but is now into mobileshit) a while ago I uploaded a scrape of his website because he made a grave error, and had source files to a bunch of his works in a listable directory. Internet Archive even has a few of the directory pages archived but none of the contents.

https://mega.nz/#!cZlnWYyL!MXuXWlj-G9WM22B3d-6zjPgFf2YK3HiRsnGyqzdsk1o


 No.906572

>>906571

Nice, I'd been meaning to go through various galleries of his work.


 No.906573

File: 974d02bc046ec73⋯.png (251.51 KB, 748x1575, 748:1575, list.png)

>>906572

Well there's some images but it doesn't include his Pixiv or anything like that. Only his now defunct main website, and again, because it had things that were never meant to be public. Like, PSDs of some doujin works and two full-res dakimakuras.


 No.906577

>>906567

It's been running for several hours now and I'm at 8k, out of 130k images. Looks like this is going to run all weekend while I'm at my parents'


 No.906587

File: d83379e22b94950⋯.png (564 KB, 553x1000, 553:1000, 793805569509384192_Q_CwQpy….png)

Mine's only pulled links up to the Ss so far. ~120000 lines in filelist. It should go without saying that whoever intends to mirror it should also upload the lists of URLs and such for validation purposes. I thought a bit too late to record a console log sadly.


 No.906592

>>906561

>>906567

Is this getting only the images, or the page content too? Because there's more to this wiki than just the pages, there's been a lot of care put into categorization and descriptions that would be lost with just the images.


 No.906594

>>906587

139797 lines in the list of files to get on mine.

I should really do a duplicate check on it to see how many files are actually the same.

>>906592

The script I posted only gets the files, but you can use this to get the content out of the pages:


#!/bin/sh

echo "Getting text of pages"
mkdir pagedir
for i in $(cat sorted_list)
do
echo "Getting page $i"
pagename=$(basename $i).html
fetch -o pagedir/${pagename} "https://animefeet.fandom.com${i}"
a=$(grep -n '<article' pagedir/$pagename | cut -d : -f 1)
b=$(grep -n '</article>' pagedir/$pagename | cut -d : -f 1)
sed -I '' -e "1,${a}d" -e "${b},99999d" pagedir/$pagename
done

later I'll beat it up with sed to replace the image urls with file:// urls so it can be browsed locally, and clean up the html.

13904 pages fyi, but I'll post a tarball somewhere once it's cleaned up, it's only 111MB


 No.906603

>>906594

Ideally we'd get a snapshot of the entire wiki as it is today that can be browsed locally as a static website. If anyone can get me that then I'll host it via IPFS.


 No.906604

Archiveteam have their own set of tools for archiving wikis which might be helpful here. They can be found at https://github.com/WikiTeam/wikiteam


 No.906605

>>906594

What is "fetch" I can't find it in my list of programs. I subbed in wget for now with some random delaying since this connection will probably see it flagged as a bot.

Thankfully I can just cut the rest of the script out without re-downloading all that.

Also I ended up with 147255 total images, confirm?


 No.906607

OK this script's downloading is not working right. People who are downloading, confirm it's not generating 404 links! If so abort it.


 No.906610

File: 62bfa44b31f75c3⋯.png (225.76 KB, 953x894, 953:894, downloader.png)

I gave >>906604 a spin, seems to work perfectly. I suggest giving that a shot.


 No.906611

File: 226005d24fae818⋯.png (743.6 KB, 688x1000, 86:125, __graf_zeppelin_kantai_col….png)

File: 3759c5f93689463⋯.jpg (139.31 KB, 1920x1080, 16:9, __oumae_kumiko_hibike_euph….jpg)

OK I fixed it. I'll point out that when I did a Twitter image scraper it used Python because BeautifulSoup makes it so easy to scrape webpage content. Assuming none of the URLs have a " in their filename (which is bad practice but you never know) it should be OK. The original problem with links not being found is because there was a newline in the script breaking the URL up.

#!/bin/sh
echo "getting all files"
for i in $(cat sorted_files)
do
echo "Getting $i"
urla=$(curl -sS "https://animefeet.fandom.com${i}" | grep -oE 'fullMedia"><a href="https://vignette.wikia.nocookie.net/animatedfeetscenes/images/[^"]+"' | cut -d '"' -f 3)
urlb=${urla%?cb=*}
echo "got full image link, fetching image file: $urlb"
fname=$(basename $i)
wget -Oq outdir/${fname} $urla
sleep 1
done


 No.906612

Actually there looks to be a lot of crap on this wiki, so I'm not gonna do all that downloading. I'll at least throw up the list of URLs though and the images I did get.

https://mega.nz/#!FZlgVYwa!21RwynOq5_6Gc0WivrRoB28q2tm40COFP6eWMR-R530


 No.906680

>>906592

It's only designed to save the images.

In the case of the animebaths wiki, the mediawiki staff had offered to provide a full copy of the text database but wouldn't do the same for the image files. I don't know if the same is true here.


 No.906689

>>906605

>>906611

Sorry, fetch is a freebsd thing, wget should work fine on linux.

I got 139797 files in the sorted list.

And yes, I understand that this is a bit shitty, but i made it in about 15 minutes, which is about all the time I had to spare last night. If I cared more about speed, I'd do it with LWP and some parallelism.

Regarding the " in href part, I'm just working under the assumption that they'll be urlencoded.


 No.906731


 No.906734

File: 305630663580646⋯.jpg (92.52 KB, 960x720, 4:3, Urusei Yatsura - 009&010 [….jpg)

>>906544

WHAT'S NEXT, ARMPITS?


 No.907927

>According to the site one of the next ones is anime shoes.

>>906532

My guess is they'll go after animegrooming next, somehow they will find a way to perceive the documentation of rare scenes in illustration fiction when characters trim their nosehairs or brush their teeth or wash their face or comb their hair to somehow be inherently fetishistic, mark my words.

It's small enough I might be able to do that manually.

https://animeunderwear.fandom.com may have fallen under the radar, but it has only 9 pages, so fuck that place, it never came close to emulating what pantsu.wikia used to be

I wouldn't be surprised if they lumped https://paswg.fandom.com/wiki/Panty_and_Stocking_with_Garterbelt_Wiki in there for good measure.

>>906556

>According to the site one of the next ones is anime shoes.

Sannse sent out a Feb 13 warning of 2 weeks left (so death by Feb 25) for BOTH animefeet and animeshoes, they are due to be deleted on the same day.

I'm not sure of the complete list, but it seems these three are in the same boat:

>https://animatedmusclemen.fandom.com/wiki/Thread:8508

>https://animatedmusclewomen.fandom.com/wiki/Thread:25015

>https://animated-video-games-muscular.fandom.com

You only get notified of announcements in communities you've actually contributed to. I had made a token "WTF is this" edit on the 3rd one, and found the 1st/2nd by coincidence while trying to google the 3rd.

> Seems some of the other sites like pantsu and oppai and even a kodomo no jikan wikia has been purged already.

I can't remember oppai. KNJ was a tragedy, I was a co-mod on that. There was also rather generic fanservice.wikia which was axed years ago. My project 'animeslaps' also got killed (apparently it is fetish to show animated gifs of when cute tsunderes slap guys in the face) and back then, all of this was done with absolutely no warning. Animeslaps was small enough that I could've backed it up manually, I had probably only covered 12-20 instances at the time. I had wanted to build it into something bigger but it was killed in the cradle.

https://webcache.googleusercontent.com/search?q=cache:d6fWpUxLhL0J:https://cartoonfatness.fandom.com/wiki/Thread:63459+&cd=1&hl=en&ct=clnk&gl=ca

a google cache shows that a 3 January 2019 warning was issued to "CartoonFatness" and the discussion formerly at https://cartoonfatness.fandom.com/wiki/Thread:63459 only goes as far as January 10th, so it might've only lasted ONE week.

Sannse said "It's a matter of reputation. We just don't want to be the wiki host that contains fetish sites." despite admitting earlier "I would call this an "ecchi" wiki rather than an obvious fetish one, but we aren't going to host them either"

The thing is... what is "ecchi" is entirely subjective. Pretty much any picture of flanks (plot) is lewd to bronies, for example, but they wouldn't DARE go after MLP.wikia

>>906562

Probably has something to do with all the recent expansion you can see described at the end of https://en.wikipedia.org/wiki/Wikia#History

Autistic projects like this probably made Wikia loads of cash via advertisements, but now they probably see it as a PR risk to all the other side projects they've investigated in, as the power of social activism becomes more apparent. I swear Twitter wasn't this powerful years ago when we started stuff like this...

>>906563

I think we need a 'Wikia for adults', because apparently even though you can actually check a box when creating your project to indicate you don't want kids coming there, it's still one collective network so I expect they realize that it's just a token objection and doesn't stop crossnetworking between kid projects and mature ones.

Not that there was ever anything actually mature about these targeted projects. You can see feet or muscles or shoes hitting pedals on all kinds of toons still airing. What they're objecting to is the memetic focus and documentation of it. They are sexualizing it by calling this encyclopedia topic a 'fetish'.

>>906577

I remember when I used to do stuff like that with my wife back when she was a pre-teen and I'd have her torrenting batches as I slept. Now that she's a teenager she's become frigid and can't stay active more than ten minutes at a time. Replacing Vista with Lubuntu was only a band-aid solution.

>>906592

The source code of the pages is supposedly found in the Database dumps you can download from https://animefeet.fandom.com/wiki/Special:Statistics and even after a wiki is closed they will usually keep a link up to the dump for the next month. That's why the main priority is the files. If I was able to find a new host, the dump would be the first thing I would add, and then we would get back all the pages, but all the image links would just be dead/red and not display anything.


 No.907930

I have finished accruing a complete dump of the site as it was last week (took a while to complete) in the format produced by wikiteam's scripts and demanded by archive.org. I will try to work out how to actually upload it to archive.org, and from there anyone can download the dump later on. This dump is of a format that can be imported into any mediawiki instance (hopefully), though I have not tested it and don't really know if the dump is correct. It is however 61GB when compressed, so most likely it's all in there.


 No.907961

>>907930

I have 64 GB of images, and I think I've missed a few. The page text was only a few hundred MB, even before stripping off most of the wikia boilerplate their html is aids

I'm going to do a second pass this weekend after I'm done a bunch of stuff that has been keeping me too busy. Make sure I haven't missed any and do some hash mapping to see what the duplication rate looks like. God damn I wish there was more time.


 No.908075

I have 150,912 image files with a total size of 68.5 GB (= 63.8 GiB).

There were 994 files which couldn't be downloaded. I think most of those were probably embedded Youtube videos.

There are also a handful of instances (~281) where multiple "/wiki/File:..." URLs point to a single file.

I think the best way to share them is via a torrent. I can't be bothered screwing around with multiple Mega accounts. I'm going away for a week, but once I get back my computer will be online 24/7. I can spare the HDD space to seed it for at least a month or two - probably much longer.


 No.908076

Also, here's the updated version of the code with caching, resuming, and nice progress bars.

Delete filecache.txt in order to refresh the cache.


#!/usr/bin/env python3

from requests_html import HTMLResponse, HTMLSession
from pathlib import Path
from urllib.parse import unquote, quote
from typing import Set
import shutil


def recursively_traverse_allpages(session: HTMLSession, url: str, depth: int = 0) -> Set[str]:
response: HTMLResponse = session.get(url)
pages = set()

# here we handle non-leaf nodes in the tree
allpageslist = response.html.find(".allpageslist", first=True)
if allpageslist is not None:
page_ranges = allpageslist.absolute_links
num = len(page_ranges)
for ii, link in enumerate(page_ranges):
if depth == 0:
progress_bar(num, ii)
pages = pages.union(recursively_traverse_allpages(session, link, depth+1))
progress_bar(num, num)
if depth == 0:
print()

# and here we handle the leaf nodes
allpagestable = response.html.find(".mw-allpages-table-chunk", first=True)
if allpagestable is not None:
pages = pages.union(allpagestable.absolute_links)

return pages


def get_files(session: HTMLSession, wikiname: str) -> Set[str]:
url = "https://{}.wikia.com/wiki/Special:AllPages?namespace=6".format(wikiname)
pages = recursively_traverse_allpages(session, url)
return pages


def download_image(session: HTMLSession, url: str, folder: Path):
response: HTMLResponse = session.get(url)
original = response.html.find("div.fullMedia", first=True).absolute_links.pop()

# this _should_ hopefully eliminate any path-traversal exploits
# the [-3] is because the filename is not the last element in the url
name_sanitized = Path(unquote(original.split("/")[-3])).name

# we download to a temp file to avoid getting incomplete files
# the file is only moved to its final location once we have it all
path_temp = Path("/tmp") / name_sanitized
path_final = folder / name_sanitized

response = session.get(original, stream=True)
with path_temp.open("wb") as fp:
for chunk in response.iter_content(2**13):
fp.write(chunk)

shutil.move(str(path_temp), str(path_final))


def progress_bar(total: int, value: int):
width = 40
filled = int(width * value / total)
bar = '▉' * filled + ' ' * (width - filled)
print("\r[{}] {}/{}".format(bar, value, total), end='')
# don't forget to call print() after the final call to this function to move to the next line.


def main():
ss = HTMLSession()

path_cache = Path("filecache.txt")
path_errors = Path("errors.txt")
dir_images = Path("animefeet")

if path_cache.is_file():
print("loading cached file list...")
with path_cache.open('r') as fp_cache:
files = set((unquote(line.strip()) for line in fp_cache))
else:
print("retrieving file list from wiki...")
files = set((unquote(x) for x in get_files(ss, "animefeet")))
with path_cache.open('w') as fp_cache:
fp_cache.write("\n".join(files))

# The files on disk only have the filename
# The variable 'files' contains full URLs
# To make 'files' and 'already_have' have the same format, we need to know the url prefix
file_prefix, _, _ = next(iter(files)).partition("/wiki/File:")
file_prefix += "/wiki/File:"

if dir_images.is_dir():
already_have = set((file_prefix + img.name for img in dir_images.glob("*")))
else:
dir_images.mkdir(parents=True)
already_have = set()

# There should be no urls in 'already_have' which aren't also in 'files'
errors = already_have.difference(files)
assert errors == set(), f"Mismatched URLs found:\n {errors}"

total = len(files)
offset = len(already_have)
print("\nfound {} files".format(total))
print(f"{len(already_have)} already downloaded")
files.difference_update(already_have)
print(f"{len(files)} remaining to download\n")
print("downloading images...")
with path_errors.open('a') as fp_errors:
for ii, img in enumerate(files):
progress_bar(total, ii+offset)
try:
download_image(ss, img, dir_images)
except AttributeError as err:
fp_errors.write(f"{img}\n")
if str(err) != "'NoneType' object has no attribute 'absolute_links'":
print(err)
progress_bar(total, total)
print()


if __name__ == "__main__":
main()


 No.908293

File: f877d8a9d307ade⋯.jpg (1.4 MB, 1322x1920, 661:960, f877d8a9d307ade5a381c700bb….jpg)

File: 86fd14a33ca7667⋯.png (2.19 MB, 1302x1842, 217:307, 86fd14a33ca7667a26c64718c8….png)

File: 16c5498b4a7a754⋯.jpg (47.1 KB, 626x1000, 313:500, 16c5498b4a7a754a8033d37375….jpg)

File: 2b05b566296dadb⋯.png (755.25 KB, 747x1058, 747:1058, 2b05b566296dadb47abbed0587….png)

File: 4e1d863c1b0141c⋯.jpg (1.48 MB, 1600x1283, 1600:1283, 4e1d863c1b0141c9cd354e04f7….jpg)

>>906489

I can see the need for lewd alternatives to fandom.


 No.908534

Alrighty then, I believe I have successfully archived the wiki. It is up at https://archive.org/details/wiki-animefeetfandomcom and can be downloaded from there directly (~60GB). It is in the format that mediawiki accepts and can be directly imported. The IA has also made it available via torrent, which would be the best way to grab it.


 No.910330

>>907930

>>908534

thank you "Feetlord 3000", I'll take a look at this to see if I can figure out how to navigate it. Is there an easy way to ask archive.org to back up pretty much any wiki? 60gigs is a lot more than I would have guessed...

>>908075

Dang, I'm not sure if I can get a working computer up by then. If you stop torrenting will it still be possible to view it on archive.org afterward?

I was hoping for megas but at 15gigs each that would take 5 accounts just to store it all...

I'm thinking if we got it back up and then went on a purging spree for repetetive low-quality or size-bloated images it could be brought down.


 No.910618

>>910330

>Is there an easy way to ask archive.org to back up pretty much any wiki?

No, you have to use archiveteam's tools to do it locally and then use their tool to upload it.


 No.912245

File: b4b7e57cf0f6e09⋯.jpg (208.52 KB, 1920x1090, 192:109, vlcsnap-2016-11-30-20h48m1….jpg)

>>908075

Any updates on the torrent, anon?


 No.912246

>>912245

I'm sorry, I'm retarded and skipped over >>908534

Thank you anon!


 No.912247

>>912245

Sorry, I'm ready to create the torrent but other people seem to have already archived it and >>910330 said he wasn't ready to download it.

If you want the torrent, let me know and I'll set it up.


 No.913754

File: bea95d3489f2cda⋯.png (1020.84 KB, 1300x1922, 650:961, bea95d3489f2cda9031a34168c….png)

>>906499

This is great. Would you happen to know what the song is called anon?


 No.913760

>>908075

>>912247

I'm ready to download it if you're willing to set it up.

Currently in a middle of a massive data backup so I can't vouch for consistent seeding for a couple of weeks but it should be fine.

Can it be posted on Nyaa for maximum exposure?


 No.913783

It's always inspiring seeing men come together for a common and noble cause.


 No.913784

>>913783

Through dick, unity.


 No.913808

>>913760

Here you go.

magnet:?xt=urn:btih:175ce1a6778721232db5bbdeb7a376fbd6c0b65a&dn=animefeet_images.zip&tr=udp%3A%2F%2Fipv6.leechers-paradise.org%3A6969%2Fannounce&tr=udp%3A%2F%2Fexplodie.org%3A6969&tr=udp%3A%2F%2Ftracker.zer0day.to%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.internetwarriors.net%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.leechers-paradise.org%3A6969&tr=http%3A%2F%2Fmgtracker.org%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.uw0.xyz%3A6969&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337

Let me know if there are any problems. 8chan wouldn't let me post the torrent file itself.

>Can it be posted on Nyaa for maximum exposure?

Would it be allowed on Nyaa? There's A LOT of western shit mixed in.

Anyway, I've used the same tracker list as nyaa.pantsu.cat so it will be easy enough to upload the torrent there if you want to.


 No.915335

File: 844a7edda8e3e35⋯.gif (1.22 MB, 500x281, 500:281, donuts.gif)

OP here, so I went to https://archive.org/download/wiki-animefeetfandomcom and downloaded the .torrent file.

Running it in Transmission now, which is telling me it is 65.9 GB. I assume this has something to due with whether counting in 1000 or 1024 multiples.

So far I've downloaded about 900 MB worth (1.4% done) but when I was looking at the .contents of the torrent, I noticed...

>animefeetfandomcom-20190216-wikidump.7z.part 60.9GB

.7z files are compressed... my question to "Footlord 3000" who made this happen... do you have any idea just how big a folder I'm looking at when I decompress this?

I hope it's not too big, my hard drive only has about 200 GB more free space.

When I unzipped the Database Dump that Wikia supplied (page source code only, no pictures) the 17.3 MB file became 461.9 MB when I decompressed it. It's actually beyond the 250 MB upload cap at https://animatedfeet.miraheze.org so I am unable to Special:Import it.

Does anyone know how to split these XML batches into smaller bits? A pair of 230 MB XMLs might be accepted.


 No.915340

Ok I'll defend the footfags but if they come for the fartposters I'm sitting by and watching them get purged.


 No.915343

>>915335

I just checked.

$ du -h animefeetfandomcom-20190216-wikidump
63G animefeetfandomcom-20190216-wikidump/images
64G animefeetfandomcom-20190216-wikidump
In your first example the XML compressed super well, but images are already in a compressed form of sorts. Thus it's once again about 65G. I probably could've turned up the compression ration, but it was taking forever to pack as it was.


 No.915346

>>915340

Have you actually done anything to help?

Personally I'm happy to help archive any anime/fetish wikis that need it. I have my own fucked up fetishes so I'd be a hypocrite to judge others.




[Return][Go to top][Catalog][Nerve Center][Cancer][Post a Reply]
Delete Post [ ]
[]
[ / / / / / / / / / / / / / ] [ dir / choroy / dempart / just / lounge / miku / roze / wooo / youthot ]