No.42797
As many of you are probably well-aware Geocities JP is set to go onto the chopping block within the next year. You can help Archive Team with that, there's probably a dozen redundant backups of the entire site by now, but I'm sure that more would always be appreciated. The thing is, archival isn't some transient issue that'll be resolved every time people are called into action to solve whatever the current crisis is. There's a CONSTANT crisis going on, and people can hardly be alerted to it every other day of the week. The fact that we aren't prepared for data erasure unless we're given prior warning is the issue here. You can't always just assume that somebody else has got the job done already.
How can you help? There's two simple ways to do this:
Download and seed torrents of things you like. Easy as pie. Just remember to always try to at least keep the ones which most closely reflect the original media, even if that somehow detracts from the experience for you. Downloading raws and using your own external subtitle files means nothing if you haven't got an actual ISO for anime, for example. Manga are kind of tricky, since often in cases where they do have digital editions, some content is cut out from the printed version, and there's no real definitive way to perfectly scan paper, as it's an analogue medium. As for games, downloading ISOs rather than floating directories with these shitty pre-installed patches for video games is kind of just a given, so please just do that.
Another way which might turn out to be substantially more useful to everybody if you have some money to burn (better the paper of money than the paper of books), is to buy media which isn't currently available anywhere you've searched on the Internet and scan it yourself. This methodology is less accessible to anyone living outside of Japan (as you can't really rent items from overseas, and renting is much less expensive than buying), but it's still possible through online marketplaces like eBay or Yahoo (or even Amazon, if it comes down to that). If you do this, it's very important that you remember not to make any alterations to the media before uploading it. Anime seems to get treated the worst, as transcodes will typically somehow appear much earlier than actual ISO images. And PLEASE use ISOs, every last bit of your disc must be captured or else it'll be absolutely worthless.
It should be kept in mind that paying for things that are already freely available to pirate is generally not only shameful, but also somewhat harmful to the archival effort. If you've got a copy of whatever media you've got on a disc, or in book form or whatever, chances are, you're not going to download and seed any torrents for it. If you've already got it, and it's already very available, then what's the point in keeping it on your hard drive as well, right? I guess that kind of attitude can be deliberately avoided, but I think the money would be better spent on some obscure crap that you won't necessarily enjoy, but just happened to not be on the net.
Thanks for reading these foolish words.
____________________________
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.42798
Nice, the email field doesn't even have enough space to fit a full email address. Very nice.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.42834
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.42836
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.42839
I'd argue that improved versions of existing material is worth preserving over whatever shitty original exists. Especially video games, having the original ISO is great and all but if it doesn't fucking work without patches and cracks what's the point. Hell the reason why more GOOD archives haven't been uploaded to trackers is due to the autistic rules policing for quality. Obscure media is obscure for a reason, and people won't preserve files that they don't deem worth preserving. Tragic I know, but I believe the best I can do for the cause is keep whatever I have, and offer it to others on request.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.42841
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.42842
>>42839
Actually, exactly my problem with torrent trackers lately is that they often refuse to distribute the original ISO at all. Gazelle Games for example, has this strict policy where undub patches, translations, and cracks are STRICTLY meant to be used on the game beforehand. What the FUCK is up with that! Do they really think I have the space on my hard drive to keep 3 different versions of a game when I shouldn't have to, instead of just keeping one, and two really really small files that'll give those guys what they want? It's fucking ridiculous. Probably one of the worst things about it is that emulators nowadays are often able to do this thing called "live patching" where those retards who can't even choose two files in a program patcher don't even need to go THAT far.
Anyway, I'll keep pretending to myself that I DO have the space to keep a billion different versions of every game, but space is running thin. Fuck you.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.42843
i really like modern jaypee
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.42844
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.42899
>>42843
Then copy it, while I continue to act oblivious to your harsh criticism!
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.43400
>every last bit of your disc must be captured or else it'll be absolutely worthless
Holy hyperbole, dude. All data has worth, even if it's not the same worth. I'd be happy to get encoded video too. I'd even prefer it as an end user for convenience and space consumption. Certainly better than nothing regardless, as that's the difference in seeing/playing it or not.
If people would ONLY backup disc images or nothing, there would be almost no PC98 game rips at all. Because by time people wiling to archive them bothered, only installed games existed for most. So they pretty much had to backup the hard drives or nothing. To call that worthless just because it's impure is beyond absurd.
Ideally I'd like disc backups as well, yes. But they're a far secondary goal. Sharing the media in a form people can easily store and use it is more important. Especially since that makes more people willing to hold and seed stuff, which means more lively torrents and whatnot.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.43601
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.43602
I got a Senran Kagura art book am I supposed to scan that my dudes? well I don't have a scanner.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.43603
>>43602
Commercial scanners are no good, you've got to get that into a professional bootlegger.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.43604
>>43400
It is worthless. You don't collect games and other media just for the temporary, superficial purpose of enjoying them.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.43631
>>42797
It would be very nice if a soft translation format emerged for manga. Nobody hardsubs anime any more. It's about time the same transition happened for manga.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.43632
>>43604
To insert a bizarre philosophical claim in your otherwise sensible post is a classic technique for keeping the conversation going!
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.43633
>>43603
Woah. I think you are right. The people need to be able to see this.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.43639
>>42797
Nothing because all must come to an end. The Internet is very superficial, which is a real damn shame.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.43791
Nothing, but a 2ch poster is lurking uboachan right now, apparently, if any of you have input.
https://uboachan.net/fg/res/14021.html
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.43953
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.43954
>>43953
calm down dude I pirated more games okay
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.43955
>>43954
An entire website needs to be pirated and it seems like nobody could really be assed to do any manual work up until this point.
Of course, at this point, saving the entire thing would probably be comparable to holding up the moon.
>Yahoo!ジオシティーズは2019年3月31日をもちましてサービスの提供を終了いたします
30 hours remain.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.43961
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.43969
All I know is that, The japanese make love through a hole in their "tatami" mat. Is that enough?
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.43971
YouTube embed. Click thumbnail to play. Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.43974
>>43955
>>43961
Is GeoCities Japan closed?
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.43976
>>43974
Yeah. Whatever was recovered from it, will probably be posted about here, https://archiveteam.org/index.php?title=GeoCities_Japan eventually, one day.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.43981
I thought the whole namefag just for a single thread thing might be kind of cool, but it ended up just feeling really lame. Apologies to anybody I annoyed doing this.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.43987
http://www.asahi-net.or.jp/~AD8y-hys/index.htm Copied!
I can't ignore the voices of any poor website! Can't you hear them calling?
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.44112
Youtube/Any video hosting site in existence:
Youtube-DL https://github.com/ytdl-org/youtube-dl
Typical command:
youtube-dl -o "/yourfavouriteabsolutepathtoadirectory/%(uploader)s/%(upload_date)s - %(title)s.%(ext)s" --netrc --write-info-json --write-thumbnail --write-annotations --write-description --download-archive "/yourfavouriteabsolutepathtoadirectory/downloads\ list" {CHANNEL/VIDEO URL}
o: Output path. Variables like %(uploader)s and stuff can be read about on their github wiki.
netrc: Login details for various sites. You'll need one for niconicodouga.
write-annotations: Nevermind, rest in peace.
download-archive: A list of video IDs that have already been downloaded to read from and write to, for skipping redundant downloads. You can avoid using it if you prefer, or put it in another directory.
Twitter:
Twint https://github.com/twintproject/twint
Typical command (BE GRATEFUL THIS TOOK FUCKING AGES):
A shell script made of spaghetti code attached to this post, by yours truly. Twint in itself isn't actually intended for this kind of function, so I had to include it in a string of commands. It doesn't have anything helpful like a manual or a "help" option, so here's the rundown:
You give it a twitter username (that's their @), and it'll download all of their tweets to a file, which gets sorted into chronological order. Then, the script downloads all images and videos they've posted. It even continues where it left off (if the username is still the same) (I think this works, but all tests say it doesn't) and everything. One word of warning though, if you're scraping an account for the first time, you have to let the script get all the way to the beginning. Otherwise, it'll pick up from the most recent tweet it's scraped and continue from there, you'll have to get the rest manually.
Defective though it may be, I'm going to upload it anyway. Can't attach this actually, so I'll just paste this: https://pastebin.com/P5j1Ru4C
Mediawiki/Probably any kind of wiki software:
WikiTeam https://github.com/WikiTeam/wikiteam
Typical command:
dumpgenerator.py (< move this somewhere convenient, like /usr/bin/wikiteam-dump) {LINK TO WIKI} --xml --images --path . --resume
Downloads every page of a wiki, and each of those page's respective histories into a single file. This isn't very convenient for readability, but there's tools to restore the files to their normal form. It also downloads all images (not sure about audio and video), but only how they are in their present form.
Unless you're batshit insane, don't even think of trying to dump Wikipedia. They already have their own, regularly updated dumps you can download from.
Websites:
GNU Wget https://www.gnu.org/software/wget/
Typical command:
wget -e robots="off" --mirror {SITE URL}
Please be careful to only download small, static sites through this method. If it's a site that generates pages,, you're going to give yourself and the webmaster a very hard time if you don't get a little creative, especially with robots.txt checking turned off (so many of them just seem to think they know what's better for their site than I do, the GALL of them!). In those situations, the include-directories and exclude-directories option will probably be helpful to you. Avoid the accept or reject options, as while they don't clutter up your filesystem, they still do download the entire file anyway for some reason, before deciding to delete it.
Flash and Javascript are a major pain in the ass, and even though wget should be able to navigate through them for embedded links, it just decides not to. For those, you'll have to go onto the pages yourself, open your web browser's debug tools (generally F12 or something, I don't know), and see if you can get a list of assets while they're loading. And then you're going to have to wave your cursor around, click on the flash files and stuff, and manually download the files as they appear in the log. It's a massive pain in the ass, but I guess it gives you something to do.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.44113
Pixiv galleries/Sadpanda:
Gallery-DL https://github.com/mikf/gallery-dl
Typical command:
gallery-dl {URL}
Configuring this program is very, very annoying, so I'll just let you figure that out on your own. With a proper setup, you can download galleries into pre-zipped, pre-cbz'd archives, named according to the Japanese titles.
https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst
https://github.com/mikf/gallery-dl/blob/master/docs/gallery-dl-example.conf
https://github.com/mikf/gallery-dl/blob/master/docs/gallery-dl.conf
These files are going to be interesting to you, but they'll get much less interesting once you realise you've been spending two hours trying to figure everything out. Hang in there.
Messageboards (like /jp/!):
GNU Wget https://www.gnu.org/software/wget/ / A specialised program if you can find one, it's probably better
Alright these can be tricky, and it's often different on a case by case basis.
A lot of boards with have a /boardname/res/ directory, and if it's publically available, then things are going to be very easy for you. You just wget -x -i those directories, and you'll be graciously given all of the threads on that board, without having to worry about the site generating a web page for every single post, and every single combination of every single post (seriously, Kareha does this).
You'll generally want to wget --page-requisites the index of each board too, just to make sure that you've got the page's CSS, extra images, and stuff.
Here's a specialised shell script I made for 8chan, it only works on a thread-by-thread basis. I think you can find a list of threads at boardname/threads.json or similar API points, but to exploit that I'd have to do something like I just did for the Twitter script, and I really don't feel like that right now. Here's the incompleted version anyway, you just feed it thread URLs, and it works with several.
#!/bin/sh
cd "/yourfavouriteabsolutepathtoadirectory/"
wget --page-requisites --timestamping $@
gallery-dl --ignore-config --option "base-directory=./media.8ch.net/file_store/" --option "filename={tim}{ext}" --option "directory=" $@
Anything else you need, there's a good chance you can find it by running a search for "[x] scraper" in Github.
>>43632
If you don't take your philosophy to its logical extreme, then it's open to attack.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.44120
>>44112
oh shit, I forgot to give one of the @s a dollar sign
that's probably why continuing didn't work
don't use twitscrape's continue feature unless you fix it yourself, otherwise it won't continue
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.44226
>>43604
It's not worthless. And I do. Enjoying them is about 40% of the motivation for collecting them.
Having content for review and sharing with others is the majority of the rest of the motivation.
Having absolutely perfect rips of anything at all is but a small portion of what I care about. I only ever go for that when it's actually feasible and reasonable. Such as with video game console collections small enough to fit in a few dozen GB like most of No-Intro; if the most perfect rip is right in front of me already without exhaustive search; or if it's the only version I can actually find.
Regardless of motivation, having the content accessible in the first place is the absolute goal. Perfect rips are just icing on the cake, so to speak.
Having things in a usable format is inherently desirable, even for archival.
Even if you consider it imperfect, even if you consider it of less worth than perfect data, it's still data, which is miles better than NO data. And it's usable data at that.
But I'd go so far as to say imperfections may actually ADD worth. Particularly with the including of subtitles or removal of anti-piracy messages and advertisements. Or in some cases, like this one rip of Qwaser I have, drastically cleaning up the fucking awful scaling for the bluray release by pre-filtering, so that I don't have to waste electricity doing it myself.
I do not like dealing only in absolutes like all or nothing. Because it's absolutely retarding. Literally so. And also generally impossible.
Worth less? Maybe; arguably at least. Worthless? Fuck no. All discernible data has worth, even if it's relatively worth less than other data.
>>44113
I'd argue that if you ONLY take your philosophy to its logical extreme then it's even more open to attack. Due to having no room for rationality and reason.
Hell, such extremist philosophy deserves to be attacked, and it's begging for it.
Like, I love that you want to archive things, and that you want to archive them as pure as possible. I don't mind that you personally aren't intending to archive anything impure. But to forsake anything impure as absolutely valueless? That gets under my skin. It's like you don't care about the content of the data at all, whatsoever, and just have a fetish for the data itself.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.44228
>>44226
Information is cheap, data is invaluable.
Well I guess I'm not actually serious about that, it's just part of my character here.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.44291
>>44228
Information isn't always cheap. In fact, people often work to its limit availability to try and make expensive. Especially real life secrets, but also simple media.
Ideally, information for simple media would be cheap and readily available. Maybe not immediately, for the sake of profit and production, but eventually at least.
Ideals aren't realistic though.
Data itself may be considered invaluable, in certain contexts. But data without the information it contains is worthless; simple noise.
Even though the data and format do indeed matter, contents matter more.
Only when the contents are guaranteed to exist and be obtainable do the formats come into play. And even then it is often quite a tradeoff, for space, convenience, and sometimes finance if it must be purchased.
I'd still gladly take a crappy XVID encode of a series from the days of old, if the only other option is not having it at all due to being lost to time.
If I can find a better format then great.
The XVID encode would be worth less, but not worthless, as it still contains the contents however degraded they may be.
I do wish you well though. Archive all you can.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.44298
>>44291
Thanks, and sorry if my opinions seem unreasonable. The amount of junk rips compared to proper copies makes it seem like absolutely anything is available on the internet, as long as you're happy with getting an inferior or legitimately incomplete (that is, not by my own insane standards) version. But I guess that's probably not the case, I'd probably have an easier time knowing if I actually downloaded things with the intention of using them.
Unless we start travelling through time and start digitising the space around a master film reel/the first performance of a piece of classical music/the really big fish dad caught but didn't take a photo of, I guess there's no point in trying to archive everything, anyway.
Do I want to archive things to make them more accessible, or just so they can keep existing in some form? If it's the second, there's probably some bullshit metaphysics I could come up with to relieve myself. Otherwise, I guess archiving anything I see would be the best I can do. Maybe I should be a little more reasonable about my goals.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.