[ / / / / / / / / / / / / / ] [ dir / random / 93 / biohzrd / hkacade / hkpnd / tct / utd / uy / yebalnia ]

/prog/ - Programming

Programming

Name
Email
Subject
REC
STOP
Comment *
File
Password (Randomized for file and post deletion; you may also set your own.)
Archive
* = required field[▶Show post options & limits]
Confused? See the FAQ.
Options

Allowed file types:jpg, jpeg, gif, png, webp,webm, mp4, mov
Max filesize is16 MB.
Max image dimensions are15000 x15000.
You may upload5 per post.


File: 1453177111172.png (94.86 KB,337x450,337:450,403.png)

c6be61 No.3852

I wrote a python script to download images from tinyboard/vichan imageboards.

It works on every imageboard I try except 8ch, which gives me a 403 forbidden error. I tried changing my user agent within the script (perhaps unsuccessfully), but still 403. What gives?


#!/usr/bin/env python3

import argparse, bs4, os, urllib.request, urllib.parse

parser = argparse.ArgumentParser()
parser.add_argument("url", help="Link to thread")
parser.add_argument("-d", help="Directory to download to")
args = parser.parse_args()

if args.d:
if not os.path.exists(args.d):
os.makedirs(args.d)
os.chdir(args.d)

soup = bs4.BeautifulSoup(urllib.request.urlopen(args.url))

domain = urllib.parse.urlparse(args.url).netloc
http = urllib.parse.urlparse(args.url).scheme + "://"

for link in soup.find_all("p", class_="fileinfo"):
image = http + domain + link.next_sibling.get("href")
filename = image.rsplit("/", 1)[1]
if not os.path.exists(filename):
urllib.request.urlretrieve(image, filename

____________________________
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

c6be61 No.3853

Note that I accidentally removed a parenthesis to close the very last line.

- urllib.request.urlretrieve(image, filename

+ urllib.request.urlretrieve(image, filename)

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

c6be61 No.3882

>>3852

If I had to guess,

>image = http + domain + link.next_sibling.get("href")

href of link is is https://media.8ch.net/prog/src/1453177111172.png for 8chan. Look at in in a page inspector.

I think it's because hotwheels has some weird hacky shit with the servers due to bui and/or site growth.

You might have figured this out by yourself already, since this was a while ago.

I might try writing my own version in javascript or bash for 8chan specifically. I'll prob post it here if I do

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

c6be61 No.3883

>>3852

If I had to guess,

>image = http + domain + link.next_sibling.get("href")

href of link is is https://media.8ch.net/prog/src/1453177111172.png for 8chan. Look at in in a page inspector.

I think it's because hotwheels has some weird hacky shit with the servers due to bui and/or site growth.

You might have figured this out by yourself already, since this was a while ago.

I might try writing my own version in javascript or bash for 8chan specifically. I'll prob post it here if I do

fak you flood detection

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.



[Return][Go to top][Catalog][Nerve Center][Random][Post a Reply]
Delete Post [ ]
[]
[ / / / / / / / / / / / / / ] [ dir / random / 93 / biohzrd / hkacade / hkpnd / tct / utd / uy / yebalnia ]