▶ No.1057341>>1057484
I'm seeing a lot of bot posts spammed around here. Be careful anons
▶ No.1057388>>1059015
▶ No.1057484
>>1057341
That could be my next project
▶ No.1057485>>1057486 >>1057501
>>1057235 (OP)
>Are you really a programmer? Show us your own 8ch downloader then.
>Here's mine.
>C:\
I'd be more impressed if you found a toilet to shit in pajeet.
You can read about my downloader if you have a decent OS installed you don't with man wget
▶ No.1057486>>1057494 >>1057776 >>1057817 >>1058015 >>1068493
>>1057485
UI programming is basic sh*t, but I do agree that OP is a faggot for not doing cross-platform software. (and don't give me this "Linux or GTFO" shit)
▶ No.1057491>>1057498 >>1058015 >>1059997 >>1068077
Mods REALLY need to start banning racism on here.
▶ No.1057494
>>1057486
>(and don't give me this "Linux or GTFO" shit
command line or GTFO ;^)
▶ No.1057498
>>1057491
>Mods REALLY need to start banning racism on here.
where does this nigger think his bot is posting?
▶ No.1057501
>>1057485
Why would you want to use a terrible autist os when you could use something that actually works? :^)
▶ No.1057776>>1057835
>>1057486
>not doing cross-platform
>java
maybe you should ask in what language was made before posting.I suggest you read more about look and feel.
▶ No.1057790>>1057794
>>1057785
You work at Mozilla on Firefox? Wow
▶ No.1057791>>1057835 >>1057836 >>1064592
curl https://8ch.net/tech/res/1057235.html | grep -o '<a href=['"'"'"][^"'"'"']*['"'"'"]' | sed -e 's/^<a href=["'"'"']//' -e 's/["'"'"']$//' | grep file_store | xargs -I{} wget {}
▶ No.1057794>>1057811 >>1058071
>>1057790
You work at Microsoft on Windows? Wow
▶ No.1057811>>1059015
>>1057794
>projecting
How embarrassing. I shared no evidence to suggest I use Windows, rookie.
▶ No.1057814
>>1057235 (OP)
like theres anything worth downloading here. just click the save button in the browser if that ever happens or use wget but theres never so much good stuff that its worth using that.
▶ No.1057817>>1057835
>>1057486
whats there even to program on windows.. they have simple gui editors for that and even someone that has never programmed before can use them.
▶ No.1057829>>1057835 >>1057844 >>1057866 >>1057868
>only a downloader
Get on my level.
Sent from my Emacs
▶ No.1057831>>1057835 >>1057837 >>1058026 >>1064592
wget -P thread_media -nd -nc -r -l 1 -H -D i.4cdn.org -A png,gif,jpg,jpeg,webm --reject-regex='s.jpg' [Thread URL]
▶ No.1057835>>1058083
>>1057776
> Not knowing what UI libraries are
The look says nothing as it is Win10. Could be Java or C# or whatever the fuck OP wants
>>1057533
>>>/hydrus/ is even better
>>1057791
>>1057831
Not everyone is sicko mode
>>1057817
Then it will not be cross-platform.
>>1057829
Source code or GTFO
▶ No.1057836>>1058026 >>1064592
>>1057791
> curl -> grep -> sed -> grep ->wget
#read the URLs to get from the URL listed and treat it as HTML input to be parsed:
wget -F -i https://8ch.net/tech/res/1057235.html
add accept/reject lists and other options as needed
▶ No.1057837
>>1057831
thanks anon I was hoping someone would post that line too.
▶ No.1057838>>1057839 >>1058155
▶ No.1057839>>1057840
>>1057838
>https://notabug.org/darmor/meme_magic_stats/src/master/src/main.py
>for i, thread in enumerate(downloadBoard("https://8ch.net"+currentBoard+"catalog.html",currentBoard)):
> # get json
> json = downloadThread(thread)
Pages are served JSON format so use
https://8ch.net/tech/res/1057235.json
instead of
https://8ch.net/tech/res/1057235.html
▶ No.1057840
▶ No.1057844>>1059015
>>1057829
>22 workspaces
This fucking chad.
▶ No.1057848>>1058026 >>1064592
>>1057235 (OP)
wget -bEHkprl 'inf' -So 'wget.log' \
--accept-regex="^https?://(media\.|softserve\.)?8ch\.net/((${B}/(res/${T}\.html|threads\.json)|main\.js)|(${B}/(thumb|src)|file_store(/thumb)?|js|static|stylesheets)/.*)$" \
--warc-cdx --warc-file="8ch.${B}.${T}" -nH -P "8ch.${B}.${T}" -- "https://8ch.net/${B}/res/${T}.html"
▶ No.1057866>>1059015
>>1057829
>22
>default status bar
madman
▶ No.1057868>>1057874
>>1057829
>1280x1024
How comfortable is that resolution nowadays? Is there a reason to keep to it rather than going with e.g. 1920x1200?
▶ No.1057874>>1057881 >>1057883
>>1057868
I mostly use it out of habit to be honest, but it's enough for two windows with 80 chars each. Somewhat annoying for movies obviously. 4:3 master race
▶ No.1057881>>1057882 >>1057884
>>1057874
>4:3
But it's 5:4
▶ No.1057882>>1057884
>>1057881
Yeah, but I'd prefer a 4:3 monitor.
▶ No.1057883
>>1057874
>it's enough for two windows with 80 chars each
Close but no cigar - it would work only without any vertical borders/scrollbars etc. whatsoever (unless the font has char width less than 8 pixels).
▶ No.1057884
>>1057881
>>1057882
Why did VESA approve that weird 5:4 reso of 1280x1024 (rather than 1280x960 or 1366x1024) anyway?
▶ No.1057948>>1057954 >>1057985 >>1058040 >>1058157
>>1057235 (OP)
it's real simple one i made. just calls a method with an xpath string passed in. no ui. i just run it from the ide whenever i want to hork some shite
▶ No.1057954>>1057959
▶ No.1057958
>>1057955
any sed thing looks disgusting every time
▶ No.1057959
>>1057954
>>ide
>hmmm
????????????????
▶ No.1057985
>>1057948
post your code faggot
▶ No.1057987
>>1057955
public static void DownloadAllImages(string url, string xpath)
{
using (var client = new WebClient())
{
var html = client.DownloadString(url);
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
var links = doc.DocumentNode.SelectNodes(xpath)
.Select(x => x.GetAttributeValue("href", null));
const string imageDir = "dlimages";
IoHelpers.CreateDirectory(imageDir);
foreach (var link in links)
{
var filename = link.Split(new[] { '/' }).Last();
Console.WriteLine(filename);
client.DownloadFile(link, Path.Combine(imageDir, filename));
}
}
}
▶ No.1058015>>1058896
>>1057491
>>1057486
Linux or GTFO
>>1057491
Kike, pls. Next you'll demand blacklisting CIS males next I suppose.
enum class Type { Kike, Dyke, Gayaf, Nigger, Pajeet, Mudshit, XYZ, CIS_male };
bool blacklist_check(const Type applicant) {
return (applicant == Type::CIS_male);
}
int main() {
// Perform initial blacklist check
auto blacklisted = blacklist_check(Type::CIS_male); // blacklisted
blacklisted = blacklist_check(Type::Kike); // OK so far
blacklisted = blacklist_check(Type::Pajeet); // OK so far
}
▶ No.1058026
>>1057955
>I've not read the wget manual
>I don't know about the -i -F -A -R -nc options
RTFM for wget
wget even has regex options, plus a config file option if the command line gets too long
See the following for ideas:
>>1057836
>>1057831
>>1057848
>>1057831
▶ No.1058040>>1058043 >>1058157
>>1057948
>AaronLawson
D-did you just d-dox yourself, pajeet?
▶ No.1058043
>>1058040
stop that retarded meme that is putting the - symbol where it should not be
▶ No.1058059
I have an oneliner I frequently use.
▶ No.1058063>>1058076 >>1058153 >>1064592
>>1058055
...nice design doc?
Here is a working curl post script
curl -A "Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_3_3 like Mac OS X; en-us) AppleWebKit/533.17.9 (\
KHTML, like Gecko) Version/5.0.2 Mobile/8J2 Safari/6533.18.5" --referer "https://8ch.net" -d "thr\
ead=1031726&board=tech&email=''&body=wtfareyouevensayingtomedude&password&user_flag=&json_respons\
e=1&post='New Reply'" -X POST https://sys.8ch.net/post.php
▶ No.1058071
>>1057794
>"Hello? This is Microsoft. I'm calling about the virus you have on your machine anon."
▶ No.1058076
>>1058063
at least fake a recent version. no one uses those old phones anymore.
▶ No.1058083>>1058953
>>1057835
>>1057835
>the looks says nothing
>hurr durr i am implying that a program it's not cross-platform because it was executed on a windows machine
▶ No.1058088>>1058100 >>1059015 >>1059426
ok this is actually interesting.
so i could make it works (i just used braindead wget). how do i make front end of this?. i don't really want to use GUI, but i want to make it with TUI, all what i knew for other scripting language is just python,though.
▶ No.1058100>>1059015
▶ No.1058153>>1058155 >>1058160
>>1058063
It's not a design doc, its a usage statement. No I won't post the source, and no your curl script doesn't snipe gets lol
▶ No.1058155>>1058158 >>1059015
>>1058153
>No I won't post the source
common sign of a code baby
someone already posted a 8chan scraper here >>1057838
▶ No.1058157>>1068493
▶ No.1058158>>1058162
>>1058155
I've shared plenty of code snippets on this board, including a nice userscript for the catalog page. I can't share the gets sniper for obvious reasons, think about it.
Also,
>scraper
>reinventing the wheel this badly
Simple browser extension does this trick, fren
▶ No.1058160
>>1058055
>>1058153
>No I won't post the source,
Not even if someone makes a FOI request for the source code, since it was made with public funds glownigger?
▶ No.1058162>>1058164 >>1059015
>>1058158
let me guess your "sniper" algorithm
>get most recently bumped thread
>get last post on bumped thread
>if post number = x post reply
woooow thats crazy fam, how could you ever make something like that!!?!?!?!
let me guess your distinguishing feature is that you do concurrent requests?
▶ No.1058164>>1058167
>>1058162
>trying to bait me into posting the algo
2/10
▶ No.1058167
>>1058164
How can I make you post something which does not exist?
▶ No.1058896
>>1058015
Why are you using C++?
▶ No.1058953
>>1058083
> it's not cross-platform hurr
Unless the OP shows it is cross platform, NO FAG
▶ No.1058990
▶ No.1059015>>1059998
>>1058162
my dubs bot is best dubs bot
DUBS
>>1057955 55's (3 total; 30.00%)
>>1058055
>>1058155
>>1057388 88's (2 total; 20.00%)
>>1058088
>>1058100 00's (1 total; 10.00%)
>>1057811 11's (1 total; 10.00%)
>>1057533 33's (1 total; 10.00%)
>>1057844 44's (1 total; 10.00%)
>>1057866 66's (1 total; 10.00%)
▶ No.1059019
Here's a dubs check with multiple dub types. This thing really shines when there are dubs, trips, quads, and quints all in the same bread.
DUBS
>>1036100 00's (6 total; 16.67%)
>>1036300
>>1036400
>>1036800
>>1037500
>>1040400
>>1040166 66's (6 total; 16.67%)
>>1040566
>>1050166
>>1050966
>>1051266
>>1058466
>>1035788 88's (6 total; 16.67%)
>>1036088
>>1049588
>>1050988
>>1051288
>>1051388
>>1036211 11's (4 total; 11.11%)
>>1036911
>>1037611
>>1049511
>>1035877 77's (4 total; 11.11%)
>>1036377
>>1036677
>>1058477
>>1036322 22's (3 total; 8.33%)
>>1037322
>>1049522
>>1035744 44's (3 total; 8.33%)
>>1035844
>>1051844
>>1036933 33's (2 total; 5.56%)
>>1051233
>>1035755 55's (2 total; 5.56%)
>>1050155
TRIPS
>>1037555 555's (2 total; 50.00%)
>>1051555
>>1036111 111's (1 total; 25.00%)
>>1035777 777's (1 total; 25.00%)
▶ No.1059058>>1059382
I don't believe that there's really that much worth downloading here that I should program my own downloader for 8ch.
▶ No.1059299
This is mine.
#!/bin/bash
list-boards() {
i=1
while :; do
curl -s https://8ch.net/board-search.php?page="$i" \
| jq -rM '.boards [].uri' \
|| break
(( i += 100 ))
done
}
list-threads() {
board=$1
curl -s https://8ch.net/"$board"/catalog.html \
| sed 's/>/>\n/g'\
| awk -v b="$board" -v FS=\" '/goto_thread_catalog/ {printf "%s,%s\n", b, $2}'
}
dl-thread() {
board=$1
thread=$2
mkdir -p board"/${thread%/*}" board"/${thread%.html}"
curl -s https://8ch.net"$thread" \
| tee board/"$thread" \
| sed 's/>/>\n/g' \
| awk -v b="$board" -v t="$thread" -v FS=\" \
'/class="fileinfo"/ {getline; printf "%s,%s,%s\n", b, t, $(NF-1)}'
}
dl-image() {
board=$1
thread=${2%.html}
image=$3
if cd -- board/"$thread"; then
wget -q "$image"
fi
}
export -f list-threads dl-thread dl-image
list-boards \
| parallel --joblog .joblog.boards -r -j 4 --lb -- list-threads \
| parallel --joblog .joblog.threads -r -j 4 -C , --lb -- dl-thread {1} {2} \
| parallel --joblog .joblog.images -r -j 4 -C , -- dl-image {1} {2} {3}
▶ No.1059382
>>1059058
Unless you collect frog photos.
▶ No.1059490
function downthread () { # Download all files in a thread
link="$1"
site="$(cut -d '/' -f '3' <<< "$link")"
case "$site" in
boards.4channel.org) trunk="i.4cdn.org" ;;
boards.4chan.org) trunk="i.4cdn.org" ;;
8ch.net) trunk="media.8ch.net" ;;
esac
wget -erobots=0 -nc -nd -nv -Rhtml,s.jpg -HErD $trunk $link
}
▶ No.1059672>>1059685 >>1059700 >>1064592
Guy that's learning Java by going through the java tutorial on Oracle's site here. It's going fine, but I haven't even finished the chapters on the basics yet, so this is out of my league. In the future, I want to code a bot that uses my browser to go to sites and download all the image elements it can find, which I guess is what OP's creature is doing, and other simple tasks on the web. What do I need to learn for this? I'm guessing that Java will be of some use. Do I need to learn a scripting language?
▶ No.1059685>>1059689 >>1064413
>>1059672
Java is a great language for web automation. I mean, you could use minimalist scripting methods like people are showing off here for simple tasks like grabbing images, but Java (and also C#, Python, NodeJS... but its easiest in Java) can actually launch a real browser instance and perform interactive commands using a user-centric API, using the Selenium library.
▶ No.1059689
>>1059685
Cool. I'll just stick to Java then.
▶ No.1059700>>1059701 >>1059792
>>1059672
i think this book is a better tutorial than oracle's website
▶ No.1059701
>>1059700
i forgot to mention its long as fuck but i had to get it for csci class, but hey learning a language isnt easy
▶ No.1059792
>>1059700
Thanks, dude. I checked out a few of the chapters, and it seems concise compared to the stuff on the site.
▶ No.1059806
▶ No.1059997>>1064592
>>1057491
>niggers cannot comprehend command line
GET THE FUCK OUT
▶ No.1059998
>>1059015
but, how is it working with the constant captcha?
use it in /pol/ and quote Hitler constantly
▶ No.1060078
>>1057235 (OP)
>Show us your own 8ch downloader then.
▶ No.1060083>>1060250
>>1057235 (OP)
>downloader
>c:\
Is that just 'wget -r -A '*.jpeg' https://8ch.net/tech/res/1057235.html', but worse in every aspect?
▶ No.1060250>>1060255
>>1060083
Nope anon, that's not just wget *.jpeg, it downloades all kind of files not just photos. (You) can also open files by double clicking the item from the list, you can move the files from the folder by dragging them from the list. I am working on new features right now in order to expand my programming knowledge, you can stay stick to wget and never learn anything new anon.
▶ No.1060255
>>1060250
but the unixfags do not want one program that does everything
▶ No.1060258>>1060259
Why reinvent the wheel? Unless you're making a game
▶ No.1060259>>1060281 >>1060286
>>1060258
because i'd rather just download videos and not have an entire image suite that i don't need
▶ No.1060263>>1060287 >>1064592
>every one liner ITT is directed at cuckchan's cdn
way to off yourselfs.
Heres is one that isn't cucked:
wget -nd -nc -r -l 1 -H -D media.8ch.net -R html,txt,tmp -p -A.webm,.mp4,.jpg,.jpeg,.png,.gif $1
#$1 where the url goes
▶ No.1060281
▶ No.1060286
>>1060259
why would anyone want to download the cutted up low quality videos that get posted on imageboards? i hate them and i hate the people that post them without source.
▶ No.1060287
>>1060263
>when you don't know how to use the tool so you just copy&paste a line that someone else made
thats the reason why the posts are like that.
▶ No.1064413>>1064433
▶ No.1064433>>1064436
>>1064413
>muh dynamic typing
I honestly just feel sorry for you. Hopefully you can learn a real language some day.
▶ No.1064436>>1064438 >>1064440
>>1064433
>real language
>Java
Java is an interpreted language just like python or jewscript.
I honestly just feel sorry for you. Hopefully you can learn a real language (C/C++) some day.
▶ No.1064438
>>1064436
>whatabouting this hard and still failing to prove a valid point
Java and nodejs are both compiled just-in-time, as is Python if you use the pypy implementation. However only Java provides static typing, which allows for far more powerful IDEs and easy API discovery.
>C/C++
Why would you do web automation in a systems programming language? Or are you just LARPing as hard as you can and throwing every buzzword you've learned against the wall to see what sticks?
▶ No.1064440
>>1064436
>only compiled languages are real languages
Aw shit, here we go again.
▶ No.1064477>>1064480
>parsing html with regular expressions
▶ No.1064480>>1064496
>>1064477
I wish people would stop parroting this epic Stack Overflow meme. You are not parsing HTML in general but the very specific HTML that 8chan outputs, which you can actually do with regular expressions even if it's pretty ugly. Slapping an HTML parser on that doesn't make your code any more robust because you are already married to the specific structure.
▶ No.1064496>>1064504
>>1064480
>Slapping an HTML parser on that doesn't make your code any more robust because you are already married to the specific structure.
Lol are you kidding, that's exactly what it does. It is more robust to change; you're not married to a specific structure at all. Eg, the addition of elements won't affect a query by attribute.
▶ No.1064504>>1064550
>>1064496
>the addition of elements won't affect a query by attribute
Badly chosen example because that doesn't affect a regular expression for attributes either. Point being, the structure does not change very often and an HTML parser can get fucked by internal changes (e.g. a class name change) just as easily. If you're going to do something involved with the HTML, sure, do it properly. But to replace markup tags and extract URLs from an imageboard, this suffices.
▶ No.1064550
>>1064504
>Badly chosen example because that doesn't affect a regular expression for attributes either.
Depends on the regexp.
>HTML parser can get fucked by internal changes (e.g. a class name change) just as easily.
Depends on the query.
A regexp is inherently more brittle. The regexp in this thread don't even parse HTML really, they're just looking for CDN URLs, and that will passively fail if more CDNs are added later. I'd prefer to grab them semantically and not rely on URLs, but hey that's just me.
>But to replace markup tags and extract URLs from an imageboard, this suffices.
Yes you're probably right, especially since the site is not under active development. In general, web scrapers go out of date quicker than any other software you can think of and rely heavily on functional tests if you need any sort of reliability, regardless of your approach.
▶ No.1064592
>>1057955
Liking this pipe action.
>>1057848
>>1057836
>>1057831
>>1057791
>>1058063
>>1060263
These are all great.
>>1059672
Try this tutorial series for Java. https://www.youtube.com/watch?v=Hl-zzrqQoSE&list=PLFE2CE09D83EE3E28
Watching videos helps when you are just starting out. Whatever the lesson your are doing you can search video sharing sites for a tutorial or even sometimes a college lecture on the subject.
>>1059997
KEK
wget ‐‐recursive ‐‐no-clobber ‐‐no-parent https://8ch.net
wget ‐‐mirror ‐‐domains=8ch.net https://8ch.net
I should take all of these commands write a recusive random get flood script. Could be cool.
▶ No.1068077
>>1057491
Found the nigger
▶ No.1068493
Found pic related from 2015 on my computer, wondering if it still works but no Ruby so you tell me.
>>1058157
>Me and Melinda Goofing off, The Chest Pump
Based.
>>1057486
Implying he didn't use a cross-platform toolkit like Tk with whatever his favourite language is. You're a nigger and you don't know shit about software engineering, stop trying to make it like you do.