[ / / / / / / / / / / / / / ] [ dir / animu / asmr / rel / strek / sw / travis2k / vore / zoo ][Options][ watchlist ]

/tech/ - Technology

You can now write text to your AI-generated image at https://aiproto.com It is currently free to use for Proto members.
Name
Email
Subject
Comment *
File
Select/drop/paste files here
Password (Randomized for file and post deletion; you may also set your own.)
* = required field[▶ Show post options & limits]
Confused? See the FAQ.
Expand all images

[–]

 No.892780>>892786 >>893106 >>893123 >>893155 >>893160 >>893344 >>893345 >>893363 >>893680 >>893986 [Watch Thread][Show All Posts]

https://hackernoon.com/how-it-feels-to-learn-javascript-in-2016-d3a717dd577f

And BSD fags thinks that GNU/linux is bloated.

>Hey, I got this new web project, but to be honest I haven’t coded much web in a few years and I’ve heard the landscape changed a bit. You are the most up-to date web dev around here right?

>-The actual term is Front End engineer, but yeah, I’m the right guy. I do web in 2016. Visualisations, music players, flying drones that play football, you name it. I just came back from JsConf and ReactConf, so I know the latest technologies to create web apps.

>Cool. I need to create a page that displays the latest activity from the users, so I just need to get the data from the REST endpoint and display it in some sort of filterable table, and update it if anything changes in the server. I was thinking maybe using jQuery to fetch and display the data?

<-Oh my god no, no one uses jQuery anymore. You should try learning React, it’s 2016.

>Oh, OK. What’s React?

<-It’s a super cool library made by some guys at Facebook, it really brings control and performance to your application, by allowing you to handle any view changes very easily.

>That sounds neat. Can I use React to display data from the server?

<-Yeah, but first you need to add React and React DOM as a library in your webpage.

>Wait, why two libraries?

<-So one is the actual library and the second one is for manipulating the DOM, which now you can describe in JSX.

>JSX? What is JSX?

<-JSX is just a JavaScript syntax extension that looks pretty much like XML. It’s kind of another way to describe the DOM, think of it as a better HTML.

>What’s wrong with HTML?

<-It’s 2016. No one codes HTML directly anymore.

>Right. Anyway, if I add these two libraries then I can use React?

<-Not quite. You need to add Babel, and then you are able to use React.

>Another library? What’s Babel?

<-Oh, Babel is a transpiler that allows you to target specific versions of JavaScript, while you code in any version of JavaScript. You don’t HAVE to include Babel to use ReactJS, but unless you do, you are stuck with using ES5, and let’s be real, it’s 2016, you should be coding in ES2016+ like the rest of the cool kids do.

>ES5? ES2016+? I’m getting lost over here. What’s ES5 and ES2016+?

<-ES5 stands for ECMAScript 5. It’s the edition that has most people target since it has been implemented by most browsers nowadays.

>ECMAScript?

<-Yes, you know, the scripting standard JavaScript was based on in 1999 after its initial release in 1995, back then when JavaScript was named Livescript and only ran in the Netscape Navigator. That was very messy back then, but thankfully now things are very clear and we have, like, 7 editions of this implementation.

>7 editions. For real. And ES5 and ES2016+ are?

<-The fifth and seventh edition respectively.

>Wait, what happened with the sixth?

<-You mean ES6? Yeah, I mean, each edition is a superset of the previous one, so if you are using ES2016+, you are using all the features of the previous versions.

>Right. And why use ES2016+ over ES6 then?

<-Well, you COULD use ES6, but to use cool features like async and await, you need to use ES2016+. Otherwise you are stuck with ES6 generators with coroutines to block asynchronous calls for proper control flow.

>I have no idea what you just said, and all these names are confusing. Look, I’m just loading a bunch of data from a server, I used to be able to just include jQuery from a CDN and just get the data with AJAX calls, why can’t I just do that?

<-It’s 2016 man, no one uses jQuery anymore, it ends up in a bunch of spaghetti code. Everyone knows that.

>Right. So my alternative is to load three libraries to fetch data and display a HTML table.

<-Well, you include those three libraries but bundle them up with a module manager to load only one file.

The rest in the Link.

 No.892786

>>892780 (OP)

Ironic that the rant/satire is super fucking bloated.


 No.892807>>893077 >>893293

websites were a mistake


 No.893077

>>892807

Not necessarily. Insisting on trying to make everything a website surely was.


 No.893084

>CY+3

>js

>not CGI (fastCGI included)


 No.893104

>ES6

lmao, the cool kids use typescript


 No.893106>>893345 >>893965

>>892780 (OP)

Very true. The absolute mess of web development pushed me back into systems programming and mathematics. I think the idea was to make everything simplified, so that front end programmers could be more like interchangeable cogs, but it has ultimately created a mess of ideas slapped together with justifications changing as fast as fashions.


 No.893121

This article is old by now.


 No.893123>>893132

>>892780 (OP)

>-Ever heard of Python 3?

Very crude and misplaced joke.

Python 2 will be EOL'ed in 2020 and everything will be fine.


 No.893132>>893395 >>893396

>>893123

There's still a shitton of Python 2 code out there. We'll be in a Python 3 / Python 4 mess before 2 can be truly EOL'd.


 No.893142

>caring about web development


 No.893154>>893156 >>893162 >>893163 >>893475 >>893973 >>894114

Web dev sucks because UNIX destroyed everything that was learned in the 70s. On a Lisp machine, web dev wouldn't exist because they would just be normal applications.

Several hours ago, I started a grep over a medium-sized
directory, writing its output into a file. I then
immediately took off to a seminar, leaving it running (from
someone's console in Tech Square), came back frm the
seminar, and logged out the job without looking at the
output---I figured I'd do that later.

Suddenly (just after I sent the above bug report), I had a
horrible thought.

I'll bet the more astute among us have already grasped the
problem. The output file was IN the directory I was
searching, and grep merrily infinite-looped, finding targets
in its OUTPUT file and stuffing evidence of them back into
the output file.

File locking? Not seeing output files in "ordinary"
directory listings until their output is finalized? Version
numbers? Common sense in coding? Gee, these concepts are
only 30 years old (some are older). Any one of the above
would have prevented this problem. It's a real pity that
UNIX still hasn't managed to grasp these concepts.

(I know that the deficiencies of UNIX aren't your folks'
problem per se. You've got to deal with what the iron
vendor dishes out, as do we all. I just figured that, as a
member of the Learning and Common Sense group, it sure
doesn't bode very well that our computational environment
show no evidence of either.)

See you on the UNIX-Haters mailing list...


 No.893155

>>892780 (OP)

This is why if I ever need to use JS, I keep it as minimal as possible. I'm lucky, though; most people don't live in a city where there's a huge number of small businesses that just want a simple splash page and aren't aware of shitty services like squarespace.


 No.893156>>893218 >>893226 >>893475

>>893154

What? Just

 ls > output.txt 


 No.893160

>>892780 (OP)

this is surprisingly spot on

web dev is such a cancerfest


 No.893162>>893191 >>893218

>>893154

These UNIX haters posts you keep sharing are due to people not using programs correctly. There's no deficiency here except the usual PEBKAC. Why would your file get locked? Did you ask it to lock it? How would you write to a locked file? Does he expect the entire file to remain in RAM before becoming visible? What an absolute retard. He's a Windows user now, you know.

>On a Lisp machine, web dev wouldn't exist because they would just be normal applications.

This isn't even wrong, it's just nonsensical. What do you mean by "they would just be normal applications"? What isn't normal about rendering Javascript? How would using a LISP make it "more normal"? You're so fucking annoying, please stop posting.


 No.893163

>>893154

Note, to mute most of this guy's posts add some simple filters

>weenie

>weenies

>weenix

>(condition-case)

>"AT&T spawned this virus"

>Learning and Common Sense group

>UNIX-Haters


 No.893191>>893201 >>893209

>>893162

>These UNIX haters posts you keep sharing are due to people not using programs correctly. There's no deficiency here except the usual PEBKAC.

AT&T didn't use the C compiler correctly when making UNIX and those "tool" programs and they didn't use the assembler correctly when making the C compiler. Definitely PEBKAC.

>Why would your file get locked? Did you ask it to lock it? How would you write to a locked file?

It gets locked because it has exclusive write access. You write to it because you created it and have the write access.

>Does he expect the entire file to remain in RAM before becoming visible? What an absolute retard. He's a Windows user now, you know.

Multics maps all files, including "stack" and "memory", into the address space. He would expect that if he's used to Multics. Why would he expect a 90s OS to be worse than a 60s OS?

>This isn't even wrong, it's just nonsensical. What do you mean by "they would just be normal applications"? What isn't normal about rendering Javascript? How would using a LISP make it "more normal"? You're so fucking annoying, please stop posting.

Lisp machine already has GC, objects, hash tables, and closures, so there's no need for a browser to reinvent wheels because the OS and hardware already support everything JavaScript needs. JavaScript would use the same "normal" types that Lisp and every other language uses.

       Raise your hand if you remember when file systems
had version numbers.

Don't. The paranoiac weenies in charge of Unix
proselytizing will shoot you dead. They don't like
people who know the truth.

Heck, I remember when the filesystem was mapped into the
address space! I even re<BANG!>

Subject: why Unix sucks

Some Andrew weenie, writing of Unix buffer-length bugs, says:
> The big ones are grep(1) and sort(1). Their "silent
> truncation" have introduced the most heinous of subtle bugs
> in shell script database programs. Bugs that don't show up
> until the system has been working perfectly for a long time,
> and when they do show up, their only clue might be that some
> inverted index doesn't have as many matches as were expected.


Unix encourages, by egregious example, the most
irresponsible programming style imaginable. No error
checking. No error messages. No conscience. If a student
here turned in code like that, I'd flunk his ass.

Unix software comes as close to real software as Teenage
Mutant Ninja Turtles comes to the classic Three Musketeers:
a childish, vulgar, totally unsatisfying imitation.


 No.893201

>>893191

>Raise your hand if you remember when file systems had version numbers.

What, you mean like ext?

ext

ext2

ext3

ext4

Or maybe you mean ZFS, which has the latest release for its Linux implementation being 0.77 according to the github.

To be fair, a lot of other filesystems don't really have version numbers today, and are referenced in terms of the kernel version they're associated with.

But ultimately, I don't see the issue with this. Whether they have version numbers or not, who gives a shit? It just seems like your argument here is that it isn't identical to your old lisp machine, therefore it's bad.

>No error checking. No error messages.

>shell script database programs

yeah this right here reveals how outdated your shit is. No error checking or messages? buffer-length bugs?

I ask you, right now, to go ahead and install a GNU/Linux distribution. you can do it in a VM, that's perfectly fine. Just install that, and see if this buffer-length bug exists. see whether error messages exist (hint, they do). Verify whether ANYTHING you are saying has any relevance to this decade.

Also, shell script databases? Ever heard of SQL? A common implementation of that would be PostgreSQL. Maybe you should give that a go sometime. I can't even justify why such an absurd situation would come up, as SQL became a standard long before the publishing of the Unix Haters Handbook.


 No.893209

>>893191

>AT&T didn't use the C compiler correctly when making UNIX and those "tool" programs and they didn't use the assembler correctly when making the C compiler.

Proof.


 No.893218>>893224 >>893475

>>893156

>ls

>>893162

>not using programs correctly

If you are going to defend unix never mention ls or correctness. Fucking hell it's like you're being purposefully retarded just to prove the weenie poster right.


 No.893224>>893240

>>893218

>If you are going to defend unix never mention ls or correctness

Why.


 No.893226

>>893156

>ls does the same as grep


 No.893240>>893241 >>893475

>>893224

ls is objectively incorrect. People consider it's incorrectness to be correct. So it's purposefully wrong.

This is not the command of an OS that cares about correctness.


 No.893241>>893243

>>893240

You haven't said anything but opinion, so your use of 'objectively' is ironic.


 No.893243>>893247 >>893475

>>893241

>hurr durr I'm arguing so now you have to explain basic shit to me

fuck off cancer


 No.893247

>>893243

eat your own shit aids


 No.893293>>893308 >>893309

>>892807

I've never used Gopher, but when I read about the way it works, I can't help but feel that we have chosen the inferior option. As usual.


 No.893308

>>893293

gopher was cool, but i bet it would have been ruined just like HTTP was ruined. Instead of JavaScript it would have been something else to screw it all up.


 No.893309

>>893293

> I can't help but feel that we have chosen the inferior option

M8 there's no we, the reason why http work was chosen is just because it spread much faster into the hands of people who would use it more (aka normalfags).

If gopher would have been used instead it would certainly have turned into the same shit.


 No.893344>>893350

File (hide): 6514949605160ec⋯.jpeg (209.67 KB, 489x500, 489:500, serveimage.jpeg) (h) (u)

>>892780 (OP)

>linked site has a massive static bar on top

>24pt font

>massive white space on both sides of a tiny column of text

>yfw hotwheels did UI better than a retard ranting about the state of webdev


 No.893345

>>892780 (OP)

Well Babel and JSX are used for the same purpose, so WHY do you want to use both?

>>893106

Web Designers are artists that want things look like a Video Essay, so here we are.


 No.893350

>>893344

All site devs like that should be sent to korea


 No.893363>>893413

>>892780 (OP)

Is React really that bad? I have a friend who wants to work on a for-fun project with it with me. This sounds like an absolute waste of time.

Not that I disagree that functional programming is the way to go.


 No.893364

When is this web shit going to collapse on itself?


 No.893395>>893731

>>893132

>We'll be in a Python 3 / Python 4

Python 4 won't be like Python 3.

It will be more like 3.10, if they even decide to bump the major version at all, which is unlikely.


 No.893396>>893398 >>893412

>>893132

>There's still a shitton of Python 2 code out there

and it's not that hard to semi-automatically convert it.

and the converted code will benefit from less hidden encoding bugs and faster performance (because all new optimizations are done for Python 3 now)

plus, many of 2.x code was written with future conversion to 3 in mind, and this code is either already compatible or is trivial to convert.


 No.893398>>893409

>>893396

this

I've converted a fair amount of python code by hand and it's usually just changing print syntax and iterators, would imagine it would be very easy to automate


 No.893409

>>893398

btw real life robust code isn't even using print that much

it's mostly about string/bytes literals and their conversions, and a few features which were removed and were mostly used for code golf, but not for real programming.


 No.893412>>893420 >>893424

>>893396

Isn't Python still like the slowest scripting language?


 No.893413>>893681

>>893363

React is pretty dogshit, but the complexity of modern web development isn't due to React. It's due to the fact that there are more than one framework, and they're all pretty radically different, and people freak the absolute fuck out when they have to learn something even slightly new. Case in point: OP's guy wanted to use jQuery, and threw an autism fit when he had to learn something that was not jQuery. The simple fact of the matter is you shouldn't be in web development if you aren't willing to learn new shit constantly, because the languages, frameworks, and browsers are CONSTANTLY being updated at a pace that makes people used to the slow burn update schedule of languages like Java and C++ do pic related.


 No.893420

>>893412

How do you even measure the speed which you are talking about?

No it isn't, in fact some things are possible to implement really efficiently, a typical example is https://github.com/magicgoose/simple_dr_meter


 No.893424>>893427 >>893436

>>893412

Python isn't too slow as far as interpreted languages go. Also, I recently learned how to make a C FFI library for CPython, which is surprisingly straightforward. I think my main workhorse for future projects will using Python for the 90% of code that is just glue logic, and optimizing the rest with C for performance.


 No.893427

>>893424

It's not even necessary to go full C for extensions, there's also Cython.


 No.893436>>893439 >>893444 >>893445 >>893453

>>893424

It's mighty slow, compared to Perl/awk, mate. The real problem, though, is that the python foundation is more focused on creating a "vibrant and diverse" community than finishing pypy.


 No.893439>>893520

>>893436

>perl

yeah I think I'll pass on that one

>awk

o shit nigga what are you doing

Shame about pypy though


 No.893444

>>893436

PyPy isn't free of trade-offs.

Some kinds of code are faster in CPython. Even some math-heavy code, which uses long arithmetics. It's not obvious if it's even possible to make PyPy to be always at least as fast as CPython.

I guess the best way is to gradually port the best features (which don't also introduce slowdowns somewhere) from PyPy to CPython. In fact this is already happening, it could be faster of course but it's moving.


 No.893445>>893514

>>893436

>It's mighty slow, compared to Perl/awk, mate

Examples, please.


 No.893453>>893467

>>893436

Diversity was their plan all along. It was designed for social studies majors who can't into Perl or C.


 No.893467>>893664

>>893453

>It was designed for social studies majors who can't into Perl or C

It sounds like you think that if one can use Perl and C, then it's a shame to look into other languages. And that if one can use Perl and C, then it means they want to use them. For every fucking task.

Also, [citation needed].


 No.893475

>>893156

>JS

>>893243

>>893240

>>893240

>>893218

>>893154

Can you just define "correct" ? Why is ls's output incorrect?


 No.893514>>893526

>>893445

I don't have any under hand, but I had to find word probabilities for a given text (keeping only words matching or not matching 3 or 4 regexps) at my work.

Mawk did it in 7s, nawk/gawk in 30s and python 3.6 in 50s (for 200 MB of plaintext). Don't know if it was the split or the regexp being slow.


 No.893520>>893532

>>893439

Awk is good, though; at least for text or tabular data processing (what it was made for). You should really learn it.


 No.893526>>893527

>>893514

Gimme the text, perhaps I could make it go faster than this in python 3.6.

6x slowdown seems quite extreme.


 No.893527>>893535

>>893526

… I mean also with the problem statement (which words do you need to find and where to store the result --- dump all words to stdout, or just count them, or what?)


 No.893532>>893535

>>893520

Of course it's good and I have learned it. But Kernighan himself said himself it is meant for one-liners; it fits a very narrow problem space. I can't think of a single python script I have written that I would be able to rewrite in awk without wanting to chop off my dick


 No.893535>>893538

>>893527

From memory, I split by [^A-Za-z']+, then only consider tokens verifying len >= 2, not ALL CAPS, not starting/ending with a dash and without double dash inside. Then I simply count the accepted token occurrence rate (e.g. "word", probabilty=0.2).

The corpus was the free version of OANC.

>>893532

>But Kernighan himself said himself it is meant for one-liners

It's named awk, not k. As someone who used it a lot (because I love its incredible expressiveness), awk isn't limited to one-liners but to one-files. Mainly because it has no import/include capabilities (POSIX awk) and doesn't have local variables (other than functions parameters). Perl tried to "fix" awk by making general purpose, but failed because of the same problems as C++.


 No.893538>>893551

>>893535

http://www.anc.org/data/masc/downloads/data-download/

this?

what exactly should I download from there? there are many variants.

>probabilty=0.2

what do you mean by probability here?

the split is strange, if you split by (everything except latin letters), you can't get dashes or any other symbols in the tokens.

something is not right here.

can you explain exactly the problem you're solving instead of your solution which may or may not be correct?


 No.893551>>893552

>>893538

Yeah, it's [^A-Za-z'-]+. The problem is simply getting good word stats for general English. So, if I have

>hello -foo world world

I get:

>"world",0.6667,2

>"hello",0.3333,1

Here's the awk (again, from memory):


#!/usr/bin/mawk -f

BEGIN {RS = "[^A-Za-z'-]+"}

length($1) >= 2 && $1 !~ /^[A-Z]+$/ && $1 !~ /--/ && $1 !~ /^-|-$/\
{
++words[tolower($1)]
++wordcount
}

END\
{
for (w in words)
printf "\"%s\",%.4f,%d\n", w, words[w] / wordcount, words[w]
}


 No.893552>>893556 >>893557

>>893551

Okay, I think I understand what you need to calculate, now which text exactly do I need to download to test it?

Please give the full URL to the file (and what to use from inside it, if it's an archive). And then I'll try to take a shot.


 No.893556>>893559

>>893552

Are you really bothering m8? After that, there are some html tags to strip and all that shit.

Anyway, it's http://www.anc.org/OANC/OANC-1.0.1-UTF8.zip from http://www.anc.org/data/oanc/download/. It says XML, but it also contains the plaintexts.


 No.893557>>893559

>>893552

By the way, see this interesting blogpost https://brenocon.com/blog/2009/09/dont-mawk-awk-the-fastest-and-most-elegant-big-data-munging-language/

It could be because mawk is a really good awk interpreter (uses a really simple bytecode VM, I think).


 No.893559>>893561 >>893563

>>893557

I am just interested to see how much juice can I squeeze out of CPython 3.6 on a problem that's told to be its weakness.

>>893556

Will get that file in a few minutes.

Do you mean the HTML tags are also present in the plain text?

Or I need to strip them, but it's a completely separate step and won't add to the actual benchmark?

It's not a big deal, I can do this.


 No.893561>>893569 >>893570 >>893575

>>893559

When you get the archive and unzip it, you must just delete everything uninteresting (find -type f ! -iname '*.txt' -delete) then cat everything when running your tests (find OANC -type f -exec cat -- {} + | time ./test.mawk >/dev/null).


 No.893563>>893569 >>893570 >>893575

>>893559

Oh, and I'm wrong, this one doesn't have any tag, you just have to keep the txts


 No.893569

>>893561

>>893563

So:

to get the input, I need to concatenate together all files under "data" with names ending with ".txt". Correct?

Should I also insert a newline when concatenating, or do they all end with a blank line?


 No.893570>>893575 >>893583

>>893561

>>893563

The total size after leaving only *.txt under "data" is

96737202 bytes

(about 100MB)

you were saying about 200 MB, where are the other 100? or it's okay and there were actually just 100MB?


 No.893575>>893578 >>893602 >>893610

>>893570

>>893563

>>893561

FWIW this is what I got and I'm gonna use this file (benis.txt) as input. Stay tuned.

cd data

find . -type f -name "*.txt" | xargs cat > benis.txt

SHA256(benis.txt) = c9963649dc5c92d17ca2cc6655216614d5925f88fa749fb89d289c50316fb778


 No.893578

>>893575

Hmm by the way I am not sure if find uses deterministic directory traversal order and thus will produce exactly the same file on your machine.

Anyway, I will upload the archived file if the need arises. It's only ~20MB when compressed with xz.


 No.893583>>893602

>>893570

Well I aggregated other corpuses, then. Not really important, 100MB is already a lot (for free plaintext).


 No.893602>>893604 >>893606 >>893607 >>893614 >>893623

>>893583

>>893575

On my machine it runs in

19 seconds with CPython 3.6

9.5 seconds with PyPy 5.7.1-beta0

I didn't test awk on my machine but it looks like CPython could be helped by a faster regex implementation.

A very crude profiling shows that the bottleneck is in regex implementation.


import re
import time
from collections import defaultdict
from operator import itemgetter


def benis(input_file, output_file):
lowercase_letters_search = re.compile("[a-z]").search
word_pattern_finditer = re.compile("[A-Za-z'][A-Za-z'\-]*[A-Za-z']").finditer
word_counts = defaultdict(lambda: 0)
total_word_count = 0
for line in input_file:
for match in word_pattern_finditer(line):
word = match.group(0)
if '--' not in word and lowercase_letters_search(word):
word_counts[word] += 1
total_word_count += 1
for k, v in sorted(word_counts.items(), key=itemgetter(1), reverse=True):
print(f'"{k}",{v/total_word_count},{v}', file=output_file)


with open('results.txt', 'w') as output_file:
with open('benis.txt', 'r') as input_file:
time_start = time.time()
benis(input_file, output_file)
time_finish = time.time()
print(time_finish - time_start)

First 20 lines from the output:


"the",0.05420379339420173,727830
"of",0.032709427064645226,439211
"and",0.030057438162336275,403601
"to",0.02491305632002245,334524
"in",0.019135056910147698,256939
"that",0.014633080467196885,196488
"is",0.010782222782260317,144780
"for",0.00939880953178879,126204
"it",0.0076361045239609175,102535
"you",0.007601846874562936,102075
"with",0.007247503622746424,97317
"The",0.006907012376990835,92745
"was",0.006801334975913149,91326
"on",0.006203836886521834,83303
"as",0.00590110352825489,79238
"are",0.005339129131826265,71692
"have",0.005152648362059862,69188
"by",0.0050796646742119885,68208
"be",0.005064770044038953,68008
"uh",0.004980764329863033,66880

Full results: https://files.catbox.moe/u8otXn.xz


 No.893604

>>893602

Some explanation:

a word is not upper case if it contains at least one lowercase letter

the check for length > 2 and first and last characters not being a minus is embedded right into the main regex


 No.893606

>>893602

Also, if PyPy support is not needed, one can replace

word = match.group(0)

with

word = match[0]

and shave approximately 1 more second

because PyPy only partially supports Python 3.6 and matches can't be indexed in PyPy.


 No.893607>>893610

>>893602

what data set are you using


 No.893610>>893612


 No.893612

>>893610

ah thanks


 No.893614>>893624

>>893602

A bit faster (~18 seconds) in CPython 3.6 if I quickly skip lines shorter than 2 characters. Because input contains lots of empty lines apparently.


import re
import sys
import time
from collections import defaultdict
from operator import itemgetter


def benis(input_file, output_file):
lowercase_letters_search = re.compile("[a-z]").search
word_pattern_finditer = re.compile("[A-Za-z'][A-Za-z'\-]*[A-Za-z']").finditer
word_counts = defaultdict(lambda: 0)
total_word_count = 0
for line in input_file:
if len(line) >= 2:
for match in word_pattern_finditer(line):
word = match[0]
# word = match.group(0) # use this on PyPy
if '--' not in word and lowercase_letters_search(word):
word_counts[word] += 1
total_word_count += 1
for k, v in sorted(word_counts.items(), key=itemgetter(1), reverse=True):
print(f'"{k}",{v/total_word_count},{v}', file=output_file)


with open('results.txt', 'w') as output_file:
with open('benis.txt', 'r') as input_file:
time_start = time.time()
benis(input_file, output_file)
time_finish = time.time()
print(time_finish - time_start, file=sys.stderr)

Could be even better if I could read all text without splitting by lines but also lazily. Right now there's no way to do this using stdlib only. There is mmap, but if I don't cheat by ignoring encoding, I would need to also decode UTF-8 on the fly. Ideally, 're' module needs functions which accept file descriptor in place of a string, and decode it on the fly.

Doing this by myself would be too much code and it'd be unfair to the simplicity of awk code.

Loading all 100MB into memory at once works also a bit faster, but it is not scalable and thus also not fair. I presume awk also doesn't load all of the text into RAM first?


 No.893623>>893634

>>893602

Honestly, what I did in Python (to keep it simple and still not that slow) is to split at RS and then filter the tokens to keep only the wanted words then to feed the filter output to a Counter. I purposely didn't coalesce the regexp to compare with awk.

Plus, awk may be the fastest for this kind of thing but it's also really short/intelligible .


 No.893624>>893633

>>893614

>not doing benchmark on tmpfs

What are you doing, nigger?


 No.893633

>>893624

Who said so? There are no absolute paths anywhere. They could as well be on tmpfs.

But I also have a SSD anyways, and OS caches it anyway since it's only 100MB and I have 16GB of RAM, so it likely won't even matter.


 No.893634>>893697

>>893623

I don't think that that code is less simple. But maybe it's simply because I don't know awk.

Not sure if I should learn it, it seems like it's a one trick pony.


 No.893664

>>893467

> citation needed

Sure, okay. https://en.wikipedia.org/wiki/Read_a_history_book,_nigga

It's called de-skilling, and it's to flood the job market. Same old story, only some details are different.

https://archive.org/details/HUMANRESOURCESSocialEngineeringInThe20thCentury

The main difference now is they hate straight white males. Two birds with one stone, motherfucker!


 No.893680>>893735

>>892780 (OP)

>Web Devs are destroying everything.

ftfy

Good article. Makes me glad I do 95% of my browsing with js disabled.


 No.893681>>894025

>>893413

Sounds like you're trying too hard to defend Javascript.

He was freaked out because he had to add many unnecessary libraries, not because he had to learn something different from jQuery.


 No.893697>>893700 >>893800

>>893634

It was almost a one-liner for me. Something like `wcount = Counter(filter(validate, file.read().split(ERE)))` then get the wordcount with `sum(wcount.values())`. PS: I learned Python one week ago.


 No.893700

>>893697

With validate being just a wrapper around the whole regexp story. Honestly, I'd really love Python if:

- it was fucking faster

- it wasn't that much against symbols (its ternary is retarded compared to C's one)

- its stdlib was a bit more consistent (and the doc better; it's fucking horrible compare to the C manpages I'm used to)


 No.893731>>893821 >>893823

>>893395

They will. Python 3.7 has the first future feature, lazy type annotations, which will become the default in Python 4.


 No.893735

>>893680

There's a reason why I call that language "Pajeetscript".


 No.893800

>>893697

file.read() won't scale, if your text doesn't fit in RAM.

otherwise, yes it seems awk may be the winner here.


 No.893821>>893823

>>893731

Still it won't be backwards incompatible like 3 was to 2.


 No.893823

>>893821

>>893731

> Static type checkers will see no difference in behavior, whereas tools using annotations at runtime will have to perform postponed evaluation.

Oh well, it could break some code which makes heavy use of reflection, but it's not even remotely as dramatic as were 3/2 changes.


 No.893965>>894143 >>894182 >>894205

>>893106

GUI artists are always cancer. There is no exception to this rule because no one else is stupid enough to want to do it.

>Hired to make a GUI

>Do job

>Get paid

>Job done, find new job

OR

>Make GUI

>Get paid

>Convince them their GUI is out of date in 6 months

>Get paid to make new GUI

>Repeat endlessly so you always have work.

Now you understand why shit just gets worse. If it was good it would be in the first version. But if it's a novelty sidegrade it wouldn't be.


 No.893973>>894015

File (hide): f61f7d093c21549⋯.png (356.47 KB, 1280x720, 16:9, smug019.png) (h) (u)

>>893154

So why aren't you programming a new LispOS then? Is it because you're too incompetent to fix things you don't like instead of bitching about them?


 No.893986

>>892780 (OP)

I am so glad i am in embedded, at least we don't have remake of existing technologies with a new wrapper and name slapped on them.

Technologies do not change as fast here, i mean, a register is a fucking register, you can't do much with that, maybe make a couple of compatibility layers on top of that but that's about it


 No.894015>>894018

>>893973

>she posts this in a thread about how the internet sucks


 No.894018>>894138

>>894015

>she

If I want to write a webpage that doesn't suck, I can. You can still write html and css yourself, and if you want them the older, less-shitty js libraries are mostly still there. It's not difficult at all to do your own webdev by your own preferred standards.


 No.894025

>>893681

Basically the same.


 No.894114>>894143

>>893154

Basically this. The web went wrong the moment people started saying you can use it for anything more than browsing articles. I mean even then it's completely wrong due to shitty standards (protocols, languages, formats).


 No.894138

>>894018

lol I just got a flashforward of some insufferable retards making minimal webpages.


 No.894143>>894200

File (hide): 0e8371330be24a8⋯.png (88.16 KB, 834x642, 139:107, WorldWideWeb.1.png) (h) (u)

>>893965

That sounds a lot more like the typical programmer or sysadmin MO than a human factors engineer or tech writer to me.

>>894114

I would put the downfall much earlier, to when the viewer and WYSIWYG editor functions separated in 3rd-party "browsers", turning the web from the hypertext version of normal documents it was originally conceived as, into a read-only antiquarian markup format hacked together by hand in text editors by subhumans.


 No.894182

>>893965

Actually what happens is

>Hire someone to make a GUI for a product with XYZ features

>They do so

>Six months later company wants to add ABC features

>The previous UI was poorly thought up and doesn't extend well

>Have to redo work to support new features and them them consistent

And is no different than picking up a code base that some retard half assed before you were forced to work on it.


 No.894200

>>894143

>read-only antiquarian markup format

At this point, it's write only


 No.894205

>>893965

Another cause for added complexity is that smart people look for abstractions, and extract patterns in the way they develop things. This does have the effect of making them more productive, but the very nasty side-effect of lowering the barrier of entry to many who should not be programming in the first place. For example, if we look at old jQuery, I'm sure the creator was reasonably talented. Little did he realize, that what probably made him productive was a pure form of cancer. Fast forward a decade, and his cancer looks rather innocuous to the mess it has metastasized into these days.




[Return][Go to top][Catalog][Screencap][Nerve Center][Cancer][Update] ( Scroll to new posts) ( Auto) 5
102 replies | 4 images | Page ???
[Post a Reply]
[ / / / / / / / / / / / / / ] [ dir / animu / asmr / rel / strek / sw / travis2k / vore / zoo ][ watchlist ]