[ / / / / / / / / / / / / / ] [ dir / arepa / cyoa / kc / leftpol / soyboys / strek / vg / zenpol ][Options][ watchlist ]

/tech/ - Technology

You can now write text to your AI-generated image at https://aiproto.com It is currently free to use for Proto members.
Name
Email
Subject
Comment *
File
Select/drop/paste files here
Password (Randomized for file and post deletion; you may also set your own.)
* = required field[▶ Show post options & limits]
Confused? See the FAQ.
Expand all images

[–]

 No.895727>>895728 >>895737 >>895804 >>895846 >>896072 >>897671 [Watch Thread][Show All Posts]

https://blog.ircmaxell.com/2015/03/security-issue-combining-bcrypt-with.html

When will this stupidity finally come to an end?

Even in C, it's absolutely possible to use a struct with pointer and length, and add a library with replacements for the functions which worked with zero terminated strings.

Why would anyone still use zero terminated "strings"? They make no fucking sense, almost the worst idea ever.

 No.895728>>895732 >>896203 >>897515

>>895727 (OP)

But anon, null terminated strings take up less space and are the same format regardless of CPU architecture.


 No.895732>>895734

>>895728

they don't.

3 bytes difference is irrelevant for long strings.

if you want to make short strings small, then there is such a thing as variable length coding for integers, but this would be so little gain, it's not worth it.

CPU architecture is irrelevant, the pointer size would change in both cases. and the internal represenation of the length doesn't mean jack shit anyway.

also zero terminated strings are less efficient because calculating length is O(n).


 No.895734>>895738

>>895732

>they don't.

Ah so that 8 byte size prefix is not a waste of space then?

>there is such a thing as variable length coding for integers

Yeah we will just do a big num hack to size our strings! Good idea.


 No.895735>>895736 >>895744 >>896031

What if my string is more than 4,294,967,295 characters!


 No.895736>>895744 >>896516

>>895735

No one would ever need to store a file more than 4 gigabytes. We don't need to design our system to handle that.


 No.895737>>895739

>>895727 (OP)

You know windows internally uses sized strings. Look how well that turned out.


 No.895738>>895740 >>895743

>>895734

4 bytes are enough for strings for all practical purposes. if you have more, obviously it's time to use specialized data structures anyway.

it's not a waste of space, it's only 3 bytes more than the terminating null byte. which is about 1-2 characters on average if you use UTF-8, and less than 1 character if you use fixed size Unicode.

unless you store single characters in strings, this doesn't fucking matter, and it's a lot better than to turn all code dealing with strings into a potential minefield + sacrificing run time where you can't reuse length information for some reason.

>>895734

>Yeah we will just do a big num hack to size our strings! Good idea.

it's only big if your brain is small. it's not gonna be used in the most of application level code.

anyway, in the same sentence I also said that the gain is minimal so it is not worth it. learn to read. still, that would be still better than zero terminated strings.


 No.895739>>895743

>>895737

which exactly of windows problems are a consequence of this?


 No.895740

>>895738

>unless you store single characters in strings a lot of times

a few words escaped, fixed


 No.895742>>895795 >>896253

With any kind of input you usually don't even know the string length beforehand, so some kind of string termination is necessary.

and you don't always need to know the string length either


 No.895743>>895748 >>896509

>>895738

>4 bytes are enough for strings for all practical purposes

I agree, no one could possibly need a hard drive more than 16 megabytes. A 200Mhz CPU is blazing fast.

>it's not a waste of space, it's only 3 bytes more than the terminating null byte

For 32bit max length strings. For small strings especially its a waste of space.

>if you use fixed size Unicode.

Well good thing no one uses that for the same reason no one uses size prefixed strings. Because it has a different representation on different machines. Byte order and what not.

>you can't reuse length information

Well thats one particular operation

> it's not gonna be used in the most of application level code

Do you not know what bignum is? You are saying that everyone is going to be using a bignum implementation for basic fucking strings.

>>895739

>windows problems are a consequence of this?

Idk its proprietary they only let us know so much


 No.895744>>895745 >>895803

>>895736

if you read a file that's as big, and need to keep all of the content in RAM for some reason, you aren't using a fucking string type for it. it will be almost useless

in this form anyway.

>>895735

a legit use case, please.

also keep in mind that scanning 4 GiB for the zero byte would take really a lot of time.


 No.895745>>895750

>>895744

>you aren't using a fucking string type for it.

So no one reads files into char* then? lol


 No.895748>>895751

>>895743

>>895743

>Do you not know what bignum is? You are saying that everyone is going to be using a bignum implementation for basic fucking strings.

I'm saying no one will re-implement it.

And using bignum already implemented in a library doesn't make shit any more complex.

>>895743

>Well thats one particular operation

when it is used somewhere and turns an O(n) algorithm into O(n^2) that would be a big deal and a PITA to fix.

>>895743

>Well good thing no one uses that for the same reason no one uses size prefixed strings. Because it has a different representation on different machines. Byte order and what not.

For most use cases it's a bullshit reason. Byte order only matters for data exchange --- files and network. Files do not need to store their size, their size is known. So prefixing size in files would be simply excessive, just as adding zero byte to the end. On network, size counts even more, so it's a normal practice to use variable length coding (protobuf, etc.).

But we are talking about in-memory representation. And you know, you don't swap a fucking CPU on a running machine while keeping RAM and CPU cache and registers, etc.


 No.895749>>895795 >>895806

>+3 bytes is a waste of space for strings

Meanwhile people are happily making programs in javascript where every variable is some abominable super object.


 No.895750>>895751

>>895745

char* doesn't say anything about whether the referenced content is zero terminated.

it's just a fucking pointer.


 No.895751>>895754 >>895755 >>895795

>>895748

>And using bignum already implemented in a library doesn't make shit any more complex.

You think linking in foreign dependencies to use strings does not increase the complexity? HAHAHA

>Byte order only matters for data exchange

Like uhhhh text files, websites, spreadsheets, literally everything.

>are happily making programs in javascript where every variable is some abominable super object.

And this is a bad thing

>On network, size counts even more, so it's a normal practice to use variable length coding (protobuf, etc.).

Not for strings where the smallest representation is null terminated.

>>895750

>He does not have 10 gigabyte log files he wants to parse


 No.895754>>895757 >>895764

>>895751

>>He does not have 10 gigabyte log files he wants to parse

you never wrote software which was able to do that, obviously.

>>895751

>Like uhhhh text files, websites, spreadsheets, literally everything.

and? they don't use zero terminated strings.

and we are talking about in-memory representation.

>>895751

>You think linking in foreign dependencies to use strings does not increase the complexity? HAHAHA

HAHAHA, go program some stuff without libc.


 No.895755>>895776

>>895751

>And this is a bad thing

Of course. My point was that 3 bytes on a string is nothing. It's the difference between "Hello world!" and "Hello world!gay". You're more likely to waste bytes on shit software design than a structured string.


 No.895757>>895759 >>895761

>>895754

>go program some stuff without libc.

String copy is like 5 lines of code to write yourself, you want to add on a bunch of big num shit to make it 100.

>and we are talking about in-memory representation.

You think file formats don't use null termination literally everywhere?

>you never wrote software which was able to do that, obviously.

Its really fucking easy, you just mmap it into memory and start reading. The OS will load and unload the pages for you.

>You're more likely to waste bytes on shit software design than a structured string.

People wasting space at the high level is no excuse to waste shit at the low level. Even worse wasting space at the low level is going to make all that high level shit even shittier.


 No.895759

>>895757

>You think file formats don't use null termination literally everywhere?

Yes.

>>895757

>just mmap it into memory and start reading

Then you aren't using any null termination.


 No.895761>>895763

>>895757

>excuse to waste shit at the low level

You're free to null terminate your own esoteric string use cases if you really need to squeeze the fuck out of every byte of memory, but the chances that you do are probably 0. I don't think you understand how meaningless those 3 bytes really are in this case.


 No.895763

>>895761

>I don't think you understand how meaningless those 3 bytes really are in this case.

combining that with the other text, it's pretty much obvious that he doesn't.


 No.895764>>895770

>>895754

>Then you aren't using any null termination.

mmap is not dependent on the file type you dolt

>I don't think you understand how meaningless those 3 bytes really are in this case.

No reason to optimize! Not like computers process hundreds of billions of strings a day! That would not be beneficial at all!


 No.895770

>>895764

>mmap is not dependent on the file type you dolt

which retarded file format are you using which has zero termination at the end?


 No.895776>>895798

Null-terminated strings have the advantage of behaving extremely well with recursive function, as constructing a suffix substring is just a matter of incrementing a pointer.

Think about how Lisp handles list: they are chained cons cells, with the last cons having an empty (null) list as its CDR. This is how you work with recursivity in Lisp.

The fact that you think null-terminated strings are useless prove you have still much to learn, youngling.

>>895755

>3 bytes on a string is nothing

Wrong. First of all, a size_t variable in a 64 bits environment is 64 bits, so we're talking about 7 extra byes. Second of all, those bytes are nothing on a text, but when you have a plethora of max 10 characters strings (quite frequent), with 32 bits system it increases their size by a 30% factor, on 64 bits system, it's 70% of memory usage increase.

Imagine storing an associative array that uses, as keys, 5-char strings with 8 byte size variable. That's 13 bytes per key. With null-terminated strings, it's 6 bytes. That's some 118% more space used by the size+string solution.


 No.895777>>895778 >>895779 >>895781

>ITT: strings should be null terminated because some filetypes use null characters, also I don't stat my mmaps and EOF aren't a thing, and I prefer to waste cycles scanning for the null terminator rather than wasting an extra 1% of memory that would speed up most use cases and solve 95% of the silliest and most common types of bugs ever

The absolute state of C idiots in this board, everyone.


 No.895778>>896348

>>895777

Go back to your javascript bullshit. Clearly you don't care about efficiency and interchangeability.


 No.895779>>895810

>>895777

Do you know that EOF is an integer value, not an unsigned char, that most strings in a system are very small, that strlen() is rarely needed, and that you can still store a string's length in C, while not losing the advantages of the NULL termination?


 No.895781

>>895777

>C

You mean Unix.


 No.895785>>895788 >>895793 >>895801 >>896156 >>896219

the virgin FOReskin


void map(char *str, size_t str_len, char (func*)(char)) {
for (size_t i = 0; i < str_len; ++i) {
str[i] = func(str[i]);
}
}

The Chad Elegant Recursion


void map(char* str, char (func*)(char)) {
if (str) {
*str = func(*str);
map(str, func);
}
}

[/code]


 No.895788>>895790

>>895785

should be map(str++, func), LOL


 No.895790>>895811 >>896038

>>895788

>stack overflow

This is why you don't use languages without proper tail recursion.


 No.895793>>895811

>>895785

>it took less to type therefore it's better as a program

Remove yourself from premises.


 No.895795>>895797 >>895811

Null-terminated strings suck. C weenies defend it because that's what C uses. Common Lisp strings are arrays, and they can be adjustable (grow and shrink) and have a fill pointer (anything less than it is the currently used part). This covers all the uses of dynamically sized strings, length-prefixed strings, and fixed-length strings. Lisp strings are arrays, so all arrays can have these properties.

>>895742

Bullshit. You always need to know the length. If you really added up all the waste from C and UNIX "comparing characters to zero and adding one to pointers", it would be more efficient to have GC and dynamic typing and store files as arrays of strings. I'm not kidding. C malloc overhead is huge too, but on a Lisp machine, allocating a list only uses one word of memory per element. Allocating a 1D array only uses one header word to store the actual length of the array (which malloc has to do too, but it doesn't provide useful information to you) followed by the words for the array data. Lisp machine overhead is much smaller than C overhead, and the GC compacts to eliminate memory fragmentation.

>>895749

>>+3 bytes is a waste of space for strings

>Meanwhile people are happily making programs in javascript where every variable is some abominable super object.

That's because C sucks. malloc in C has more than 3 bytes of waste. JavaScript is a better language than C even though it sucks too.

>>895751

>>He does not have 10 gigabyte log files he wants to parse

You're going to read an entire 10 GB file into memory (not memory mapping) and stick a 0 byte on the end, but you think an 8 byte length is wasteful? I have no idea why anyone would do things like that.

> Subject: More On Compiler Jibberings... 
>
> ...
> There's nothing wrong with C as it was originally
> designed,
> ...

bullshite.

Since when is it acceptable for a language to incorporate
two entirely diverse concepts such as setf and cadr into the
same operator (=), the sole semantic distinction being that
if you mean cadr and not setf, you have to bracket your
variable with the characters that are used to represent
swearing in cartoons? Or do you have to do that if you mean
setf, not cadr? Sigh.

Wouldn't hurt to have an error handling hook, real memory
allocation (and garbage collection) routines, real data
types with machine independent sizes (and string data types
that don't barf if you have a NUL in them), reasonable
equality testing for all types of variables without having
to call some heinous library routine like strncmp,
and... and... and... Sheesh.

I've always loved the "elevator controller" paradigm,
because C is well suited to programming embedded controllers
and not much else. Not that I'd knowingly risk my life in
an elevator that was controlled by a program written in C,
mind you...

And what can you say about a language which is largely used
for processing strings (how much time does Unix spend
comparing characters to zero and adding one to pointers?)
but which has no string data type? Can't decide if an array
is an aggregate or an address? Doesn't know if strings are
constants or variables? Allows them as initializers
sometimes but not others?

(I realize this does not really address the original topic,
but who really cares. "There's nothing wrong with C as it
was originally designed" is a dangerously positive sweeping
statement to be found in a message posted to this list.)


 No.895797

>>895795

>Spams copy pasta

dropped


 No.895798>>895800 >>895811

>>895776

>max 10 characters strings (quite frequent)

citation needed

>First of all, a size_t variable in a 64 bits environment is 64 bits

what about uint32_t?


 No.895800>>895803

>>895798

>4 gigabyte limit


 No.895801>>895802 >>895811 >>896088

>>895785

>char (func*)(char)

lol, what a retarded syntax


 No.895802>>895808 >>896088

>>895801

I bet you are the type of larper that gets autistic about where parenthesis are placed or using spaces vs tabs wasting all our fucking time.


 No.895803>>895805


 No.895804

>>895727 (OP)

>misuse null-terminated strings

<FUCK NULL TERMINATED STRINGS, IT WASN'T MY OWN STUPIDITY

The code for the hash in your link is shit, and it's not because of the string. It's because the bcrypt writer played with fire and got burned. If you work directly with pointer logic, you need to be very careful. The language does offer you ways of solving the problem with safer, easier to use tools. The problem is, when you need to be efficient, you're going to have to write code closer to the hardware level. You might as well ban chainsaws because idiots get hurt by them.


 No.895805>>895808

>>895803

Who said anything about keeping it in RAM? You can process something without it being in RAM. Every heard of streams? Every heard of memory mapped files? Guess not lol.


 No.895806>>895809

>>895749

And there's absolutely nothing wrong with that. Having been in both worlds, its such a pleasure to write software in the more abstract languages.


 No.895808>>895812 >>896088

>>895802

not.

it's a lot less clear than `char -> char` for example, or even `Function<char, char>`.

try to spell (in C) a type of a variable which is a function which takes a char and returns a function which returns a function which returns a function which returns a char, for example.

>>895805

if you read from file, you already know the size, because files have size. adding 1 useless byte is useless and stupid.


 No.895809>>895815 >>895818

>>895806

Javascript is the C of high level languages


 No.895810>>895814 >>895816 >>895827

>>895779

>Do you know that EOF is an integer value

People ITT apparently don't know null terminated strings and reading from files have nothing to do with each other at all, that's my point.

>that most strings in a system are very small

Did you know SQL databases solved this ages ago with fixed size char fields, variable size char fields and text fields? Fuck, we could solve this the same way we solved numeric types of different sizes, with short strings, regular strings, long strings, etc.

<but that's not YOONIKS-y and simple!

It's about as obtuse as integer sizes. Read: not at all if you care the littlest bit about muh autistic efficiency. Not only that, but the compiler could infer the most adjusted type for literals, so you should only worry about user inputs and files, which should have a fixed length anyway.

<but muh length promotion would waste too much!

Go write assembly then, fag.

>that strlen() is rarely needed,

Rarely my ass, unless you use buffers and increase the complexity of your program by doing this.


 No.895811>>895813 >>895819 >>895822 >>896921

>>895793

That alone is a reason to make it better, but it's also clearer, the function takes one less argument, and it doesn't need to push a new variable onto the stack.

>>895790

While it is true that ANSI C says nothing about tail call recursion, GCC does it.

>>895795

Mr. Common Lisp[1] here apparently does not understand the value of a null terminator in a linear collection of elements (like a string), even though it is the principle upon which cons cell lists are constructed.

[1] yuck!

>>895798

>citation needed

Look up any software, and see how long most string are.

>what about uint32_t?

Though there is no reason for it not to be used, is not recommended to hardcode your size_t. Also, uint32_t is not defined by ANSI standards older than C99.

>>895801

>return type (name) (arguments)

How would you do it, Mr. Smart Man?


 No.895812

>>895808

<Not

>Goes on to larp about syntax


 No.895813>>895827

>>895811

I bet you think a linked list is good too because it doesn't need an iterator variable to loop through.


 No.895814>>895820

>>895810

>People ITT apparently don't know null terminated strings and reading from files have nothing to do with each other at all, that's my point.

People ITT don't know that null terminated strings are used in file formats all the time.


 No.895815>>895817

>>895809

>Javascript is the crap of high level languages

ftfy

although C is crap too, so… not a big difference after all.


 No.895816>>895823 >>896348

>>895810

>Go write assembly then, fag.

Go write in javascript faggot, its where you belong.


 No.895817

>>895815

Thats the point dingus


 No.895818

>>895809

Made my day.


 No.895819>>895827

>>895811

>and it doesn't need to push a new variable onto the stack

who told you so?

>what is registers?


 No.895820

>>895814

>People ITT don't know that null terminated strings are used in file formats all the time.

and the most widely used example is … ?


 No.895821

so many newfag CS undergrads ITT smh


 No.895822>>895827 >>895836

>>895811

>Though there is no reason for it not to be used, is not recommended to hardcode your size_t. Also, uint32_t is not defined by ANSI standards older than C99.

Older standards than C99 belong to the garbage bin.


 No.895823>>895824

>>895816

Your beloved C does size promotion all the time. Fuck, getchar(), which is used to read a single character from a file, which is about as wasteful of a function as it gets, performs promotions with every single call. And it's negligible.

Really, fuck off. You don't even want assembly, your autism should only allow you to use ASICs that waste zero cycles at all.


 No.895824>>895827 >>895828

>>895823

I hate C, I just like null termination.


 No.895827>>895831 >>895833 >>895888

>>895810

SQL databases are much different than C storage. For starters, the length of VARCHAR is stored only once, in the column definition. When the length is dynamic, we're talking about text, which will indeed make the extra 8 bytes literally nothing.

><but that's not YOONIKS-y and simple!

I don't like Unix, please do not put words in my mouth.

>some rambling on stuff I haven't mentioned

ok

>Rarely my ass

Rarely indeed. For hard-coded strings, the length is simply sizeof(myString) (which counts the null terminator). For strings that you receive as input, the size is calculated while receiving it, or is pre-given.

Null-terminated dynamic-size strings are good for manipulation, sized dynamic-size strings are good for interchange (databases, network, file formats, etc.)

You should use fixed-length strings as much as possible anyway.

>>895819

If it is not pushed onto the stack, it's a compiler optimization, that you shouldn't rely on, or you need to specify the variable as volatile.

>>895813

Linked lists are excellent as lists. If you try to use them when you should use fixed-size arrays or vectors, maybe you should take an IQ test, and based on that, decide if you should kill yourself or retake Data Structures 101.

>>895822

If you're not a LARPer, surely you have heard of legacy codebases.

>>895824

Same. C sucks, but most of its detractors just don't understand the real reasons why.


 No.895828>>895830

>>895824

Well tough shit then…


 No.895830

>>895828

Tough shit for you, attacking a strawman this whole time.


 No.895831>>895832

>>895827

>Linked lists are excellent as lists

>you should take an IQ test

You sure you're not projecting m8?


 No.895832>>895835

>>895831

>I don't understand why you would possibly want a linked list.

How many years of programming do you have on your CV, again?


 No.895833>>895834 >>895844

>>895827

>Linked lists are excellent as lists.

Linked lists waste all that space on pointers though, terrible cache properties, jumping around to different pages all the time. Big O time complexity has little to do with the real world when we are dominated by the size of N.


 No.895834>>895837 >>895843

>>895833 (checked)

Is an array of pointers that get reallocated all the time a better solution when the list is not changing often?


 No.895835>>895861

>>895832

Well I've never heard of a situation where a linked list is the best solution, so here's your chance to educate me.


 No.895836>>895839 >>895845 >>896048

File (hide): 3b235dc9d091814⋯.jpg (117.17 KB, 905x1280, 181:256, f675d1bd03c83ec210e8900cb1….jpg) (h) (u)

>>895822

That's where you're wrong, kiddo. C99 is one of the worst standards to come, and everyone in the industry uses C95 exclusively.


 No.895837>>895841 >>895852 >>895968

>>895834

If that array of pointers fits within a few pages then its absolutely faster compared to chasing down pages wherever they get allocated.


 No.895839

>>895836

>The furry c programmer knows all


 No.895841

>>895837

Thx. Is this the best solution for small lists? Is there a special list type you'd recommend?


 No.895843>>895852

>>895834

Have you ever benchmarked this shit on a relatively modern computers?


 No.895844>>895848 >>895849 >>895850 >>895855

>>895833

You and I must have different definitions of "real world". When you need to constantly resize (queues, lists, stack), using vectors is extremely expensive. When the size of your vector remains constant, or is changed very little, using a vector is better.

You wouldn't cut a steak with a wood saw, or cut a plank with a steak knife. Two different tools serve two different purposes, and so do two different data structures.


 No.895845>>895858

>>895836

Not an argument


 No.895846>>895851

>>895727 (OP) (OP)

That vulnerability mentioned in that blog post is developer error. The function takes in a string, but you pass in a byte array. Why would you expect it to work? If you pass in the wrong type of variable then of course it might not work right.


 No.895848>>895861

>>895844

>You and I must have different definitions of "real world". When you need to constantly resize (queues, lists, stack), using vectors is extremely expensive

Any evidence?


 No.895849>>895853 >>895861

>>895844

> using vectors is extremely expensive

Thats just it, its not extremely expensive. It have a expensive big O cost, but almost every benchmark will show that vectors are faster. This is because cache pages exist. The cache changes how all of this works.


 No.895850>>895861

>>895844

Look your CS 101 data structures class using big O notation is not an accurate description of how caches work.


 No.895851>>895985

>>895846

in C, char* is also used for byte arrays.

this is a programmer error, but it could be prevented if the design of the language and the stdlib was less shit.

programmers will always make some errors, but some of them can be prevented entirely as a class.


 No.895852

>>895843

No. But performance always takes priority.

And I think we should listen to

>>895837

's practical advice and not some stupid theory developed by java shitcoders at some university.


 No.895853>>895854

>>895849

>It have a expensive big O cost

it doesn't.

amortized cost of adding an item is still O(1).


 No.895854>>895856 >>895861

>>895853

Adding an item to the middle of a vector is not amortized to O(1).


 No.895855

>>895844

You can change how often a vector reallocates itself, but really, the default behavior is sufficient for most implementations.


 No.895856>>895859

>>895854

neither in the linked list if you need first to find a place where to insert --- you'll need O(n) traversal first.


 No.895858>>895862

>>895845

Yours neither, loser. You literally made a bold statement without backing it, or providing proof. Your nodev ass can't even write a reverse polish calculator, LOL.


 No.895859>>895861 >>895867

>>895856

Again you keep using all these fucking big O notation when talking about the speed of these datastructures. The real world does not follow big O. Iterating over a vector thats all in one page is thousands of times faster than jumping between pages where linked list nodes are allocated despite the same time complexity.


 No.895861>>895864 >>895867

>>895842

Yup. This is why compiler warnings exist when you try to do implicit conversion, and this is why Apps Hungarian Notation is useful.

>>895848

>evidence

>of a math problem

1st year CS theory that you ought to know if you want to be taken seriously here.

>>895849

>benchmarks

That use cases where vectors are indeed better.

>>895850

>cache

Do you think data structures stop existing outside of RAM?

>>895835

Filesystems make extensive use of linked data structures.

>>895854

In the middle, or anywhere besides the end. Dynamic vectors can be used somewhat effectively as stacks because of that, but that's about it.

>>895859

>The real world does not follow big O

L M A O

M M

A A

O O


 No.895862>>895863

>>895858

>Your nodev ass can't even write a reverse polish calculator, LOL

I can write even infix calculator without any problem.

I actually wrote a compiler for a simple language and a lot of other shit too. Fix your detector.


 No.895863

>>895862

You're still claiming shit you've never done, and don't provide proof.

>>>/reddit/


 No.895864>>895866

>>895861

Look here retard. Iterating over a list and vector have the same big O cost. In the case of an actual list though you will be chasing down pointers in different pages. big O does not at all model this cast. If you knew more about CS theory than an undergrad simpleton you would understand this.


 No.895865>>895870

oh shit watch out there's a troll in here.


 No.895866>>895869

>>895864

>muh iterations

Insert a new value at the head of a 10 million records vector.

Now do it at the head of a 10 million records linked list.

Come back and tell everyone how it went.


 No.895867>>895868 >>895872

>>895861

>1st year CS theory that you ought to know if you want to be taken seriously here.

When you make claims based on your invalid mental model of the modern computing hardware, of course you need to prove your bullshit to be taken seriously.

>>895859

Lol, are you a brainlet or what?

>>895861

>Filesystems make extensive use of linked data structures.

For different reasons altogether.

We are talking about in-memory data structures.


 No.895868>>895870

>>895867

>Lol, are you a brainlet or what?

<standard Big O notation always correctly models hardware

what the fuck are you on about


 No.895869>>895872

>>895866

>Insert a new value at the head of a 10 million records vector.

if you need to insert at head, you use deque and not vector.

for deque, this is not a problem at all and it will be faster than linked list (amortized)


 No.895870>>895871

>>895865

don't worry I got him right here: >>895868


 No.895871

>>895870

forgot pic


 No.895872>>895873 >>895874

>>895867

>We are talking about in-memory data structures.

Who says so? I defended that linked lists had very valid use cases, and everyone and their nodev asses have come to shit on what is basic knowledge.

>invalid mental model of the modern computing hardware

I know how cache works, thank you.

>>895869

>deque

Not always.


 No.895873>>895875

>>895872

>not always

I see you don't know what amortized means then


 No.895874>>895875 >>895876 >>895881 >>897445

>>895872

TFW your linked list is slower for the one thing it should be better at because of how hardware actually works

https://baptiste-wicht.com/posts/2012/12/cpp-benchmark-vector-list-deque.html


 No.895875>>895878 >>895883

>>895873

If you need to frequently mutate the order of your data, deques can still prove too slow, or their head be too big.

>>895874

>muh benchmarks

Filesystems, do you understand them?


 No.895876

>>895874

See that the only case where the list is actually faster is where they happen to store very large values at each node instead of a pointer to them which is a retarded contrived use case.


 No.895878

>>895875

>amortized


 No.895881

>>895874

>The random position is found by linear search.

Gee!


 No.895882>>895898

>not just implementing a linked list with a lookup table for fast iteration

Kiss and make up, gentlemen. Try not to touch balls though, that's gay.


 No.895883>>895884

>>895875

Yeah no one is actually going to every have to do a linear search on their data to find what they need


 No.895884>>895890

>>895883

Lists are not intended for linear searches.


 No.895887>>895890

>>895877

>they will likely behind a pointer then, so even then it loses.

Only if you're a terrible programmer.


 No.895888

>>895827

>SQL databases are much different than C storage. For starters, the length of VARCHAR is stored only once, in the column definition. When the length is dynamic, we're talking about text, which will indeed make the extra 8 bytes literally nothing.

That really matters nothing at all. The compiler should be able to handle this, along with the promotion rules. My point is that fixed/limited size strings are nothing new and people know how to handle it. The reason most modern programming languages use the same type of strings for everything is because C hacked them in as simple pointers to chars, when that type actually has another property and it is that it is null terminated, so even though the following languages knew null terminated strings were bad because they caused all sorts of problems, they didn't think making a distinction wasn't worth it, so they just used size_t for every string, be it 2 or 20000 characters long.

Riddle me this: what would be so wrong about using structs for strings, where one of the members is a pointer and the other is an unsigned integer which number of bytes adjusts itself to the minimum number that can hold the number of characters in the string? This way, strings up to the max unsigned char value occupy the same as null terminated strings, and strings that take up to the maximum unsigned short int value would occupy a measly single extra byte. In addition, by manipulating pointer and length you could generate a view into a string, which is more or less what Rust already does, and save memory in the process.


 No.895890>>895894 >>895895

>>895884

Okay so we can agree then that lists are useless for almost everything?

>>895887

How dare someone store and object bigger than the size of a page behind a pointer!


 No.895894>>895899

>>895890

lists DO have uses, though. Pretending that they don't is cargo cult programming


 No.895895>>895899 >>895900

>>895890

>so we can agree then that lists are useless for almost everything?

They are not useless, they are slower. And sure, they're slower almost every time, but not every time, which is the point I'm making from the beginning.

>How dare someone store and object bigger than the size of a page behind a pointer!

>2048 bytes

>bigger than a page

nigguh


 No.895898

>>895882

this table will need to be updated each time you insert or remove something, defeating the purpose.

what you actually probably want is https://bitbucket.org/astrieanna/bitmapped-vector-trie.


 No.895899>>895901

>>895895

>>895894

Most things have uses, and the less useful should not be the default.


 No.895900>>895901

>>895895

>And sure, they're slower almost every time, but not every time, which is the point I'm making from the beginning

still useless for realtime, as memory allocation is unpredictable generally.


 No.895901>>895903 >>895907

>>895899

I don't think I said they were or should be the default, have I?

>>895900

Man, I've mentioned filesystems three times already.


 No.895903>>895905 >>895922 >>895930

>>895901

They are the default in schema


 No.895904>>895906 >>895913

>hurr durr lets LARP about irrelevant shit

/tech/ in a nutshell. I bet most of you fags haven't even programmed anything except fizzbuzz tier shit.


 No.895905>>895922

>>895903

*scheme


 No.895906>>895910


 No.895907

>>895901

>Man, I've mentioned filesystems three times already.

filesystems like the FAT? :^)

I've seen better filesystems use more clever data structures.


 No.895910>>895911 >>895913

>>895906

not an argument XDDDDDDDDDDDDDDDDDDDDDDD


 No.895911

>>895910

>>>/molyneux/


 No.895913>>895915

>>895910

>>895904

>I-I bet you guyz hasnt even program! L O L

"well reasoned argument"

>n-not an argument

shhh. The grown NEETs are talking.


 No.895915

>>895913

>grown NEETs

LOL. Keep LARPing faggots.


 No.895922>>895924

>>895903

>>895905

They aren't.


 No.895924>>895925

>>895922

yes they are


 No.895925>>895926


 No.895926>>895929


 No.895929>>895932

>>895926

>Ctrl+F

>"default"

>0 results


 No.895930>>895931

>>895903

what does it even mean for them to be "default"?


 No.895931

>>895930

It means "I know nothing about programming and I need to read my SICP".


 No.895932>>895933 >>895935

>>895929

>A more streamlined notation can be used for lists: the elements of the list are simply enclosed in parentheses and separated by spaces. The empty list is written (). For example, the following are equivalent notations for a list of symbols:

>(a b c d e)

>(a . (b . (c . (d . (e . ())))))

LOL. It is the default


 No.895933>>895936

>>895932

>this means they're the default DS


 No.895935>>895937

>>895932

you are confusing the abstract concept of lists, and the particular implementation of linked lists. Scheme uses lists heavily but that doesn't mean its based in linked lists


 No.895936>>895938

>>895933

They are. If you write (a b c) you have a linked list.


 No.895937>>895939

>>895935

>that doesn't mean its based in linked lists

proof?

Mine is here: https://www.gnu.org/software/mit-scheme/documentation/mit-scheme-ref/Lists.html#Lists


 No.895938>>895940

>>895936

You don't, and you're retarded, and you don't understand the difference between an actual list structure, and the concept of list used in the representation of Scheme programs.

If you type (a b c), you are calling the function 'a' with the arguments 'b' and 'c'.


 No.895939>>895942

>>895937

proof of what? I'm just pointing out a distinction.

>Ctrl+F

>link

>0 results


 No.895940>>895943

>>895938

>>A more streamlined notation can be used for lists: the elements of the list are simply enclosed in parentheses and separated by spaces. The empty list is written (). For example, the following are equivalent notations for a list of symbols:

>>(a b c d e)

>>(a . (b . (c . (d . (e . ())))))

https://www.gnu.org/software/mit-scheme/documentation/mit-scheme-ref/Lists.html#Lists


 No.895942>>896308

>>895939

>7 Lists

>A pair (sometimes called a dotted pair) is a data structure with two fields called the car and cdr fields (for historical reasons). Pairs are created by the procedure cons. The car and cdr fields are accessed by the procedures car and cdr. The car and cdr fields are assigned by the procedures set-car! and set-cdr!.

>Pairs are used primarily to represent lists. A list can be defined recursively as either the empty list or a pair whose cdr is a list. More precisely, the set of lists is defined as the smallest set X such that

> The empty list is in X.

> If list is in X, then any pair whose cdr field contains list is also in X.


 No.895943>>895945

>>895940

>I still don't understand Scheme

You could have said "oh, ok, I thought so" when I told you they weren't the default, and you would have just appeared as someone who doesn't know Scheme, which is in itself not a bad thing. Now you're just making an ass out of yourself.


 No.895945>>895946


 No.895946>>895947


 No.895947>>895949

>>895946

>A more streamlined notation can be used for lists: the elements of the list are simply enclosed in parentheses and separated by spaces.

>Pairs are used primarily to represent lists.

>A more streamlined notation can be used for lists: the elements of the list are simply enclosed in parentheses and separated by spaces.

>Pairs are used primarily to represent lists.

>A more streamlined notation can be used for lists: the elements of the list are simply enclosed in parentheses and separated by spaces.

>Pairs are used primarily to represent lists.

>A more streamlined notation can be used for lists: the elements of the list are simply enclosed in parentheses and separated by spaces.

>Pairs are used primarily to represent lists.

>A more streamlined notation can be used for lists: the elements of the list are simply enclosed in parentheses and separated by spaces.

>Pairs are used primarily to represent lists.

>A more streamlined notation can be used for lists: the elements of the list are simply enclosed in parentheses and separated by spaces.

>Pairs are used primarily to represent lists.

you don't understand english.


 No.895948

anyways have fun tripfagging.


 No.895949>>895950

>>895947

You haven't answered my question. Are lists the default data structure of C?


 No.895950>>895955 >>895998

>>895949

there are no built-in data structures in C at all.


 No.895955>>895958 >>895983

>>895950

There's arrays. Now not only you have to read your SICP, but you also have to read your 2nd edition of "The C Programming Language".


 No.895958>>895959 >>895960

>>895955

Bullshit. There are no arrays.


 No.895959

>>895958

I mean, there's only a joke of an array which must fit into stack or constant pool and it can't be used for anything "big".

What you probably actually mean is just pointers + the ability to request some memory dynamically if your environment has it in the API.


 No.895960>>895961 >>895983

>>895958

But everything is "just pointers". C has arrays, there's a syntax for array initialization, there's a syntax to specify arrays of types, and they are mentioned in the standard.


 No.895961>>895962

>>895960

Pro tip: if you can't query the size of the array, it's not an array.


 No.895962>>895963 >>895966

>>895961

1) That's a definition you pulled out of your ass

2)

sizeof(myArray);


 No.895963>>895972

>>895962

the latter will not work for a pointer to dynamically allocated "array".


 No.895966

File (hide): 20ccc98660ebacf⋯.jpg (10.57 KB, 236x188, 59:47, 780.jpg) (h) (u)

>>895962

Are you a furry?


 No.895968>>895971

>>895837

I still don't understand why it's faster.

An entrypoint pointer and a bunch of structs would have the same performance with paging when cycling through in my brain.

Both have the distance of 1 pointer to the next struct and the contents are allocated dynamically.


 No.895971>>895974

>>895968

then go ahead and learn instead of making a clown out of yourself here


 No.895972>>895973 >>895983

>>895963

Because dynamically allocated memory is not an array as per the C definition of array. Read your books.


 No.895973>>895975

>>895972

yeah that's what I said from the beginning.

arrays which are in C are almost useless, and there are no true arrays.


 No.895974

>>895971

Don't be mad at me. Learning from you is simply faster.


 No.895975>>895979 >>895983

>>895973

My god, kiddo. Read your fucking book.


char *a;
char b[512];
sizeof(a);
sizeof(b);

Those yield different results, because arrays and pointers in C are, in fact, not the same thing.

https://eli.thegreenplace.net/2009/10/21/are-pointers-and-arrays-equivalent-in-c


 No.895979>>895981 >>896075

>>895975

>not the same thing.

<but they should be

one more piece of c bloat


 No.895981>>895982

>>895979

>I was wrong but I should be right


 No.895982

>>895981

Don't assume that anyone that responds is the same person


 No.895983>>895984

>>895955

>>895960

>>895972

>>895975

C does not have arrays. It has something called arrays, something called strings, and something called unions, but it does not have any of those things. If I built an array processor, languages with arrays would be faster, but C wouldn't be able to run on it.


 No.895984>>896002

>>895983

And I suppose you think lisp cannot exist on x86 because it does not have cons cells at the hardware level?


 No.895985

>>895851

>in C, char* is also used for byte arrays.

Byte arrays should be unsigned char*. When the char datatype is started without being signed or unsigned, it means that it represents characters (as in characters in a string).

This is a problem with PHP itself and has happened with other functions too.


 No.895998>>896001

>>895950

>there are no built-in data structures in C at all.

wew lad


 No.896001

>>895998

data structures? I think you mean bloat.


 No.896002>>896005 >>896006

>>895984

It's harder to make a computer that can't run Lisp than it is to make a computer that can't run C because ISO C cannot run on certain hardware. There are requirements about the kind of hardware C can run on, like it must use binary integers in one of certain number of formats, and "strings" and "arrays" also have a lot of requirements, even though they're horrible. There's no way to use "arrays" and "strings" in C without pointer arithmetic, so C must specify how data appears in memory, which prevents a lot of optimizations and faster hardware designs.


 No.896005>>896007

>>896002

>It's harder to make a computer that can't run Lisp than it is to make a computer that can't run C because ISO C cannot run on certain hardware.

>Lisp is so shit, you can't build an emulator or compatibility layer with it.

LISPFAGS BTFO


 No.896006

>>896002

Are the existence of C and C++ one of the reasons why progress stopped and all computer architectures are so mediocre?


 No.896007>>896008

>>896005

you can emulate everything but running C with shit speed is pointless, as it's only ever used for speed gains.

if C becomes slower than Python for example, then it's fucking pointless to use, as it sucks donkey balls on everything else too.


 No.896008>>896009

>>896007

>C becomes slower than Python

That wouldn't happen because C is very specific about everything.


 No.896009>>896012

>>896008

it will only make matters worse, dude.


 No.896012

>>896009

No. Things can always be abstracted but reduction is impossible after some point which is why more specific languages are faster by default and that DOESN'T CHANGE.


 No.896031

>>895735

Then you really don't want to scan for that null terminator.


 No.896038

>>895790

Newer revisions of the ISO C standard require the compiler to implement tail recursion properly.


 No.896048

>>895836

you are a humongous faggot


 No.896072>>896082 >>896091

>>895727 (OP)

It's easier to put a NUL byte at the end (just *one* byte) than calculate the new length everytime it's modified and put it into another variable (presumably at least an int, i.e. probably at least four bytes).


 No.896075

>>895979

>arrays and pointers should be the same thing

No. Their underlying concepts are very different. The fact that C arrays are implemented such that an array identifier represents a pointer to its first element in almost all contexts (using the array indentifier as a sizeof operand is a case when it does *not*, and for good reasons) is just because it's convenient that way, nothing else.


 No.896082

>>896072

What data structure could you possibly be using where you wouldn't already know the length?


 No.896088>>896091 >>896093

>>895801

>>895802

>>895808

The syntax is retarded because it's invalid. "func being a pointer to function taking char argument and returning char is "char (*func)(char)".


 No.896091>>896142

>>896072

>calculate the new length every time it's modified

... which isn't how you're supposed to use this stuff.

>>896088

>asterisk is on the wrong side

>INVALID!!1!!!

Yeah this totally changes how it looks.


 No.896093>>896142

>>896088

Are you seriously complaining that he put the * on the other side?


 No.896095>>896096

>Hey kid, send me the word "hello". That'll be 2 megabytes long, by the way.


 No.896096>>896105 >>896122

>>896095

This tbh.

>Store 50,000 5-char identifiers

>650kB


 No.896105>>896109

>>896096

>Storing 5 char identifiers

>Not mapping them to an enum

smh tbh famalamadingdong. I already told you how to handle that usecase with almost zero overhead, anyway.


 No.896109>>896110

>>896105

>want to store or delete identifier

>need to recompile entire application

>user-submitted identifier

>make script that modifies the sources and recompile

>giant enum {...} with 50 000 entries


 No.896110>>896112

>>896109

You could store them in a config file. Oh wait, that's bloat, amirite.


 No.896112>>896124

>>896110

>get user query based on identifier

>load up 640kB file everytime


 No.896122>>896127

>>896096

I was more so referring to the potential security problems with having an arbitrary length value, but yeah, that too.


 No.896124>>896126 >>896658

>>896112

You could mmap that file and seek it on demand.

<ooh, but seeking these on demand is so expensive

>get user query based on identifier

>search a million entries

>compare a million 5 char long strings one by one

>not expensive

<but my program will only search for 5 entries tops!

Which is why you need 50k 5 char long strings, right. Two can play the "arbitrarily specific specifications" game, too.


 No.896126>>896130

>>896124

>What is a search tree


 No.896127>>896128 >>896144

>>896122

>arbitrary length value,

Null terminated strings are a-okay tho.


 No.896128

>>896127

Did I stutter?


 No.896130>>896137

>>896126

Something you could apply to the file as well. Or something you could apply to enum-tagged entries for extra speed. I dunno, I'm not the one making up the stupid limitations.


 No.896137>>896151

>>896130

>calls it stupid limitations

>when he's the one calling for 8 extra bytes appended to EVERY string.


 No.896142>>896145

File (hide): 084e1bea1455bde⋯.png (14.52 KB, 704x412, 176:103, untitled.PNG) (h) (u)

>>896091

>>896093

Are you literal brainlets? "putting the asterisk on the wrong side" is invalid syntax and doesn't work.


 No.896144

>>896127

<Null terminated

>A one-L NUL, it ends a string

>A two-L NULL points to no thing

>But I will bet a golden bull

>That there is no three-L NULLL

(char)'\0' != (void *)0


 No.896145>>896146 >>896149

>>896142

You are complaining about such a minor irrelevant typo.


 No.896146>>896150

>>896145

>syntax error (proving him to be a larper who has actually not much of a clue of how function pointers are used correctly) preventing the code from even compiling

>"minor irrelevant typo"

dat desperate damage control of yours tho


 No.896149

>>896145

C isn't nignog transgender studies where nothing that is said actually ever matters. The compiler is autist extraordinaire and is merciless to syntax errors of any kind.


 No.896150>>896156

>>896146

I'm not the one that made the post. I'm calling you a faggot for sperging out about it.


 No.896151>>896155 >>896158

>>896137

>when he's the one calling for 8 extra bytes appended to EVERY string.

Never said that, you just conveniently ignored my post on how to properly handle this, while also adding a feature that would simply save memory in the long run.


 No.896155>>896178

>>896151

The one where you advocated adding a complicated BigNum system to system that needs to process strings?


 No.896156>>896160 >>896161

>>896150

Desperately playing down a stupid mistake (yea no, it wasn't a "typo" because the larper consistently repeated it, go back to >>895785 and check if unsure) just because (you) yourself didn't notice it is far worse than pointing it out (which you cared to inaptly call "sperging out about it").


 No.896158>>896178

>>896151

>posts non-solutions

>how to properly handle this

Very simple use case, kid. You have a map of arbitrary-but-usually-small length strings to whatever (let's say 1 int), and you have 50,000 of those records. You have to attend numerous update and query requests.

You cannot

>use an enum (lm fucking ao)

>use a file (actually you can but it changes nothing)

>mmap enormous chunks of data

It's simple, you have a map of strings. Do you more than duplicate their size?


 No.896160>>896166

>>896156

It's an asterisk on the wrong side of the "func" identifier, no one cares about it actually, because it doesn't change the look of it overall. You're sperging over it because you absolutely can't stand losing an argument, or maybe just crave validation. Talk about LARPing.


 No.896161

>>896156

>just because (you) yourself didn't notice it is far worse than pointing it out

I guess if you don't autistically rant for multiple posts about a minor syntax error that means you did not know it was wrong. If you don't correct every little grammar error in a persons post you most have no concept of proper English usage. The only option is to be a massive faggot about everything.


 No.896166>>896167

>>896160

Why am I even coming back to this place choke-full of bitter and confused larpers


 No.896167


 No.896178>>896179 >>896180

>>896155

Why the fuck would you need bignum when you couldn't hold strings bigger than size_t, anyway? And if you could, you would indeed need other access mechanisms. No, you just need unsigned chars, shorts, ints and longs. If this already included in C feature is too complicated for you, you should go back to scripting languages.

>>896158

>arbitrary-but-usually-small length strings

>arbitrary length

Oh sure, you never said that before but okay. If what you need is to hold SQL-like text fields you would indeed waste 7 bytes per entry using my method, but if you are short on resources (considering you are complaining about 650k being too much) you shouldn't be doing this anyway, at the very least use a varchar. You are also implying these keys are probably unique, non-system defined and dynamic, which weren't part of the original requirements, while I was assuming your dataset was larger and that these keys were not primary and non-unique, in which case they would benefit from compression if mapped to integers.

Your requirements are stupid anyway. About the only usecase in which NUL terminated strings would win, and that's making assumptions such as users submitting arbitrarily large keys and fucking over your limited resources, which could be solved by offloading your keys onto another table held in a file and organized using a radix tree.


 No.896179>>896182 >>896196

>>896178

Not actual bignum you dumb fuck. Variable length encoding of the prefix size number. Its very similar.


 No.896180

>>896178

NULL-terminated string also win in ease of processing.


 No.896182>>896183

>>896179

Good solution for compact storage and transmission, not so good for processing, as it introduces more branching.


 No.896183>>896191

>>896182

Scheme uses actual bignum for all its calculations in its numeric tower, clearly you don't care about a little extra branching.


 No.896191>>896193

>>896183

Did I just hear a non-argument?


 No.896193

>>896191

>>>/molyneux/


 No.896196>>896198

>>896179

>variable length

Not variable at all, just like the size of a char, a short, an int or a long are not variable and they are simply different types. micro strings (minimum addressable size, equivalent to NUL terminated strings; could also be named short short string if you are into retarded modifier naming schemes), short strings, strings, long strings, macro strings (or long long strings) would be different types, and the language would just know how to promote them when appropriate, just like it already promotes these numeric types.


 No.896198>>896200

>>896196

Just what we need, c with even more type coercion, only this time with dynamically allocated values.


 No.896200>>896202

>>896198

>c with even more type coercion,

Okay, you can tell the fucking compiler you really want to cast that short to an int when passing them to functions accepting ints by explicitly stating it, Java pedantry style, but considering it makes no difference to you or the resource allocations required you may as well let the compiler do it for you.

If you don't want the compiler to automatize anything at all, why don't you fucking code it in assembly?

>with dynamically allocated values.

you wot m8


 No.896202>>896214

>>896200

>If you don't want the compiler to automatize anything at all,

I want the compiler to automate all kinds of things. I want it to automate ways to ensure my code is correct. I don't want it to just randomly fuck with the types of the numbers because I am using them. Its like javascript fuckery where ints get turned into strings based on the operation.


 No.896203>>896205

>>895728

That actually depends on the platform (i.e. what is a char actually stored as). Unarguably, though, they take up more processing, because to detect the end of the string, you have to check each and every character as you are parsing it.


 No.896205>>896207 >>896214

>>896203

>they take up more processing

*for a single particular operation

copying a null terminated string takes less processing time


 No.896207>>896208

>>896205

No it doesn't. Consider this, how do you know when you are at the end of the source string? With a null-terminator, you must check every character.

With an integer counting up to the length of the string, you have the potential benefit of loop unrolling. i.e. compare the index to the string length every 16 elements or so instead of comparing each char to '\0'.


 No.896208>>896227

>>896207

>loop unrolling

If you want to blow your instruction cache and slow down your whole program


 No.896214

>>896202

>don't want it to just randomly fuck with the types of the numbers because I am using them

<int fug(long thing) { /* fug the long thing */ }

<unsigned short short benis = 255;

<int colonDDDDD = fug(benis);

<fug(benis) == fug(255)

<(int) benis == (int) 255

No data lost at all because you just told it to take 8 bytes rather than 1. 255 in 8 bytes is still 255, and since the function was already accepting a long long, unless you attempted to introduce some sort of generics into C that allowed you to make a function that worked with short shorts and another that worked with longs, but that's fucking bloat and would help you nothing at all.

Since you can not divide or multiply (possible float or double casting), sum or rest (possible int promotion) a string, all you could do would be concatting them, which would be implemented via functions that would return appropriately sized strings that would raise warnings if attempted to be assigned to smaller string types. No harmful type coercion at all, miles simpler than any other integer operations system ever devised in a mainstream language.

> Its like javascript fuckery where ints get turned into strings based on the operation.

For what's it for, JS more often than not does what you want it to do, considering it is meant for text processing. I agree JS's type system is retarded, but C can never be as retarded as JS since it is not dynamically typed, and type coercion is really one of your smallest problems, considering it is only a problem because the DOM is all retarded and doesn't define an universally enforced type for input values (ie: numeric fields generally return a Number in modern browsers if the field is supported, but IE returns a string even though it claims to support numeric fields even though it doesn't, so you have to cast it to Number anyway, but depending on locales you may not parse it correctly if they use commas instead of dots for decimal positions) and you can't really know which types you are working with.

>>896205

As long as you know the size of the string you are allocating to, which may be smaller or bigger than the string you are getting (in which case, you have to check for being inside your target string bounds and for your source string's NUL terminator). If it is bigger, your source string might get cut, which is undesirable, so you would want to have your target string be at least as big as your source string, which implies malloc and also seeking the last position of your source string. Or you could use buffers and do some retarded realloc-ing or an array of char * to grow your target string as you read your source string, but that's all sorts of retarded and would be more wasteful than if you just knew their lengths.


 No.896219>>896222

File (hide): 1c385a2a74721cc⋯.jpg (179.9 KB, 494x1038, 247:519, dictionary idiot splatoon.jpg) (h) (u)

>>895785

Congratulations, you blew the stack.


 No.896222>>896230 >>896236

>>896219

Shouldn't do that since it should trigger tail call optimization, but using function pointers is wasteful if we are talking about autistically optimizing shit.


 No.896227

>>896208

The whole reason compilers insert loop unrolling is for the exact opposite reason; it increases I-cache hits.


 No.896230>>896238

>>896222

Neat, but I'm not finding any way to guarantee it and with small stacks it'll blow quick. I use function pointers all the time though


 No.896236>>896238

>>896222

>autistically optimizing shit

lmfao how about writing a function that actually works instead.


 No.896238>>896245

>>896230

>I use function pointers all the time though

It's not a bad thing and performance cost is negligible, but we are talking about people feeling the need to install Gentoo to cut 2 MB of total RAM usage here. If it's gonna save you several lines of code, or worse, a gigantic switch of death, by all means do it.

>>896236

Tell the tripfag. I personally wouldn't bother writing a single line of C (or C with syntax errors :^) )due to hipsterism.


 No.896245>>896348

>>896238

You wouldn't write it because you couldn't. You're just another retarded larper.


 No.896253>>896520 >>896524 >>896535

We won't ever be truly free of this moronic C string business until people stop using stdio.

>>895742

The read system call stops reading when input data is exhausted. It returns exactly how many bytes it has read, so at the end of the process you know exactly how long the data is.

NUL-terminated strings are, like errno, a C concept. Linux gives approximately zero fucks about your NUL, just like there is no actual errno global variable (the system call interface simply returns the error code like a sane implementation would). This is stdlib garbage.


 No.896308>>896315 >>896352

>>895942

Is there a Lisp out there where the list data structure isn't actually a linked list? Can it be a dynamic array, for example?


 No.896315>>896522 >>896656

>>896308

>list data structure

>but not an actual list

durr


 No.896348>>896513

>>896245

>>895778

>>895816

Feel like a hero yet, Reddit?


 No.896352

>>896308

Clojure kind of applies, they're not the default but they exist, and you can create a fork where they will be the default.

this "list" should be immutable, so it's relatively tough shit to implement if you want something better than the linked list, but in Clojure there are already a couple of persistent data structures, either you can use them, or study the algorithms and re-implement them in your language of choice.


 No.896509

>>895743

>200MHz CPU is blazing fast

but anon, a 200MHz CPU is blazing fast. You only need more if you plan to run the latest Macroshit Wangblows craperating cistern wid' Enpantsed Jewgle Crowd Pthtoorage Gapeability.


 No.896513


 No.896516>>896519

>>895736

t. FAT32


 No.896519

>>896516

thats the joke


 No.896520

>>896253

are there any viable alternatives to shill?

getting rid of stdio sounds like a good idea.


 No.896522>>896526

File (hide): 403e9fdb8613429⋯.png (435.51 KB, 2560x2489, 2560:2489, pepe HD.png) (h) (u)

>>896315

>not knowing the difference between an interface and an implementation


 No.896524>>896535

>>896253

>until people stop using stdio

aren't plain "string" literals in C also generating an extra zero byte at the end?

it seems people also need to eschew "string" literals in C and use something like a macro which expands to a constant array of bytes or something else.

I mean it's doable but not really ergonomic in plain C.

it's easier to get rid of this shit in C++ perhaps.


 No.896526>>896527

>>896522

>hurr durr just make the memory a linked list

literal brainlet


 No.896527

>>896526

can you even read plain English?


 No.896535>>896538 >>896546 >>896655

>>896253

>We won't ever be truly free of this moronic C string business until people stop using stdio.

Except null-terminated strings are embedded right in the language, with string literals being null-terminated.

>>896524

>use something like a macro which expands to a constant array of bytes or something else

Alas, I am afraid such a thing is not possible with the C preprocessor.

However, for all the LARPers out there, keep in mind GCC and Clang/LLVM are Free as In Freedom™ software, that anyone can modify. You're all such C expert senior engineers, writing a GNU extension for non-asciiz strings should be TRIVIAL.


 No.896538>>896542

File (hide): a9d7300dcd23150⋯.jpg (56.23 KB, 900x535, 180:107, DYWMuUGVoAEQXsv.jpg) (h) (u)

>>896535

I will stick with LLVM. A compiler that respects my freedom, unlike the restrictive GCC.


 No.896542>>896543 >>896545

>>896538

Cuck, with a cuck license. How's your wife's son?


 No.896543>>896592

>>896542

Whats cucked about it? I can go sell it to random fucks and no one can stop me. The developers are cucked. The users are the least cucked possible. This is in comparison to your GPL compiler. Under the GPL the developers are not as cucked as BSD developers, but they are still cucks compared to proprietary. The GPL users are more cucked than BSD users because they are bound by more terms.


 No.896545

>>896542

>Open Source

lol the cucks are fighting again


 No.896546>>896547 >>896634

>>896535

>Except null-terminated strings are embedded right in the language

No. There is no string datatype at all in C. It's a convention to use NUL-terminated arrays of char, and that's what standard library functions expect. You're free to implement your own functions and libraries which handle whatever string datatype equivalent you come up with in whatever way you like.


 No.896547>>896548 >>896554 >>896709

>>896546

>It's a convention to use NUL-terminated arrays of char

So how do you explain the fact that this is null by the compiler?

const char* foo = "bar";


 No.896548

>>896547

*null terminated


 No.896554>>896555 >>896707

>>896547

String literals are syntactic sugar.


 No.896555

>>896554

So no language has anything. Got it. Its all just syntactic sugar on top of machinecode.


 No.896567

>A character string literal has static storage duration and type ``array of char , and is initialized with the given characters. A wide string literal has static storage duration and type ``array of wchar_t, and is initialized with the wide characters corresponding to the given multibyte characters. Character string literals that are adjacent tokens are concatenated into a single character string literal. A null character is then appended.

>A string is a contiguous sequence of characters terminated by and including the first null character. It is represented by a pointer to its initial (lowest addressed) character and its length is the number of characters preceding the null character.

>A character string literal need not be a string (...), because a null character may be embedded in it by a \0 escape sequence.

https://port70.net/~nsz/c/c89/c89-draft.html


 No.896585

>null

>NULL

This is how fucktarded this guy actually is.


 No.896592>>896593 >>896594 >>896725

>>896543

>I can go sell it to random fucks and no one can stop me.

Can you please remind me of which part of the GNU Public License (version 2 or 3) forbids the user from selling a copy?


 No.896593

>>896592

>the user from selling a copy?

Look we both know thats bullshit. Its theoretically possible to have someone pay for GPL code but when they get the source you are gonna have a real hard time charging for it a second time.


 No.896594>>896613

>>896592

>one person buys

>can legally uploading to every other person

lmao


 No.896613

>>896594

>"NOT EVEN MERCHANTABILITY"


 No.896634>>896638

>>896546

>that's what standard library functions expect. You're free to implement your own functions and libraries which handle whatever string datatype equivalent you come up with in whatever way you like.

THIS is what I think we need. Quite frankly, C stdlib is pure garbage. I'm working on and off on a project of this type, a custom freestanding C library based on Limux. Once it has a reasonable set of features to make it useful, I will publish it under MIT.


 No.896638>>896645 >>896649

>>896634

>cuck license


 No.896645>>896649

>>896638

Yes. I don't particularly care about improvements being sent back to me. I just want to stop using libc and start using Linux directly because frankly the Linux interfaces are a LOT better. If other people think my code is useful, I want them to please use it.


 No.896649>>896706

>>896638

>>896645

In fact, I'm personally rather wary of "improvements" that get sent since they can be a curse in disguise. Have you SEEN glibc source code? It's a mess. Even something as simple as an strlen implementation is huge and needs truckloads of comments to explain what the fuck is happening, all so it can scan lots of data at once to improve performance while looking for the NUL.

I want my code to be simple so that I, and maybe even other people, can immediately understand it when reading it. The license allows you to do whatever you want, so you can just supply your own highly optimized functions if it matters that much.


 No.896655

>>896535

>Except null-terminated strings are embedded right in the language, with string literals being null-terminated.

Just because string literals are NUL terminated doesn't mean that you can't keep count yourself. As far as I/O goes, only stdio requires NUL terminated strings; the kernel interfaces do not.


 No.896656

>>896315

What stops car from returning

array[0]
and cdr from returning the slice
array[1..-1]
?


 No.896658>>896674 >>897048

>>896124

>You could mmap that file and seek it on demand

How to recover from SIGSEGV properly?


 No.896674>>896770

>>896658

You just define a signal handler and keep on reading


 No.896706>>896711 >>896748 >>896773

>>896649

>Even something as simple as an strlen implementation is huge

size_t strlen(char *s)
{
int i = 0;
while (s[i++]);
return (size_t)(i-1);
}

>huge


 No.896707>>896773 >>897174

>>896554

It's not like "bar" is syntactic sugar for {'b', 'a', 'r', '\0'}, because

char *s = {'b', 'a', 'r', '\0'}
doesn't work. Then what is "bar" exactly syntactic sugar for?


 No.896709

>>896547

>const char* foo = "bar";

>char* foo

In the following line

int* foo, bar, baz;
the retarded style you gave an example of suggests that all three declared variables are pointers to int, which is not the case. That's why the asterisk is supposed to stick to the identifier and not to the type, like this:
int *foo, bar, baz;
so it's obvious what is what.


 No.896711>>896716 >>896773

>>896706

What is the shortest possible strlen implementation that werks? Anything shorter than the one below (52 bytes)?

int strlen(char*s){int i=-1;while(s[++i]);return i;}


 No.896716>>896764 >>896773

>>896711

Here are three alternatives, but all still same length:

int strlen(char*s){int i=0;while(*s++)++i;return i;}
int strlen(char*s){int i=-1;for(;s[++i];);return i;}
int strlen(char*s){int i=0;for(;*s++;++i);return i;}
Looks like 52 byte strlen might be tough to beat.


 No.896725

>>896592

Have fun selling binaries right next to the readable source code.


 No.896748

>>896706

It's huge in glibc, yes. 78 lines not including the license comment at the top.


 No.896764

>>896716

Here's my attempt.

int strlen(char*s){return*s?strlen(s+1)+1:0;}
It clocks in at 45 characters. It's kind of cool how I was able to remove all the white space from the function's body.


 No.896770

>>896674

Nope. Returning from the signal handler means returning to the point in the code that triggered the SEGV.


 No.896773>>897162

>>896706

>>896711

>>896716

Mine looks similar to that, but it uses pointer difference and checks for NULL.

Now check out the glibc strlen function. It's fucking huge.

>>896707

It's sugar for a const char array.


 No.896779>>896781 >>896830 >>897048

>Have advanced string object with separate size variable

>Check size and be ready to add data at the end

>The actual string is much shorter

>Either segfault or security fun


 No.896781>>896829

>>896779

When would that happen?

Is it a bigger danger than, for example, unterminated classical C strings?

Many languages before and after C have had counted strings.


 No.896829>>896859

>>896781

>When would that happen?

Even less chance than a classic buffer overflow.

Someone has to write exceptionally stupid code to produce invalid string object (that is, which has invalid length)


 No.896830

>>896779

>>The actual string is much shorter

This won't happen unless someone deliberately tries to change length value to some nonsense.

It's no different than producing any other kind of array with invalid length.

The string in this case is simply a special kind of an array.

Do you mean people should not use arrays as well and use some sentinel to terminate them? And what if there is no possible value to terminate, for example, if the array stores some bytes and all of them are valid encounters in the content?

Do you understand you just added more bullshit to this already drowning in bullshit thread?


 No.896859>>896889 >>896892 >>896896

>>896829

>Someone has to write exceptionally stupid code to produce invalid string object (that is, which has invalid length)

Given that C does not allow for private structure members, it is indeed perfectly possible to indicate a wrong length. A classic case of off-by-one when implementing some kind of concatenation function would do that.


 No.896889

>>896859

The solution is to not touch the struct directly. Just because you can doesn't mean you have to.


 No.896892>>896952 >>896988

>>896859

>Given that C does not allow for private structure members, it is indeed perfectly possible to indicate a wrong length.

Come on, how hard is it not to screw up a size variable? You almost never touch them anyway. You get them as parameters. If your programmers can't stop themselves from accidentally overwriting a pointer + size pair, they should probably be writing Java. Lengths are explicit and really hard to screw up, completely unlike the "hurr I forgot a NUL terminator" bugs.

Strictly speaking, you could easily provide an opaque structure (incomplete, forward-declared type accessible only through pointers) with accessor functions.


/* string.h */

struct string;

size_t string_length(struct string *);
size_t string_data(struct string *);

/* string.c */

struct string { size_t s; char *p; };

size_t string_length(struct string *s) { return s->s; }
size_t string_data(struct string *s) { return s->p; }

But this would forbid stack allocation of the string structure and force people to use functions tonaccess member variables when it's just not necessary. In fact, this would be nearly undistinguishable from a generic dynamic array library. Indeed, most C stdlib str* functiond are pretty much equivalent to their mem* counterparts, save for the NUL terminator handling.


 No.896896>>896898 >>896988

>>896859

>A classic case of off-by-one when implementing some kind of concatenation function would do that.

Yes, but that happens in virtually every other language as well. If didn't calculate an index or length incorrectly, it's a logical/mathematical error. If you failed to NUL-terminate the C string, it's a simple human forgetfulness error.

Using explicit lengths gets rid of the latter class of errors, while only Haskell might be truly immune to the former.


 No.896898>>896907 >>896911

>>896896

>If you failed to NUL-terminate the C string, it's a simple human forgetfulness error

you know what?

it's also possible to accidentally include an extra NUL where it's now allowed to be, and that is harder to detect (no out of bounds access, no segfault) and can have even worse implications, actually the article in the OP is about exactly this.


 No.896907>>896908 >>896910 >>896911

>>896898

wow anon what about the implications of accidentally including NULL in the middle of a linked list. oh shit someone call blackhat.


 No.896908>>896909

>>896907

zero byte is a valid character in many encodings.

now go learn something about computing and programming, ffs.


 No.896909>>896925

>>896908

Yeah which encodings.


 No.896910>>896916

>>896907

Retard. A zero in a cadr doesnt terminate the list.


 No.896911>>896916 >>896922

>>896898

True. That's why I don't like encoding data in-band. Arrays are simple, they are just a memory segment, a pointer to the start and the length, and they're a general data structure that can hold anything. C strings and other similar stuff constrain this simple concept. Suddenly, it can't hold arbitrary data anymore; it can hold anything except a zero byte, because the zero byte is now a special value that encodes the length of the array within the array itself, and if you accidentally put a zero value anywhere you end up cutting the string into pieces.

>>896907

NULL is a pointer. The data contained by the linked list is completely oblivious to the NULL pointer handling.


 No.896916>>897028

>>896910

lol

>>896911

hmm if only strings were more like arrays


 No.896921>>896922

>>895811

>but it's also clearer

So clear that you made an error while writing it.


 No.896922>>896924 >>897028 >>897038

>>896911

>C strings and other similar stuff constrain this simple concept

Like?

>Suddenly, it can't hold arbitrary data anymore

Well no shit, retard, it's a fucking string. Strings are supposed to encode human-readable text, not arbitrary data, unless that data is itself encoded in a human-readable (or at least printable like base64) format. Want arrays of arbitrary data? Use arrays instead of being a retard. I bet you're the kind of person who complains a stack interface doesn't have a function to access to an arbitrary element of it.

>>896921

Autist


 No.896924

>>896922

>Autist

Ouch.


 No.896925>>896930 >>897185

>>896909

ASCII and UTF-8, for example.

C is unusual in having trouble with null bytes, and some parts of C and Unix do support them. Try this to see Python, C/Unix and UTF-8 work with a null byte:

python3 -c 'print("foo\0bar")' | cat -v


 No.896930

>>896925

thank you for freeing me from the burden of explaining this to him. really appreciate that.


 No.896952>>897009 >>897038

>>896892

>Everyone is playing nice

>Nobody will try to put invalid data in something you will have to rely on

I can see CIA rubbing its hands


 No.896988>>896996 >>896998 >>897043

>>896892

>how hard is it not to screw up a size variable?

A single one? Not hard. However, consider the following game of statistics: say there is a 95% chance that you get a line of code right without needing to touch it. If you implement a complete, modern string library, the chances that you will not commit ANY error in 1,000 lines are 0,95^1000, which is to say, 0. On average, it means 50 bugs per 1,000 lines.

That's how easy it is to screw up. Even when writing trivial shit.

>>896896

it happens in every other language, yes, but those languages have bounds checking and hide the internals of the string implementation behind a structure.

Also, not NULL-terminating a string can easily be a logical or mathematical error as well (typical example of appending a string to another, but copying 1 byte less than its length, thus removing the null character).

If you want safe strings, hide them behind an interface that does bounds checking, but that's NOT going to be fast.


 No.896996>>897010

>>896988

>That's how easy it is to screw up.

those statistics would have to include pajeet and the soyfags.


 No.896998>>897010 >>897048

>>896988

If you've got one string library that'll be used in thousands of programs it's reasonable to invest ten times the effort to make sure that it works the way it's supposed to. Making mistakes is easy, but when you use a library instead of C-style strings there's a lot less code using unsafe primitives when you add it all up.

Do you have some sort of benchmark or other source to support the claim that safe strings aren't fast? I'd expect slight performance loss in most (but not all) cases, but usually nothing significant. Safe strings aren't something modern they started using when computers became faster. BCPL had them.


 No.897009

>>896952

we are talking about in-memory data.

when you get the string from somewhere, you of course allocate the right amount of memory and the right length is stored. do you worry about someone overwriting the memory of your process? then you have bigger threats to worry about, and you probably want to look for techniques for radiation-safe software development, that is, software that is resilient to radiation-induced bit flips.

otherwise, it's obvious bullshit.


 No.897010>>897013 >>897015

>>896996

Admittedly, it's number I pulled out of my ass, but if you've ever done any development beyond a fizzbuzz, you know that getting a program right from the start is impossible. Even if the actual statistics of errors was 1 error per 200 lines (99.5% correct lines), it still accumulates pretty fast.

>>896998

>Do you have some sort of benchmark or other source to support the claim that safe strings aren't fast?

Array bound checkings is slower than not checking, although admittedly usually not by much, as modern processors include branch prediction. And the larger the string, the lower the impact of branch prediction misses.


 No.897013

>>897010

>Array bound checkings is slower than not checking

for the most critical parts it can be optimized away, but it's not a reason to eschew checking everywhere, the golden rule of optimization is to optimize at the "bottle neck".

also, modern compilers can prove correctness of access in some places and optimize it away. I remember that Java's JIT does that in many cases.


 No.897015>>897018 >>897019

>>897010

lines of code don't mean anything.

you can split and join lines arbitrarily without changing the meaning of code at all.


 No.897018

>>897015

And you could write a book with one fucking sentence. I bet thats meaningless too.


 No.897019>>897022

>>897015

Either you didn't read or don't know what you're talking about.


 No.897022>>897034

>>897019

>1 error per 200 lines

One really long line of code = no errors


 No.897028>>897048

>>896922

>Like?

Other sentinel-using data structures.

>>896916

They are. Just keep their length around in a variable.


 No.897034

>>897022

I think you just hacked programming.


 No.897038>>897040

>>896922

>Well no shit, retard, it's a fucking string. Strings are supposed to encode human-readable text

Kill yourself brainlet. C strings have absolutely no notion of an encoding, and even if it had, the encodings themselves usually attribute some meaning to code point zero. In ASCII it is a control code which functions as a no-op. It is absolutely valid for an ASCII- or UTF-8-encoded string to have a null byte in it. The fact including some 0 byte in a string will make software written in C truncate it is a bug.

>>896952

Most exploits happen in the input handling layer, actually. Unless the CIA can subvert the kernel's most fundamental I/O syscalls and make them return bogus values, I'd say you will always know the exact size of your data.


 No.897040>>897046 >>897056

>>897038

>Most exploits happen in the input handling layer

this

>C strings have absolutely no notion of an encoding

not op but I like having a generic unicode type that is coding independent. Haskell does this for example where you have a type called Text and you have operations for it and can then encode / decode it to whatever you want, while the abstraction is independent of it.


 No.897043

>>896988

>those languages have bounds checking and hide the internals of the string implementation behind a structure.

These internals are essentially just a length and a pointer. Out of bounds access is still a bug, even with bounds-checking; the only difference is the runtime won't actually allow the access to take place, raising an exception instead.

>Also, not NULL-terminating a string can easily be a logical or mathematical error as well (typical example of appending a string to another, but copying 1 byte less than its length, thus removing the null character).

Use explicit lengths and you simply don't have to think about this at all.

>If you want safe strings, hide them behind an interface that does bounds checking, but that's NOT going to be fast.

Obviously, checking your bounds on every single access is going to be safer than not checking. Nobody even said anything about access here. We're talking about the different approaches to encoding the length of the string.


 No.897046>>897048 >>897142

>>897040

>Haskell does this for example where you have a type called Text and you have operations for it and can then encode / decode it to whatever you want, while the abstraction is independent of it.

This is how I would model it as well.

The fact is the folks who made C and Unix made A LOT of assumptions about these things. C strings are supposed to be "text" but are in fact just 0-terminated byte arrays containing arbitrary non-null data. There's no actual string type; C string literals are just array literals with an inplicit zero at the end, and to complement that constrained array type C has a whole roster of str* functions that take zero-termination into account and are otherwise equivalent to their mem* counterparts. It's clearly a specialized array. This is why a dynamic array is sufficient for 100% of C string operations.

If I were to write a string library, I'd do something like:

struct memory {
size_t size;
char *pointer;
}

enum encoding {
UTF8 = 0,
UTF16,
UTF32,
ASCII,
/* ... */
}

struct text {
struct memory memory;
enum encoding encoding;
}

The fact is text is just encoded memory. C's entire notion of 0-terminators make it so only a (large) subset of encodings are supported, and require memory handling functions to be duplicated as 0-terminator handling versions. The only advantage of this design is the sheer minimalism of the data structure itself.


 No.897048>>897049 >>897062

>>896658

>How to recover from SIGSEGV properly?

The computer does it properly, so it's the OS's fault. C and UNIX can't do it because they suck. Look up what segmentation fault means in Multics.

>>896779

>Check size and be ready to add data at the end

>The actual string is much shorter

You're talking about data corruption. If you mixed those strings with C strings, you might get a buffer overflow that changes the length, but C buffer overflows could corrupt anything.

>>896998

>I'd expect slight performance loss in most (but not all) cases, but usually nothing significant.

I'd expect a huge performance increase in all cases except for one, which is when you are parsing a string one character at a time, parallelism won't help you in any way, and you don't care about how many characters are remaining. Everything else is much faster when you know the length.

>Safe strings aren't something modern they started using when computers became faster. BCPL had them.

FORTRAN had Hollerith strings in source code because knowing the length ahead of time is much faster. Later on, readability and not having to manually count the length of every string became more important than raw speed. Strings in most languages were safer and faster than C strings. I'd say that the acceptance of null-terminated strings is because people don't care as much about efficiency as they used to. Lisp machines were about making dynamic languages faster and simpler by checking type tags in hardware, but people today don't care as much if they're fast.

The source of UNIX stupidity, B, used EOT to terminate a string, so the use of null is totally arbitrary. If you couldn't put ASCII character 04 in a string, UNIX weenies would say it's "stupid" to want to use that character in a string.

>>897028

>Just keep their length around in a variable.

That sounds like a good idea, but you should drop the null or you'll end up with multiple "length" variables and they won't all be equal because someone will put a null character somewhere.

>>897046

>C has a whole roster of str* functions that take zero-termination into account and are otherwise equivalent to their mem* counterparts.

Except there's no way to give many of them a string length at all.

BTW, I had to replace the NUL characters (etc.) in the above
line with caret-atsigns because when I tried to send the
message the first time the line did not appear since some
Berserkely C blabberer with less than two fingers of
forehead decided to write a mailer that reads messages with
gets or some equally braindead substitute for an input
reader and just drop the non-printable characters (rather
than bounce and complain or something semi-reasonable).


 No.897049

>>897048

ffs again if you are going to block quote things include text describing who / what its from. quotes with no context are no authority


 No.897056>>897076

Here's a fun little test:

$ python3 -m timeit -s 'a = "c" * 1_000_000; b = "c" * 1_000_001' 'a == b'
10000000 loops, best of 3: 0.0265 usec per loop

Those are one-megabyte strings. Try that in C.

>>897040

That's also the route Python went with version 3. Strings in Python 2 were just sequences of bytes, but it now has a bytes data type and a string (unicode) data type.

The underlying representation of the text is abstracted away. You can use ord and chr to go to and from unicode code points, if you want, and you can use .encode() and .decode() to go to and from a bytes representation, but unless you explicitly ask for it, you're never confronted with the gory details. It keeps you sane.


 No.897062>>897071

>>897048

>If you mixed those strings with C strings,

>>you should drop the null or you'll end up with multiple "length" variables and they won't all be equal because someone will put a null character somewhere.

One shouldn't mix these different types. It's clear that general arrays don't have the same constraints as 0-terminated char arrays. Custom string types have even more elaborate semantics; they could have separate length and capacity field to track the length of the actual text and of the allocated memory.

In practice, most string libraries out there use that length+capacity design, and allocate an extra byte for a "hidden" null byte at the end of the memory they maintain and also take care to correctly set the null byte after every operation. They do this so you can pass their pointers to C stdlib functions. Personally, I don't care very much about this because I advocate talking to the kernel directly instead of using the extremely limited C stdlib, but I can understand why that'd make their string library better.

>The computer does it properly, so it's the OS's fault. C and UNIX can't do it because they suck.

Yeah, signals in Unix were pretty much a huge mistake. Still, I'd love to see a way to recover from SIGSEGV reliably, at least in Linux. Imagine I'm writing a JIT compiler by mmaping executable pages and the generated code causes a segmentation fault or even illegal instruction errors; I'd like to handle those errors.

I ask this question every single time SIGSEGV is mentioned and to this day nobody answered...

>Except there's no way to give many of them a string length at all.

Yeah. I think it's hilarious how they had to make things like strn* versions of functions so that it would be safer to use those functions. It's just backwards. The mem* functions are the right thing.


 No.897071>>897072 >>897076

>>897062

I am not familiar with the mem* functions. Which functions do you mean?


 No.897072>>897076

>>897071

oh do you mean like memcpy?


 No.897076>>897091

>>897056

God I love Python3, and I used to think Python was PHP tier in the Python2 days. The new version really cleaned up the language.

The new string type is amazing. It uses unicode so it does the right thing by default, and there's even unicode metadata integration. Best of all is how they don't treat strings as just arrays of bytes/code points anymore.

'ほげほげ'[::-1]
# => 'げほげほ'

'čšž'[::-1]
# => 'žšč'

The old string type simply became the bytes type, which is actually appropriate. Lots of people just did I/O and used whatever came in or out as opaque data, and the bytes data type is absolutely appropriate for this use.

>>897071

>>897072

Yes.


 No.897091>>897104 >>897108


 No.897104>>897110


 No.897108>>897163

>>897091

Yeah memcpy has that stupid limitation because muh efficiency. Always use memmove whenever possible.


 No.897110>>897131

>>897104

I thought you asked what memcpy did lmao


 No.897131

>>897110

the literacy of this one wew


 No.897142>>897144

>>897046

>The only advantage of this design is the sheer minimalism of the data structure itself.

You should now be aware that the entire edifice of Unix was built on this.


 No.897144

>>897142

He is, and this is a bad thing. Lisp machines are also shit tho.


 No.897162>>897167

>>896773

>It's sugar for a const char array.

If ptr[n] is sugar for *(ptr+n) and ptr->m is sugar for (*ptr).m then "foo" is sugar for... what exactly?


 No.897163

>>897108

>muh efficiency

Always use asm (specifically MOV) whenever possible.


 No.897164>>897175 >>897548

Speaking of C, why do bit fields "look good on paper" but are so bad in practice? The implementation-dependent issues related to endianess/order, alignment, padding etc. make them basically a non-contender compared to just using standard datatypes and adressing specific bits with bitmasks etc.


 No.897167>>897168 >>897175

>>897162

{'f', 'o', 'o', '\0'}, more or less.


 No.897168>>897174 >>897175

>>897167

(it's not equivalent, so it's not quite syntactic sugar, but it's close enough to serve the broader point that C strings are a convention supported by the syntax)


 No.897174>>897176

>>897168

>it's not equivalent

Which was already pointed out here >>896707

Using a string literal such as "foo" puts a const array of char in the heap but itself stands for a pointer to its first element (and thus can be assigned to a pointer variable), while {'f', 'o', 'o', '\0'} represents an actual array and can basically only be used to initialize an array variable on the stack (or a struct).


 No.897175>>897184 >>897196

>>897164

You basically answered your own question.

>>897167

>>897168

It's more like

(const char[]) {'f', 'o', 'o', '\0'};

https://ideone.com/CbWfE2


 No.897176>>897180

>>897174

>Using a string literal such as "foo" puts a const array of char in the heap

>heap

It's allocated in a read-only ELF section for constants.


 No.897180>>897183

>>897176

>read-only ELF section for constants.

And where do you think the segments of the ELF file are loaded?


 No.897183

>>897180

In the process's address space.


 No.897184>>897186

File (hide): 873e42801022a6a⋯.jpg (6.12 KB, 259x194, 259:194, hal.jpg) (h) (u)

>>897175

>casting shit to array type

I'm afraid I cannot do that, Dave


 No.897185>>897343

>>896925

>C has trouble

>cat -v doesn't

hmm


 No.897186

>>897184

It works?


 No.897196>>897198 >>897203 >>897209 >>897217

>>897175

#include <stdio.h>

int main(void)
{
const char *s1 = "foo";
const char *s2 = (const char[]){'f', 'o', 'o', '\0'};
const char s3[] = {'f', 'o', 'o', '\0'};

printf("%p\n", s1); /* heap */
printf("%p\n", s2); /* stack */
printf("%p\n", s3); /* stack */

return 0;
}
The above shows how "foo" and (const char[]){'f', 'o', 'o', '\0'} are still not the same.


 No.897198>>897200

>>897196

That's an implementation detail not the standard.


 No.897200>>897201

>>897198

But if you want to call one piece of syntax "sugar" for another piece of syntax then you must be able to rely on them both to work exactly the same in all contexts.


 No.897201

>>897200

I don't call it sugar, not OP.


 No.897203>>897204

>>897196

Brainlet detected, are you the same anon who still thinks const char* literals are in the heap?


 No.897204>>897205

>>897203

They literally are tho.


 No.897205>>897206

File (hide): d63cbf2ca4360ab⋯.png (36.61 KB, 645x773, 645:773, 1510923282420.png) (h) (u)

>>897204

They're in .rodata, maybe learn about C before you try to write code in it


 No.897206>>897207

>>897205

And where do you think that is stored lol?


 No.897207>>897211

>>897206

In the address space, usually contiguous with the other data segments. Do you even know what a heap is lol?


 No.897209>>897218

>>897196

>heap

>stack

Kill yourself, brainlet.


 No.897211

>>897207

Not only that, this kind of process image data is usually allocated at one end of the address space, while the kernel is at the other end, and both processor stack and the process break growing in opposite directions towards the "middle" and each other.

Memory map lets you assign any part of the address space, though. It's no longer neatly sequential like process break and stack. Virtually all memory allocation libraries use mmap.

Nowhere in this picture does a "heap" exist.


 No.897214>>897216


#include <stdio.h>
#include <string.h>
#include <stdlib.h>

const int x = 0xf00;

int main(void) {
const char *s1 = "foo",
*s2 = (const char[]){'f', 'o', 'o', '\0'},
s3[] = {'f', 'o', 'o', '\0'};

char *s4 = malloc(sizeof(s3));
memcpy(s4, s3, sizeof(s3));

printf("%p\n", &x); /* rodata */
printf("%p\n", s1); /* rodata */
printf("%p\n", s2); /* stack */
printf("%p\n", s3); /* stack */
printf("%p\n", s4); /* heap */
}

Roast me.


 No.897216>>897217 >>897238

>>897214

Yes, and? They are all 0-terminated arrays of char. Where they are allocated does not matter.


 No.897217

>>897216

I was just illustrating the problem with >>897196, not making an argument although I find nul-termination distasteful


 No.897218>>897221 >>897227 >>897228 >>897292

File (hide): b32f7cc6cea7d77⋯.png (147.98 KB, 1072x801, 1072:801, contents.png) (h) (u)

>>897209

Books should off themselves too?


 No.897221

>>897218

Maybe you should read that book to find out what those words mean :^)


 No.897227>>897233 >>897265 >>897269 >>897271

>>897218

Yes.

The "stack" is just a pointer maintained by the processor in a register. Managing stack memory consists of incrementing or decrementing this pointer. That's it. The memory lies on the address space. It's fast, but ephemeral due to the nature of functions. You can execute multiple threads/functions/coroutines all with separate stacks on one processor; simply adjust the pointer. You can have split stacks by allocating more memory (from the "heap") and adjusting the stack pointer to point there. It's not just "the stack", it can be much more depending on the language.

A "heap" is essentially a buzzword for dynamic memory allocation. As you demonstrated yourself with your confusion, the definition is so vague it engulfs all non-stack memory, even the kernel and process images mapped onto the ends of the address space. People actually believe that all non-local variables are allocated on some "heap". The world doesn't actually work like that. There is no "heap". The "heap" is some kind of elaborate lie told in the name of abstraction. I don't know who invented this garbage but he should be shot.

The real, simple truth is you can allocate memory by extending the process image break (just a pointer, similar to stack) or by mapping new sections of the address space. Most libraries use mmap, so in practice the "heap" is just a big block of memory malloc asked Linux to map onto the process's address space. All sorts of complicated setups can be created, though. For example, you can create an unreadable, unwritable, execute-only memory region for JIT compiled code.

You're not alone. This shitty buzzword is firmly entrenched in the minds of most programmers, especially those who use virtualized languages. They use it to refer to this mythical heap thing, which is where memory magically comes out of. It's why the people who wrote your book used the word. Nobody really asks why these things are the way they are.


 No.897228>>897233 >>897263 >>897265

>>897218

Another thing on your book that's ridiculous and a major pain in the ass is the whole dichotomy between values and references. People simply don't understand that pointers are themselves values. They think some magic happens when you "pass-by-reference" and it somehow lets you access things outside your scope.

The simple truth is you're always passing around values. There is no way to call functions with anything but values as arguments. Pointers are values that represent the address of your data. The value of the pointer itself is copied.


 No.897233>>897264

>>897227

There's no need for this confusing shit, it's pretty much a given that when someone says "heap" they're referring to the memory pool used by malloc.

>>897228

There's no magic, but it's important to distinguish values from addresses. On an architecture with separate data and address lines the references would be passed using entirely different registers.


 No.897238>>897242

>>897216

>const char s[] = {};

Does not look like it's stored as an array of char at all.


 No.897242

>>897238

nvm looks like they are just stored as stack offset instructions.


 No.897263>>897264

>>897228

>pointers are themselves values

Yes, but values of a different kind, memory addresses (along with element length specific to a given type which enables pointer arithmetic, unless they are (void *) in which case it's just a bare contextless address) rather than things like ints, floats etc. I like to think about pointers as meta-variables, i.e. variables that can store/represent different variables (e.g. a pointer to int can be assigned the addresses of different int variables at different times, a pointer to a function with certain parameters and return type can represent different functions at different times, etc.).


 No.897264>>897266

>>897233

It's a "given" yet the guy above called .rodata the "heap", and even when used to mean malloc memory pools it's still wrong given other data structures are used to keep track of memory.

It's hard enough to name things in computer science, last thing we need is awfully vague terminology consuming all meaning.

>>897263

Not really. Specifically, on x86_64, they are pretty much an unsigned long, which is the 64 bit register type. The number represents the address on the 64-bit address space. Look at the assembly, the instructions are the same, the type just changes the offset of the pointer based on its size. It's just a number.


 No.897265>>897286

>>897227

>>897228

So what books do you recommend that teach proper memory management terminology and practices (preferably something concise and to the point (no pun intended) rather than some of the usual 800 or 1300 page mammoth doorstop behemoths where anything interesting starts shining through well after page 150 (after all of the contents, detailed contents, forewords, prefaces to the first as well as various following editions, introductions, acknowlegments, etc. etc.)?


 No.897266>>897286

>>897264

>Specifically, on x86_64

Whoa there. That's implementation-specific. The standard does not imply what the internal representation of a pointer even is (hence the %p format specifier and the NULL macro, among other things), so let's not just assume things, m'kay?


 No.897269>>897310

>>897227

>there is no spoon, the cake is a lie, ur heap a shit


 No.897271>>897286

File (hide): 77d46d1beef3bd3⋯.png (90.08 KB, 517x997, 517:997, hurr.png) (h) (u)

>>897227

So all of these search results are basically mostly confused and misguided people talking out of their asses?


 No.897286

>>897265

Sys V ABI documents and the ELF specification. The Linux Programming Interface. Linux system calls (partly POSIX compatible).

>>897266

Standards don't run code, processors do. Odds are you're running either x86_64 or ARM. I'm very much interested in the semantics of these platforms.

>>897271

>stack variables can't be accessed by other functions

>heap variables are global in scope

>stack is static memory allocation

>I memorized that objects allocated with new go on the heap

Yeah these people are pretty confused. At least they stay on the topic of dynamic memory allocation.


 No.897292>>897299

>>897218

>introduction to a proprietary botnet in the last chapter

>in a fucking book

yes.


 No.897299>>897301

File (hide): b83119795997782⋯.png (26.34 KB, 480x262, 240:131, untitled.PNG) (h) (u)

>>897292

>proprietary botnet

The book's title is "Programming for Engineers - A Foundational Approach to Learning C and Matlab", so there's no surprises I guess. It focuses mostly on C though.

Also, it's not even its final chapter.


 No.897301>>897303

>>897299

you didn't reveal the title until this point.


 No.897303>>897306

>>897301

The point being? The fact it has some chapters on Matlab towards the end was irrelevant to the discussion on memory from a C point of view.


 No.897306>>897309

>>897303

Without knowing the title of the book, it's harder to evaluate if it's worth its salt or not. I know we can search by chapter names, etc., but there can be collisions and whatnot


 No.897309

>>897306

It's certainly unusual for a "foundational approach to learning C" type book in that it immediately introduces pointers and under-the-hood memory concerns. The overwhelming majority of texts talks about pointers no earlier than halfway through the book (though that brings about that revealing moment when the reader finally understands how arrays really work and why he didn't need the & operator with the %s specifier in scanf()), and many books on the language don't even talk about the stack or memory organization at all but just how to use pointers etc.


 No.897310

>>897269

>1999: "there is no spoon"

>2007: "the cake is a lie"

>2018: "you're \"\"\"heap\"\"\" a shit"

kek

how will we ever recover


 No.897343

>>897185

>>and some parts of C and Unix do support them

putchar('\0') works fine, to name one. It's a problem if you work with C-style strings but not even everything in the standard library works with C-style strings - it's only a convention.


 No.897347>>897352 >>897465

$ cat test.c
#include <stdio.h>
const char s[] = {'h', 'e', 'l', 'l', 'o', '\0'};
int main (void)
{
puts(s);
}
$ gcc test.c; and md5sum a.out
5f6bdc1973f14a557f104df5e44cb259 a.out
$ cat test.c
#include <stdio.h>
const char s[] = "hello";
int main (void)
{
puts(s);
}
$ gcc test.c; and md5sum a.out
5f6bdc1973f14a557f104df5e44cb259 a.out


 No.897352>>897357

File (hide): 1d4f66559f41f20⋯.gif (1.93 MB, 235x240, 47:48, Jeremiah Johnson nod.gif) (h) (u)

>>897347

Nice post, I now realize that the other anon was trying to demonstrate that C char* literals are syntactic sugar.


 No.897357

>>897352

>1.93 MB

>gif

gbtr


 No.897445>>897452

>>895874

>data from 2012

lol


 No.897452

>>897445

>vs no data

lol


 No.897465

>>897347

>md5'ing the output binary

Just because it happens to compile to the exact same binary doesn't mean the two expressions in the source code are the same. It was already proven above that you cannot directly "assign" things like {'h', 'e', 'l', 'l', 'o', '\0'}; to a char pointer (and if you coerce it by casting to array type then it's going to be allocated on the stack like an array rather than elsewhere as "hello" would). So they're NOT exactly the same.


 No.897515

>>895728

malloc technically records the length of your object somewhere in the heap for free.

everything being aligned to 4/8 byte offsets mean there can be up to 3/7 bytes of padding.


 No.897547>>897551

>393 replies

And so this larper playground of a thread is slowly drawing to a close.


 No.897548>>897555 >>897805

So you people opposing ^@-terminated strings would like some sort of convoluted clusterfuck of a format in its place (which likely would clunky, messy and mostly nonportable like the bit fields mentioned by >>897164 are)?


 No.897551

File (hide): 93dd67fb8e31b54⋯.jpg (18.69 KB, 327x300, 109:100, Dagqf4JWAAICMxf.jpg) (h) (u)


 No.897555>>897558

>>897548

It would be enough to have a struct containing a char array and a length and some functions and syntactic sugar so that you don't have to touch the struct directly in 90% of programs. What's so nonportable about that?

Bitfields are badly portable because they have too little abstraction. This is a proposal to add more abstraction.


 No.897558>>897559

>>897555

structs are very implementation dependent vs a cstring which you can just push over the network with no extra parsing on the other end


 No.897559>>897567 >>897589

>>897558

Nobody is going to push the actual struct over the network, moron.


 No.897567>>897581

>>897559

Which is why its shit. You can do that with c strings. BTW people actually do that all the time with structs.


 No.897581

>>897567

Chances are you can do that just fine. Length and pointer are 128 bits and aligned, there shouldn't be any padding. You can also just send exactly length bytes of data referred to by the pointer.


 No.897589>>897592 >>897593

File (hide): 13d722e3d7202d0⋯.png (19.33 KB, 588x428, 147:107, tcp_header.png) (h) (u)

>>897559

How do you think network protocol headers are implemented? And you need to have a guarantee that any implementation of the protocol internally stores every data unit (be it a 64-bit int or a 1-bit bitfield) is stored at the exact same place regardless of implementation-specific shit like endianess, alignment, padding etc..


 No.897592

>>897589

What stops you from simply

send(socket, string.pointer, string.length, 0);
like a normal person?


 No.897593>>897595

>>897589

You know these structures aren't dumped as-is onto the network, right? For starters, there's endianness concerns, as two machines might not share the same, and then there's the smaller problem of C compilers paddings structures depending on a lot of architecture-dependent factors, making two structures, containing the same data, on two different machines, be potentially different.


 No.897595>>897604 >>897706

>>897593

Yeah we have compiler directives for creating packed structs that can be sent out on the network, but I don't know of any way to deal with endianness that doesn't involve conditional compilation.


 No.897604

>>897595

>I don't know of any way to deal with endianness that doesn't involve conditional compilation

Because I don't think there is any, unless you limit yourself to using strictly 1-byte data units, such as in a text-based protocol.


 No.897671>>897851

>>895727 (OP)

The difference between sentinel-terminated and length-tagged structures in C is negligable for most cases. The real issue with C is memory safety. PHP had some retarded issues because it had functions that treat input as a zero-terminated string and others that treat input as a length-tagged string. But that's because PHP is and always was, fucking retarded. No other language which claims to be high level is full of basic issues ike this.

>be on tor for 10 years

>no ad blocker

>never seen an ad

>just see stuff like this instead:

>If you're using an adblocker, please consider supporting this site via Patreon or PayPal


 No.897706

>>897595

>deal with endianness that doesn't involve conditional compilation

htonl htons / ntohl ntohs


 No.897805>>897851

>>897548

They probably also hate jagged arrays and linked lists terminated by NULL pointers, and would demand those structures store their length/number of elements at all times instead. Go figure.


 No.897851>>897957

>>897805

>linked lists terminated by NULL pointers

pointers in linked lists are not in-band with data, that's a stupidly bad analogy

(someone already explained it ITT by the way, it's sad that it needs to be repeated)

>jagged arrays

who said they need to use sentinel values to encode the length?

>>897671

>The difference between sentinel-terminated and length-tagged structures in C is negligable for most cases

It is not negligible with regard to correctness and code complexity. You seem to miss the bigger picture.


 No.897957

>>897851

The bigger picture is that non trivial c programs are absolutely full of memory errors that even experts who have been doing it for 30 years have trouble with.




[Return][Go to top][Catalog][Screencap][Nerve Center][Cancer][Update] ( Scroll to new posts) ( Auto) 5
411 replies | 20 images | Page ???
[Post a Reply]
[ / / / / / / / / / / / / / ] [ dir / arepa / cyoa / kc / leftpol / soyboys / strek / vg / zenpol ][ watchlist ]