[–]▶ No.895727>>895728 >>895737 >>895804 >>895846 >>896072 >>897671 [Watch Thread][Show All Posts]
https://blog.ircmaxell.com/2015/03/security-issue-combining-bcrypt-with.html
When will this stupidity finally come to an end?
Even in C, it's absolutely possible to use a struct with pointer and length, and add a library with replacements for the functions which worked with zero terminated strings.
Why would anyone still use zero terminated "strings"? They make no fucking sense, almost the worst idea ever.
▶ No.895728>>895732 >>896203 >>897515
>>895727 (OP)
But anon, null terminated strings take up less space and are the same format regardless of CPU architecture.
▶ No.895732>>895734
>>895728
they don't.
3 bytes difference is irrelevant for long strings.
if you want to make short strings small, then there is such a thing as variable length coding for integers, but this would be so little gain, it's not worth it.
CPU architecture is irrelevant, the pointer size would change in both cases. and the internal represenation of the length doesn't mean jack shit anyway.
also zero terminated strings are less efficient because calculating length is O(n).
▶ No.895734>>895738
>>895732
>they don't.
Ah so that 8 byte size prefix is not a waste of space then?
>there is such a thing as variable length coding for integers
Yeah we will just do a big num hack to size our strings! Good idea.
▶ No.895735>>895736 >>895744 >>896031
What if my string is more than 4,294,967,295 characters!
▶ No.895736>>895744 >>896516
>>895735
No one would ever need to store a file more than 4 gigabytes. We don't need to design our system to handle that.
▶ No.895737>>895739
>>895727 (OP)
You know windows internally uses sized strings. Look how well that turned out.
▶ No.895738>>895740 >>895743
>>895734
4 bytes are enough for strings for all practical purposes. if you have more, obviously it's time to use specialized data structures anyway.
it's not a waste of space, it's only 3 bytes more than the terminating null byte. which is about 1-2 characters on average if you use UTF-8, and less than 1 character if you use fixed size Unicode.
unless you store single characters in strings, this doesn't fucking matter, and it's a lot better than to turn all code dealing with strings into a potential minefield + sacrificing run time where you can't reuse length information for some reason.
>>895734
>Yeah we will just do a big num hack to size our strings! Good idea.
it's only big if your brain is small. it's not gonna be used in the most of application level code.
anyway, in the same sentence I also said that the gain is minimal so it is not worth it. learn to read. still, that would be still better than zero terminated strings.
▶ No.895739>>895743
>>895737
which exactly of windows problems are a consequence of this?
▶ No.895740
>>895738
>unless you store single characters in strings a lot of times
a few words escaped, fixed
▶ No.895742>>895795 >>896253
With any kind of input you usually don't even know the string length beforehand, so some kind of string termination is necessary.
and you don't always need to know the string length either
▶ No.895743>>895748 >>896509
>>895738
>4 bytes are enough for strings for all practical purposes
I agree, no one could possibly need a hard drive more than 16 megabytes. A 200Mhz CPU is blazing fast.
>it's not a waste of space, it's only 3 bytes more than the terminating null byte
For 32bit max length strings. For small strings especially its a waste of space.
>if you use fixed size Unicode.
Well good thing no one uses that for the same reason no one uses size prefixed strings. Because it has a different representation on different machines. Byte order and what not.
>you can't reuse length information
Well thats one particular operation
> it's not gonna be used in the most of application level code
Do you not know what bignum is? You are saying that everyone is going to be using a bignum implementation for basic fucking strings.
>>895739
>windows problems are a consequence of this?
Idk its proprietary they only let us know so much
▶ No.895744>>895745 >>895803
>>895736
if you read a file that's as big, and need to keep all of the content in RAM for some reason, you aren't using a fucking string type for it. it will be almost useless
in this form anyway.
>>895735
a legit use case, please.
also keep in mind that scanning 4 GiB for the zero byte would take really a lot of time.
▶ No.895745>>895750
>>895744
>you aren't using a fucking string type for it.
So no one reads files into char* then? lol
▶ No.895748>>895751
>>895743
>>895743
>Do you not know what bignum is? You are saying that everyone is going to be using a bignum implementation for basic fucking strings.
I'm saying no one will re-implement it.
And using bignum already implemented in a library doesn't make shit any more complex.
>>895743
>Well thats one particular operation
when it is used somewhere and turns an O(n) algorithm into O(n^2) that would be a big deal and a PITA to fix.
>>895743
>Well good thing no one uses that for the same reason no one uses size prefixed strings. Because it has a different representation on different machines. Byte order and what not.
For most use cases it's a bullshit reason. Byte order only matters for data exchange --- files and network. Files do not need to store their size, their size is known. So prefixing size in files would be simply excessive, just as adding zero byte to the end. On network, size counts even more, so it's a normal practice to use variable length coding (protobuf, etc.).
But we are talking about in-memory representation. And you know, you don't swap a fucking CPU on a running machine while keeping RAM and CPU cache and registers, etc.
▶ No.895749>>895795 >>895806
>+3 bytes is a waste of space for strings
Meanwhile people are happily making programs in javascript where every variable is some abominable super object.
▶ No.895750>>895751
>>895745
char* doesn't say anything about whether the referenced content is zero terminated.
it's just a fucking pointer.
▶ No.895751>>895754 >>895755 >>895795
>>895748
>And using bignum already implemented in a library doesn't make shit any more complex.
You think linking in foreign dependencies to use strings does not increase the complexity? HAHAHA
>Byte order only matters for data exchange
Like uhhhh text files, websites, spreadsheets, literally everything.
>are happily making programs in javascript where every variable is some abominable super object.
And this is a bad thing
>On network, size counts even more, so it's a normal practice to use variable length coding (protobuf, etc.).
Not for strings where the smallest representation is null terminated.
>>895750
>He does not have 10 gigabyte log files he wants to parse
▶ No.895754>>895757 >>895764
>>895751
>>He does not have 10 gigabyte log files he wants to parse
you never wrote software which was able to do that, obviously.
>>895751
>Like uhhhh text files, websites, spreadsheets, literally everything.
and? they don't use zero terminated strings.
and we are talking about in-memory representation.
>>895751
>You think linking in foreign dependencies to use strings does not increase the complexity? HAHAHA
HAHAHA, go program some stuff without libc.
▶ No.895755>>895776
>>895751
>And this is a bad thing
Of course. My point was that 3 bytes on a string is nothing. It's the difference between "Hello world!" and "Hello world!gay". You're more likely to waste bytes on shit software design than a structured string.
▶ No.895757>>895759 >>895761
>>895754
>go program some stuff without libc.
String copy is like 5 lines of code to write yourself, you want to add on a bunch of big num shit to make it 100.
>and we are talking about in-memory representation.
You think file formats don't use null termination literally everywhere?
>you never wrote software which was able to do that, obviously.
Its really fucking easy, you just mmap it into memory and start reading. The OS will load and unload the pages for you.
>You're more likely to waste bytes on shit software design than a structured string.
People wasting space at the high level is no excuse to waste shit at the low level. Even worse wasting space at the low level is going to make all that high level shit even shittier.
▶ No.895759
>>895757
>You think file formats don't use null termination literally everywhere?
Yes.
>>895757
>just mmap it into memory and start reading
Then you aren't using any null termination.
▶ No.895761>>895763
>>895757
>excuse to waste shit at the low level
You're free to null terminate your own esoteric string use cases if you really need to squeeze the fuck out of every byte of memory, but the chances that you do are probably 0. I don't think you understand how meaningless those 3 bytes really are in this case.
▶ No.895763
>>895761
>I don't think you understand how meaningless those 3 bytes really are in this case.
combining that with the other text, it's pretty much obvious that he doesn't.
▶ No.895764>>895770
>>895754
>Then you aren't using any null termination.
mmap is not dependent on the file type you dolt
>I don't think you understand how meaningless those 3 bytes really are in this case.
No reason to optimize! Not like computers process hundreds of billions of strings a day! That would not be beneficial at all!
▶ No.895770
>>895764
>mmap is not dependent on the file type you dolt
which retarded file format are you using which has zero termination at the end?
▶ No.895776>>895798
Null-terminated strings have the advantage of behaving extremely well with recursive function, as constructing a suffix substring is just a matter of incrementing a pointer.
Think about how Lisp handles list: they are chained cons cells, with the last cons having an empty (null) list as its CDR. This is how you work with recursivity in Lisp.
The fact that you think null-terminated strings are useless prove you have still much to learn, youngling.
>>895755
>3 bytes on a string is nothing
Wrong. First of all, a size_t variable in a 64 bits environment is 64 bits, so we're talking about 7 extra byes. Second of all, those bytes are nothing on a text, but when you have a plethora of max 10 characters strings (quite frequent), with 32 bits system it increases their size by a 30% factor, on 64 bits system, it's 70% of memory usage increase.
Imagine storing an associative array that uses, as keys, 5-char strings with 8 byte size variable. That's 13 bytes per key. With null-terminated strings, it's 6 bytes. That's some 118% more space used by the size+string solution.
▶ No.895777>>895778 >>895779 >>895781
>ITT: strings should be null terminated because some filetypes use null characters, also I don't stat my mmaps and EOF aren't a thing, and I prefer to waste cycles scanning for the null terminator rather than wasting an extra 1% of memory that would speed up most use cases and solve 95% of the silliest and most common types of bugs ever
The absolute state of C idiots in this board, everyone.
▶ No.895778>>896348
>>895777
Go back to your javascript bullshit. Clearly you don't care about efficiency and interchangeability.
▶ No.895779>>895810
>>895777
Do you know that EOF is an integer value, not an unsigned char, that most strings in a system are very small, that strlen() is rarely needed, and that you can still store a string's length in C, while not losing the advantages of the NULL termination?
▶ No.895781
▶ No.895785>>895788 >>895793 >>895801 >>896156 >>896219
the virgin FOReskin
void map(char *str, size_t str_len, char (func*)(char)) {
for (size_t i = 0; i < str_len; ++i) {
str[i] = func(str[i]);
}
}
The Chad Elegant Recursion
void map(char* str, char (func*)(char)) {
if (str) {
*str = func(*str);
map(str, func);
}
}
[/code]
▶ No.895788>>895790
>>895785
should be map(str++, func), LOL
▶ No.895790>>895811 >>896038
>>895788
>stack overflow
This is why you don't use languages without proper tail recursion.
▶ No.895793>>895811
>>895785
>it took less to type therefore it's better as a program
Remove yourself from premises.
▶ No.895795>>895797 >>895811
Null-terminated strings suck. C weenies defend it because that's what C uses. Common Lisp strings are arrays, and they can be adjustable (grow and shrink) and have a fill pointer (anything less than it is the currently used part). This covers all the uses of dynamically sized strings, length-prefixed strings, and fixed-length strings. Lisp strings are arrays, so all arrays can have these properties.
>>895742
Bullshit. You always need to know the length. If you really added up all the waste from C and UNIX "comparing characters to zero and adding one to pointers", it would be more efficient to have GC and dynamic typing and store files as arrays of strings. I'm not kidding. C malloc overhead is huge too, but on a Lisp machine, allocating a list only uses one word of memory per element. Allocating a 1D array only uses one header word to store the actual length of the array (which malloc has to do too, but it doesn't provide useful information to you) followed by the words for the array data. Lisp machine overhead is much smaller than C overhead, and the GC compacts to eliminate memory fragmentation.
>>895749
>>+3 bytes is a waste of space for strings
>Meanwhile people are happily making programs in javascript where every variable is some abominable super object.
That's because C sucks. malloc in C has more than 3 bytes of waste. JavaScript is a better language than C even though it sucks too.
>>895751
>>He does not have 10 gigabyte log files he wants to parse
You're going to read an entire 10 GB file into memory (not memory mapping) and stick a 0 byte on the end, but you think an 8 byte length is wasteful? I have no idea why anyone would do things like that.
> Subject: More On Compiler Jibberings...
>
> ...
> There's nothing wrong with C as it was originally
> designed,
> ...
bullshite.
Since when is it acceptable for a language to incorporate
two entirely diverse concepts such as setf and cadr into the
same operator (=), the sole semantic distinction being that
if you mean cadr and not setf, you have to bracket your
variable with the characters that are used to represent
swearing in cartoons? Or do you have to do that if you mean
setf, not cadr? Sigh.
Wouldn't hurt to have an error handling hook, real memory
allocation (and garbage collection) routines, real data
types with machine independent sizes (and string data types
that don't barf if you have a NUL in them), reasonable
equality testing for all types of variables without having
to call some heinous library routine like strncmp,
and... and... and... Sheesh.
I've always loved the "elevator controller" paradigm,
because C is well suited to programming embedded controllers
and not much else. Not that I'd knowingly risk my life in
an elevator that was controlled by a program written in C,
mind you...
And what can you say about a language which is largely used
for processing strings (how much time does Unix spend
comparing characters to zero and adding one to pointers?)
but which has no string data type? Can't decide if an array
is an aggregate or an address? Doesn't know if strings are
constants or variables? Allows them as initializers
sometimes but not others?
(I realize this does not really address the original topic,
but who really cares. "There's nothing wrong with C as it
was originally designed" is a dangerously positive sweeping
statement to be found in a message posted to this list.)
▶ No.895797
>>895795
>Spams copy pasta
dropped
▶ No.895798>>895800 >>895811
>>895776
>max 10 characters strings (quite frequent)
citation needed
>First of all, a size_t variable in a 64 bits environment is 64 bits
what about uint32_t?
▶ No.895800>>895803
>>895798
>4 gigabyte limit
▶ No.895801>>895802 >>895811 >>896088
>>895785
>char (func*)(char)
lol, what a retarded syntax
▶ No.895802>>895808 >>896088
>>895801
I bet you are the type of larper that gets autistic about where parenthesis are placed or using spaces vs tabs wasting all our fucking time.
▶ No.895803>>895805
▶ No.895804
>>895727 (OP)
>misuse null-terminated strings
<FUCK NULL TERMINATED STRINGS, IT WASN'T MY OWN STUPIDITY
The code for the hash in your link is shit, and it's not because of the string. It's because the bcrypt writer played with fire and got burned. If you work directly with pointer logic, you need to be very careful. The language does offer you ways of solving the problem with safer, easier to use tools. The problem is, when you need to be efficient, you're going to have to write code closer to the hardware level. You might as well ban chainsaws because idiots get hurt by them.
▶ No.895805>>895808
>>895803
Who said anything about keeping it in RAM? You can process something without it being in RAM. Every heard of streams? Every heard of memory mapped files? Guess not lol.
▶ No.895806>>895809
>>895749
And there's absolutely nothing wrong with that. Having been in both worlds, its such a pleasure to write software in the more abstract languages.
▶ No.895808>>895812 >>896088
>>895802
not.
it's a lot less clear than `char -> char` for example, or even `Function<char, char>`.
try to spell (in C) a type of a variable which is a function which takes a char and returns a function which returns a function which returns a function which returns a char, for example.
>>895805
if you read from file, you already know the size, because files have size. adding 1 useless byte is useless and stupid.
▶ No.895809>>895815 >>895818
>>895806
Javascript is the C of high level languages
▶ No.895810>>895814 >>895816 >>895827
>>895779
>Do you know that EOF is an integer value
People ITT apparently don't know null terminated strings and reading from files have nothing to do with each other at all, that's my point.
>that most strings in a system are very small
Did you know SQL databases solved this ages ago with fixed size char fields, variable size char fields and text fields? Fuck, we could solve this the same way we solved numeric types of different sizes, with short strings, regular strings, long strings, etc.
<but that's not YOONIKS-y and simple!
It's about as obtuse as integer sizes. Read: not at all if you care the littlest bit about muh autistic efficiency. Not only that, but the compiler could infer the most adjusted type for literals, so you should only worry about user inputs and files, which should have a fixed length anyway.
<but muh length promotion would waste too much!
Go write assembly then, fag.
>that strlen() is rarely needed,
Rarely my ass, unless you use buffers and increase the complexity of your program by doing this.
▶ No.895811>>895813 >>895819 >>895822 >>896921
>>895793
That alone is a reason to make it better, but it's also clearer, the function takes one less argument, and it doesn't need to push a new variable onto the stack.
>>895790
While it is true that ANSI C says nothing about tail call recursion, GCC does it.
>>895795
Mr. Common Lisp[1] here apparently does not understand the value of a null terminator in a linear collection of elements (like a string), even though it is the principle upon which cons cell lists are constructed.
[1] yuck!
>>895798
>citation needed
Look up any software, and see how long most string are.
>what about uint32_t?
Though there is no reason for it not to be used, is not recommended to hardcode your size_t. Also, uint32_t is not defined by ANSI standards older than C99.
>>895801
>return type (name) (arguments)
How would you do it, Mr. Smart Man?
▶ No.895812
>>895808
<Not
>Goes on to larp about syntax
▶ No.895813>>895827
>>895811
I bet you think a linked list is good too because it doesn't need an iterator variable to loop through.
▶ No.895814>>895820
>>895810
>People ITT apparently don't know null terminated strings and reading from files have nothing to do with each other at all, that's my point.
People ITT don't know that null terminated strings are used in file formats all the time.
▶ No.895815>>895817
>>895809
>Javascript is the crap of high level languages
ftfy
although C is crap too, so… not a big difference after all.
▶ No.895816>>895823 >>896348
>>895810
>Go write assembly then, fag.
Go write in javascript faggot, its where you belong.
▶ No.895817
>>895815
Thats the point dingus
▶ No.895819>>895827
>>895811
>and it doesn't need to push a new variable onto the stack
who told you so?
>what is registers?
▶ No.895820
>>895814
>People ITT don't know that null terminated strings are used in file formats all the time.
and the most widely used example is … ?
▶ No.895821
so many newfag CS undergrads ITT smh
▶ No.895822>>895827 >>895836
>>895811
>Though there is no reason for it not to be used, is not recommended to hardcode your size_t. Also, uint32_t is not defined by ANSI standards older than C99.
Older standards than C99 belong to the garbage bin.
▶ No.895823>>895824
>>895816
Your beloved C does size promotion all the time. Fuck, getchar(), which is used to read a single character from a file, which is about as wasteful of a function as it gets, performs promotions with every single call. And it's negligible.
Really, fuck off. You don't even want assembly, your autism should only allow you to use ASICs that waste zero cycles at all.
▶ No.895824>>895827 >>895828
>>895823
I hate C, I just like null termination.
▶ No.895827>>895831 >>895833 >>895888
>>895810
SQL databases are much different than C storage. For starters, the length of VARCHAR is stored only once, in the column definition. When the length is dynamic, we're talking about text, which will indeed make the extra 8 bytes literally nothing.
><but that's not YOONIKS-y and simple!
I don't like Unix, please do not put words in my mouth.
>some rambling on stuff I haven't mentioned
ok
>Rarely my ass
Rarely indeed. For hard-coded strings, the length is simply sizeof(myString) (which counts the null terminator). For strings that you receive as input, the size is calculated while receiving it, or is pre-given.
Null-terminated dynamic-size strings are good for manipulation, sized dynamic-size strings are good for interchange (databases, network, file formats, etc.)
You should use fixed-length strings as much as possible anyway.
>>895819
If it is not pushed onto the stack, it's a compiler optimization, that you shouldn't rely on, or you need to specify the variable as volatile.
>>895813
Linked lists are excellent as lists. If you try to use them when you should use fixed-size arrays or vectors, maybe you should take an IQ test, and based on that, decide if you should kill yourself or retake Data Structures 101.
>>895822
If you're not a LARPer, surely you have heard of legacy codebases.
>>895824
Same. C sucks, but most of its detractors just don't understand the real reasons why.
▶ No.895828>>895830
>>895824
Well tough shit then…
▶ No.895830
>>895828
Tough shit for you, attacking a strawman this whole time.
▶ No.895831>>895832
>>895827
>Linked lists are excellent as lists
>you should take an IQ test
You sure you're not projecting m8?
▶ No.895832>>895835
>>895831
>I don't understand why you would possibly want a linked list.
How many years of programming do you have on your CV, again?
▶ No.895833>>895834 >>895844
>>895827
>Linked lists are excellent as lists.
Linked lists waste all that space on pointers though, terrible cache properties, jumping around to different pages all the time. Big O time complexity has little to do with the real world when we are dominated by the size of N.
▶ No.895834>>895837 >>895843
>>895833 (checked)
Is an array of pointers that get reallocated all the time a better solution when the list is not changing often?
▶ No.895835>>895861
>>895832
Well I've never heard of a situation where a linked list is the best solution, so here's your chance to educate me.
▶ No.895836>>895839 >>895845 >>896048
>>895822
That's where you're wrong, kiddo. C99 is one of the worst standards to come, and everyone in the industry uses C95 exclusively.
▶ No.895837>>895841 >>895852 >>895968
>>895834
If that array of pointers fits within a few pages then its absolutely faster compared to chasing down pages wherever they get allocated.
▶ No.895839
>>895836
>The furry c programmer knows all
▶ No.895841
>>895837
Thx. Is this the best solution for small lists? Is there a special list type you'd recommend?
▶ No.895843>>895852
>>895834
Have you ever benchmarked this shit on a relatively modern computers?
▶ No.895844>>895848 >>895849 >>895850 >>895855
>>895833
You and I must have different definitions of "real world". When you need to constantly resize (queues, lists, stack), using vectors is extremely expensive. When the size of your vector remains constant, or is changed very little, using a vector is better.
You wouldn't cut a steak with a wood saw, or cut a plank with a steak knife. Two different tools serve two different purposes, and so do two different data structures.
▶ No.895845>>895858
▶ No.895846>>895851
>>895727 (OP) (OP)
That vulnerability mentioned in that blog post is developer error. The function takes in a string, but you pass in a byte array. Why would you expect it to work? If you pass in the wrong type of variable then of course it might not work right.
▶ No.895848>>895861
>>895844
>You and I must have different definitions of "real world". When you need to constantly resize (queues, lists, stack), using vectors is extremely expensive
Any evidence?
▶ No.895849>>895853 >>895861
>>895844
> using vectors is extremely expensive
Thats just it, its not extremely expensive. It have a expensive big O cost, but almost every benchmark will show that vectors are faster. This is because cache pages exist. The cache changes how all of this works.
▶ No.895850>>895861
>>895844
Look your CS 101 data structures class using big O notation is not an accurate description of how caches work.
▶ No.895851>>895985
>>895846
in C, char* is also used for byte arrays.
this is a programmer error, but it could be prevented if the design of the language and the stdlib was less shit.
programmers will always make some errors, but some of them can be prevented entirely as a class.
▶ No.895852
>>895843
No. But performance always takes priority.
And I think we should listen to
>>895837
's practical advice and not some stupid theory developed by java shitcoders at some university.
▶ No.895853>>895854
>>895849
>It have a expensive big O cost
it doesn't.
amortized cost of adding an item is still O(1).
▶ No.895854>>895856 >>895861
>>895853
Adding an item to the middle of a vector is not amortized to O(1).
▶ No.895855
>>895844
You can change how often a vector reallocates itself, but really, the default behavior is sufficient for most implementations.
▶ No.895856>>895859
>>895854
neither in the linked list if you need first to find a place where to insert --- you'll need O(n) traversal first.
▶ No.895858>>895862
>>895845
Yours neither, loser. You literally made a bold statement without backing it, or providing proof. Your nodev ass can't even write a reverse polish calculator, LOL.
▶ No.895859>>895861 >>895867
>>895856
Again you keep using all these fucking big O notation when talking about the speed of these datastructures. The real world does not follow big O. Iterating over a vector thats all in one page is thousands of times faster than jumping between pages where linked list nodes are allocated despite the same time complexity.
▶ No.895861>>895864 >>895867
>>895842
Yup. This is why compiler warnings exist when you try to do implicit conversion, and this is why Apps Hungarian Notation is useful.
>>895848
>evidence
>of a math problem
1st year CS theory that you ought to know if you want to be taken seriously here.
>>895849
>benchmarks
That use cases where vectors are indeed better.
>>895850
>cache
Do you think data structures stop existing outside of RAM?
>>895835
Filesystems make extensive use of linked data structures.
>>895854
In the middle, or anywhere besides the end. Dynamic vectors can be used somewhat effectively as stacks because of that, but that's about it.
>>895859
>The real world does not follow big O
L M A O
M M
A A
O O
▶ No.895862>>895863
>>895858
>Your nodev ass can't even write a reverse polish calculator, LOL
I can write even infix calculator without any problem.
I actually wrote a compiler for a simple language and a lot of other shit too. Fix your detector.
▶ No.895863
>>895862
You're still claiming shit you've never done, and don't provide proof.
>>>/reddit/
▶ No.895864>>895866
>>895861
Look here retard. Iterating over a list and vector have the same big O cost. In the case of an actual list though you will be chasing down pointers in different pages. big O does not at all model this cast. If you knew more about CS theory than an undergrad simpleton you would understand this.
▶ No.895865>>895870
oh shit watch out there's a troll in here.
▶ No.895866>>895869
>>895864
>muh iterations
Insert a new value at the head of a 10 million records vector.
Now do it at the head of a 10 million records linked list.
Come back and tell everyone how it went.
▶ No.895867>>895868 >>895872
>>895861
>1st year CS theory that you ought to know if you want to be taken seriously here.
When you make claims based on your invalid mental model of the modern computing hardware, of course you need to prove your bullshit to be taken seriously.
>>895859
Lol, are you a brainlet or what?
>>895861
>Filesystems make extensive use of linked data structures.
For different reasons altogether.
We are talking about in-memory data structures.
▶ No.895868>>895870
>>895867
>Lol, are you a brainlet or what?
<standard Big O notation always correctly models hardware
what the fuck are you on about
▶ No.895869>>895872
>>895866
>Insert a new value at the head of a 10 million records vector.
if you need to insert at head, you use deque and not vector.
for deque, this is not a problem at all and it will be faster than linked list (amortized)
▶ No.895870>>895871
>>895865
don't worry I got him right here: >>895868
▶ No.895872>>895873 >>895874
>>895867
>We are talking about in-memory data structures.
Who says so? I defended that linked lists had very valid use cases, and everyone and their nodev asses have come to shit on what is basic knowledge.
>invalid mental model of the modern computing hardware
I know how cache works, thank you.
>>895869
>deque
Not always.
▶ No.895873>>895875
>>895872
>not always
I see you don't know what amortized means then
▶ No.895874>>895875 >>895876 >>895881 >>897445
>>895872
TFW your linked list is slower for the one thing it should be better at because of how hardware actually works
https://baptiste-wicht.com/posts/2012/12/cpp-benchmark-vector-list-deque.html
▶ No.895875>>895878 >>895883
>>895873
If you need to frequently mutate the order of your data, deques can still prove too slow, or their head be too big.
>>895874
>muh benchmarks
Filesystems, do you understand them?
▶ No.895876
>>895874
See that the only case where the list is actually faster is where they happen to store very large values at each node instead of a pointer to them which is a retarded contrived use case.
▶ No.895878
▶ No.895881
>>895874
>The random position is found by linear search.
Gee!
▶ No.895882>>895898
>not just implementing a linked list with a lookup table for fast iteration
Kiss and make up, gentlemen. Try not to touch balls though, that's gay.
▶ No.895883>>895884
>>895875
Yeah no one is actually going to every have to do a linear search on their data to find what they need
▶ No.895884>>895890
>>895883
Lists are not intended for linear searches.
▶ No.895887>>895890
>>895877
>they will likely behind a pointer then, so even then it loses.
Only if you're a terrible programmer.
▶ No.895888
>>895827
>SQL databases are much different than C storage. For starters, the length of VARCHAR is stored only once, in the column definition. When the length is dynamic, we're talking about text, which will indeed make the extra 8 bytes literally nothing.
That really matters nothing at all. The compiler should be able to handle this, along with the promotion rules. My point is that fixed/limited size strings are nothing new and people know how to handle it. The reason most modern programming languages use the same type of strings for everything is because C hacked them in as simple pointers to chars, when that type actually has another property and it is that it is null terminated, so even though the following languages knew null terminated strings were bad because they caused all sorts of problems, they didn't think making a distinction wasn't worth it, so they just used size_t for every string, be it 2 or 20000 characters long.
Riddle me this: what would be so wrong about using structs for strings, where one of the members is a pointer and the other is an unsigned integer which number of bytes adjusts itself to the minimum number that can hold the number of characters in the string? This way, strings up to the max unsigned char value occupy the same as null terminated strings, and strings that take up to the maximum unsigned short int value would occupy a measly single extra byte. In addition, by manipulating pointer and length you could generate a view into a string, which is more or less what Rust already does, and save memory in the process.
▶ No.895890>>895894 >>895895
>>895884
Okay so we can agree then that lists are useless for almost everything?
>>895887
How dare someone store and object bigger than the size of a page behind a pointer!
▶ No.895894>>895899
>>895890
lists DO have uses, though. Pretending that they don't is cargo cult programming
▶ No.895895>>895899 >>895900
>>895890
>so we can agree then that lists are useless for almost everything?
They are not useless, they are slower. And sure, they're slower almost every time, but not every time, which is the point I'm making from the beginning.
>How dare someone store and object bigger than the size of a page behind a pointer!
>2048 bytes
>bigger than a page
nigguh
▶ No.895898
>>895882
this table will need to be updated each time you insert or remove something, defeating the purpose.
what you actually probably want is https://bitbucket.org/astrieanna/bitmapped-vector-trie.
▶ No.895899>>895901
>>895895
>>895894
Most things have uses, and the less useful should not be the default.
▶ No.895900>>895901
>>895895
>And sure, they're slower almost every time, but not every time, which is the point I'm making from the beginning
still useless for realtime, as memory allocation is unpredictable generally.
▶ No.895901>>895903 >>895907
>>895899
I don't think I said they were or should be the default, have I?
>>895900
Man, I've mentioned filesystems three times already.
▶ No.895903>>895905 >>895922 >>895930
>>895901
They are the default in schema
▶ No.895904>>895906 >>895913
>hurr durr lets LARP about irrelevant shit
/tech/ in a nutshell. I bet most of you fags haven't even programmed anything except fizzbuzz tier shit.
▶ No.895905>>895922
▶ No.895906>>895910
▶ No.895907
>>895901
>Man, I've mentioned filesystems three times already.
filesystems like the FAT? :^)
I've seen better filesystems use more clever data structures.
▶ No.895910>>895911 >>895913
>>895906
not an argument XDDDDDDDDDDDDDDDDDDDDDDD
▶ No.895911
▶ No.895913>>895915
>>895910
>>895904
>I-I bet you guyz hasnt even program! L O L
"well reasoned argument"
>n-not an argument
shhh. The grown NEETs are talking.
▶ No.895915
>>895913
>grown NEETs
LOL. Keep LARPing faggots.
▶ No.895922>>895924
▶ No.895925>>895926
▶ No.895929>>895932
>>895926
>Ctrl+F
>"default"
>0 results
▶ No.895930>>895931
>>895903
what does it even mean for them to be "default"?
▶ No.895931
>>895930
It means "I know nothing about programming and I need to read my SICP".
▶ No.895932>>895933 >>895935
>>895929
>A more streamlined notation can be used for lists: the elements of the list are simply enclosed in parentheses and separated by spaces. The empty list is written (). For example, the following are equivalent notations for a list of symbols:
>(a b c d e)
>(a . (b . (c . (d . (e . ())))))
LOL. It is the default
▶ No.895933>>895936
>>895932
>this means they're the default DS
▶ No.895935>>895937
>>895932
you are confusing the abstract concept of lists, and the particular implementation of linked lists. Scheme uses lists heavily but that doesn't mean its based in linked lists
▶ No.895936>>895938
>>895933
They are. If you write (a b c) you have a linked list.
▶ No.895938>>895940
>>895936
You don't, and you're retarded, and you don't understand the difference between an actual list structure, and the concept of list used in the representation of Scheme programs.
If you type (a b c), you are calling the function 'a' with the arguments 'b' and 'c'.
▶ No.895939>>895942
>>895937
proof of what? I'm just pointing out a distinction.
>Ctrl+F
>link
>0 results
▶ No.895940>>895943
>>895938
>>A more streamlined notation can be used for lists: the elements of the list are simply enclosed in parentheses and separated by spaces. The empty list is written (). For example, the following are equivalent notations for a list of symbols:
>>(a b c d e)
>>(a . (b . (c . (d . (e . ())))))
https://www.gnu.org/software/mit-scheme/documentation/mit-scheme-ref/Lists.html#Lists
▶ No.895942>>896308
>>895939
>7 Lists
>A pair (sometimes called a dotted pair) is a data structure with two fields called the car and cdr fields (for historical reasons). Pairs are created by the procedure cons. The car and cdr fields are accessed by the procedures car and cdr. The car and cdr fields are assigned by the procedures set-car! and set-cdr!.
>Pairs are used primarily to represent lists. A list can be defined recursively as either the empty list or a pair whose cdr is a list. More precisely, the set of lists is defined as the smallest set X such that
> The empty list is in X.
> If list is in X, then any pair whose cdr field contains list is also in X.
▶ No.895943>>895945
>>895940
>I still don't understand Scheme
You could have said "oh, ok, I thought so" when I told you they weren't the default, and you would have just appeared as someone who doesn't know Scheme, which is in itself not a bad thing. Now you're just making an ass out of yourself.
▶ No.895946>>895947
▶ No.895947>>895949
>>895946
>A more streamlined notation can be used for lists: the elements of the list are simply enclosed in parentheses and separated by spaces.
>Pairs are used primarily to represent lists.
>A more streamlined notation can be used for lists: the elements of the list are simply enclosed in parentheses and separated by spaces.
>Pairs are used primarily to represent lists.
>A more streamlined notation can be used for lists: the elements of the list are simply enclosed in parentheses and separated by spaces.
>Pairs are used primarily to represent lists.
>A more streamlined notation can be used for lists: the elements of the list are simply enclosed in parentheses and separated by spaces.
>Pairs are used primarily to represent lists.
>A more streamlined notation can be used for lists: the elements of the list are simply enclosed in parentheses and separated by spaces.
>Pairs are used primarily to represent lists.
>A more streamlined notation can be used for lists: the elements of the list are simply enclosed in parentheses and separated by spaces.
>Pairs are used primarily to represent lists.
you don't understand english.
▶ No.895948
anyways have fun tripfagging.
▶ No.895949>>895950
>>895947
You haven't answered my question. Are lists the default data structure of C?
▶ No.895950>>895955 >>895998
>>895949
there are no built-in data structures in C at all.
▶ No.895955>>895958 >>895983
>>895950
There's arrays. Now not only you have to read your SICP, but you also have to read your 2nd edition of "The C Programming Language".
▶ No.895958>>895959 >>895960
>>895955
Bullshit. There are no arrays.
▶ No.895959
>>895958
I mean, there's only a joke of an array which must fit into stack or constant pool and it can't be used for anything "big".
What you probably actually mean is just pointers + the ability to request some memory dynamically if your environment has it in the API.
▶ No.895960>>895961 >>895983
>>895958
But everything is "just pointers". C has arrays, there's a syntax for array initialization, there's a syntax to specify arrays of types, and they are mentioned in the standard.
▶ No.895961>>895962
>>895960
Pro tip: if you can't query the size of the array, it's not an array.
▶ No.895962>>895963 >>895966
>>895961
1) That's a definition you pulled out of your ass
2)
sizeof(myArray);
▶ No.895963>>895972
>>895962
the latter will not work for a pointer to dynamically allocated "array".
▶ No.895968>>895971
>>895837
I still don't understand why it's faster.
An entrypoint pointer and a bunch of structs would have the same performance with paging when cycling through in my brain.
Both have the distance of 1 pointer to the next struct and the contents are allocated dynamically.
▶ No.895971>>895974
>>895968
then go ahead and learn instead of making a clown out of yourself here
▶ No.895972>>895973 >>895983
>>895963
Because dynamically allocated memory is not an array as per the C definition of array. Read your books.
▶ No.895973>>895975
>>895972
yeah that's what I said from the beginning.
arrays which are in C are almost useless, and there are no true arrays.
▶ No.895974
>>895971
Don't be mad at me. Learning from you is simply faster.
▶ No.895975>>895979 >>895983
>>895973
My god, kiddo. Read your fucking book.
char *a;
char b[512];
sizeof(a);
sizeof(b);
Those yield different results, because arrays and pointers in C are, in fact, not the same thing.
https://eli.thegreenplace.net/2009/10/21/are-pointers-and-arrays-equivalent-in-c
▶ No.895979>>895981 >>896075
>>895975
>not the same thing.
<but they should be
one more piece of c bloat
▶ No.895981>>895982
>>895979
>I was wrong but I should be right
▶ No.895982
>>895981
Don't assume that anyone that responds is the same person
▶ No.895983>>895984
>>895955
>>895960
>>895972
>>895975
C does not have arrays. It has something called arrays, something called strings, and something called unions, but it does not have any of those things. If I built an array processor, languages with arrays would be faster, but C wouldn't be able to run on it.
▶ No.895984>>896002
>>895983
And I suppose you think lisp cannot exist on x86 because it does not have cons cells at the hardware level?
▶ No.895985
>>895851
>in C, char* is also used for byte arrays.
Byte arrays should be unsigned char*. When the char datatype is started without being signed or unsigned, it means that it represents characters (as in characters in a string).
This is a problem with PHP itself and has happened with other functions too.
▶ No.895998>>896001
>>895950
>there are no built-in data structures in C at all.
wew lad
▶ No.896001
>>895998
data structures? I think you mean bloat.
▶ No.896002>>896005 >>896006
>>895984
It's harder to make a computer that can't run Lisp than it is to make a computer that can't run C because ISO C cannot run on certain hardware. There are requirements about the kind of hardware C can run on, like it must use binary integers in one of certain number of formats, and "strings" and "arrays" also have a lot of requirements, even though they're horrible. There's no way to use "arrays" and "strings" in C without pointer arithmetic, so C must specify how data appears in memory, which prevents a lot of optimizations and faster hardware designs.
▶ No.896005>>896007
>>896002
>It's harder to make a computer that can't run Lisp than it is to make a computer that can't run C because ISO C cannot run on certain hardware.
>Lisp is so shit, you can't build an emulator or compatibility layer with it.
LISPFAGS BTFO
▶ No.896006
>>896002
Are the existence of C and C++ one of the reasons why progress stopped and all computer architectures are so mediocre?
▶ No.896007>>896008
>>896005
you can emulate everything but running C with shit speed is pointless, as it's only ever used for speed gains.
if C becomes slower than Python for example, then it's fucking pointless to use, as it sucks donkey balls on everything else too.
▶ No.896008>>896009
>>896007
>C becomes slower than Python
That wouldn't happen because C is very specific about everything.
▶ No.896009>>896012
>>896008
it will only make matters worse, dude.
▶ No.896012
>>896009
No. Things can always be abstracted but reduction is impossible after some point which is why more specific languages are faster by default and that DOESN'T CHANGE.
▶ No.896031
>>895735
Then you really don't want to scan for that null terminator.
▶ No.896038
>>895790
Newer revisions of the ISO C standard require the compiler to implement tail recursion properly.
▶ No.896048
>>895836
you are a humongous faggot
▶ No.896072>>896082 >>896091
>>895727 (OP)
It's easier to put a NUL byte at the end (just *one* byte) than calculate the new length everytime it's modified and put it into another variable (presumably at least an int, i.e. probably at least four bytes).
▶ No.896075
>>895979
>arrays and pointers should be the same thing
No. Their underlying concepts are very different. The fact that C arrays are implemented such that an array identifier represents a pointer to its first element in almost all contexts (using the array indentifier as a sizeof operand is a case when it does *not*, and for good reasons) is just because it's convenient that way, nothing else.
▶ No.896082
>>896072
What data structure could you possibly be using where you wouldn't already know the length?
▶ No.896088>>896091 >>896093
>>895801
>>895802
>>895808
The syntax is retarded because it's invalid. "func being a pointer to function taking char argument and returning char is "char (*func)(char)".
▶ No.896091>>896142
>>896072
>calculate the new length every time it's modified
... which isn't how you're supposed to use this stuff.
>>896088
>asterisk is on the wrong side
>INVALID!!1!!!
Yeah this totally changes how it looks.
▶ No.896093>>896142
>>896088
Are you seriously complaining that he put the * on the other side?
▶ No.896095>>896096
>Hey kid, send me the word "hello". That'll be 2 megabytes long, by the way.
▶ No.896096>>896105 >>896122
>>896095
This tbh.
>Store 50,000 5-char identifiers
>650kB
▶ No.896105>>896109
>>896096
>Storing 5 char identifiers
>Not mapping them to an enum
smh tbh famalamadingdong. I already told you how to handle that usecase with almost zero overhead, anyway.
▶ No.896109>>896110
>>896105
>want to store or delete identifier
>need to recompile entire application
>user-submitted identifier
>make script that modifies the sources and recompile
>giant enum {...} with 50 000 entries
▶ No.896110>>896112
>>896109
You could store them in a config file. Oh wait, that's bloat, amirite.
▶ No.896112>>896124
>>896110
>get user query based on identifier
>load up 640kB file everytime
▶ No.896122>>896127
>>896096
I was more so referring to the potential security problems with having an arbitrary length value, but yeah, that too.
▶ No.896124>>896126 >>896658
>>896112
You could mmap that file and seek it on demand.
<ooh, but seeking these on demand is so expensive
>get user query based on identifier
>search a million entries
>compare a million 5 char long strings one by one
>not expensive
<but my program will only search for 5 entries tops!
Which is why you need 50k 5 char long strings, right. Two can play the "arbitrarily specific specifications" game, too.
▶ No.896126>>896130
>>896124
>What is a search tree
▶ No.896127>>896128 >>896144
>>896122
>arbitrary length value,
Null terminated strings are a-okay tho.
▶ No.896128
▶ No.896130>>896137
>>896126
Something you could apply to the file as well. Or something you could apply to enum-tagged entries for extra speed. I dunno, I'm not the one making up the stupid limitations.
▶ No.896137>>896151
>>896130
>calls it stupid limitations
>when he's the one calling for 8 extra bytes appended to EVERY string.
▶ No.896142>>896145
>>896091
>>896093
Are you literal brainlets? "putting the asterisk on the wrong side" is invalid syntax and doesn't work.
▶ No.896144
>>896127
<Null terminated
>A one-L NUL, it ends a string
>A two-L NULL points to no thing
>But I will bet a golden bull
>That there is no three-L NULLL
(char)'\0' != (void *)0
▶ No.896145>>896146 >>896149
>>896142
You are complaining about such a minor irrelevant typo.
▶ No.896146>>896150
>>896145
>syntax error (proving him to be a larper who has actually not much of a clue of how function pointers are used correctly) preventing the code from even compiling
>"minor irrelevant typo"
dat desperate damage control of yours tho
▶ No.896149
>>896145
C isn't nignog transgender studies where nothing that is said actually ever matters. The compiler is autist extraordinaire and is merciless to syntax errors of any kind.
▶ No.896150>>896156
>>896146
I'm not the one that made the post. I'm calling you a faggot for sperging out about it.
▶ No.896151>>896155 >>896158
>>896137
>when he's the one calling for 8 extra bytes appended to EVERY string.
Never said that, you just conveniently ignored my post on how to properly handle this, while also adding a feature that would simply save memory in the long run.
▶ No.896155>>896178
>>896151
The one where you advocated adding a complicated BigNum system to system that needs to process strings?
▶ No.896156>>896160 >>896161
>>896150
Desperately playing down a stupid mistake (yea no, it wasn't a "typo" because the larper consistently repeated it, go back to >>895785 and check if unsure) just because (you) yourself didn't notice it is far worse than pointing it out (which you cared to inaptly call "sperging out about it").
▶ No.896158>>896178
>>896151
>posts non-solutions
>how to properly handle this
Very simple use case, kid. You have a map of arbitrary-but-usually-small length strings to whatever (let's say 1 int), and you have 50,000 of those records. You have to attend numerous update and query requests.
You cannot
>use an enum (lm fucking ao)
>use a file (actually you can but it changes nothing)
>mmap enormous chunks of data
It's simple, you have a map of strings. Do you more than duplicate their size?
▶ No.896160>>896166
>>896156
It's an asterisk on the wrong side of the "func" identifier, no one cares about it actually, because it doesn't change the look of it overall. You're sperging over it because you absolutely can't stand losing an argument, or maybe just crave validation. Talk about LARPing.
▶ No.896161
>>896156
>just because (you) yourself didn't notice it is far worse than pointing it out
I guess if you don't autistically rant for multiple posts about a minor syntax error that means you did not know it was wrong. If you don't correct every little grammar error in a persons post you most have no concept of proper English usage. The only option is to be a massive faggot about everything.
▶ No.896166>>896167
>>896160
Why am I even coming back to this place choke-full of bitter and confused larpers
▶ No.896167
▶ No.896178>>896179 >>896180
>>896155
Why the fuck would you need bignum when you couldn't hold strings bigger than size_t, anyway? And if you could, you would indeed need other access mechanisms. No, you just need unsigned chars, shorts, ints and longs. If this already included in C feature is too complicated for you, you should go back to scripting languages.
>>896158
>arbitrary-but-usually-small length strings
>arbitrary length
Oh sure, you never said that before but okay. If what you need is to hold SQL-like text fields you would indeed waste 7 bytes per entry using my method, but if you are short on resources (considering you are complaining about 650k being too much) you shouldn't be doing this anyway, at the very least use a varchar. You are also implying these keys are probably unique, non-system defined and dynamic, which weren't part of the original requirements, while I was assuming your dataset was larger and that these keys were not primary and non-unique, in which case they would benefit from compression if mapped to integers.
Your requirements are stupid anyway. About the only usecase in which NUL terminated strings would win, and that's making assumptions such as users submitting arbitrarily large keys and fucking over your limited resources, which could be solved by offloading your keys onto another table held in a file and organized using a radix tree.
▶ No.896179>>896182 >>896196
>>896178
Not actual bignum you dumb fuck. Variable length encoding of the prefix size number. Its very similar.
▶ No.896180
>>896178
NULL-terminated string also win in ease of processing.
▶ No.896182>>896183
>>896179
Good solution for compact storage and transmission, not so good for processing, as it introduces more branching.
▶ No.896183>>896191
>>896182
Scheme uses actual bignum for all its calculations in its numeric tower, clearly you don't care about a little extra branching.
▶ No.896191>>896193
>>896183
Did I just hear a non-argument?
▶ No.896193
▶ No.896196>>896198
>>896179
>variable length
Not variable at all, just like the size of a char, a short, an int or a long are not variable and they are simply different types. micro strings (minimum addressable size, equivalent to NUL terminated strings; could also be named short short string if you are into retarded modifier naming schemes), short strings, strings, long strings, macro strings (or long long strings) would be different types, and the language would just know how to promote them when appropriate, just like it already promotes these numeric types.
▶ No.896198>>896200
>>896196
Just what we need, c with even more type coercion, only this time with dynamically allocated values.
▶ No.896200>>896202
>>896198
>c with even more type coercion,
Okay, you can tell the fucking compiler you really want to cast that short to an int when passing them to functions accepting ints by explicitly stating it, Java pedantry style, but considering it makes no difference to you or the resource allocations required you may as well let the compiler do it for you.
If you don't want the compiler to automatize anything at all, why don't you fucking code it in assembly?
>with dynamically allocated values.
you wot m8
▶ No.896202>>896214
>>896200
>If you don't want the compiler to automatize anything at all,
I want the compiler to automate all kinds of things. I want it to automate ways to ensure my code is correct. I don't want it to just randomly fuck with the types of the numbers because I am using them. Its like javascript fuckery where ints get turned into strings based on the operation.
▶ No.896203>>896205
>>895728
That actually depends on the platform (i.e. what is a char actually stored as). Unarguably, though, they take up more processing, because to detect the end of the string, you have to check each and every character as you are parsing it.
▶ No.896205>>896207 >>896214
>>896203
>they take up more processing
*for a single particular operation
copying a null terminated string takes less processing time
▶ No.896207>>896208
>>896205
No it doesn't. Consider this, how do you know when you are at the end of the source string? With a null-terminator, you must check every character.
With an integer counting up to the length of the string, you have the potential benefit of loop unrolling. i.e. compare the index to the string length every 16 elements or so instead of comparing each char to '\0'.
▶ No.896208>>896227
>>896207
>loop unrolling
If you want to blow your instruction cache and slow down your whole program
▶ No.896214
>>896202
>don't want it to just randomly fuck with the types of the numbers because I am using them
<int fug(long thing) { /* fug the long thing */ }
<unsigned short short benis = 255;
<int colonDDDDD = fug(benis);
<fug(benis) == fug(255)
<(int) benis == (int) 255
No data lost at all because you just told it to take 8 bytes rather than 1. 255 in 8 bytes is still 255, and since the function was already accepting a long long, unless you attempted to introduce some sort of generics into C that allowed you to make a function that worked with short shorts and another that worked with longs, but that's fucking bloat and would help you nothing at all.
Since you can not divide or multiply (possible float or double casting), sum or rest (possible int promotion) a string, all you could do would be concatting them, which would be implemented via functions that would return appropriately sized strings that would raise warnings if attempted to be assigned to smaller string types. No harmful type coercion at all, miles simpler than any other integer operations system ever devised in a mainstream language.
> Its like javascript fuckery where ints get turned into strings based on the operation.
For what's it for, JS more often than not does what you want it to do, considering it is meant for text processing. I agree JS's type system is retarded, but C can never be as retarded as JS since it is not dynamically typed, and type coercion is really one of your smallest problems, considering it is only a problem because the DOM is all retarded and doesn't define an universally enforced type for input values (ie: numeric fields generally return a Number in modern browsers if the field is supported, but IE returns a string even though it claims to support numeric fields even though it doesn't, so you have to cast it to Number anyway, but depending on locales you may not parse it correctly if they use commas instead of dots for decimal positions) and you can't really know which types you are working with.
>>896205
As long as you know the size of the string you are allocating to, which may be smaller or bigger than the string you are getting (in which case, you have to check for being inside your target string bounds and for your source string's NUL terminator). If it is bigger, your source string might get cut, which is undesirable, so you would want to have your target string be at least as big as your source string, which implies malloc and also seeking the last position of your source string. Or you could use buffers and do some retarded realloc-ing or an array of char * to grow your target string as you read your source string, but that's all sorts of retarded and would be more wasteful than if you just knew their lengths.
▶ No.896219>>896222
>>895785
Congratulations, you blew the stack.
▶ No.896222>>896230 >>896236
>>896219
Shouldn't do that since it should trigger tail call optimization, but using function pointers is wasteful if we are talking about autistically optimizing shit.
▶ No.896227
>>896208
The whole reason compilers insert loop unrolling is for the exact opposite reason; it increases I-cache hits.
▶ No.896230>>896238
>>896222
Neat, but I'm not finding any way to guarantee it and with small stacks it'll blow quick. I use function pointers all the time though
▶ No.896236>>896238
>>896222
>autistically optimizing shit
lmfao how about writing a function that actually works instead.
▶ No.896238>>896245
>>896230
>I use function pointers all the time though
It's not a bad thing and performance cost is negligible, but we are talking about people feeling the need to install Gentoo to cut 2 MB of total RAM usage here. If it's gonna save you several lines of code, or worse, a gigantic switch of death, by all means do it.
>>896236
Tell the tripfag. I personally wouldn't bother writing a single line of C (or C with syntax errors :^) )due to hipsterism.
▶ No.896245>>896348
>>896238
You wouldn't write it because you couldn't. You're just another retarded larper.
▶ No.896253>>896520 >>896524 >>896535
We won't ever be truly free of this moronic C string business until people stop using stdio.
>>895742
The read system call stops reading when input data is exhausted. It returns exactly how many bytes it has read, so at the end of the process you know exactly how long the data is.
NUL-terminated strings are, like errno, a C concept. Linux gives approximately zero fucks about your NUL, just like there is no actual errno global variable (the system call interface simply returns the error code like a sane implementation would). This is stdlib garbage.
▶ No.896308>>896315 >>896352
>>895942
Is there a Lisp out there where the list data structure isn't actually a linked list? Can it be a dynamic array, for example?
▶ No.896315>>896522 >>896656
>>896308
>list data structure
>but not an actual list
durr
▶ No.896348>>896513
>>896245
>>895778
>>895816
Feel like a hero yet, Reddit?
▶ No.896352
>>896308
Clojure kind of applies, they're not the default but they exist, and you can create a fork where they will be the default.
this "list" should be immutable, so it's relatively tough shit to implement if you want something better than the linked list, but in Clojure there are already a couple of persistent data structures, either you can use them, or study the algorithms and re-implement them in your language of choice.
▶ No.896509
>>895743
>200MHz CPU is blazing fast
but anon, a 200MHz CPU is blazing fast. You only need more if you plan to run the latest Macroshit Wangblows craperating cistern wid' Enpantsed Jewgle Crowd Pthtoorage Gapeability.
▶ No.896513
▶ No.896516>>896519
▶ No.896519
▶ No.896520
>>896253
are there any viable alternatives to shill?
getting rid of stdio sounds like a good idea.
▶ No.896522>>896526
>>896315
>not knowing the difference between an interface and an implementation
▶ No.896524>>896535
>>896253
>until people stop using stdio
aren't plain "string" literals in C also generating an extra zero byte at the end?
it seems people also need to eschew "string" literals in C and use something like a macro which expands to a constant array of bytes or something else.
I mean it's doable but not really ergonomic in plain C.
it's easier to get rid of this shit in C++ perhaps.
▶ No.896526>>896527
>>896522
>hurr durr just make the memory a linked list
literal brainlet
▶ No.896527
>>896526
can you even read plain English?
▶ No.896535>>896538 >>896546 >>896655
>>896253
>We won't ever be truly free of this moronic C string business until people stop using stdio.
Except null-terminated strings are embedded right in the language, with string literals being null-terminated.
>>896524
>use something like a macro which expands to a constant array of bytes or something else
Alas, I am afraid such a thing is not possible with the C preprocessor.
However, for all the LARPers out there, keep in mind GCC and Clang/LLVM are Free as In Freedom™ software, that anyone can modify. You're all such C expert senior engineers, writing a GNU extension for non-asciiz strings should be TRIVIAL.
▶ No.896538>>896542
>>896535
I will stick with LLVM. A compiler that respects my freedom, unlike the restrictive GCC.
▶ No.896542>>896543 >>896545
>>896538
Cuck, with a cuck license. How's your wife's son?
▶ No.896543>>896592
>>896542
Whats cucked about it? I can go sell it to random fucks and no one can stop me. The developers are cucked. The users are the least cucked possible. This is in comparison to your GPL compiler. Under the GPL the developers are not as cucked as BSD developers, but they are still cucks compared to proprietary. The GPL users are more cucked than BSD users because they are bound by more terms.
▶ No.896545
>>896542
>Open Source
lol the cucks are fighting again
▶ No.896546>>896547 >>896634
>>896535
>Except null-terminated strings are embedded right in the language
No. There is no string datatype at all in C. It's a convention to use NUL-terminated arrays of char, and that's what standard library functions expect. You're free to implement your own functions and libraries which handle whatever string datatype equivalent you come up with in whatever way you like.
▶ No.896547>>896548 >>896554 >>896709
>>896546
>It's a convention to use NUL-terminated arrays of char
So how do you explain the fact that this is null by the compiler?
const char* foo = "bar";
▶ No.896548
▶ No.896554>>896555 >>896707
>>896547
String literals are syntactic sugar.
▶ No.896555
>>896554
So no language has anything. Got it. Its all just syntactic sugar on top of machinecode.
▶ No.896567
>A character string literal has static storage duration and type ``array of char , and is initialized with the given characters. A wide string literal has static storage duration and type ``array of wchar_t, and is initialized with the wide characters corresponding to the given multibyte characters. Character string literals that are adjacent tokens are concatenated into a single character string literal. A null character is then appended.
>A string is a contiguous sequence of characters terminated by and including the first null character. It is represented by a pointer to its initial (lowest addressed) character and its length is the number of characters preceding the null character.
>A character string literal need not be a string (...), because a null character may be embedded in it by a \0 escape sequence.
https://port70.net/~nsz/c/c89/c89-draft.html
▶ No.896585
>null
>NULL
This is how fucktarded this guy actually is.
▶ No.896592>>896593 >>896594 >>896725
>>896543
>I can go sell it to random fucks and no one can stop me.
Can you please remind me of which part of the GNU Public License (version 2 or 3) forbids the user from selling a copy?
▶ No.896593
>>896592
>the user from selling a copy?
Look we both know thats bullshit. Its theoretically possible to have someone pay for GPL code but when they get the source you are gonna have a real hard time charging for it a second time.
▶ No.896594>>896613
>>896592
>one person buys
>can legally uploading to every other person
lmao
▶ No.896613
>>896594
>"NOT EVEN MERCHANTABILITY"
▶ No.896634>>896638
>>896546
>that's what standard library functions expect. You're free to implement your own functions and libraries which handle whatever string datatype equivalent you come up with in whatever way you like.
THIS is what I think we need. Quite frankly, C stdlib is pure garbage. I'm working on and off on a project of this type, a custom freestanding C library based on Limux. Once it has a reasonable set of features to make it useful, I will publish it under MIT.
▶ No.896638>>896645 >>896649
▶ No.896645>>896649
>>896638
Yes. I don't particularly care about improvements being sent back to me. I just want to stop using libc and start using Linux directly because frankly the Linux interfaces are a LOT better. If other people think my code is useful, I want them to please use it.
▶ No.896649>>896706
>>896638
>>896645
In fact, I'm personally rather wary of "improvements" that get sent since they can be a curse in disguise. Have you SEEN glibc source code? It's a mess. Even something as simple as an strlen implementation is huge and needs truckloads of comments to explain what the fuck is happening, all so it can scan lots of data at once to improve performance while looking for the NUL.
I want my code to be simple so that I, and maybe even other people, can immediately understand it when reading it. The license allows you to do whatever you want, so you can just supply your own highly optimized functions if it matters that much.
▶ No.896655
>>896535
>Except null-terminated strings are embedded right in the language, with string literals being null-terminated.
Just because string literals are NUL terminated doesn't mean that you can't keep count yourself. As far as I/O goes, only stdio requires NUL terminated strings; the kernel interfaces do not.
▶ No.896656
>>896315
What stops car from returning
array[0]
and cdr from returning the slice
array[1..-1]
?
▶ No.896658>>896674 >>897048
>>896124
>You could mmap that file and seek it on demand
How to recover from SIGSEGV properly?
▶ No.896674>>896770
>>896658
You just define a signal handler and keep on reading
▶ No.896706>>896711 >>896748 >>896773
>>896649
>Even something as simple as an strlen implementation is huge
size_t strlen(char *s)
{
int i = 0;
while (s[i++]);
return (size_t)(i-1);
}
>huge
▶ No.896707>>896773 >>897174
>>896554
It's not like "bar" is syntactic sugar for {'b', 'a', 'r', '\0'}, because
char *s = {'b', 'a', 'r', '\0'}
doesn't work. Then what is "bar" exactly syntactic sugar for?
▶ No.896709
>>896547
>const char* foo = "bar";
>char* foo
In the following line
int* foo, bar, baz;
the retarded style you gave an example of suggests that all three declared variables are pointers to int, which is not the case. That's why the asterisk is supposed to stick to the identifier and not to the type, like this:
int *foo, bar, baz;
so it's obvious what is what.
▶ No.896711>>896716 >>896773
>>896706
What is the shortest possible strlen implementation that werks? Anything shorter than the one below (52 bytes)?
int strlen(char*s){int i=-1;while(s[++i]);return i;}
▶ No.896716>>896764 >>896773
>>896711
Here are three alternatives, but all still same length:
int strlen(char*s){int i=0;while(*s++)++i;return i;}
int strlen(char*s){int i=-1;for(;s[++i];);return i;}
int strlen(char*s){int i=0;for(;*s++;++i);return i;}
Looks like 52 byte strlen might be tough to beat.
▶ No.896725
>>896592
Have fun selling binaries right next to the readable source code.
▶ No.896748
>>896706
It's huge in glibc, yes. 78 lines not including the license comment at the top.
▶ No.896764
>>896716
Here's my attempt.
int strlen(char*s){return*s?strlen(s+1)+1:0;}
It clocks in at 45 characters. It's kind of cool how I was able to remove all the white space from the function's body.
▶ No.896770
>>896674
Nope. Returning from the signal handler means returning to the point in the code that triggered the SEGV.
▶ No.896773>>897162
>>896706
>>896711
>>896716
Mine looks similar to that, but it uses pointer difference and checks for NULL.
Now check out the glibc strlen function. It's fucking huge.
>>896707
It's sugar for a const char array.
▶ No.896779>>896781 >>896830 >>897048
>Have advanced string object with separate size variable
>Check size and be ready to add data at the end
>The actual string is much shorter
>Either segfault or security fun
▶ No.896781>>896829
>>896779
When would that happen?
Is it a bigger danger than, for example, unterminated classical C strings?
Many languages before and after C have had counted strings.
▶ No.896829>>896859
>>896781
>When would that happen?
Even less chance than a classic buffer overflow.
Someone has to write exceptionally stupid code to produce invalid string object (that is, which has invalid length)
▶ No.896830
>>896779
>>The actual string is much shorter
This won't happen unless someone deliberately tries to change length value to some nonsense.
It's no different than producing any other kind of array with invalid length.
The string in this case is simply a special kind of an array.
Do you mean people should not use arrays as well and use some sentinel to terminate them? And what if there is no possible value to terminate, for example, if the array stores some bytes and all of them are valid encounters in the content?
Do you understand you just added more bullshit to this already drowning in bullshit thread?
▶ No.896859>>896889 >>896892 >>896896
>>896829
>Someone has to write exceptionally stupid code to produce invalid string object (that is, which has invalid length)
Given that C does not allow for private structure members, it is indeed perfectly possible to indicate a wrong length. A classic case of off-by-one when implementing some kind of concatenation function would do that.
▶ No.896889
>>896859
The solution is to not touch the struct directly. Just because you can doesn't mean you have to.
▶ No.896892>>896952 >>896988
>>896859
>Given that C does not allow for private structure members, it is indeed perfectly possible to indicate a wrong length.
Come on, how hard is it not to screw up a size variable? You almost never touch them anyway. You get them as parameters. If your programmers can't stop themselves from accidentally overwriting a pointer + size pair, they should probably be writing Java. Lengths are explicit and really hard to screw up, completely unlike the "hurr I forgot a NUL terminator" bugs.
Strictly speaking, you could easily provide an opaque structure (incomplete, forward-declared type accessible only through pointers) with accessor functions.
/* string.h */
struct string;
size_t string_length(struct string *);
size_t string_data(struct string *);
/* string.c */
struct string { size_t s; char *p; };
size_t string_length(struct string *s) { return s->s; }
size_t string_data(struct string *s) { return s->p; }
But this would forbid stack allocation of the string structure and force people to use functions tonaccess member variables when it's just not necessary. In fact, this would be nearly undistinguishable from a generic dynamic array library. Indeed, most C stdlib str* functiond are pretty much equivalent to their mem* counterparts, save for the NUL terminator handling.
▶ No.896896>>896898 >>896988
>>896859
>A classic case of off-by-one when implementing some kind of concatenation function would do that.
Yes, but that happens in virtually every other language as well. If didn't calculate an index or length incorrectly, it's a logical/mathematical error. If you failed to NUL-terminate the C string, it's a simple human forgetfulness error.
Using explicit lengths gets rid of the latter class of errors, while only Haskell might be truly immune to the former.
▶ No.896898>>896907 >>896911
>>896896
>If you failed to NUL-terminate the C string, it's a simple human forgetfulness error
you know what?
it's also possible to accidentally include an extra NUL where it's now allowed to be, and that is harder to detect (no out of bounds access, no segfault) and can have even worse implications, actually the article in the OP is about exactly this.
▶ No.896907>>896908 >>896910 >>896911
>>896898
wow anon what about the implications of accidentally including NULL in the middle of a linked list. oh shit someone call blackhat.
▶ No.896908>>896909
>>896907
zero byte is a valid character in many encodings.
now go learn something about computing and programming, ffs.
▶ No.896909>>896925
>>896908
Yeah which encodings.
▶ No.896910>>896916
>>896907
Retard. A zero in a cadr doesnt terminate the list.
▶ No.896911>>896916 >>896922
>>896898
True. That's why I don't like encoding data in-band. Arrays are simple, they are just a memory segment, a pointer to the start and the length, and they're a general data structure that can hold anything. C strings and other similar stuff constrain this simple concept. Suddenly, it can't hold arbitrary data anymore; it can hold anything except a zero byte, because the zero byte is now a special value that encodes the length of the array within the array itself, and if you accidentally put a zero value anywhere you end up cutting the string into pieces.
>>896907
NULL is a pointer. The data contained by the linked list is completely oblivious to the NULL pointer handling.
▶ No.896916>>897028
>>896910
lol
>>896911
hmm if only strings were more like arrays
▶ No.896921>>896922
>>895811
>but it's also clearer
So clear that you made an error while writing it.
▶ No.896922>>896924 >>897028 >>897038
>>896911
>C strings and other similar stuff constrain this simple concept
Like?
>Suddenly, it can't hold arbitrary data anymore
Well no shit, retard, it's a fucking string. Strings are supposed to encode human-readable text, not arbitrary data, unless that data is itself encoded in a human-readable (or at least printable like base64) format. Want arrays of arbitrary data? Use arrays instead of being a retard. I bet you're the kind of person who complains a stack interface doesn't have a function to access to an arbitrary element of it.
>>896921
Autist
▶ No.896924
▶ No.896925>>896930 >>897185
>>896909
ASCII and UTF-8, for example.
C is unusual in having trouble with null bytes, and some parts of C and Unix do support them. Try this to see Python, C/Unix and UTF-8 work with a null byte:
python3 -c 'print("foo\0bar")' | cat -v
▶ No.896930
>>896925
thank you for freeing me from the burden of explaining this to him. really appreciate that.
▶ No.896952>>897009 >>897038
>>896892
>Everyone is playing nice
>Nobody will try to put invalid data in something you will have to rely on
I can see CIA rubbing its hands
▶ No.896988>>896996 >>896998 >>897043
>>896892
>how hard is it not to screw up a size variable?
A single one? Not hard. However, consider the following game of statistics: say there is a 95% chance that you get a line of code right without needing to touch it. If you implement a complete, modern string library, the chances that you will not commit ANY error in 1,000 lines are 0,95^1000, which is to say, 0. On average, it means 50 bugs per 1,000 lines.
That's how easy it is to screw up. Even when writing trivial shit.
>>896896
it happens in every other language, yes, but those languages have bounds checking and hide the internals of the string implementation behind a structure.
Also, not NULL-terminating a string can easily be a logical or mathematical error as well (typical example of appending a string to another, but copying 1 byte less than its length, thus removing the null character).
If you want safe strings, hide them behind an interface that does bounds checking, but that's NOT going to be fast.
▶ No.896996>>897010
>>896988
>That's how easy it is to screw up.
those statistics would have to include pajeet and the soyfags.
▶ No.896998>>897010 >>897048
>>896988
If you've got one string library that'll be used in thousands of programs it's reasonable to invest ten times the effort to make sure that it works the way it's supposed to. Making mistakes is easy, but when you use a library instead of C-style strings there's a lot less code using unsafe primitives when you add it all up.
Do you have some sort of benchmark or other source to support the claim that safe strings aren't fast? I'd expect slight performance loss in most (but not all) cases, but usually nothing significant. Safe strings aren't something modern they started using when computers became faster. BCPL had them.
▶ No.897009
>>896952
we are talking about in-memory data.
when you get the string from somewhere, you of course allocate the right amount of memory and the right length is stored. do you worry about someone overwriting the memory of your process? then you have bigger threats to worry about, and you probably want to look for techniques for radiation-safe software development, that is, software that is resilient to radiation-induced bit flips.
otherwise, it's obvious bullshit.
▶ No.897010>>897013 >>897015
>>896996
Admittedly, it's number I pulled out of my ass, but if you've ever done any development beyond a fizzbuzz, you know that getting a program right from the start is impossible. Even if the actual statistics of errors was 1 error per 200 lines (99.5% correct lines), it still accumulates pretty fast.
>>896998
>Do you have some sort of benchmark or other source to support the claim that safe strings aren't fast?
Array bound checkings is slower than not checking, although admittedly usually not by much, as modern processors include branch prediction. And the larger the string, the lower the impact of branch prediction misses.
▶ No.897013
>>897010
>Array bound checkings is slower than not checking
for the most critical parts it can be optimized away, but it's not a reason to eschew checking everywhere, the golden rule of optimization is to optimize at the "bottle neck".
also, modern compilers can prove correctness of access in some places and optimize it away. I remember that Java's JIT does that in many cases.
▶ No.897015>>897018 >>897019
>>897010
lines of code don't mean anything.
you can split and join lines arbitrarily without changing the meaning of code at all.
▶ No.897018
>>897015
And you could write a book with one fucking sentence. I bet thats meaningless too.
▶ No.897019>>897022
>>897015
Either you didn't read or don't know what you're talking about.
▶ No.897022>>897034
>>897019
>1 error per 200 lines
One really long line of code = no errors
▶ No.897028>>897048
>>896922
>Like?
Other sentinel-using data structures.
>>896916
They are. Just keep their length around in a variable.
▶ No.897034
>>897022
I think you just hacked programming.
▶ No.897038>>897040
>>896922
>Well no shit, retard, it's a fucking string. Strings are supposed to encode human-readable text
Kill yourself brainlet. C strings have absolutely no notion of an encoding, and even if it had, the encodings themselves usually attribute some meaning to code point zero. In ASCII it is a control code which functions as a no-op. It is absolutely valid for an ASCII- or UTF-8-encoded string to have a null byte in it. The fact including some 0 byte in a string will make software written in C truncate it is a bug.
>>896952
Most exploits happen in the input handling layer, actually. Unless the CIA can subvert the kernel's most fundamental I/O syscalls and make them return bogus values, I'd say you will always know the exact size of your data.
▶ No.897040>>897046 >>897056
>>897038
>Most exploits happen in the input handling layer
this
>C strings have absolutely no notion of an encoding
not op but I like having a generic unicode type that is coding independent. Haskell does this for example where you have a type called Text and you have operations for it and can then encode / decode it to whatever you want, while the abstraction is independent of it.
▶ No.897043
>>896988
>those languages have bounds checking and hide the internals of the string implementation behind a structure.
These internals are essentially just a length and a pointer. Out of bounds access is still a bug, even with bounds-checking; the only difference is the runtime won't actually allow the access to take place, raising an exception instead.
>Also, not NULL-terminating a string can easily be a logical or mathematical error as well (typical example of appending a string to another, but copying 1 byte less than its length, thus removing the null character).
Use explicit lengths and you simply don't have to think about this at all.
>If you want safe strings, hide them behind an interface that does bounds checking, but that's NOT going to be fast.
Obviously, checking your bounds on every single access is going to be safer than not checking. Nobody even said anything about access here. We're talking about the different approaches to encoding the length of the string.
▶ No.897046>>897048 >>897142
>>897040
>Haskell does this for example where you have a type called Text and you have operations for it and can then encode / decode it to whatever you want, while the abstraction is independent of it.
This is how I would model it as well.
The fact is the folks who made C and Unix made A LOT of assumptions about these things. C strings are supposed to be "text" but are in fact just 0-terminated byte arrays containing arbitrary non-null data. There's no actual string type; C string literals are just array literals with an inplicit zero at the end, and to complement that constrained array type C has a whole roster of str* functions that take zero-termination into account and are otherwise equivalent to their mem* counterparts. It's clearly a specialized array. This is why a dynamic array is sufficient for 100% of C string operations.
If I were to write a string library, I'd do something like:
struct memory {
size_t size;
char *pointer;
}
enum encoding {
UTF8 = 0,
UTF16,
UTF32,
ASCII,
/* ... */
}
struct text {
struct memory memory;
enum encoding encoding;
}
The fact is text is just encoded memory. C's entire notion of 0-terminators make it so only a (large) subset of encodings are supported, and require memory handling functions to be duplicated as 0-terminator handling versions. The only advantage of this design is the sheer minimalism of the data structure itself.
▶ No.897048>>897049 >>897062
>>896658
>How to recover from SIGSEGV properly?
The computer does it properly, so it's the OS's fault. C and UNIX can't do it because they suck. Look up what segmentation fault means in Multics.
>>896779
>Check size and be ready to add data at the end
>The actual string is much shorter
You're talking about data corruption. If you mixed those strings with C strings, you might get a buffer overflow that changes the length, but C buffer overflows could corrupt anything.
>>896998
>I'd expect slight performance loss in most (but not all) cases, but usually nothing significant.
I'd expect a huge performance increase in all cases except for one, which is when you are parsing a string one character at a time, parallelism won't help you in any way, and you don't care about how many characters are remaining. Everything else is much faster when you know the length.
>Safe strings aren't something modern they started using when computers became faster. BCPL had them.
FORTRAN had Hollerith strings in source code because knowing the length ahead of time is much faster. Later on, readability and not having to manually count the length of every string became more important than raw speed. Strings in most languages were safer and faster than C strings. I'd say that the acceptance of null-terminated strings is because people don't care as much about efficiency as they used to. Lisp machines were about making dynamic languages faster and simpler by checking type tags in hardware, but people today don't care as much if they're fast.
The source of UNIX stupidity, B, used EOT to terminate a string, so the use of null is totally arbitrary. If you couldn't put ASCII character 04 in a string, UNIX weenies would say it's "stupid" to want to use that character in a string.
>>897028
>Just keep their length around in a variable.
That sounds like a good idea, but you should drop the null or you'll end up with multiple "length" variables and they won't all be equal because someone will put a null character somewhere.
>>897046
>C has a whole roster of str* functions that take zero-termination into account and are otherwise equivalent to their mem* counterparts.
Except there's no way to give many of them a string length at all.
BTW, I had to replace the NUL characters (etc.) in the above
line with caret-atsigns because when I tried to send the
message the first time the line did not appear since some
Berserkely C blabberer with less than two fingers of
forehead decided to write a mailer that reads messages with
gets or some equally braindead substitute for an input
reader and just drop the non-printable characters (rather
than bounce and complain or something semi-reasonable).
▶ No.897049
>>897048
ffs again if you are going to block quote things include text describing who / what its from. quotes with no context are no authority
▶ No.897056>>897076
Here's a fun little test:
$ python3 -m timeit -s 'a = "c" * 1_000_000; b = "c" * 1_000_001' 'a == b'
10000000 loops, best of 3: 0.0265 usec per loop
Those are one-megabyte strings. Try that in C.
>>897040
That's also the route Python went with version 3. Strings in Python 2 were just sequences of bytes, but it now has a bytes data type and a string (unicode) data type.
The underlying representation of the text is abstracted away. You can use ord and chr to go to and from unicode code points, if you want, and you can use .encode() and .decode() to go to and from a bytes representation, but unless you explicitly ask for it, you're never confronted with the gory details. It keeps you sane.
▶ No.897062>>897071
>>897048
>If you mixed those strings with C strings,
>>you should drop the null or you'll end up with multiple "length" variables and they won't all be equal because someone will put a null character somewhere.
One shouldn't mix these different types. It's clear that general arrays don't have the same constraints as 0-terminated char arrays. Custom string types have even more elaborate semantics; they could have separate length and capacity field to track the length of the actual text and of the allocated memory.
In practice, most string libraries out there use that length+capacity design, and allocate an extra byte for a "hidden" null byte at the end of the memory they maintain and also take care to correctly set the null byte after every operation. They do this so you can pass their pointers to C stdlib functions. Personally, I don't care very much about this because I advocate talking to the kernel directly instead of using the extremely limited C stdlib, but I can understand why that'd make their string library better.
>The computer does it properly, so it's the OS's fault. C and UNIX can't do it because they suck.
Yeah, signals in Unix were pretty much a huge mistake. Still, I'd love to see a way to recover from SIGSEGV reliably, at least in Linux. Imagine I'm writing a JIT compiler by mmaping executable pages and the generated code causes a segmentation fault or even illegal instruction errors; I'd like to handle those errors.
I ask this question every single time SIGSEGV is mentioned and to this day nobody answered...
>Except there's no way to give many of them a string length at all.
Yeah. I think it's hilarious how they had to make things like strn* versions of functions so that it would be safer to use those functions. It's just backwards. The mem* functions are the right thing.
▶ No.897071>>897072 >>897076
>>897062
I am not familiar with the mem* functions. Which functions do you mean?
▶ No.897072>>897076
>>897071
oh do you mean like memcpy?
▶ No.897076>>897091
>>897056
God I love Python3, and I used to think Python was PHP tier in the Python2 days. The new version really cleaned up the language.
The new string type is amazing. It uses unicode so it does the right thing by default, and there's even unicode metadata integration. Best of all is how they don't treat strings as just arrays of bytes/code points anymore.
'ほげほげ'[::-1]
# => 'げほげほ'
'čšž'[::-1]
# => 'žšč'
The old string type simply became the bytes type, which is actually appropriate. Lots of people just did I/O and used whatever came in or out as opaque data, and the bytes data type is absolutely appropriate for this use.
>>897071
>>897072
Yes.
▶ No.897104>>897110
▶ No.897108>>897163
>>897091
Yeah memcpy has that stupid limitation because muh efficiency. Always use memmove whenever possible.
▶ No.897110>>897131
>>897104
I thought you asked what memcpy did lmao
▶ No.897131
>>897110
the literacy of this one wew
▶ No.897142>>897144
>>897046
>The only advantage of this design is the sheer minimalism of the data structure itself.
You should now be aware that the entire edifice of Unix was built on this.
▶ No.897144
>>897142
He is, and this is a bad thing. Lisp machines are also shit tho.
▶ No.897162>>897167
>>896773
>It's sugar for a const char array.
If ptr[n] is sugar for *(ptr+n) and ptr->m is sugar for (*ptr).m then "foo" is sugar for... what exactly?
▶ No.897163
>>897108
>muh efficiency
Always use asm (specifically MOV) whenever possible.
▶ No.897164>>897175 >>897548
Speaking of C, why do bit fields "look good on paper" but are so bad in practice? The implementation-dependent issues related to endianess/order, alignment, padding etc. make them basically a non-contender compared to just using standard datatypes and adressing specific bits with bitmasks etc.
▶ No.897167>>897168 >>897175
>>897162
{'f', 'o', 'o', '\0'}, more or less.
▶ No.897168>>897174 >>897175
>>897167
(it's not equivalent, so it's not quite syntactic sugar, but it's close enough to serve the broader point that C strings are a convention supported by the syntax)
▶ No.897174>>897176
>>897168
>it's not equivalent
Which was already pointed out here >>896707
Using a string literal such as "foo" puts a const array of char in the heap but itself stands for a pointer to its first element (and thus can be assigned to a pointer variable), while {'f', 'o', 'o', '\0'} represents an actual array and can basically only be used to initialize an array variable on the stack (or a struct).
▶ No.897175>>897184 >>897196
>>897164
You basically answered your own question.
>>897167
>>897168
It's more like
(const char[]) {'f', 'o', 'o', '\0'};
https://ideone.com/CbWfE2
▶ No.897176>>897180
>>897174
>Using a string literal such as "foo" puts a const array of char in the heap
>heap
It's allocated in a read-only ELF section for constants.
▶ No.897180>>897183
>>897176
>read-only ELF section for constants.
And where do you think the segments of the ELF file are loaded?
▶ No.897183
>>897180
In the process's address space.
▶ No.897184>>897186
>>897175
>casting shit to array type
I'm afraid I cannot do that, Dave
▶ No.897185>>897343
>>896925
>C has trouble
>cat -v doesn't
hmm
▶ No.897186
▶ No.897196>>897198 >>897203 >>897209 >>897217
>>897175
#include <stdio.h>
int main(void)
{
const char *s1 = "foo";
const char *s2 = (const char[]){'f', 'o', 'o', '\0'};
const char s3[] = {'f', 'o', 'o', '\0'};
printf("%p\n", s1); /* heap */
printf("%p\n", s2); /* stack */
printf("%p\n", s3); /* stack */
return 0;
}
The above shows how "foo" and (const char[]){'f', 'o', 'o', '\0'} are still not the same.
▶ No.897198>>897200
>>897196
That's an implementation detail not the standard.
▶ No.897200>>897201
>>897198
But if you want to call one piece of syntax "sugar" for another piece of syntax then you must be able to rely on them both to work exactly the same in all contexts.
▶ No.897201
>>897200
I don't call it sugar, not OP.
▶ No.897203>>897204
>>897196
Brainlet detected, are you the same anon who still thinks const char* literals are in the heap?
▶ No.897204>>897205
>>897203
They literally are tho.
▶ No.897205>>897206
>>897204
They're in .rodata, maybe learn about C before you try to write code in it
▶ No.897206>>897207
>>897205
And where do you think that is stored lol?
▶ No.897207>>897211
>>897206
In the address space, usually contiguous with the other data segments. Do you even know what a heap is lol?
▶ No.897209>>897218
>>897196
>heap
>stack
Kill yourself, brainlet.
▶ No.897211
>>897207
Not only that, this kind of process image data is usually allocated at one end of the address space, while the kernel is at the other end, and both processor stack and the process break growing in opposite directions towards the "middle" and each other.
Memory map lets you assign any part of the address space, though. It's no longer neatly sequential like process break and stack. Virtually all memory allocation libraries use mmap.
Nowhere in this picture does a "heap" exist.
▶ No.897214>>897216
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
const int x = 0xf00;
int main(void) {
const char *s1 = "foo",
*s2 = (const char[]){'f', 'o', 'o', '\0'},
s3[] = {'f', 'o', 'o', '\0'};
char *s4 = malloc(sizeof(s3));
memcpy(s4, s3, sizeof(s3));
printf("%p\n", &x); /* rodata */
printf("%p\n", s1); /* rodata */
printf("%p\n", s2); /* stack */
printf("%p\n", s3); /* stack */
printf("%p\n", s4); /* heap */
}
Roast me.
▶ No.897216>>897217 >>897238
>>897214
Yes, and? They are all 0-terminated arrays of char. Where they are allocated does not matter.
▶ No.897217
>>897216
I was just illustrating the problem with >>897196, not making an argument although I find nul-termination distasteful
▶ No.897218>>897221 >>897227 >>897228 >>897292
>>897209
Books should off themselves too?
▶ No.897221
>>897218
Maybe you should read that book to find out what those words mean :^)
▶ No.897227>>897233 >>897265 >>897269 >>897271
>>897218
Yes.
The "stack" is just a pointer maintained by the processor in a register. Managing stack memory consists of incrementing or decrementing this pointer. That's it. The memory lies on the address space. It's fast, but ephemeral due to the nature of functions. You can execute multiple threads/functions/coroutines all with separate stacks on one processor; simply adjust the pointer. You can have split stacks by allocating more memory (from the "heap") and adjusting the stack pointer to point there. It's not just "the stack", it can be much more depending on the language.
A "heap" is essentially a buzzword for dynamic memory allocation. As you demonstrated yourself with your confusion, the definition is so vague it engulfs all non-stack memory, even the kernel and process images mapped onto the ends of the address space. People actually believe that all non-local variables are allocated on some "heap". The world doesn't actually work like that. There is no "heap". The "heap" is some kind of elaborate lie told in the name of abstraction. I don't know who invented this garbage but he should be shot.
The real, simple truth is you can allocate memory by extending the process image break (just a pointer, similar to stack) or by mapping new sections of the address space. Most libraries use mmap, so in practice the "heap" is just a big block of memory malloc asked Linux to map onto the process's address space. All sorts of complicated setups can be created, though. For example, you can create an unreadable, unwritable, execute-only memory region for JIT compiled code.
You're not alone. This shitty buzzword is firmly entrenched in the minds of most programmers, especially those who use virtualized languages. They use it to refer to this mythical heap thing, which is where memory magically comes out of. It's why the people who wrote your book used the word. Nobody really asks why these things are the way they are.
▶ No.897228>>897233 >>897263 >>897265
>>897218
Another thing on your book that's ridiculous and a major pain in the ass is the whole dichotomy between values and references. People simply don't understand that pointers are themselves values. They think some magic happens when you "pass-by-reference" and it somehow lets you access things outside your scope.
The simple truth is you're always passing around values. There is no way to call functions with anything but values as arguments. Pointers are values that represent the address of your data. The value of the pointer itself is copied.
▶ No.897233>>897264
>>897227
There's no need for this confusing shit, it's pretty much a given that when someone says "heap" they're referring to the memory pool used by malloc.
>>897228
There's no magic, but it's important to distinguish values from addresses. On an architecture with separate data and address lines the references would be passed using entirely different registers.
▶ No.897238>>897242
>>897216
>const char s[] = {};
Does not look like it's stored as an array of char at all.
▶ No.897242
>>897238
nvm looks like they are just stored as stack offset instructions.
▶ No.897263>>897264
>>897228
>pointers are themselves values
Yes, but values of a different kind, memory addresses (along with element length specific to a given type which enables pointer arithmetic, unless they are (void *) in which case it's just a bare contextless address) rather than things like ints, floats etc. I like to think about pointers as meta-variables, i.e. variables that can store/represent different variables (e.g. a pointer to int can be assigned the addresses of different int variables at different times, a pointer to a function with certain parameters and return type can represent different functions at different times, etc.).
▶ No.897264>>897266
>>897233
It's a "given" yet the guy above called .rodata the "heap", and even when used to mean malloc memory pools it's still wrong given other data structures are used to keep track of memory.
It's hard enough to name things in computer science, last thing we need is awfully vague terminology consuming all meaning.
>>897263
Not really. Specifically, on x86_64, they are pretty much an unsigned long, which is the 64 bit register type. The number represents the address on the 64-bit address space. Look at the assembly, the instructions are the same, the type just changes the offset of the pointer based on its size. It's just a number.
▶ No.897265>>897286
>>897227
>>897228
So what books do you recommend that teach proper memory management terminology and practices (preferably something concise and to the point (no pun intended) rather than some of the usual 800 or 1300 page mammoth doorstop behemoths where anything interesting starts shining through well after page 150 (after all of the contents, detailed contents, forewords, prefaces to the first as well as various following editions, introductions, acknowlegments, etc. etc.)?
▶ No.897266>>897286
>>897264
>Specifically, on x86_64
Whoa there. That's implementation-specific. The standard does not imply what the internal representation of a pointer even is (hence the %p format specifier and the NULL macro, among other things), so let's not just assume things, m'kay?
▶ No.897269>>897310
>>897227
>there is no spoon, the cake is a lie, ur heap a shit
▶ No.897271>>897286
>>897227
So all of these search results are basically mostly confused and misguided people talking out of their asses?
▶ No.897286
>>897265
Sys V ABI documents and the ELF specification. The Linux Programming Interface. Linux system calls (partly POSIX compatible).
>>897266
Standards don't run code, processors do. Odds are you're running either x86_64 or ARM. I'm very much interested in the semantics of these platforms.
>>897271
>stack variables can't be accessed by other functions
>heap variables are global in scope
>stack is static memory allocation
>I memorized that objects allocated with new go on the heap
Yeah these people are pretty confused. At least they stay on the topic of dynamic memory allocation.
▶ No.897292>>897299
>>897218
>introduction to a proprietary botnet in the last chapter
>in a fucking book
yes.
▶ No.897299>>897301
>>897292
>proprietary botnet
The book's title is "Programming for Engineers - A Foundational Approach to Learning C and Matlab", so there's no surprises I guess. It focuses mostly on C though.
Also, it's not even its final chapter.
▶ No.897301>>897303
>>897299
you didn't reveal the title until this point.
▶ No.897303>>897306
>>897301
The point being? The fact it has some chapters on Matlab towards the end was irrelevant to the discussion on memory from a C point of view.
▶ No.897306>>897309
>>897303
Without knowing the title of the book, it's harder to evaluate if it's worth its salt or not. I know we can search by chapter names, etc., but there can be collisions and whatnot
▶ No.897309
>>897306
It's certainly unusual for a "foundational approach to learning C" type book in that it immediately introduces pointers and under-the-hood memory concerns. The overwhelming majority of texts talks about pointers no earlier than halfway through the book (though that brings about that revealing moment when the reader finally understands how arrays really work and why he didn't need the & operator with the %s specifier in scanf()), and many books on the language don't even talk about the stack or memory organization at all but just how to use pointers etc.
▶ No.897310
>>897269
>1999: "there is no spoon"
>2007: "the cake is a lie"
>2018: "you're \"\"\"heap\"\"\" a shit"
kek
how will we ever recover
▶ No.897343
>>897185
>>and some parts of C and Unix do support them
putchar('\0') works fine, to name one. It's a problem if you work with C-style strings but not even everything in the standard library works with C-style strings - it's only a convention.
▶ No.897347>>897352 >>897465
$ cat test.c
#include <stdio.h>
const char s[] = {'h', 'e', 'l', 'l', 'o', '\0'};
int main (void)
{
puts(s);
}
$ gcc test.c; and md5sum a.out
5f6bdc1973f14a557f104df5e44cb259 a.out
$ cat test.c
#include <stdio.h>
const char s[] = "hello";
int main (void)
{
puts(s);
}
$ gcc test.c; and md5sum a.out
5f6bdc1973f14a557f104df5e44cb259 a.out
▶ No.897352>>897357
>>897347
Nice post, I now realize that the other anon was trying to demonstrate that C char* literals are syntactic sugar.
▶ No.897357
▶ No.897445>>897452
>>895874
>data from 2012
lol
▶ No.897452
▶ No.897465
>>897347
>md5'ing the output binary
Just because it happens to compile to the exact same binary doesn't mean the two expressions in the source code are the same. It was already proven above that you cannot directly "assign" things like {'h', 'e', 'l', 'l', 'o', '\0'}; to a char pointer (and if you coerce it by casting to array type then it's going to be allocated on the stack like an array rather than elsewhere as "hello" would). So they're NOT exactly the same.
▶ No.897515
>>895728
malloc technically records the length of your object somewhere in the heap for free.
everything being aligned to 4/8 byte offsets mean there can be up to 3/7 bytes of padding.
▶ No.897547>>897551
>393 replies
And so this larper playground of a thread is slowly drawing to a close.
▶ No.897548>>897555 >>897805
So you people opposing ^@-terminated strings would like some sort of convoluted clusterfuck of a format in its place (which likely would clunky, messy and mostly nonportable like the bit fields mentioned by >>897164 are)?
▶ No.897555>>897558
>>897548
It would be enough to have a struct containing a char array and a length and some functions and syntactic sugar so that you don't have to touch the struct directly in 90% of programs. What's so nonportable about that?
Bitfields are badly portable because they have too little abstraction. This is a proposal to add more abstraction.
▶ No.897558>>897559
>>897555
structs are very implementation dependent vs a cstring which you can just push over the network with no extra parsing on the other end
▶ No.897559>>897567 >>897589
>>897558
Nobody is going to push the actual struct over the network, moron.
▶ No.897567>>897581
>>897559
Which is why its shit. You can do that with c strings. BTW people actually do that all the time with structs.
▶ No.897581
>>897567
Chances are you can do that just fine. Length and pointer are 128 bits and aligned, there shouldn't be any padding. You can also just send exactly length bytes of data referred to by the pointer.
▶ No.897589>>897592 >>897593
>>897559
How do you think network protocol headers are implemented? And you need to have a guarantee that any implementation of the protocol internally stores every data unit (be it a 64-bit int or a 1-bit bitfield) is stored at the exact same place regardless of implementation-specific shit like endianess, alignment, padding etc..
▶ No.897592
>>897589
What stops you from simply
send(socket, string.pointer, string.length, 0);
like a normal person?
▶ No.897593>>897595
>>897589
You know these structures aren't dumped as-is onto the network, right? For starters, there's endianness concerns, as two machines might not share the same, and then there's the smaller problem of C compilers paddings structures depending on a lot of architecture-dependent factors, making two structures, containing the same data, on two different machines, be potentially different.
▶ No.897595>>897604 >>897706
>>897593
Yeah we have compiler directives for creating packed structs that can be sent out on the network, but I don't know of any way to deal with endianness that doesn't involve conditional compilation.
▶ No.897604
>>897595
>I don't know of any way to deal with endianness that doesn't involve conditional compilation
Because I don't think there is any, unless you limit yourself to using strictly 1-byte data units, such as in a text-based protocol.
▶ No.897671>>897851
>>895727 (OP)
The difference between sentinel-terminated and length-tagged structures in C is negligable for most cases. The real issue with C is memory safety. PHP had some retarded issues because it had functions that treat input as a zero-terminated string and others that treat input as a length-tagged string. But that's because PHP is and always was, fucking retarded. No other language which claims to be high level is full of basic issues ike this.
>be on tor for 10 years
>no ad blocker
>never seen an ad
>just see stuff like this instead:
>If you're using an adblocker, please consider supporting this site via Patreon or PayPal
▶ No.897706
>>897595
>deal with endianness that doesn't involve conditional compilation
htonl htons / ntohl ntohs
▶ No.897805>>897851
>>897548
They probably also hate jagged arrays and linked lists terminated by NULL pointers, and would demand those structures store their length/number of elements at all times instead. Go figure.
▶ No.897851>>897957
>>897805
>linked lists terminated by NULL pointers
pointers in linked lists are not in-band with data, that's a stupidly bad analogy
(someone already explained it ITT by the way, it's sad that it needs to be repeated)
>jagged arrays
who said they need to use sentinel values to encode the length?
>>897671
>The difference between sentinel-terminated and length-tagged structures in C is negligable for most cases
It is not negligible with regard to correctness and code complexity. You seem to miss the bigger picture.
▶ No.897957
>>897851
The bigger picture is that non trivial c programs are absolutely full of memory errors that even experts who have been doing it for 30 years have trouble with.