Compression

Email
Comment *
File	Select/drop/paste files here
Password	(Randomized for file and post deletion; you may also set your own.)
* = required field	[▶ Show post options & limits] Confused? See the FAQ.

Flag
Oekaki	Show oekaki applet (replaces files and can be used instead)
Options	Do not bump (you can also write sage in the email field) Spoiler images (this replaces the thumbnails of your images with question marks)
Allowed file types:jpg, jpeg, gif, png, webm, mp4, pdf Max filesize is 16 MB. Max image dimensions are 15000 x 15000. You may upload 3 per post.

File (hide): 2be48f971cbd655⋯.png (2.94 KB, 247x78, 19:6, packed.png) (h) (u)

▶Compression Anonymous 01/01/19 (Tue) 13:12:58 No.1015164>>1016057 >>1016144 >>1016410 >>1016952 >>1018721 >>1026386 [Watch Thread][Show All Posts]

Do people actually still use compression tools? It seems like they kinda went out of fashion.

Also

>winrar STILL the best archiver

>7shit still an unholy abomination

▶Anonymous 01/01/19 (Tue) 13:24:48 No.1015166>>1015172

>2018+1

>still no archiver that supports modifying solid archives

feels bad man

▶Anonymous 01/01/19 (Tue) 14:26:24 No.1015172>>1015236 >>1015271

>>1015166

Every archiver from CompactPro to 7-Zip could do that, it's just slow, since it has to decompress and recompress basically the entire archive. Unless you mean the eunuchs style of compressing a TAR file, which is retarded.

▶Anonymous 01/01/19 (Tue) 18:24:42 No.1015236

>>1015172

>Unless you mean the eunuchs style of compressing a TAR file, which is retarded.

>let's reinvent tar everytime, without any portability!

inb4 tar isn't portable, use the POSIX format

▶Anonymous 01/01/19 (Tue) 19:56:46 No.1015271>>1016305 >>1016585

>>1015172

Nope, it doesn't work. Try it yourself.

Make a solid 7z archive of a file, then open the archive in the 7z GUI and drop in a copy of the same original file with a different name. It will double the size of the archive.

And even considering that, I'm not convinced it isn't possible to do it without decompressing and recompressing everything, at least if you're willing to lose some compressing ability. After all compression is based on a dictionary, if you can keep the same dictionary used for the original files you should be able to just compress the later data without recompressing the original data.

▶Anonymous 01/04/19 (Fri) 00:43:39 No.1016057>>1016410

File (hide): f88577ce906725c⋯.jpg (199.38 KB, 675x800, 27:32, Cszs-dhWAAA16kL.jpg) (h) (u)

>>1015164 (OP)

I use winrar for the also only for encryption. Shit is like open standard and can be opened by any archiving program which makes it confy, without compromising security.

aes256 + 60 char password. Doesn't matter if its PGP or WinRar.

▶Anonymous 01/04/19 (Fri) 05:21:20 No.1016144

>>1015164 (OP)

Normies are too dumb for compressors and with bandwidths people have available slowly rising the balance shifted and it was more important to make tards able to handle the files than to download them faster.

▶Anonymous 01/04/19 (Fri) 16:42:47 No.1016305>>1016424

>>1015271

>And even considering that, I'm not convinced it isn't possible to do it without decompressing and recompressing everything

I don't think you understand what a solid archive is

▶Anonymous 01/05/19 (Sat) 01:05:27 No.1016410>>1016455

File (hide): e45f3b8a325c280⋯.jpg (54.19 KB, 640x774, 320:387, 1546575233155.jpg) (h) (u)

>>1015164 (OP)

>Do people actually still use compression tools? It seems like they kinda went out of fashion.

They have not. Windows is still in the stone age in terms of applied compression usage. Zstd and LZ4 are pretty exciting these days and you still can't beat XZ.

Seriously guys, add something to NTFS or your new ReFS.

>>1016057

Ark for linux is my comfy. I wish winrar would at least visible source their rar files as they're god tier in terms of ability to resist bit destruction of the underlying files with added redundancy flags. No DAR+PAR2 needed. I don't like using a filesystem/ compression algo unless the source is out somewhere...

But no, they're anal about their century old compression algo getting loose.

So fuck 'em I guess.

▶Anonymous 01/05/19 (Sat) 01:57:06 No.1016424>>1016544

>>1016305

Sure I understand. You turn everything into a byte stream and run a compression algorithm over it. But I'm not convinced it isn't possible to append data at the end of the compressed stream and still get some compression.

After all compression is based on a dictionary, if you can keep the same dictionary used for the original byte stream you should be able to just compress the additional data using the dictionary built for the original archive without recompressing the original data.

Now, if you modified the original dictionary, then yes, you're going to have to recompress everything, unless you just append more entries at the end of the existing dictionary.

▶Anonymous 01/05/19 (Sat) 03:46:00 No.1016455>>1016463

>>1016410

>you still can't beat XZ

https://www.nongnu.org/lzip/xz_inadequate.html

TL;DR: it's amateur software, and you'll be bitten in the ass eventually.

▶Anonymous 01/05/19 (Sat) 04:02:41 No.1016463

File (hide): 858a47b6a4362a9⋯.png (271.99 KB, 577x382, 577:382, 1546503782278.png) (h) (u)

>>1016455

Christ.

Maybe I should just write my own graphical &/ or terminal addon for Ark to utilize Par2 with whatever you want to throw at it.

▶Anonymous 01/05/19 (Sat) 10:37:02 No.1016544>>1016643

>>1016424

>But I'm not convinced it isn't possible to append data at the end of the compressed stream and still get some compression.

It literally is not possible unless you have a compression format with some very specific and unusual properties.

In most formats, addding data does not translate into appending bytes at the end of the compressed file, as the new data will certainly change lenght headers, CRCs, and other redundant enconding.

>if you can keep the same dictionary

The dictionary not changing can easily mean the end of the old data concatenating with the start of the new data to create bogus results.

Imagine if "PINGAS" decompresses to the navy seal copypasta in an archive ending with "PIN", and then you attach data starting with "GAS".

The only way to fix that is to decompress and recompress everything, at least for normal formats.

▶Anonymous 01/05/19 (Sat) 15:13:01 No.1016585>>1016643

>>1015271

I don't think you understand what is going on in the software, or the design decisions necessary.

If you expect putting in an identical file with different name to not take up double space, then the software would need to check every file in the archive against every other file in the archive for this potential' saving. The order of calculations goes up dramatically, and the time you need to wait also.

You would have to be separating the filenames from the content and hash/checksum them separately.

When a user creates an archive it is normal to assume the user would like the archive created as quickly as possible.

▶Anonymous 01/05/19 (Sat) 20:42:12 No.1016643>>1016788 >>1016892

>>1016544

>In most formats, addding data does not translate into appending bytes at the end of the compressed file, as the new data will certainly change lenght headers, CRCs, and other redundant enconding.

True, for most formats you will have to re-read the past data to recalculate the checksum. But CPU and memory wise that's gonna be less expensive than recompressing everything. And considering most compressors can only work in single threaded mode, this is likely to speed up things a lot.

Unless there's a format which only does checksums for small blocks, then you can just fix the last block.

>The dictionary not changing can easily mean the end of the old data concatenating with the start of the new data to create bogus results.

Not really I think. Compression formats which work like that must have a way to specify how long is the sequence of bytes for each dictionary reference in the compressed stream and whether something is a literal or a dictionary reference, otherwise how could you decode anything at all? So the decompressor will know the symbol in the compressed stream ends at "N", and "G" is the beginning of a new symbol. Because otherwise you couldn't compress anything even in normal operation, since at any point in the process you don't know whether you should stop and process the past n bytes as a reference or keep appending bytes to the reference.

But yes, it is true that for re-compressing an archive you would have to re-create the dictionary first by reading from the existing archive first, but considering dictionaries are generally at most a few gigs as to fit in memory, for large archives it'll be a small portion of the overall compression time.

Think about it. If you take the time to prime the encoder as it would originally be at the end of the original archive, and begin processing and appending more data, it's just as if the original byte stream never ended in the first place.

>>1016585

>I don't think you understand what is going on in the software, or the design decisions necessary.

t. webdev

>what is a hash of a file's content

>what is a search tree

>what is a dictionary

>what is solid compression

>what is deduplication

>When a user creates an archive it is normal to assume the user would like the archive created as quickly as possible.

Nope. If the goal was to create the "archive" as quickly as possible then he wouldn't use compression at all. He would just use a regular folder, or if the goal was to have everything in a single file as quickly as possible, then he would use the "Store" (ie. zero compression) method in the archiver.

When using compression, time is obviously not the main goal.

When you compress two copies of the same file together in either winrar or 7zip in one go and enable "solid archive", the result is an archive taking less space than one individual copy of the file. The problem here is adding files to that archive. Deduplicating data when making an archive in one go is a solved problem.

▶Anonymous 01/06/19 (Sun) 06:24:35 No.1016768

Yes. I use transparent compression on pretty much every drive I own. I have compressed almost 3x as much data at each of my drives combined could handle on their own. So I'd say it's still very much fashionable. For everyone who isn't a poo in the loo that is.

▶Anonymous 01/06/19 (Sun) 07:56:26 No.1016788>>1016794

>>1016643

> creating an archive should be no different from adding files to archive

Well it is.

>time to compress isn't an issue

If you tried to compress 3 text files and receive a dialog "ETA 15 hours" you'd quickly find that statement is incorrect.

▶Anonymous 01/06/19 (Sun) 08:38:11 No.1016794

>>1016788

>Well it is.

No shit Sherlock, that's what I was complaining about in the first place. Unless you're saying it's inherently impossible, in which case, prove it.

<If you tried to compress 3 text files and receive a dialog "ETA 15 hours" you'd quickly find that statement is incorrect.

>things that never happened

You can do a solid archive right now which will detect and deduplicate copies of the same file, and it won't take you any more time than a regular archive. how about you stop talking out your ass for once?

▶Anonymous 01/06/19 (Sun) 09:32:58 No.1016810

I use SHADOW compression which can bring dozens of terabytes down to a dew KB.

▶Anonymous 01/06/19 (Sun) 10:01:25 No.1016814

gnu tar will include facebooks zstandard in future releases, what do you think about it?

▶Anonymous 01/06/19 (Sun) 15:07:14 No.1016884

I use fusecompress for email.

▶Anonymous 01/06/19 (Sun) 15:43:26 No.1016892>>1017025

>>1016643

>But CPU and memory wise that's gonna be less expensive than recompressing everything.

If CPU and memory are a concern, why use solid archives?

More importantly, why use solid archives if you expect to modify them later?

>Compression formats which work like that must have a way to...

Nope, there are way better methods to do it, but they are not append-friendly as they reserve unused sequences as keys for the dictionary.

It's essentially a very fancy weighted encoding.

▶Anonymous 01/06/19 (Sun) 18:46:48 No.1016952>>1017032

>>1015164 (OP)

>winrar STILL the best archiver

t.DRM fag

>7shit still an unholy abomination

So you blame the fact that you don't know how to use 7zip on it's design ? Interesting.

Peazip is also a thing btw.

▶Anonymous 01/06/19 (Sun) 19:13:14 No.1016960>>1017032

>>winrar STILL the best archiver

>>7shit still an unholy abomination

>likes begware

7-Zip masterrace fagit!

▶Anonymous 01/06/19 (Sun) 19:15:00 No.1016961>>1017044

File (hide): 78881ec80078597⋯.gif (44.58 KB, 499x499, 1:1, smugpepe.gif) (h) (u)

Best way to archive in 2019:

>Compress with 7-Zip on maximum

>Create PAR file(s) with 100%+ parity with MultiPAR

>Store/distribute PAR files

▶Anonymous 01/06/19 (Sun) 21:30:01 No.1017025>>1017355

>>1016892

>If CPU and memory are a concern, why use solid archives?

>More importantly, why use solid archives if you expect to modify them later?

Because solid archives compress duplicate files adequately, which could be cool for instance for backups.

And not having to recompress not only saves you CPU and memory, but also saves you from needing double the disk size of the uncompressed data and possibly getting data corruption from moving around a lot of data instead of it remaining statically on the disk.

>Nope, there are way better methods to do it, but they are not append-friendly as they reserve unused sequences as keys for the dictionary.

What do you mean "unused sequences"? You mean sequences which are not used as literals? Again, ANY compression format must have a way to determine the length of each sequence as being either a key or a literal, since otherwise any given cut in the compressed stream could be a literal or a dictionary key.

▶Anonymous 01/06/19 (Sun) 21:36:36 No.1017032

>>1016952

7zip doesn't have any redundancy functionality, nor does it have any way to automatically repair the archives.

RAR can repair an archive after both the beginning and end headers have been nuked, and if you add 5% data redundancy it can resist making multiple holes filled with gibberish across the archive, and the data decompresses just fine. With 7z if you damage any part of the archive, good luck getting those files back... RAR also supports pre-set profiles from the file manager's context menu which 7z does not.

>>1016960

You can get download a license file to get rid of the begging.

▶Anonymous 01/06/19 (Sun) 21:56:36 No.1017044>>1026278

>>1016961

That doesn't save you from errors which might have happened to the data from the moment it's saved to the disk as raw data, fetched, stored in memory, processed and then saved as compressed data.

I think the only way to prevent bit rot is to have ECC memory and either software RAID 5 or a filesystem with error correction like ZFS. Software RAID because I believe it's easier for the data to get corrupted traveling from the CPU to the disk controller through the PCI interface than just being processed locally on the CPU.

Then yeah, you can confidently store your shit with parity data and be reasonably sure that it's going to be decompressed alright. Just make sure not to use PAR1, because it's not very resilient to corruption in the redundancy data.

▶Anonymous 01/07/19 (Mon) 16:17:13 No.1017355>>1017469

>>1017025

>Because solid archives compress duplicate files adequately, which could be cool for instance for backups.

Backup files are not supposed to be modified, if you are doing incremental backups you should not compress those.

>but also saves you from needing double the disk size of the uncompressed data and possibly getting data corruption from moving around a lot of data instead of it remaining statically on the disk.

Those are unreasonable concerns: you should never have that little free space, and you should absolutely not use an archive format that cannot notice and correct an error when processing an archive nor a system without some built in ECC (SATA has it, so you're probably set).

>Again, ANY compression format must have a way to determine the length of each sequence as being either a key or a literal

Sure, but that mostly happens in an implicit way, such as in the "expand these sequences as soon as you meet them" encoding, compared to the very simple and explicit "0 byte as end word" variable lenght encoding.

Unsurprisingly, most implicit methods (and a few explicit ones) are not append-friendly.

▶Anonymous 01/08/19 (Tue) 00:18:57 No.1017469>>1017613

>>1017355

>Backup files are not supposed to be modified, if you are doing incremental backups you should not compress those.

Why not? It'd be basically like ZFS or btrfs deuplication, except more effective (because it can handle arbitary offsets) and simpler to use with less of the complexity that's required for an actual filesystem. Not to mention it'd also work with disk images.

All current incremental schemes are either wasteful (copy the same file again even for 1 changed bit, or break the deduplication even for 1 additional byte in the middle of a file which offsets all the following data) or clunky, buggy pieces of shit.

>Those are unreasonable concerns: you should never have that little free space

Why not?

Besides, if we're talking about incremental backups with almost identical copies of the same files, the raw data could easily be 10 times the size of the compressed archive or more. One possible solution would be to decompress to a pipe and then compress again (I think currently GUI programs don't do that though) but you'd still need twice the space.

>you should absolutely not use an archive format that cannot notice and correct an error when processing an archive nor a system without some built in ECC (SATA has it, so you're probably set).

How would an archiver be able to detect bogus data beign received by the disk or coming from the SATA controller?

Sure, SATA has some error correction built into it, but in real life there is silent data corruption, orders of magnitude more than would be expected just from CRC collisions.

>The CERN study used a program that wrote large files into CERN’s various data stores, which represent a broad range of state-of-the-art enterprise storage systems (mostly RAID arrays), and checked them over a period of six months. A total of about 9.7 × 10^16 bytes was written and about 1.92 × 10^8 bytes was found to have suffered silent corruption, of which about 2/3 was persistent; re-reading did not return good data. http://www.ijdc.net/article/view/151/224

Other than doing the same operation twice and comparing hashes, the only way to be more or less safe is to use RAID or ZFS with ECC ram. Not even parchive can solve the issue of data being silently damaged while traveling from the disk to the CPU.

>Sure, but that mostly happens in an implicit way, such as in the "expand these sequences as soon as you meet them" encoding, compared to the very simple and explicit "0 byte as end word" variable lenght encoding.

>Unsurprisingly, most implicit methods (and a few explicit ones) are not append-friendly.

Not being friendly is not the same as being impossible.

The only issue I can see is if you would have to modify the data (some kind of checksum or folder structure I guess) that's stored in the compressed stream, in that case yeah, it probably would be impossible.

▶Anonymous 01/08/19 (Tue) 12:12:49 No.1017613

>>1017469

>Why not?

Backups are a last defense against data loss.

Incremental backups are backups focused on preventing data loss in a short term: they should only include a very small portion of your files (the ones you work with in that short term) and they should be fast to create and fast to roll back to.

Compressing incremental backups makes them slower to create/verify/roll back, adds a small risk of data loss, and only saves you a bit of dirt cheap storage space: at that point, you may as well skip incremental backups completely.

>Why not?

Because storage is dirt cheap, while data loss is not.

>if we're talking about incremental backups with almost identical copies of the same files, the raw data could easily be 10 times the size of the compressed archive or more.

Then use some version control software to deduplicate and backup those files there.

>How would an archiver be able to detect bogus data beign received by the disk or coming from the SATA controller?

Didn't mean that, I meant that the SATA controller is going to notice most read errors and either autocorrect them or retry the read, and then your archiver is going to notice most errors in the archive itself via CRCs and other verifications.

>but in real life there is silent data corruption

I have to doubt the stats in the CERN study as they imply a 1 in 10^9 bytes in 6 months permanent write corruptions.

That means, I should see bogus hashes on many (quick calc says 60%+) 1GB+ files after 6 months from them being written to disk, and my use case includes having a lot of large files that are validated via QuickSFV and stay around for several months.

Yet, I have seen but a single of those files fail hash verification in several years, on a system without ECC nor RAID and in general very far from the state of the art.

More importantly, no hash mismatches on 10+ GB files, which should have a 99,99+% of corruption.

Let's not even consider system images and backups themselves, even a simple windows install .iso should go bad more than half of the time.

I guess there was some issue at CERN, such as a disk sector going bad, because the 1 in 10^9 over 6 months figure is completely and utterly unbelievable.

>Not being friendly is not the same as being impossible.

In the way I used it, it's close enough.

It means those systems cannot guarantee that appending data will produce valid data as a result, let alone guarantee it will be the data you want.

▶Anonymous 01/12/19 (Sat) 04:48:00 No.1018721

>>1015164 (OP)

A lot of networking protocols use it under the hood.

A lot of package management systems use it under the hood.

Self-extracting installers use it under the hood.

All the videos, audio and pictures you see on the internet use it under the hood.

They aren't out of fashion at all. What is out of fashion is asking the user to do a separate manual step to decompress files when the step can be automated.

OP is a faggot

▶Anonymous 01/15/19 (Tue) 23:33:35 No.1019897

i just use tar if i need to put multiple files in a archive.. not for compression really but its easier and faster to move multiple small files when they are in one archive.

▶Anonymous 02/02/19 (Sat) 09:27:40 No.1026238

File (hide): 4a212f81752f9fc⋯.png (104.31 KB, 480x360, 4:3, ClipboardImage.png) (h) (u)

▶Anonymous 02/02/19 (Sat) 12:48:25 No.1026278

>>1017044

what fucking problem are you trying to solve? why do I care whether 4KB of my source code disappears vs the entire thing? i use backups for that. and any other type of data, like movies or music already turns to shit (e.g lost an entire region covered by i-frame and now you have youtube quality shit) the moment a single bit is lost

▶Anonymous 02/02/19 (Sat) 17:19:53 No.1026322

so how do I save my boobie vids without the nipples getting all corrupted and shit

▶Anonymous 02/02/19 (Sat) 18:14:39 No.1026336>>1026397

Dunno what CERN did in that one infamous study but bitrot in modern hardware is not really common at all and it's much more likely that your storage hardware simply fails. There's such an heavy amount of CRC and ECC stuff going on in even the cheapest modern drives (who all use the same groups of controllers, firmware implementations etc. anyways, no real chance to save money by fucking things up there anymore) the chance that you silently get presented with a corrupted dataset is very, very low. Much more likely that the drive generates a reading fail because data failed checksumming. This is a problem imagined by software guys who don't know how the hardware side works. More likely are transmission errors because of dodgy NICs (*cough*realtek*cough*) and defective RAM but no filesystem will protect you from that.

▶Anonymous 02/02/19 (Sat) 19:41:22 No.1026386

>>1015164 (OP)

>winrar

>7zip

<not using GNU/CPIO or pax (as defined in the POSIX) and then piping it into bzip2

LoL,,, what a winbabby or macfag you are.

▶Anonymous 02/02/19 (Sat) 20:01:01 No.1026397>>1026522

>>1026336

>dodgy NICs (*cough*realtek*cough*)

What's the alternative except for Intel?

▶Anonymous 02/03/19 (Sun) 02:01:32 No.1026522

>>1026397

many should exist. just look at the linux kernel driver list

/tech/ - Technology★

General

WebM

Theme

User JS

Do not paste code here unless you absolutely trust the source or have read it yourself!

Favorites

Customize Formatting

Filters