Good luck with your 256 characters.
255, generally, because null termination. ZFS does 1023, the argument not being “people should have long filenames” but “unicode exists”, ReiserFS 4032, Reiser4 3976. Not that anyone uses Reiser, any more. Also Linux’ PATH_MAX of 4096 still applies. Though that’s in the end just a POSIX define, I’m not sure whether that limit is actually enforced by open(2)… man page speaks of ENAMETOOLONG but doesn’t give a maximum.
It’s not like filesystems couldn’t support it it’s that FS people consider it pointless. ZFS does, in principle, support gigantic file metadata but using it would break use cases like having a separate vdev for your volume’s metadata. What’s the point of having (effectively) separate index drives when your data drives are empty.
When you run out of characters, you simply create another 0 byte file to encode the rest.
Check mate, storage manufacturers.
File name file system! Looks like we broke the universe! Wait, why is my MFT so large?!
You want real infinite storage space? Here you go: https://github.com/philipl/pifs
Finally someone uses the fact that compute time is so much cheaper than storage!
that’s awesome! I’m just migrating all my data to πfs. finally mathematics is put to a proper use!
I had a manager once tell me during a casual conversation with complete sincerity that one day with advancements in compression algorithms we could get any file down to a single bit. I really didn’t know what to say to that level of absurdity. I just nodded.
That’s the kind of manager that also tells you that you just lack creativity and vision if you tell them that it’s not possible. They also post regularly on LinkedIn
Send him your work: 1 (or 0 ofc)
Well he’s not wrong. The decompression would be a problem though.
Yeah with lossy compression the future is today!
u can have everthing in a single bit, if the decompressor includes the whole universe
You can give me any file, and I can create a compression algorithm that reduces it to 1 bit. (*)
spoiler
(*) No guarantees about the size of the decompression algorithm or its efficacy on other files
Broke: file names have a max character length.
Woke: split b64-encoded data into numbered parts and add .part-1…n suffix to each file name.
each file is minimum 4kb
(base64.length/max_character) * min_filesize < actual_file_size
For this to pay off
each file is minimum 4kb
$ touch empty_file $ ls -l total 8 -rw-rw-r-- 1 user group 0 may 14 20:13 empty_file $ wc -c empty_file 0 empty_file
Huh?
Oh, I’m thinking folders aren’t I. Doy…
It seems those are 4 KiB on Linux, interesting to know.
It’s all fun and games until your computer turns into a black hole because there is too much information in too little of a volume.
Even better! According to no hiding theorem, you can’t destroy information. With black holes you maybe possibly could be able to recover the data as it leaks through the Hawking radiation.
Perfect for long term storageCan’t wait to hear news about a major site leaking user passwords through hawking radiation.
i love this comment
Really-long term storage :)
If you have a tub full of water and a take a sip, you still have a tub full of water. Therefore only drink in small sips and you will have infinite water.
Water shortage is a scam.
If you have a water bottle and only drink half of it each time, you will also have infinite 💦
Reality is stranger than fiction:
Nice stuff.
I got sold on the :
EOF
does not consume less space than “5”because, even though the space taken by the filesystem is the fault of the filesystem, one needs to consider the minimum information requirements of stating starts and ends of files, specially when stuff is split into multiple files.
I would have actually considered the file size information as part of the file size instead (for both the input and the output) because, for a binary file, which can include a string of bits which might match an
EOF
, causing a falsely ended file, would be a problem. And as such, the contestant didn’t go checking forcharacter == EOF
, but used the function that truly tells whether the end of file is reached, which would, then be using the file system’s file size information.Since the input file was a 3145728 bytes and the output files would have been smaller than that, I would go with 22 bits to store the file size information. This would be in favour of the contestant as:
- That would be the minimum (hyh) number of bits required to store the file size, making it as easy as possible for the contestant to make more files
- You could actually go with 2 bits, if you predefine MiB to be the unit, but that would make it harder for the contestant, because they will be unable to present file sizes less than 1 MiB, and would have to increase the file size information bits
On the other hand, had the contestant decided to break the file between bits (instead at byte ends), instead of bytes (which, from the code, I think they didn’t) the file size information would require an additional 3 bits.
Now, using this logic, if I check the result:
From the result claimed by the contestant, there were 44 extra bytes (352 bits) remaining.
+ 22 bits for the input file size information - 22*219 bits for the output file size information because 219 files
so the contestant succeeds by
352 + 22 − (22 × 219) = −4444
bits. In other words, fails by 4444 bits.Now of course, the output file size information might be representable in a smaller number of bits, but to calculate that, I would require downloading the file (which I am not in the mood for.
And in that case, you would require additional information to tell the file size bits. So;- 5 bits for the number
22
in the input - 5 bits for the size of the file size information (I am feeling this won’t give significant gains) and rest of the bits as stated in the first 5 bits, as the file size bits
- you waste bits for every file size requiring more than 16 bits to store the file size information
- it is possible to get a net gain with this, as
qalc
says,log(3145728 / 219, 2) = (ln(1048576) − ln(73)) / ln(2) ≈ 13.81017544
But even then, you have
352 + 5 + 22 − (5 + (14 × 219)) = −2692
for the best case scenario in which all output file sizes manage to be under 14 bits of file size informations. More realistically, it would be something around352 + 5 + 22 − ((5 + 14) × 219) = −3782
because you will the the 5 bits for every file, separately, with the14
in this case, be a changing value for every file, giving a possibly smaller number.
If instead going with the naive 8 bit
EOF
that the offerer desired, well, going with 2 consecutive characters instead of a single one, seems doable. As long as you are able to find enough of said 2 characters.
After going on a little google search, I seem to think that in a 3MiB file, there would be either 47 or 383 (depending upon which of my formulae was correct) possible occurrences of the same 2 character combination. Well, you’d need to find the correct combination.But of course, that’s not exactly compression for a binary file, as I said before, as an
EOF
is not good enough.This was too damn funny for what I expected it to be
I was sort of on Mike Goldman (the challenge giver)'s side until I saw the great point made at the end that the entire challenge was akin to a bar room bet; Goldman had always set it up as a kind of scam from the start and was clearly more than happy to take $100 from anyone who fell for it, and so should have taken responsibility when someone managed to meet the wording of his challenge.
Yeah, he was bamboozled as soon as he agreed to allow multiple separate files. The challenge was bs from the start, but he could have at least nailed it down with more explicit language and by forbidding any exceptions. I think it’s kind of ironic that the instructions for a challenge related to different representations of information failed themselves to actually convey the intended information.
That story is immediately what came to mind.