Wikipedia:Reference desk/Archives/Computing/2024 January 6

= January 6 =

Smart way to insert byte into the beginning of a file?
Let's assume you that use NTFS, APFS or ext4 on a disk, and let's say that you open a 1 GB log file in vim, then do one of the following:

· If you add a single character at the END of the file and save, only the last disk block have to be updated.

· If you add a single character at the BEGINNING of the file and save, the first and ALL the consecutive disk blocks have to be updated.

Is my assumption correct, or is there some clever algorithm that can deal with this type of file change without rewriting the whole file? 81.170.205.178 (talk) 20:03, 6 January 2024 (UTC)
 * Vim writes the new file before removing the old one, therefore even if you just add a single character to the end all the blocks are rewritten. For example:

$ stat speed.csv File: speed.csv Size: 2109     	Blocks: 8          IO Block: 4096   regular file Device: fd02h/64770d	Inode: 10252      Links: 1 ... Access: 2023-12-25 17:10:13.528569022 +0000 Modify: 2023-12-25 17:10:01.098391471 +0000 Change: 2023-12-25 17:10:01.098391471 +0000 Birth: 2023-12-25 17:09:45.657170913 +0000 $ vi speed.csv #add a single space to the end of the last line $ stat speed.csv File: speed.csv Size: 2110     	Blocks: 8          IO Block: 4096   regular file Device: fd02h/64770d	Inode: 291906     Links: 1 ... Access: 2024-01-06 20:25:24.921739607 +0000 Modify: 2024-01-06 20:25:24.921739607 +0000 Change: 2024-01-06 20:25:24.982740441 +0000 Birth: 2024-01-06 20:25:24.921739607 +0000
 * I realised that when I removed the access informtion I'd also removed the inodes numbers. Notice that they are different as well as all the date/time fields. Martin of Sheffield (talk) 20:35, 6 January 2024 (UTC)
 * There is no POSIX way to do it besides leaving it up to the filesystem code. Modern filesystems implement lots of fancy algorithms like B-trees to make things more efficient, so it's not necessarily the case that every file block will be rewritten; files on the fs aren't just a linear linked list of blocks unless you're using a rudimentary FS like FAT.
 * On up-to-date Linux and BSD, they added a syscall to make this broad class of stuff even more efficient: copy_file_range. You could create a new file, prepend your stuff, then append the source file via that and finally move the new file over the old. The point of the syscall is it lets the kernel internally do the "copy" via copy-on-write, so the data doesn't have to actually be duplicated on-disk unless you modify it. Then when you move the new file over, the kernel just updates the file info in the inode to point to the updated file data starting at what you prepended. --Slowking Man (talk) 05:51, 11 January 2024 (UTC)
 * It's not driven by the FS though, but by vim. Traditionally vi wrote from its buffers to the FS and only when the write was successful did it move the new file over the old.  The data is already duplicated and written to the disk before the move starts.  Now I know that vim does a lot of (often annoying) things but I'd be surprised if it didn't retain this safeguard, consider what would happen if the system crashed during the write of a large file, lets say a million lines.  Which are new and which are old?  Martin of Sheffield (talk) 10:18, 11 January 2024 (UTC)
 * vim's behavior depends on the settings:


 * So vim with stock settings copies the original to a backup, then overwrites the destination file, then deletes the backup after a successful write, and this is indicated as the vi default. vim also truncates the destination file before writing out its buffer, which means generally the FS will throw out all the existing file data, and then allocate new blocks to hold incoming writes. But you very well may not be running stock settings so you'd need to look at your config.


 * This default behavior is the safest at the expense of time and writes, because the data is never not committed-to-fs. Either the original file contents are committed, or the new contents are. The problems with moving the new file over the original are indicated:


 * Also another unmentioned thing that can surprise people not familiar with Unix file handling: if a file is unlinked (deleted, or clobbered by having something moved over), if any processes have it opened via an open file descriptor the old file continues to exist as long as an open fd to it remains. The kernel doesn't consider a file inode "free" until there are no links to it, and links from open fds count. This also is how files can have multiple names via hard links. In Unix filesystems a filename is just an entry in a directory structure that points to an inode. You can have any arbitrary number of these pointing to the same inode. (But that only makes any sense within the same fs, which is why hard links can't exist between different filesystems.)


 * This means a process with an open file that gets unlinked never sees the open file change for it, unless it somehow monitors the file pathname for updates, whether by periodically polling it, or with facilities like inotify. The file data is all still there. See for instance the behavior of the often-handy tail -f, and how it by default follows its open fd and not a filename.


 * This behavior is frequently taken advantage of: install(1) by default does exactly this, moving the new file over the existing one. This means any in-use versions of libraries and executables continue to exist for the processes using them, if a new version is installed. And this is why updating Unix systems is often more convenient than systems like Windows, which lock open files, thus requiring the system be shut down to touch OS files. If libc.so gets overwritten, every existing process with it in use continues to see the same file, unchanged. Some programs also have used a trick of opening a file and then immediately deleting it, to create a temporary file that only that process can see or access, which is then freed automatically when the process terminates.


 * Now, under your scenario of a crash while writing: vim as mentioned first truncates the destination file. There's no risk of corrupted files then, but only data that got written to disk will be there—but the old backup file will be present too. However modern filesystems tend to be fairly resistant to file corruption, with innovations such as journaling. The purpose of these is to ensure the fs is always in a consistent state. This means it's not as "dangerous" to seek through a file and write to it as in the bad old days.


 * In your scenario of a system crash during a write call, on remount the fs would replay its journal and ensure every journaled transaction got committed, and if not unwind it to get the fs back to a consistent state. Note however that modern OSes also cache writes in memory to improve performance. write returns to the process immediately, and the kernel then handles writing out the data to disk in the background. But this means writes aren't actually hitting the disk when the process "thinks" they are; to force that behavior you have to choose to use blocking I/O, or call functions like fsync to force writes to be flushed to disk. Anyway the smart thing is to always make lots of backups. Never count on luck. --Slowking Man (talk) 23:22, 11 January 2024 (UTC)


 * Some very good points and thanks for the details of vim's operation, I'm afraid last time I studied the editor in that much detail it was called vi! You mention the unlinking, this used to be a common FORTRAN technique when handling very large datasets.  Programmes could restrict their use of memory by moving data from memory to a scratch file and back as required, effectively paging it under program control.  Careful programming could optimise the process and reduce disk I/O and "page faults".  FYI (and for readers generally) the issue of writes occurring asynchronously is exasperated when you have physical RAID systems, and more so if they are configured as a SAN.  I/O "completes" even for blocking I/O and fsync when the data reaches the RAID controller, not the spindles, hence the need for battery backup in such large systems. Martin of Sheffield (talk) 09:24, 12 January 2024 (UTC)
 * I am not up to date with my MS Windows so the latest management settings may have changed. In olden times there was a Device Manager option 'Enable write caching on the disk'. This changes the mode of operation on the disk itself. I have used RAID, both on the motherboard (BIOS) and extension cards (own configuration software) and both have equivalent controls. Removable USB likewise. Switching off the cache is rarely done as the performance hit is considerable. NTFS has a concept of a 'Clean' or 'Dirty' disk depending on incomplete writes. 'Lazy writes' in MS Windows also complicates the situation.
 * As the normal sequence is (a) Issue final write to (new) file (b) get confirmation of write received by HDD/RAID (importantly when cache is in use this is NOT YET data written to surface) (c) issue delete of old file. The final delete is queued by the HDD/RAID and so should get take place after the earlier queued write operation completes along with the update to the directory record and the space map. The disk logic will attempt to minimize head movement and so will not break into the write to go elsewhere to do the delete of the old version of the file.
 * Having previously written device drivers (interrupt driven), there is latency inherent in the flow of commands issued and confirmations received which effectively slows the sequence down on a file by file basis.
 * On my own PCs I routinely have 6 HDD and by arranging files to minimize head movement I often get close to the bandwidth of the HDD interface. I keep the Performance monitor active for 'Disk Bytes/sec' and can see disk problems as they arise. BlueWren0123 (talk) 09:17, 13 January 2024 (UTC)
 * The discussion so far has been on *NIX systems, so Windows/NTFS is off on a slight tangent. I think also there is a matter of scale.  You talk about MoBos and hardware-assisted software RAID (aka "Fake RAID").  With a full external RAID controller there is no easy way to get confirmation of the write reaching the surface.  The node writes over the medium (fibre, ether) to the RAID controller(s) which accept the data into their cache and signal completion.  The node has no further interest in the operation of the RAID controller and any reads to check that the data is there will be satisfied from the cache.  The RAID controller will move from cache to disk in its own time.  Remember further that the file will not lie in a linear strip on a single disk, but may be distributed over something like a 10-spindle RAID6.  Add in the fact that a NAS system is servicing dozens or hundreds of nodes. Martin of Sheffield (talk) 10:16, 13 January 2024 (UTC)