Wikipedia:Reference desk/Archives/Computing/2021 March 23

= March 23 =

Minimizing SSD writes
Dear Wikipedians:

Recently I've learned the fact that SSDs can be read from an infinite number of times but only has a finite number of write-erase cycles. So that got me thinking:

Modern every-day computer use scenario for most of us involving turning on our computer, fire up Chrome, and start browsing the web. None of these activities really need to do any writing-to to the SSD, especially if one has gotten a large amount of RAM. Since the web pages that are browsed are all ephemeral and discarded upon the end of a surfing session. The changes we actually do do, (such as sending and composing emails), the storage is done remotely, on the cloud.

And that got me thinking further to cloud-based computers such as the Google Chromebook, since all storage in that case is on the cloud, it is conceivable to treat the SSD inside a Google Chromebook as almost a read-only "glorified BIOS-ROM" kind of device used to boot up the Chromebook, and once it's running, all communications and storage changes are done to the cloud, heck even the operating system settings and tweaks could all be stored in the cloud?

So are there OSes, such as particular distributions of Linux, that are designed with this "minimization of SSD write-erase cyles" in mind?

172.97.224.131 (talk) 13:49, 23 March 2021 (UTC)


 * This is actually a pretty interesting question for me. I'm not sure about any OSes that have this as a function, though. RAM is a picky thing, and the vast majority of current OSes don't like dealing with the potential side effects of unsaved information stored in RAM. It would be interesting if anybody else can bring up an OS that functions this way, though. EpicPupper 20:27, 23 March 2021 (UTC)


 * Not sure if it is what you meant, but Linux can utilize a so called "ram disk", which means that it gets an image of its OS and creates a file system purely in RAM. You would then not need to have a disk attached. Some cluster-computers use a configuration like this, so that all compute nodes have the same OS running, usually combined with a network share on the head node. A disadvantage is that any changes in the RAM filesystem are lost at reboot. I believe that during install/boot (either one of them or both, not sure) Linux starts a VMlinuz and initramfs to boot the entire OS. Replacing this initramfs with an image of the OS would boot the image instead. Rmvandijk (talk) 10:47, 24 March 2021 (UTC)


 * Back in the 1980s, it used to be possible to use a home computer without any kind of storage media. You could, for example, start a Commodore 64 up, write a program in BASIC, run it and enjoy your own creation, without the computer having any storage media whatsoever attached. The downside was, of course, once you powered off your computer, your work was lost forever. J I P  &#124; Talk 16:03, 24 March 2021 (UTC)
 * 1980s computers typically had the BASIC/operating system fixed in ROM, so that played the part of modern read-only disks. All of them that I handled had a tape interface which emitted audio signals to a cassette player, and or course subsequently loaded them back.  Coming more up to date, diskless machines in clusters typically have a disk image downloaded to them during bootstrap.  The usual procedure is a mix of PXE, BOOTP, and TFTP to get the ramdisk and initramfs into the node.  This is usually only for the core operating system, the nodes will then load user disks remotely, either by some sort of NAS, or better, a SAN.  Getting back to the original question, in domestic use SSDs are unlikely to ever run out of writes.  See how long do solid state drives really last, particularly the final paragraph or two.  SSDs have improved dramatically in the last 10 years. Martin of Sheffield (talk) 16:32, 24 March 2021 (UTC)
 * Back up, back up, back up, all the time, to multiple devices, cloud, DVD, USB dongle, external HDD, wotevs. Nothing lasts for ever. Every generation has its dick pulled by charlatans and mountebanks (aka IT hardware vendor marketing depts.) seeking a quick profit. Everything fails in the end, even the most expensive project. Carry on in the sure knowledge that something will certainly go horribly wrong, and it's gonna be you at the thin end of the wedge. Back up, back up, back up. PS Test your backups. >MinorProphet (talk) 18:31, 24 March 2021 (UTC)
 * Yeah, I'd never trust an SSD on its own. With HDD at least you can most of the time get advance notice of failure. 93.136.7.84 (talk) 12:20, 25 March 2021 (UTC)


 * Chromebook programs (and Google Chrome too) write a ton to the SSD. The fact that the settings can all be stored online doesn't mean that not storing them locally can be efficient. However none of the usual daily browsing and light MS Word usage and such will ever come close to exhausting the capacities of a modern SSD. 93.136.7.84 (talk) 12:20, 25 March 2021 (UTC)


 * Frame challenge: browsers do use disk I/O to work. Compared how much you download, browsing history, cookies are negligible, but browser cache probably is not. Thread hijack alert: I just thought up that, because residential internet speeds grow much faster than residential computer processing power / disk access speeds, there may be a point where client-side caching has a negative tradeoff for the client and hence should be scrapped completely. I suspect at least one person already has had that insight, but was not lucky in my DuckDuckGo search. If someone has a good ref I would be interested in further reading.
 * Even if you perform only operations that could live in the RAM, they might involve disk I/O: see virtual memory. I have no idea is browser use triggers this often or not under what OS etc. though. Tigraan Click here to contact me 17:04, 26 March 2021 (UTC)

Formal Complexity of Various types of English strings
I'm writing a paper on Knowledge Graphs. I developed a knowledge graph for data about the Covid-19 pandemic. The hardest part of the project was to develop transformations to transform strings into objects and property values (in the Web Ontology Language). What I observed is that the best way to do this was to first do the more complex or unusual transformations. For example, we had a property for each patient called suspected_Reason which was the reason the patient was thought to have caught the virus. It could have all sorts of strings but they tended to mostly fall into certain kinds of patterns. This was good because we had many thousands of patients so we obviously couldn't do the strings on a one-off basis but had to write pattern matching functions (using SPARQL and Lisp... I'm old and Lisp is still my language of choice). E.g., a string such as String1: "Exposure to P123-P127" was complex because we had to match the regular expression pattern to find the substring "P123-127", transform those strings to integers (which were patient IDs) and then iterate through 123, 124,... 127. A simpler example was String2: "Exposure to P123, P345, P456". In this case all we had to do was use a SPARQL regex transformation that matched for digits and just find each continuous string of digits (e.g., 123, 345,...) and match those to the patients with the appropriate IDs. What I want to do is to write up what we did and to describe this in a formal manner. By formal I mean either where the expression falls in the Chomsky language hierarchy (all the strings are either regular expressions or context free grammars) or in the case where two expressions fall at the same level (e.g., are both regular expressions) the number of states in the recognizer. My hypothesis is that what we did (and I think this will generalize to other transformations for knowledge graphs in general) was do the more complex transformations first, delete those strings from the set that need to be processed, and then the less complex and more general transformations latter. I.e., parse examples that match String1 before those that match String2 because the recognizer for String2 would also falsely recognize String1 and handle it incorrectly. What I'm wondering is, is there a way to take an English string and easily determine how complex it is using this kind of a model? (Or if there are other models for complexity I should consider I'm open to that) I'm pretty sure String1 is a Context Free grammar (requiring a push-down automata to recognize) because it requires memory where as String2 is a Regular Expression that could be processed by an FSA. But while I can eye ball it I would like some more rigorous metric to state with authority that examples that follow the pattern of String1 are more complex than String2. I've looked at the article on the Chomsky hierarchy and while it is great, I didn't get the answer I needed from it, quite possibly because I just missed something, it's been a while since I worked with these concepts. --MadScientistX11 (talk) 19:37, 23 March 2021 (UTC)
 * I may be mistaken, but it seems to me that this is a rather common phenomenon that is not particular to knowledge graphs or language hierarchies. The more specific rules often cover the exceptions, the cases where the more general, simple rules fail. Take for example the following three competing transformations from a list of transformations for forming plurals of English nouns from the singular form:
 * → s (e.g. rose → roses)
 * s → ses (e.g. kiss → kisses)
 * lysis → lyses (e.g. analysis → analyses)
 * Transformation 1 does the job for rose but not for kiss or analysis. Transformation 2 works for kiss but not for analysis. Clearly, in applying such transformations, the more specific rules should take precedence over the more general ones. The more specific a rule is, the more complex it tends to be. --Lambiam 10:44, 24 March 2021 (UTC)
 * That has been the response I've gotten from others on different forums or personal communication that I respect and who I typically pick their brains on questions like this. So I'm not sure if it is worth writing a paper. But, I'm still not completely sure, I agree it is common sense and when I think of other rule based systems and transformations I've done it seems like this was always the case. What I'm wondering is if there is a way to formalize this rather than just saying it is common sense/best practice. Thanks for the feedback. --MadScientistX11 (talk) 18:50, 24 March 2021 (UTC)