Wikipedia:Reference desk/Archives/Computing/2023 January 11

= January 11 =

A password criteria conundrum
Most of us are pretty familiar with the seemingly silly requirements foisted upon users with respect to password selection. "Must start with a capital letter and contain one or more of the following symbols..." kind of thing. The problem with that sort of approach however is that it is somewhat arbitrary. Why not just calculate the bits-of-entropy in the password directly? OK, that's pretty straightforward: ENTROPY = log2(SIZE_OF_ALPHABET^LENGTH_OF_PASSWORD). But now the question becomes, which alphabet are we talking about? ASCII contains only 94 printable characters, while UTF-8 includes over one million code-points. If we were to select the latter, our calculation of password entropy could be easily result in a over-estimation, say, if the user nonetheless chooses their password from a pool of ASCII characters. The only "fair" way to resolve the issue would be to replace SIZE_OF_ALPHABET with LARGEST_CODEPOINT_IN_PASSWORD. It does lead to a bit more conservative estimate of entropy perhaps, but then again, I would rather err on that side of things rather than be too lax. Any suggestions, or even better, resources which address that specific issue? Earl of Arundel (talk) 17:19, 11 January 2023 (UTC)
 * Most people aren't all that familiar with the concepts of information theory to even grok what Entropy (information theory) means. Instructions for the general public need to be written so they can select an appropriately secure password.  Telling someone "You should choose a password that maximizes the entropy..." you've lost 99.999% of your audience.  Instead, you want people to choose a password that draws from the largest possible character set.  Ok, if you tell people "You should choose a password that draws from the largest possible character set" you've now got maybe 99.9% of your audience not understanding you.  An improvement by two orders of magnitude, but still not nearly good enough.  Most people are still going to ask "What do you mean by character set?"  So now you need to dumb your directions down so that the vast majority of the people don't screw it up.  You know, you want to say "Make sure your password includes every part of the keyboard, not just lowercase letters.  You need to have lowercase and uppercase letters, numbers, other symbols, etc. That way, you make it harder to guess your password by brute force"  Which is pretty much the instructions we give people when they choose a password "Be sure your password contains uppercase letters, lowercase letters, numbers, and symbols".  -- Jayron 32 18:24, 11 January 2023 (UTC)
 * Right, well of course. To be clear, I am NOT proposing here some sort of ad hoc system wherein the user must jump through hoops of technical jargon in order to select a password. Rather, what I am suggesting is a somewhat more rigorous (and more importantly, automated) approach toward that end. This resulting interface could literally be as simple as displaying a "progress bar" informing the user how well their current password measures up (and once it passes from "red" to "green", it is deemed acceptable). Being able to gauge things in real time makes for an easy interface for pretty much anyone to use. Earl of Arundel (talk) 20:12, 11 January 2023 (UTC)
 * Except you still have to give users instructions on how to make the progress bar go from red to green. You just frustrate users if they enter their chosen password and the bar in question stays "red", and then you give them no practical guidance on how to select a good password.  -- Jayron 32 11:55, 12 January 2023 (UTC)
 * It sounds more confusing than it actually is. The key thing to remember here is that the progress bar provides a real-time feedback loop. Here is a fairly simple example using what I call a "strict entropy" measure. Instead of using the length of the password itself, we take the natural logarithm of it and multiply that with the number of unique characters we only count the number of unique characters, which yields a conservative but fair measure.


 * As far as the site administrator is concerned, it just boils down to choosing a "reasonable" bits-of-entropy setting. Earl of Arundel (talk) 16:24, 12 January 2023 (UTC)
 * I think we're talking past each other here. You're talking about a behind-the-scenes way to assess password strength, and I'm talking about a forward-facing user interface issue to instruct users on how to choose a password.  I recognize the feedback benefit so users can see how "strong" their password is as they type it in real time.  This is a feature which I have seen implemented many times; so I am quite familiar with it.  It still needs to have plain-language instructions for the user as to what they need to do to meet the algorithm's requirement for password strength.  -- Jayron 32 16:28, 12 January 2023 (UTC)
 * Well that is a whole topic unto itself, isn't it? Some people have a natural grasp of what a "good" password should be, while others are completely clueless and choose from a very predictable set. Perhaps the approach of presenting suggestions along with the prompt is the best that can be done. Because rigid rules of selection only defeat the whole purpose, insomuch as a password should be easy for the user to remember. Earl of Arundel (talk) 17:13, 12 January 2023 (UTC)
 * Possibly it would be more accurate to define SIZE_OF_ALPHABET as the sum of the sizes of all the Unicode blocks wherefrom characters are taken. Under this rule, the alphabet of the password  would (if this site is trustworthy) be the union of Basic Latin, Latin-1 Supplement, Latin Extended-A, Latin Extended-B, Miscellaneous Symbols, and Emoticons.On the other hand, it may not be particularly worthwhile to expand the range of characters used for password generation beyond ASCII. Shells-shells (talk) 19:11, 11 January 2023 (UTC)
 * Reading over those Stack Exchange posts, the arguments against UTF don't seem very compelling. For one thing, many of the comments seem to be referring to storing the password itself, which happens to be one of the worst ideas ever, to begin with. Others suggested that "some sites may not be able to handle UTF-8". That is simply not a very well-founded. Pretty much all browsers and operating systems can process UTF on that level, at least (whereas more specialized applications may of course require a finer level of manipulation of the format). Regarding your suggesting, I did consider that. But as I said before, if we assume that the user is drawing from those other character sets, then the entropy calculation renders an arguably inaccurate estimation. (There are other things that I have conveniently ommited here. For example, some characters appear more frequently than others, on average, and as such could be considered to contribute less useful entropy. But I am getting ahead of myself here.) Earl of Arundel (talk) 20:12, 11 January 2023 (UTC)
 * Assuming Unicode, LARGEST_CODEPOINT_IN_PASSWORD for the password "一二三" is 20108, and log2&thinsp;(201083) = 3&thinsp;&times;&thinsp;log2&thinsp;20108 = 42.8864. I think this password is much weaker than " ", even though the latter only scores 7&thinsp;&times;&thinsp;log2&thinsp;65 = 42.1566. Using UTF-8, the largest numerical value of any byte is 239 (binary ), but these values are not at all uniformly distributed; most are under 192. In practice, using non-ASCII character sets, you gain no more than one bit of entropy per byte.  --Lambiam 18:59, 12 January 2023 (UTC)
 * OK, so what would you suggest for the ALPHABET_SIZE in such cases where UTF-8 is allowed? Earl of Arundel (talk) 15:39, 13 January 2023 (UTC)
 * I agree that the value of the largest codepoint in the password is not relevant to password strength. That metric would imply that using a "Z" in a password is better than using an "A", which is nonsense. Also the number of unique codepoints in the password would seem to be only marginally related to password strength. A common word that happens to have many different letters, like "american" wouldn't necessarily be a strong password, since it would fall to a simple dictionary lookup attack, while a string like "gwrgbg" would be stronger even though it is shorter and has repeated letters. CodeTalker (talk) 21:21, 12 January 2023 (UTC)
 * Fair enough. But then again, short of using a dictionary lookup of common words or what have you, I just can't see a practical solution to account for common versus uncommon/nonexistent "words". The approach of only counting unique codepoints is obviously not a perfect metric, but it does strike a decent balance which favors neither ASCII nor UTF. It also doesn't seem to "punish" the user too much for duplicate codepoints. (The entropy still grows at an appreciable rate with respect to the overall password length.) Earl of Arundel (talk) 15:39, 13 January 2023 (UTC)
 * The requirement of at least one upper-case letter, one lower-case letter and one non-alphabetic character rules out virtually all dictionary words. One Unicode-based measure might be $$N \log_2(H{-}L{+}1),$$ in which $$N$$ stands for the number of characters, $$L$$ for the lowest codepoint and $$H$$ for the highest codepoint. Note that this refers to Unicode itself, not to the UTF-8 encoding. --Lambiam 02:31, 14 January 2023 (UTC)
 * The double edge there being that the more requirements, the harder it can be for the user to actually remember their password. It's extremely frustrating, and it just defeats the purpose altogether. At the same time, I do advocate educating others about the importance and also possible techniques for constructing more secure passwords. And yes, I agree, the unicode-based measure which you've suggested is a useful one. It strikes a good balance, yielding a reasonably accurate approximation of raw entropy. Earl of Arundel (talk) 15:27, 14 January 2023 (UTC)
 * This is a social effect, where assessing entropy depends on correct assumptions about people's habits. The guessability is not the same as the entropy. "dog" is a high entropy password if you consider that to be three characters taken from the whole Unicode character set. The entropy of XKCD-style passwords is reduced if users, by habit, attempt to make grammatical sentence fragments out of the four words and if the attackers take this into account. Tools for password cracking use dictionary attacks, and the dictionary is an observation of cultural habits (common words and spellings). If attackers were for some reason using a dictionary in a dead language, that would make everybody's passwords stronger. A password becomes more guessable if the attack method considers common substitutions of numbers for letters, and the tendency to use small numbers, or numbers representing recent years (year of birth), or historically significant years, and so on. Yet that approach makes passwords less guessable if the users are aliens and don't have any of those ideas. So password strength is enhanced by finding a social way to get users to choose somewhat randomly from a genuinely large character set or word set, bearing in mind that they will be lazy and will find ways to reduce the randomness and the size of the set, and by getting them to do something original and different from currently popular styles of passwords. Card Zero  (talk) 02:26, 14 January 2023 (UTC)
 * Well said. It's a shame that there isn't more education about it, too. People often fair poorly when it comes to choosing a good password. It's almost a skill, I think. Regarding dictionary attacks, at the very extreme one approach might be to actually maintain a database of passwords which have been leaked over the years. Then, whenever a password is being set/reset, the overall score would effectively fall to zero (or what have you) whenever the selection is found to be "on the list". Earl of Arundel (talk) 15:52, 14 January 2023 (UTC)
 * Might I suggest xkcd password strength? NadVolum (talk) 23:52, 13 January 2023 (UTC)
 * The question is about restrictions enforced by sites. It is not easy to construct an automatically enforceable criterion for deciding whether a four-word passphrase, say "easy remember tough guess", or "ok here we go", was constructed by picking "four random common words". --Lambiam 02:44, 14 January 2023 (UTC)
 * Here's an idea. What if we create a database of all known "words" (using quotes here because I mean even things like surnames, place names, etc) and simply assign each one a unique number? An extended unicode codepoint, if you will. That would reduce the calculated entropy of passphrase by orders of magnitude. It does not address the issue of dictionary attacks, but maybe that in conjunction with a lookup against a list of leaked passwords, as I mentioned in another post. Earl of Arundel (talk) 16:04, 14 January 2023 (UTC)
 * There is no need to assign codes to the list entries; just knowing its length is enough. At this moment (02:07, 15 January 2023 (UTC)) Wiktionary reports having 702,377 English lemmas. Using, xkcd-style, four random choices of the whole list results in an entropy of $$4\log_2 702377=77.7$$ bits. --Lambiam 02:10, 15 January 2023 (UTC)
 * That makes sense. And if it isn't one of those, it should be considered "gibberish", thus its contribution towards overall entropy would be drawn from its individual codepoints. Earl of Arundel (talk) 19:44, 15 January 2023 (UTC)
 * If the password has to contain letters numbers doodles sign language and squirrel noises like in the dilbert cartoon then people will have to write them down. They'll probably write them down anyway. Or even worse they'll use exactly the same combo of squirrel noises for every site. NadVolum (talk) 20:58, 14 January 2023 (UTC)
 * Hopefully people who use strong passwords will also use a password manager rather than a piece of paper to keep track of them. CodeTalker (talk) 03:03, 15 January 2023 (UTC)
 * What's wrong with writing them on the bottom of the keyboard? Don't laugh, I've actually seen that done! Martin of Sheffield (talk) 10:19, 15 January 2023 (UTC)
 * And what happens if the computer has a problem? Either you've lost all your passwords or you've stored them in the cloud. Maybe they're safe there. maybe the security services can read everything you do, maybe they'll be dumped in some leak. At least you can ask it to generate meaningless rows of characters to use as passwords if that's your desire. I think there's quite a bit to be said for writing them down but the bottom of the keyboardis perhaps not the best place. NadVolum (talk) 22:15, 15 January 2023 (UTC)
 * Most commercial password managers store an encrypted copy of passwords on a server, so they can't be lost when if your computer crashes, and they can't be read by anyone except you, who holds the master key. Personally, I use a tool I wrote myself that generates each password from a cryptographic hash of the website address plus a master key, so the passwords are not stored anywhere. CodeTalker (talk) 02:00, 16 January 2023 (UTC)
 * — Or you retrieve the files from your local backup, eg offline external hard disk, that you personally control. Or possibly the off-site backup, offline external hard disk, that you personally control. (The same way you recover all the other important files when the computer has a problem.) Mitch Ames (talk) 02:34, 18 January 2023 (UTC)

How to enhance the font I type?
Well, I am at the age when people suffer from macular degeneration. I cannot see the default font, I think it is 11.. My preferred font is 18 or even 20. I want to have a mechanism of permanent change. Is there a program like this? AboutFace 22 (talk) 19:21, 11 January 2023 (UTC)
 * Depends on the device. On my Android phone, under "Settings:Display:Advanced" there is an option to change font sizes, and it seems to do so for all programs on the phone.  -- Jayron 32 19:57, 11 January 2023 (UTC)
 * In Firefox, you can go to Settings → General, scroll to Fonts and select a default size. When you go to advanced settings, you can also set a minimum size and whether websites are allowed to override the default. If you don't allow overrides, some websites not conforming to basic design principles may get rather ugly. I think most web browsers have similar settings.
 * The idea behind html is that the web page tells the logical structure, like what is paragraph text (which must have default font size) and what is a head (which must have larger font size). The browser then decides on the details. There are ways for the website builders to override and as always, there are some (not so) clever website builders who decide to do so and build really beautiful sites that turn rather ugly or even unusable on someone else's device. PiusImpavidus (talk) 09:00, 12 January 2023 (UTC)
 * @PiusImpavidus, thank you. I will try it. 107.191.1.90 (talk) 20:38, 14 January 2023 (UTC)