Wikipedia:Reference desk/Archives/Computing/2021 November 11

= November 11 =

Work hours lost due to software redesign
Whenever a new version of a software product comes out, I often find that I and my colleagues end up wasting a lot of time trying to work out how to do things we could easily and quickly do in the old version. Things that now require a different procedure, or which are accessed from a different menu, or now have a different name, or features/functions that turn out to have been removed, or now behave differently. This tends to be worst with things like new versions of operating systems (especially upgrading from Windows 7), or major interface redesigns (e.g. the introduction of the ribbon menus in Office 2007), but occurs with most software from most companies. (And that's when the new version works correctly, not counting new bugs, or compatibility problems). It occurred to me that this problem, multiplied across all business and across the world, must result in an absolutely staggering amount of lost time (and hence money). Has any analysis been done to actually quantify this? (Google searches have turned up various articles about time lost due to more specific problems like slow infrastructure, or excessive e-mails, or software bugs generally, but I can't find anything specifically about time lost due to "working as intended" changes). Iapetus (talk) 11:56, 11 November 2021 (UTC)

Inverse of function fails due to loss of precision
I have the following two functions, both of which are inverses of eachother:

Unfortunately for all values larger than say 36 or so logit(sigmoid(x)) simply returns "INF". Is there any way to prevent such an extreme loss of precision? Earl of Arundel (talk) 13:25, 11 November 2021 (UTC)
 * Not naively, I would say. 1+exp(-37) seems to be equal to 1 in IEEE 64 bit floating point arithmetic. That is not surprising - IEEE doubles have 52 bits for the mantissa, so the smallest value it can represent with exponent 1 is about 1/2**52 (if you find an off-by-one error, you can keep it ;-). You are working with base e, so 1/e**37 is close to 1/2**52. If you add that to 1, it has to be rounded off. The question is: Why would you want to use it? The sigmoid function is often used in machine learning, but everything interesting happens between -10 and 10 - outside this range, the function is essentially 0 or 1, and the gradient is essentially zero. --Stephan Schulz (talk) 19:37, 11 November 2021 (UTC)
 * I was actually referring to functions which maps real numbers on to a finite range. Suppose for example you had an 8-bit pixel value and wanted to map it onto the interval {0, 1}. I was just hoping that process could be invertible such that the original (in this case 8-bit) value could be reliably restored within some "reasonable" margin of error, regardless the magnitude of x. Earl of Arundel (talk) 21:07, 11 November 2021 (UTC)
 * Well, with 64 bit datatypes, you only have 2**64 different values. But you can use a sigmoid function that is less steep. See  for a selection. --Stephan Schulz (talk) 21:35, 11 November 2021 (UTC)
 * Thanks! It looks like this set of functions should work reasonably well:

Earl of Arundel (talk) 14:14, 12 November 2021 (UTC)
 * If two different values map to the same result, there is no hope the input can be restored from the output. Since the expressions  and   evaluate to the same value, namely $1$, no function can undo   for arguments that high. Let $M$ be the highest argument in which you are interested. You want some monotonic function $σ$ for which $σ(x) → 1$ as $x → ∞$ to have an inverse $λ$ with implementations for which $λ(σ(x)) ≈ x$ provided that $abs(x)) ≤ M$. This can only be possible if $σ(M) < 1$ by a non-negligible amount.  --Lambiam 22:43, 11 November 2021 (UTC)
 * The new set of functions I am using (see above) do seem to keep the error down to an acceptable level. But of course the input values in this case are neither going to be very small nor very large. If it ever tends toward either of those extremes then as you point out there will likely be too much loss of precision. Earl of Arundel (talk) 14:14, 12 November 2021 (UTC)
 * As
 * $$logit(sigmoid(x))=x$$
 * you should use this expression instead. :) Ruslik_ Zero 20:22, 11 November 2021 (UTC)
 * Ah, right... Earl of Arundel (talk) 21:07, 11 November 2021 (UTC)
 * I guess I'm not understanding something. Why would you want to map an 8-bit value to {0, 1} and then retrieve it?  Why not keep the 8-bit value?  Bubba73 You talkin' to me? 04:11, 12 November 2021 (UTC)
 * Well that's a fair question. And to be quite honest "at the end of the day" I may not even need the process to be reversible. It just seemed like a feature that might be useful I suppose... Earl of Arundel (talk) 14:14, 12 November 2021 (UTC)
 * I'm still not understanding. If you just want to map an 8-bit number to [0,1), why not just divide it by 256?  Bubba73 You talkin' to me? 06:33, 13 November 2021 (UTC)
 * This kind of approach just allows you to use the reuse certain algorithms with different ranges of values. For example say you wanted to do the same thing with signed 16-bit samples. No problem, just map the values using one of these uniform transforms and most of the hard work is done. Now is it really worth all of the complexity? Well that just depends on personal preference I suppose... Earl of Arundel (talk) 20:32, 13 November 2021 (UTC)

So the squash/unsquash set of functions don't really work very well with negative numbers. Luckily the problem is easily fixed. In cases where the input is negative we basically just need to reflect then invert the output of the function.

The result can be seen in the following graph.



In blue is the standard sigmoid. The so-called squash function is defined by the orange curve for all negative $$x$$, the red one for all positive values. The result is a very nice smooth curve across the interval {-1, 1}! Earl of Arundel (talk) 17:23, 12 November 2021 (UTC)
 * If you look carefully at the formula describing the graph in the link I gave, they use the absolute value of the input ($$|x|$$) to overcome that problem. --Stephan Schulz (talk) 21:54, 12 November 2021 (UTC)
 * The squish function of the following pair shares the following properties with the logistic function that was your original sigmoid. (1) it has range $$(0,1)$$; (2) denoting it by $$S$$, it satisfies the functional equation $$S(x)=1-S(-x);$$ (3) it is analytic.
 * --Lambiam 21:20, 12 November 2021 (UTC)
 * --Lambiam 21:20, 12 November 2021 (UTC)

What an elegant solution! Just absolutely beautiful to be quite honest. But if I may I ask, why it is so important to satisfy the $$S(x)=1-S(-x);$$ criterion? Earl of Arundel (talk) 22:00, 12 November 2021 (UTC)


 * It implies that the local metric contraction going from input to output, given by $$S'\!(x),$$ is solely dependent on the absolute value of $$x.$$ Whether that is a desirable property depends on the application. The specific form with a $$1$$ is most meaningful when the range of $$S$$ is $$(0,1).$$ This is helpful to scale to a fixed number of bits; the integer $$\lfloor S(x)2^n\rfloor$$ will fit in $$n$$ bits. Again, whether that is important depends on the application. --Lambiam 18:39, 13 November 2021 (UTC)


 * I think I see what you mean now. The sigmoid functions are rather smooth and flat, whereas the so-called `squash` function isn't so much. (It is also a piecewise construction as you've pointed out and so not analytical either.) The `squish` function on the other hand is likely the most optimal mapping possible, no? It's interesting just how similar $$1/(1 - x + sqrt(1 + x * x))$$ and $$1/(1 + exp(-value))$$ behave too. It's almost as if they were woven together or something. Very mysterious! Earl of Arundel (talk) 20:06, 13 November 2021 (UTC)
 * Both take the form $$1/(1+f(-x)),$$ where function $$f$$ is strictly monotonic with $$f(x)\to\infty$$ as $$x\to\infty,$$ and, moreover, enjoys the special property that $$f(x)f(-x)=1.$$ If you plot these two functions, respectively producing sigmoid and squish, they are not so similar; in one case you get the familar exponential curve; in the other case one branch of a hyperbola whose axes are given by the lines $$y=0$$ and $$y=2x.$$ There are many other functions that will do the job. Let $$g$$ be any monotonic odd function with $$g(x)\to\infty$$ as $$x\to\infty$$ – "odd" in the technical sense that $$g(-x)=-g(x).$$ Then take $$f(x)=g(x)+\sqrt{1+g(x)^2},$$ et voilà!, this function $$f$$ has the properties we need. The choice $$g(x)=x$$ is just the simplest possible. If you take $$g(x)=\sinh x,$$ the hyperbolic sine, you get ... (I'll let you work that one out). --Lambiam 00:43, 14 November 2021 (UTC)
 * Nice! So far it's been very interesting just trying different functions with your amazing equation.


 * I haven't gotten the inverse worked out quite yet but it's likely just a simple mistake in my current formulation. Anyway if anyone has any recommendations for further reading on this subject I would love to hear about it! Earl of Arundel (talk) 15:38, 18 November 2021 (UTC)