Wikipedia:Reference desk/Archives/Mathematics/2021 January 3

= January 3 =

Generating numbers with a modulus which is not a power of two?
Suppose you had a stream of bits from which to construct some numbers using a modulus of say 72. Does the fact that the modulus is not a power of two introduce any bias in the result? For some reason it just seems that it would. Earl of Arundel (talk) 14:53, 3 January 2021 (UTC)

Here's my attempt to remove any possible bias. (Although I'm not sure it's even necessary at this point!)

Earl of Arundel (talk) 15:05, 3 January 2021 (UTC)


 * This does not solve the issue for long streams. If you have bits galore, as in a cheaply obtained infinite stream, an easy solution is to chop up the stream into parts of length $$n$$ such that $$m\le 2^n,$$ where $$m$$ stands for the modulus, interpreting these as binary numbers and discarding all outside the range from $$0$$ to $$m-1$$. For a modulus of $$72$$ you should choose $$n=7,$$ which means that about 44% of the bits are thrown away. If you have to pay dearly for each bit, this is not a good approach. Instead, think of the problem as having to convert an infinitely expanded fraction of a value in the unit interval given in some base $$a$$ (here with $$a=2$$) to one in base $$b$$ (here with $$b=m$$). For example, suppose you are given the stream $$00100100001111110110101010...,$$ (which happens to be the fractional part of $$\pi$$ expressed as a binary fraction, instead of the usual decimal $$14159265...$$). If the desired modulus is $$2=3$$, this should now become $$0102110122220102...,$$ again the same value, but this time ternary. Using this example, without looking ahead we know that the binary fraction is contained in the closed interval from $$0.00100100001111110110101010\overline{0}$$ to $$0.00100100001111110110101010\overline{1}$$, in which an overlined digit denotes an infinite repetition. The corresponding respective ternary fractions are $$0.\underline{0102110122220102}1002...$$ and $$0.\underline{0102110122220102}2001...,$$ in which the common initial part is underlined. So after having seen an initial segment $$00100100001111110110101010$$ of the input stream, you can already produce the initial segment $$0102110122220102$$ of the output stream. This method uses all digits from the input; no information is discarded. There is a theoretical risk that you may have to wait indefinitely before a next output digit becomes certain. If the bitstream begins with $$010101...,$$ with an interminably repeated alteration of $$0$$ and $$1$$, you are kept in suspension about whether to produce a ternary $$0$$ or $$1$$ until the pattern is broken. For a random input stream this is not a real problem. You need to be prepared, though, to compute with very large integers. --Lambiam 16:03, 3 January 2021 (UTC)
 * There are intermediate schemes where you throw out some information, but not as much as you would be generating one number at a time. For example, if you want to generate numbers modulo 3, then interpret each 8 bit chunk as a number between 0 and 255, and if this is less than 242 then express in base 3 to get 5 numbers modulo three. You end up throwing out 1-243/256 = 5% of the bits instead of 25% as when you generate the numbers one by one. --RDBury (talk) 01:55, 4 January 2021 (UTC)
 * On average you lose a small amount more. For each 8 bits in, you obtain 5 trits output with probability 243/256 and 0 trits with probability 13/256; on the average 1215/256 = 4.74610 trits. The information content of a trit is log2 3 = 1.58496 bits. So the average information content extracted from 8 bits equals 4.74610 × 1.58496 = 7.52238 bits. The efficiency is then 7.52238/8 = 0.94030 bits, so the loss is 0.05970 bits per input bit, or almost 6%. In a general formula, for $m$ base-$a$ digits in and $n$ base $b$-digits out, where $b^{n} ≤ a^{m}$, the efficiency is
 * $$\frac{b^n\log(b^n)}{a^m\log(a^m)}.$$
 * Splitting the input up in chunks of 290 bits to get out batches of 47 base-72 digits, the efficiency is slightly above 0.99. --Lambiam 03:05, 4 January 2021 (UTC)
 * Yes, the 5% was a lower bound in the information lost, basically a raw percentage of the bits thrown out. Any computation done will reduce efficiency more. In this case it was the conversion from binary (8-bits) to ternary (5*log23=7.9 bits). You can get arbitrarily close to 100% efficiency by using larger and larger chunks, but it's a case of diminishing returns. You could also increase efficiency without increasing chink size by being a bit smarter about throwing out bits. For example if your chunk starts '11111' then you don't need to look at the remaining 3 bits to know that the number is >242, so just throw out those 5 and start over saving yourself 3 bits. Again, it's trade-off between simplicity and efficiency. --RDBury (talk) 05:49, 4 January 2021 (UTC)

Derivative of matrix
$$ \frac{d}{dt} \ln{A(t)}=A^{-1}(t)A'(t) $$

Or

$$ =A'(t)A^{-1}(t) $$

Which is true?


 * How do you define the function $$\ln$$ on matrices? You can define it as the inverse – if it exists – of function $$\exp$$ defined by a power series:
 * $$\exp A = 1 + A + \frac{A^2}{2} + \frac{A^3}{6} + \cdots~.$$
 * But many of the conventional rules for determining derivatives do not work for matrices. For example, while $$\frac{d}{dt}f(t)^3=3\left(\frac{d}{dt}f(t)\right)f(t)^2,$$ the identity $$\frac{d}{dt}A^3=3\left(\frac{d}{dt}A\right)A^2$$ is invalid. The correct rule is
 * $$\frac{d}{dt}A^3= \left(\frac{d}{dt}A\right)A^2 +A\left(\frac{d}{dt}A\right)A +A^2\left(\frac{d}{dt}A\right).$$
 * The conventional rule fails because matrix multiplication is not commutative. I think that both of these rules for the derivative of the logarithm of a matrix likewise fail to hold. --Lambiam 19:00, 3 January 2021 (UTC)


 * If A is sufficiently close to I then you can define ln(A) as (A-I)-(A-I)2/2+(A-I)3/3-(A-I)4/4 ... . There are probably other A's for which ln(A) can be reasonably defined, but that doesn't solve the basic problem here, namely that the usual rules for calculus have to be drastically recast or thrown out altogether when you don't have commutativity. Note that
 * $$\exp(\begin{bmatrix}0 & 0 \\0 & 0 \end{bmatrix}) = \exp(\begin{bmatrix}0 & 2\pi \\-2\pi & 0 \end{bmatrix}) = I.$$
 * So defining the inverse of exp on matrices is trickier than it may look at first blush. --RDBury (talk) 02:21, 4 January 2021 (UTC)


 * See here and also here. Count Iblis (talk) 08:52, 4 January 2021 (UTC)


 * The issue of defining $$\ln$$ on matrices can be bypassed by reformulating the original question as: assuming that $$A(t) = \exp B(t),$$ can we express $$\tfrac{d}{dt}B(t)$$ as a product of the inverse $$A^{{-}1}(t)$$ and the derivative $$\tfrac{d}{dt}A(t)$$ of $$A(t).$$ And the answer is, no, not in general. It is true for the special case of diagonal matrices, for which multiplication is commutative, and then either order of the multiplicands works. While not immediately related, I can report that the following identity appears to hold:
 * $$\exp(\lambda A) \exp(\mu A)=\exp((\lambda+\mu)A).$$
 * It follows that $$(\exp A)^n=\exp(n A).$$ --Lambiam 11:22, 4 January 2021 (UTC)
 * Generally $$\exp(A) \exp(B)=\exp(A+B)$$ if the operators commute. Ruslik_ Zero 08:07, 5 January 2021 (UTC)


 * Matrix logarithm covers the topic of defining the logarithm of a matrix. Like in the 1D case, there are branches, but I am not too familiar with several complex variables or the behavior of this particular multivalued function.--Jasper Deng (talk) 10:26, 7 January 2021 (UTC)
 * Now there's an idea: Look it up on Wikipedia! Okay, I feel silly now. At least we managed to cover a good chunk of the article. In any case, the article doesn't mention derivatives. It just came to me that log does define map from some open set in Cn 2 to Cn 2 so it has a derivative in that sense. From this point of view it would be a n2 by n2 matrix, or perhaps more accurately a linear map from the space of n×n matrices to itself, which may be complicated but it is defined. There's probably a way to get an expression for this, but it's not coming to me at the mement. --RDBury (talk) 00:15, 9 January 2021 (UTC)
 * It seems to me that the parameter $$t$$ of $$A(t)$$ with respect to which the derivative is taken is meant to range over $$\mathbf{R}.$$ --Lambiam 00:26, 9 January 2021 (UTC)
 * Yes, the chain rule implies that d/dt ln(A(t)) = Dln(A)(A'(t)) where Dln(A) is the derivative considered as a map from Mnn to Mnn. So knowing one is equivalent to knowing the other. Note that, by the above series for exp(A), you can write Dexp(A) = I + (adL(A)+adR(A))/2 + (adL2(A)+adL(A)adR(A)+adR(A)2)/6 +... where adL is the map U→AU and adR is the map U→UA. Similarly, I think you can get a series expression for Dln(A) when A is close to I, but it would be more complicated. I don't know if Derivative of the exponential map (linked above) is relevant here because the domain of exp in that context is a Lie algebra, it might be applicable though. --RDBury (talk) 04:23, 9 January 2021 (UTC)