Wikipedia:Reference desk/Archives/Mathematics/2018 November 11

= November 11 =

Inverse Functions
Let f be a function from R to R defined by f(x) = x^2. Find f^–1({x | 0 < x < 1}).

So I found the inverse of f(x): f^-1(x) = √x

It seems that the solution I found ( f^-1(x) = √x ) is valid for x = 0 and x = ALL positive real numbers (x ≥ 0) because the domain and co-domain are the real numbers. But not all positive real numbers are greater than 0 and less than 1. So what exactly does "find f^–1({x | 0 < x < 1})" mean?

ThunderBuggy (talk) 17:28, 11 November 2018 (UTC)


 * $$\{x\mid 0 < x < 1\}$$ is just a roundabout way of using set-builder notation to denote the open interval $$(0,1).$$ In this case, $f$ doesn't have an inverse because it's not injective.  But even if it did, the notation used is most likely asking for the preimage of the given interval under $f$.  –Deacon Vorbis (carbon &bull; videos) 17:38, 11 November 2018 (UTC)

Who needs big data?
When you already have decent sampling for your inferences. Why would more data alter the conclusion significantly?--Doroletho (talk) 20:12, 11 November 2018 (UTC)


 * If you haven’t already seen it, the article Sample size determination may be helpful. A larger sample size generally gives a narrower confidence interval, although beyond a certain point this effect gets small. A larger sample size allows the attainment of higher statistical power to reject the null hypothesis in favor of a specific alternative hypothesis. If obtaining a larger sample size can be done costlessly, a larger sample size is always better. But with costs of data collection, beyond some sample size the gains will no longer exceed the costs on the margin. Loraof (talk) 20:55, 11 November 2018 (UTC)


 * That's exactly the point: why analyze 1,000,000 samples, when the additional value after, say, 1,000 random samples would be minimal and falling? Someone analyzing megas or gigas might obtain an equivalent result as the guy analyzing teras or even pentas. I'm not just asking why more data makes sense, but why analyzing a gigantic amount of data makes sense. --Doroletho (talk) 22:26, 11 November 2018 (UTC)
 * I don't know whether you have a specific example in mind. Big data is used for many purposes, e.g. to find samples which satisfy certain requirements. If you for example want to compare the number of births on December 24 and December 25 then your data probably starts with all birthdays of the year. If you want to analyze the health effects of drinking wine then you may want to compare samples which are similar in many other ways because wine drinkers and others may have different backgrounds and lifestyles. If you want to analyze what people search in Google then there are millions of different searches. PrimeHunter (talk) 23:33, 11 November 2018 (UTC)


 * Let us take the example of political polling. If you want to know which candidate will win the next election, it is indeed sufficient to ask a couple hundred of people among millions of voters to have a good idea of the vote's repartition, assuming you managed to avoid sampling bias. That is because you are only interested in one variable (vote choice).
 * However, if you want to predict how a single person will vote, you will involve more variables. Single-variable correlations are still easy: again, a couple hundred of samples will give you sufficient support to say that older people prefer Mrs. Smith and younger people prefer Mr. Doe by a certain margin.
 * What becomes big data (in the current meaning of the term) is when you want to go further and correlate across many variables: for instance, maybe old people and women and dog-lovers prefer Mrs. Smith, but old dog-loving women actually prefer Mr. Doe. If you have a whole lot of variables to take into account, small sample sizes might cause overfitting: if you have only one old dog-loving woman in your sample, it might be preposterous to declare that old dog-loving women prefer Mr. Doe by a wide margin. In some applications, you even have more variables than observations, so you are bound to overfit if you do not set a limit of some sort.
 * You might also be interested in our article on cluster analysis. Tigraan Click here to contact me 10:45, 12 November 2018 (UTC)
 * Another factor that should be mentioned is that there's not much reason not to use all the data you've got whether it's likely to change the result or not; with modern computing power it's just as easy to compute an average over a million samples as over a thousand. The real bottleneck at this point is usually data collection, and normally big data applications occur when this too can be automated. --RDBury (talk) 16:09, 12 November 2018 (UTC)