Wikipedia:Reference desk/Archives/Mathematics/2018 March 18

= March 18 =

Combinatorics of decreasing intervals
For $$N$$ integer, how many sequences $$(i_1, i_2,...i_N)$$, $$(j_1, j_2,...j_N)$$ are there such that $$\forall k, i_k \leq i_{k+1} \leq j_{k+1} \leq j_k$$, with all i, j integers and $$i_0=1, j_0=N$$? Or equivalently, how many sequences of subsets intervals $$S_N \subseteq ... S_2 \subseteq S_1 \subseteq [1,N]$$. This looks like something that could get a recursive formula, but I had no luck finding it on my own.

Background: what I am actually looking for is the number of ways to place a "stable pyramid" within a cube of edge N, where a "stable pyramid" is a set of cubes of unit side such that each cube that does not rest on the lowest layer rests on top of another cube. The answer to that question is basically the 2D analogous of the question actually asked. Background of the background: see the "splitting a cube into three cubes, how many cuboids" question above.

Tigraan Click here to contact me 01:01, 18 March 2018 (UTC)


 * So $$1 = i_0 \leq i_1 \leq i_2 \leq \ldots \leq i_N \leq j_N \leq j_N-1 \leq \ldots \leq j_1 \leq j_0 = N$$? Unfortunately I don't have the time right now to get into this problem but my hunch says it's something to do with Stirling numbers. 93.136.36.57 (talk) 05:24, 18 March 2018 (UTC)


 * It is not clear to me if the two questions really are equivalent, but if we define $$f(m, n)$$ to be the number of sequences of sets such that $$S_n \subseteq \ldots \subseteq S_2 \subseteq S_1 \subseteq [1,m]$$ then $$f(m, 0) = 1$$ and (by considering the choice for $$S_1$$) you have $$ f(m, n) = \sum_{k = 0}^m \binom{m}{k} f(k, n - 1)$$. (Or, if you really meant "intervals" instead of "subsets", replace the binomial coefficient with the number of k-element intervals in [1, m], which is $$m - k + 1$$ if k > 0.)  I have not made any attempt to compute these numbers or think about whether they give a known sequence.  --JBL (talk) 15:22, 18 March 2018 (UTC)
 * Of course it was worth a minute's thought. If you really mean sequences of sets then $$f(m, n) = (n + 1)^m$$: each of the m elements chooses which is the first subset it will appear in, or to appear in none of them.  --JBL (talk) 15:28, 18 March 2018 (UTC)
 * Huh, sorry. Yes, I meant intervals, not subsets. The recurrence will do, I only need n up to 6 or so, so it is easy enough to compute. Tigraan Click here to contact me 21:04, 18 March 2018 (UTC)

Machine Learning: Combining Binary Features for Better Prediction
I'm building a machine learning model (probably a random forest) to predict recipe ratings. I have the rating and several hundred columns of tags that people have applied to the recipe (i.e. "barbecue", "vegan", etc.). There is one row for each recipe, and if that recipe has that tag, there is a 1 in the column for that tag (zero otherwise).

Most of these tags aren't great predictors of the rating (as measured by a biserial correlation), and I'm wondering if I can combine the tags into groups to improve the predictions. Is it even possible for that to work (i.e. is just mathematically not possible to improve a prediction in this way)? If it is possible, is there an algorithm to determine the optimum groups? Put another way, if you have binary variables, can you combine some of them into groups to improve the biserial correlation with a target variable over the individual correlations?OldTimeNESter (talk) 18:14, 18 March 2018 (UTC)
 * I think that Cluster analysis is what you're looking for. There are many ways to do cluster analysis, so finding which one is the best for your specific problem is a question that is difficult to answer here. Iffy★Chat -- 18:37, 18 March 2018 (UTC)
 * There always is a difference between theory and practice, but in theory, if you have a sufficiently strong (non-linear) model, like a 3-or-more-layer perceptron, clustering of features should not really help. I have no good intuition about random forests, but plain decision trees have limits. --Stephan Schulz (talk) 00:28, 20 March 2018 (UTC)


 * If I understand the question correctly, the answer is "obviously yes". For instance, take two coin throws and the event "the coin lands on the same face each time". Any of the two coin throw results is uncorrelated with the outcome but you can get an excellent "predictor" by grouping the two results.
 * That's obvious stuff, and it makes me wonder whether I really understood the question. As soon as you move beyond the naive assumption, machine learning is about grouping different observations into a single relevant prediction that holds more info than the sum of the observations. If it was not possible, machine learning would not work. Tigraan Click here to contact me 15:19, 20 March 2018 (UTC)
 * I think what I'm doing is a bit different. By grouping the fields, I mean just merging some of them, so that instead of having (say) seperate fields for tag1 and tag2, I have one combined field that is set to 1 if the recipe contains either tag1 or tag 2. In your example (if I understand it correctly), you have two coins, and you would set the grouping field to either 1 (if both coins came up the same) or 0 (if they didn't). In my example, you'd set each coin's outcome to 1 if it came up heads, and 0 otherwise. Then, when you combined the two, you'd have a 1 if either coin came up heads, which I don't think would predict anything. That was an interesting thought experiment, though. If I didn't understand it correctly, please let me know. OldTimeNESter (talk) 18:05, 20 March 2018 (UTC)
 * What that would achieve is a reduction of features, or, more fancy, dimensionality reduction, a frequent first step in machine learning. A standard technique for that is principal component analysis. --Stephan Schulz (talk) 21:37, 20 March 2018 (UTC)
 * Well, that depends on what machine learning algorithm you're using.
 * If you're using a highly restricted method like logistic regression, this form of feature engineering can definitely improve prediction, since it allows expressing relations that are not possible with the raw features.
 * However, I should note that since you talk about new features that are an OR of original features and discarding the original, it will detect correlations with the simultaneous lack of features, not their simultaneous existence. That is, you will detect deductions like "the recipe is good if it has neither tag X nor tag Y", but not "the recipe is good if it has both tag X and tag Y". If the latter is more likely, you will want to group tags with AND rather than OR. Or you can keep both the original features and the grouped ones, and then you can find both.
 * If you use a more general method like neural network, there's little to gain by using these combined features - the network can just construct them on its own in hidden nodes if it so wishes. However, reducing the number of features can help reduce noise and overfitting (especially when there's not a lot of data), and also reduced the computation time for learning the model. -- Meni Rosenfeld (talk) 10:01, 21 March 2018 (UTC)