Wikipedia:Reference desk/Archives/Mathematics/2009 July 23

= July 23 =

A variation of the knapsack problem
I will describe the problem that I need to solve:

Given a budget I need to choose parts that will be used to build a car. To build a car I need a-

1. Engine

2. Body

3. Set of wheels

I need to choose one engine, once body, and one set of wheels from a set of 1000 engines, 1000 bodies, and 1000 sets of wheels. (I also know the price of each engine, each body, ... ).

In addition every part has a power ranking that is between 1-100.

I need to choose 3 parts so that the car that I get will cost less than (or equal to) my budget, and that its "power ranking" will be maximized. ("power ranking" := the sum of the power rankings of the engine, body, wheels.)

Can you please tell me if this can be reduced to the knapsack problem, and how I would solve this problem.

Thanks!

Quilby (talk) 03:11, 23 July 2009 (UTC)


 * We do have an article about the knapsack problem that describes some approaches. The problem is formally NP-hard, but instances that occur in practice appear to usually be tractable. 70.90.174.101 (talk) 05:04, 23 July 2009 (UTC)
 * Right, but my problem is different. Notice how in the regular knapsack problem, all weights are 'equal'. Here I have 3 different categories of weights and I need to choose 1 from each. If someone can give me some tips on how to solve this problem it would be helpful. 94.159.156.225 (talk) 05:23, 23 July 2009 (UTC)


 * The naive approach of trying all combinations is only n3 so it's clearly polynomial. An n2 algorithm is not too hard.  Sort each set of parts by cost and throw out the strictly worse choices, then for each engine, traverse the bodies list from most expensive to cheapest while traversing the wheels list in the opposite direction looking for the most expensive wheels that still put everything under the budget.  Hopefully someone can think up something more clever though. Rckrone (talk) 06:40, 23 July 2009 (UTC)
 * By the way you are looking for an exact solution, right? Rckrone (talk) 06:44, 23 July 2009 (UTC)
 * I dont know what you mean by exact, but as I said - the price of the final car needs to be smaller than or equal to my budget. So I think this is not an exact solution. Quilby (talk) 21:38, 23 July 2009 (UTC)

For each possible power ranking (1..100) select the cheapest engine, the cheapest body, and the cheapest set of wheels, and discard the rest. This reduces the size of the problem from 10^9 to 10^6 combinations to consider. Also discard any engine (body, wheels) that costs more than some higher ranked engine (body, wheels). Next, subtract the price of the cheapest engine (body, wheels) from the prices of all the engines (bodies, wheels), and from your budget. Now your budget reflects how much extra money you can spend to buy something more expensive than the very cheapest car. Then discard all parts that individually exceed you budget. The rest is brute force. First consider the set of pairs, (the cartesian product), of engines and bodies. Discard all pairs that cost more than your budget. Sort on ascending ranking. Retain only the cheapest pair of each rank. Also discard more expensive pairs of lower ranks. Finally combine the pairs with the wheels. Bo Jacoby (talk) 10:52, 23 July 2009 (UTC).

PS. ("power ranking" := the sum of the power rankings of the engine, body, wheels.) Is that really what you want? Is it a good idea to use fancy wheels together with a lousy engine? I suppose you would be happier maximizing the minimum of the power rankings of the engine, body, wheels. Bo Jacoby (talk) 13:18, 23 July 2009 (UTC).


 * That algorithm is still worst case n3. For a hard data set it doesn't reduce the number of combinations tried. Rckrone (talk) 17:05, 23 July 2009 (UTC)
 * Oh I see what you're saying. Since there are only 100 power ratings but 1000 parts, there are guaranteed to be some stupid ones (assuming only integer power ratings which is not specified).  Still once you've weeded out the bad ones, you can do better than brute force. Rckrone (talk) 17:43, 23 July 2009 (UTC)
 * Let's try to solve a more general problem while combining all of the above. We have k types of parts, for each we have n models, each having an integer power rating between 1 and m (in the OP $$k=3,n=1000,m=100$$).
 * We construct an $$m\times k$$ array of the identity and cost of the cheapest model of each type and power rating, by going over all $$n\cdot k$$ models. We then proceed in $$k-1$$ steps; in the first step we consruct an $$2m \times 1$$ array of the identity and cost of the cheapest pair of type 1 and type 2 models for each combined power rating between 1 and $$2m$$, by going over all $$m^2$$ pairs. In the ith step we combine the list of 1-i combinations with the list of $$i+1$$ type objects to a single $$(i+1)m \times 1$$ array by going over all $$im^2$$ pairs. In the end we have a list of the cheapest combination of parts for every possible power rating, in which we can binary-search whatever our budget is. All this takes $$O(nk+m^2k^2)$$ operations.
 * Of course, in practice, if we know the budget in advance we can skip the last step and do only upwards\downwards traversal.
 * If the power rating need not be an integer, we might have to use a more sophisticated data structure to achieve better than $$O(n^{k-1})$$. -- Meni Rosenfeld (talk) 18:57, 23 July 2009 (UTC)
 * If power isn't an integer, $$n=2$$ and k is large, then this is the knapsack problem. If, as in the OP, n is large and k is small, then the problems seem unrelated. -- Meni Rosenfeld (talk) 19:23, 23 July 2009 (UTC)


 * Yes you are correct. Quilby (talk) 21:38, 23 July 2009 (UTC)

I would like to thank everyone who answered my question. Quilby (talk) 21:38, 23 July 2009 (UTC)

Units
Consider the differential equation

$$\frac{dr}{dt}=(1-r)^3+3p(1-r)r^2-3(1-p)(1-r)r^2.$$

In this differential equation, r is a density (of anything, I don't think it matters). And p is a probability. My questions is how can one explain that the units in this differential are correct. I came across this while reading a paper and the author derives this equation. On the LHS, it looks like the units should be 1/time but the right hand side appears unitless so the equal sign doesn't make sense. Am I missing something? Even if non-dimensionalization was used, what was done? I mean if all the terms on RHS had p or something, one can say something about p being proportional to a constant (which will correct the units after division and redefining the variable t) but that isn't true anyway. Is it r? There are powers of r on the RHS. What could be the units of r? Anyone can shed some light on this? Thanks!-69.106.207.49 (talk) 07:19, 23 July 2009 (UTC)


 * I don't think there's anything here you're missing; maybe with some more context from the paper (or a link) we might be able to help better.  Because the expressions "1 - r" and "1 - p" appear, r and p have to be unitless, making the right-hand side as a whole unitless;  for the units to match, we have to make t unitless as well, which quite possibly is what is intended by the author (or possibly not).  There also could be an implicit conversion factor in the right side, say, dividing all of it by one second or some such.
 * It looks like the right hand can be simplified? The last two terms can be combined.  Eric.  76.21.115.76 (talk) 07:44, 23 July 2009 (UTC)


 *  Yes, it can be simplified to $$\frac{dr}{dt}=(1-r)^2(6p-r-2)$$, unless I've made a mistake.  --COVIZAPIBETEFOKY (talk) 12:25, 23 July 2009 (UTC)
 * Haha, I shouldn't have jinxed it like that! $$\frac{dr}{dt}=(1-r)[(1-r)^2+r^2(6p-3)]$$. This next one is not really a simplification, but expanding the part in square brackets gives $$\frac{dr}{dt}=(1-r)[(6p-2)r^2-2r+1]$$, which can't be factored nicely. --COVIZAPIBETEFOKY (talk) 12:31, 23 July 2009 (UTC)

Max mod principle problem
Hi, I'm working on a problem and there are 2 cases. One I have and the other I have no clue. This is a qual problem so I have old solutions from two different people. But, one person didn't do this case and the other one did something that makes no sense. So, I guess I'm wondering if the problem is even correct. If it is, can you help me figure out how to do it?

Assume that f, g are holomorphic in the disk D = D(0, 1), continuous on the closure of D and have no zeros in D. If $$|f(z)| \equiv |g(z)|$$ for |z| = 1, prove that f(z) = k g(z) in D, for some constant k of modulus 1.

Alright, so case 1 is f(z) is never 0 on the boundary. The condition given implies g is never 0 either. So, we can get $$|f(z) / g(z)| \leq 1$$ and $$|g(z) / f(z)| \leq 1$$ on D by the max modulus theorem since f = g on the boundary. Putting those together gives |f(z) / g(z)| = 1 on D. But, then the max occurs inside the disk so f / g is constant on D.  And, clearly the constant is unit modulus.

Case 2: f(z) = 0 for at least one z in the boundary. I have no idea what to do here other than f(z) = 0 if and only if g(z) = 0. Any ideas? Thanks StatisticsMan (talk) 15:31, 23 July 2009 (UTC)


 * If f(z0)=0, consider f(z)/(z-z0) and g(z)/(z-z0). Bo Jacoby (talk) 18:22, 23 July 2009 (UTC).


 * Off the top of my head, I think you're saying to get rid of the zeros by dividing by (z - z_0), one for each zero of f (or g), and then use the previous case. Is this right?  But, what if f is 0 an infinite number of times around the boundary?  Or, if f and g are both zero but when getting rid of the zero by this method, maybe the new f and g satisfy f(1) = 100 and g(1) = 50... so they no longer have equal magnitude around the boundary and case 1 doesn't apply?  This is part of my problem.  Since f and g are not holomorphic on the boundary, do I know anything about the zeros? StatisticsMan (talk) 19:28, 23 July 2009 (UTC)
 * $$\frac{f(z)}{z-z_0}$$ is not even necessarily continuous - for example, $$f(z)=\sqrt{z+1}$$ for an appropriate branch. -- Meni Rosenfeld (talk) 20:21, 23 July 2009 (UTC)


 * (edit conflict). Meni, you have got a point. $$f(z)=\sum_{k=0}^\infty \binom{1/2}{k}z^k$$ is holomorphic for |z| < 1 and zero for z = &minus;1, and $$\frac{f(z)}{z+1}=\sum_{k=0}^\infty \binom{-1/2}{k}z^k$$ is not continuous on the closure of D. Statman, yes, that was what I was trying to say. Can a nonzero holomorphic function have an infinite number of zeroes around the boundary? The new f and g satisfy  |f(z)| = |g(z)| for |z| = 1, z ≠ z0; by continuity |f(z0)| = |g(z0)|.  Bo Jacoby (talk) 08:14, 24 July 2009 (UTC).
 * Yes: by the Riemann mapping theorem. There exists a holomorphic function on the disk, continuous up to the closure, whose zero set at the boundary is any given closed subset you fixed. --pma (talk) 14:16, 24 July 2009 (UTC)


 * You may adapt your case 1 to circles of radius r<1, where you know that f and g do not vanish. By the uniform continuity of f and g on the unit closed disk, you have, uniformly for |z|=r, |f(z)| ≤ ||g||∞,D + o(1) as r→1. Apply the max mod to f to extend the inequality on the disk of radius r as you were doing, and conclude . PS: Does not work --pma (talk) 07:44, 24 July 2009 (UTC)
 * Really? The article on the Riemann mapping theorem merely says that the disk D is mapped one-to-one to any simply connected open subset. Bo Jacoby (talk) 20:51, 24 July 2009 (UTC).
 * Yes, but it says more if the assumptions are strenghtened. As a matter of fact, any closed curve with finite length has a re-parametrization that extends to a holomorphic map on the disk, continuous up to the boundary. So it seems to me that this allows to construct a holomorphic function on the disk, continuous on the closure, with infinitely many zeros on the boundary (in fact now I'm not sure to get any given closed subset as a zero set, even if I think I remebmer that you can do that; I will think about this... --pma (talk) 06:57, 25 July 2009 (UTC)
 * How can a one-to-one mapping have more than one zero? Bo Jacoby (talk) 15:49, 25 July 2009 (UTC).
 * The holomorphic mapping of the open disk is one-to-one, its continuous extension to the boundary is not. — Emil J. 12:18, 27 July 2009 (UTC)

Multiple regression v. neural nets
Were neural nets just a fad? I would like to try predicting or forecasting the real-estate market using several time series of economic data. The dependant variable would be a time series of real estate average prices. The independant variables would be time series of things like unemployment, inflation, stock market index, and so on, and a lagged real-estate average price time series. Which would be the better methodology to use? And does anyone have an opinion on which of the various specific types of the two methods would be best for what I want to do? Would using independant variables which were correlated make any difference to the choice - I would also like to try using lagged real-estate average price time series from other geographic areas. Thanks. 89.240.217.9 (talk) 21:35, 23 July 2009 (UTC)


 * Software packages will happily perform multiple regression, but they don't correct for data anomalies. If your data exhibits (as most time series data does) heteroskedasticity, multicollinearity, non-stationarity, etc. (the list of anomalies is rather large), then the results that the software reports could be meaningless. Wikiant (talk) 22:52, 23 July 2009 (UTC)
 * But if you know what you're doing, you can usually use the software to diagnose those problems, and in some cases then transform the data in ways that make it amenable to multiple regression. Michael Hardy (talk) 23:49, 23 July 2009 (UTC)


 * Absolutely, but you'll probably need a one or two stats courses plus at least two econometrics courses to get there. My implied warning is that a little knowledge of regression is more dangerous than no knowledge of regression -- it enables one to use the software to produce results without knowing that the results can be meaningless. Wikiant (talk) 11:15, 24 July 2009 (UTC)

I dimly recall studying about heteroskedasticity, multicollinearity, and non-stationarity years ago - I need a refresher. But my question was, would neural nets be better than multiple regression in these circumstances? I'm not aware of any no-cost software that lets you use neural nets, (R may do it, but seems difficult to use), so I'd have to code it myself. 78.151.148.77 (talk) 14:38, 24 July 2009 (UTC)


 * I don't do neural nets, but my guess is that they would be as susceptible to these problems as is regression because the anomalies reside in the data, not in the analytic techniques. Wikiant (talk) 15:56, 24 July 2009 (UTC)


 * Both have their points, but talking about forecasting sales of a drink for instance, they would try doing multiple regression with some heuristic functions to try and get straight lines after correcting for seasonal variations - but there's all sorts of other things too. For instance they will also try predicting the amount sold for various types of holidays, they have to get a forecast for the next couple of weeks weather, if there is a big match there is also the variability depending on who wins - and that would affect the area with the match more. I'm sure the problem is the same with real estate plus of course there will be a much bigger dependency on the stock market. Sometimes it is fairly stable and other times it goes haywire. See Super Crunchers for a good read and an introduction if you want to get into this sort of thing. Regression might work better for drink but neural networks would I guess be more useful with real estate. It is very worthwhile trying to figure out why a neural network gets good results if it works well and that can sometimes be difficult. Dmcq (talk) 19:27, 24 July 2009 (UTC)