Wikipedia:Reference desk/Archives/Computing/2015 March 17

= March 17 =

Extracting peaks from histogram data.


I have a histogram taken from some noisy data - but it has a bunch of clear peaks. (Example at right).

I need a fast algorithm to extract the 'obvious' peaks...of which there are maybe six in that sample image. To the human eye, it's clear what I want - but I'm having the hardest time figuring out the right math to get the "obvious" peaks.

For example, just picking the largest values gets me multiple hits for each of the two tallest spikes. Picking the largest values that have lower values to either side of them produces lots of hits for the left-most peak that has a bunch of fuzzy values over it.

Ideally, I'd like a list of possible peaks that I can sort from largest to smallest and cut it off at some point.

There must be some formal approaches to doing this...but I'm having a hard time locating them.

SteveBaker (talk) 05:45, 17 March 2015 (UTC)


 * Pseudocode
 * (1) pick the highest value.
 * (2) multiply every value by |x - xp| where xp is the x coordinate of the peak
 * (3) If you don't have enough values yet, GOTO 1.
 * If you use this approach, you can get very small or very large numbers, depending on how the x coordinates are distributed. If they are integers, numbers get very big. If the x coordinates go from zero to one, they get very small.
 * You can use different multipliers in step 2, for example (arctan r(x - xp))^2 for some value r which controls how "local" the cut-off is. You might also change (1) to "pick the highest value which doesn't have a higher neighbor in the original data" and an "emergency exit" if there is none.
 * I hope that helps. Experiment. - ¡Ouch! (hurt me / more pain) 08:28, 17 March 2015 (UTC)
 * Oops. The |x - xp| introduces a shitload of bias near the ends. Use arctan.
 * Also, doesn't work on negative data. Can't think of a case where that would be a problem, unless you used logarithms, or subtracted the average from the data before step #1. - ¡Ouch! (hurt me / more pain) 14:07, 17 March 2015 (UTC)


 * A general solution would be to differentiate the data, and the peaks would be where the slope changed from positive to negative. You could then take the six largest Y values.  However, your data looks a bit noisy, so this probably wouldn't work without some smoothing.  --Phil Holmes (talk) 09:27, 17 March 2015 (UTC)
 * That's useful if there is a function you can differentiate. Numerical differentiation is unstable as hell with noisy data. However, once one has a peak, one can take that and the closest neighbors, interpolate, and then get a "subpixel"-accurate x value. - ¡Ouch! (hurt me / more pain) 14:07, 17 March 2015 (UTC)


 * (after ec) Try a web search using "peak detection" and "multimodal data". If you don't need something advanced, you can try the below:
 * apply a smoothing filter of some sort to the data
 * search for local maxima in the smoothed data
 * sort the local maxima by their y-values
 * for each of the N highest peaks identified, perform a local search in the original data set to identify the location and value of the true peaks
 * --173.49.16.112 (talk) 09:37, 17 March 2015 (UTC)


 * 1) Find the highest peak.


 * 2) Exclude the area on either side of that peak (for some distance which you will need to experiment with to get to the adjacent valleys) from the search set.


 * 3) Repeat until new peaks are below a target height, or all areas have been excluded. StuRat (talk) 10:57, 17 March 2015 (UTC)


 * This is a special case of my specialized method where all values close to xp (i.e. those satisfying |x - xp| < r) are multiplied by zero and all other values by one.
 * The one problem is that you have a hard cut-off in step 2; there could be both wide peaks and close pairs of narrow peaks in the same histogram. - ¡Ouch! (hurt me / more pain) 14:07, 17 March 2015 (UTC)


 * Yes, my recommendation is based on the sample data shown. Other data distributions would require other methods.  Also note that no one method works for all data sets and desired results.  For  example, you might want to get one or two maxima in the "close twin peaks" scenario.  StuRat (talk) 19:16, 17 March 2015 (UTC)


 * There are obviously many algorithms for things like this. In general, you can smooth / filter the histogram data to eliminate noise (e.g. N-point moving average, Gaussian moving average) or you can window the data (e.g. pick the highest point, and then exclude all points in a local neighborhood of that point as possible local maxima).  If you want a really formal answer you probably need to understand / characterize the noise in the data to get an understanding of how large a fluctuation is consistent with random.  For example, if the underlying data is some form of power spectrum then there are well-defined rules about the expected peak separation.  Without thinking too much about it, based on your example picture, I'd probably apply a moving Gaussian weighted average:


 * $$y'_n = {\sum_{k=1}^N y_k w_{k,n} \over \sum_{k=1}^N w_{k,n} }$$
 * $$w_{k,n} = e^{-(x_k-x_n)^2 \over 2 \sigma^2}$$


 * Where a suggested choice of $$\sigma$$ is half of the width in x of one of those larger peaks at half of its maximum value. The resulting smoothed curve should have a number of true peaks that more closely capture the eyes' expectations.  Then one can find a set of local maxima from the smoothed data that will probably agree pretty well with what you want.  Dragons flight (talk) 11:15, 17 March 2015 (UTC)
 * This method is probably superior to mine, if the x values are equally spaced. Mine could work if they are not, for example if the x values are logarithms of frequencies in spectra. - ¡Ouch! (hurt me / more pain) 14:07, 17 March 2015 (UTC)


 * Steve, as you already know, it's very easy to do this task poorly; but to do it well can become very difficult.
 * Often, the best solutions use derived statistics - e.g., a histogram of your histogram data - from which you can extract your threshold values. You can use signal conditioning to stretch and contour and normalize to your heart's content.  You may wish to use an outlier rejection technique - or not - depending on whether outliers are plausible in your application.  You can use a classification algorithm - as one would in machine vision or machine learning applications - and a clustering algorithm to determine which points come from a single peak.  This "should" be "easy" in one dimension!  Unfortunately, it isn't.
 * Ultimately, this is really a question of heuristics. How you define "obvious peak" is actually incredibly subjective.  How prominent over the the next peak must it be to count as a distinct signal?  How wide must it be before you discard it as noise?  Constrain all of these kinds of questions, and you will have your algorithm.  Nimur (talk) 14:44, 17 March 2015 (UTC)


 * Given the shape of the peaks, I might recommend you look into fitting Gaussian mixture models to the data. There's various software packages out there which can handle things for you in a more complex manner, but the simple version is to do something like a non-linear least squares fitting of the data to a function which is a sum of Gaussians (with fit parameters for center, width, and height). You can then use whatever goodness of fit metric you like best to pick out the model which does the best job of explaining most of the variance. Looking at your figure, I'm guessing that there would likely be a steep drop-off in the metric going from 6 Gaussians to 7 Gaussians. You can then pull out the center and width of each peak from the fitted formula, and order the peaks by the height or volume of each peak. (I also might suggest adding a linear component to your formula, to automatically account for baseline offset.) Of course, if you have reason to suspect that your peaks won't always be Gaussian-shaped, then that might not be the best approach to use. -- 160.129.138.186 (talk) 18:27, 17 March 2015 (UTC)


 * Here's a nice, understandable algorithm that seems a little different from those currently suggested, and uses a rolling deviation measure. 18:30, 17 March 2015 (UTC)


 * I've had to find peaks approximately a billion times, but I always started by smoothing the data. If you want to find peaks in the unsmoothed data, I would use the following definition:  a peak is a point that is higher than any point within N bins in either direction.  If you can decide on a value of N, implementing the algorithm should be pretty easy. Looie496 (talk) 19:21, 17 March 2015 (UTC)


 * Have wavelets gone out of fashion? —Tamfang (talk) 05:20, 18 March 2015 (UTC)

Non-redirecting link
Hi, I want to cite a description of this painting on a museum's website in a Wikipedia article. The link is: http://cyfrowe.mnw.art.pl/dmuseion/docmetadata?id=4233. However, the first time I click on this link, it redirects me to the website's main page: http://cyfrowe.mnw.art.pl/dmuseion?action=ChangeLanguageAction&language=pl. Only when I click on the same link again, it takes me to to the correct page. The same thing happens on Firefox and MSIE. Is it possible to produce a link that will always take you right to the intended page, avoiding the redirect? — Kpalion(talk) 10:37, 17 March 2015 (UTC)
 * From my quick tests in Firefox (new private mode sessions every time), yes. The redirect seems to be passing two parameters, one is an action to change language, another is the language code Pl (Polish). If you pass both parameters in your URL you don't get redirected. Note from my tesy you need to pass both, passing only one still results in a redirect. Also there should be only one question mark as part of the query string in your URL, the other parameters should be seperated by ampersand (&). So http://cyfrowe.mnw.art.pl/dmuseion/docmetadata?id=4233&action=ChangeLanguageAction&language=pl works. You should be able to put them in any order, e.g. http://cyfrowe.mnw.art.pl/dmuseion/docmetadata?action=ChangeLanguageAction&id=4233&language=pl and http://cyfrowe.mnw.art.pl/dmuseion/docmetadata?language=pl&id=4233&action=ChangeLanguageAction worked when I tested them. Nil Einne (talk) 11:50, 17 March 2015 (UTC)
 * That works. Thanks a lot, Nil Einne! — Kpalion(talk) 12:44, 17 March 2015 (UTC)

Gmail and POP3
I have a gmail account. I use Outlook 2003 to download e-mail using pop.gmail.com as the inbound server. I am (suddenly) unable to receive e-mail in Outlook, although I am able to send. I have not changed the settings in Outlook. I looked at Google's recommendations as to the settings, and they still match the settings I have. I vaguely recollect that this happened once before a long time ago, and it cleared up on its own, although I never discovered the cause of the problem.

In case it's material, I use Windows 7 Home 64-bit version. I can access gmail on the web.

I'm hoping someone else has experienced this problem. A search of the web using Google (heh) didn't uncover anything helpful.--Bbb23 (talk) 17:39, 17 March 2015 (UTC)


 * I'm not sure what your problems is, I do wish to throw the words, "Try 'reinstalling'/'reparing'", might do the trick...
 * O, latest version of MS office are available, if you are interested. -- (SuperGirlsVibrator (talk) 19:04, 17 March 2015 (UTC))
 * After doing some more searching on the Internet, I found the "solution". I disabled IPv6 on my LAN connection, and Outlook works again. Even those people who posted the solution didn't know why it worked or, more important, whether disabling IPv6 would cause other problems, but for the time being, just like them, it's disabled.--Bbb23 (talk) 22:16, 17 March 2015 (UTC)
 * Is it possible that pop.gmail.com resolves to an IPv6 address but something in your networking setup does not support IPv6? --173.49.16.112 (talk) 03:29, 18 March 2015 (UTC)
 * Nope, pop.gmail.com resolves to IPv4 address based on a check I did earlier today.--Bbb23 (talk) 04:51, 18 March 2015 (UTC)
 * You mean before or after you disabled IPv6? Once you've disabled IPv6, it's unlikely many tools will return the AAAA record unless you specifically ask them to. Even before you disabled IPv6, some tools may detect there is a problem or otherwise return the A record for some reason despite both A and AAAA record existing and them supporting both IPv4 and IPv6. Because of problems due to misconfigured dual stack networks where the IPv6 isn't working properly, both modern OSes and programs have been designed to try and detect problems with the IPv6 network and use IPv4 instead. But something as old as Outlook 2003 may not have that. Windows 7 should have decent support of dual stack configs, but the interaction between Outlook and the OS can be complicated and I don't think it can necessarily make up for all deficiencies in the program. (I see a lot of If I do an nslookup for pop.google.com, it definitely has an AAAA record although it's possible for various reasons your local name server may not have any. Nil Einne (talk) 07:23, 18 March 2015 (UTC)
 * Looking a bit more there are a lot of reports of problems with Outlook 2003 and IPv6. It's not even clear to me if it only occurs when you do have problems with your IPv6 or it can occur in other cases for various reasons. Nil Einne (talk) 07:29, 18 March 2015 (UTC)
 * To answer your first question, I did the check before I disabled IPv6. As for the thrust of your remaining comments, I've pretty much resigned myself to having to upgrade Outlook 2003. Thanks for your comments, even though I don't necessarily understand all of them. :-) That's another item on my agenda, though. I need to educate myself more about IPv6. The question is where to start; so many sources of information assume the reader knows more than they do.--Bbb23 (talk) 14:48, 18 March 2015 (UTC)

I can confirm that outlook 2003 has problems with IPv6. I have had the problems in the year 2005 when I introduced IPv6 on my mailserver. I the meantime newer versions of outlook work with ipv6. outlook 2003 is out of support. MS never fixed the ipv6 problem for the old version. The google-Server has an AAAA-Record: host pop.gmail.com gmail-pop.l.google.com has IPv6 address 2a00:1450:4013:c01::6d This answer is independent of you own network connection. Of course you can use the IPv6-Server only if you are connected via IPv6. --Tschäfer (talk) 21:03, 21 March 2015 (UTC)

.Jpeg software:
Hello, I uninstalled ‘MS 2003’ and ‘2010’ then installed ‘Windows XP Professional with Frontpage’ and reinstalled ‘MS 2010’, after accomplishing I clicked on an image file (because looked like the first icon) to view the image, it opens up as ‘Microsoft Photo Editor’. It is a ‘.jpeg’ file. How can I get it back to how it was, in its ‘.jpeg’ software interface? -- (SuperGirlsVibrator (talk) 18:54, 17 March 2015 (UTC))
 * This sort of question reminds me of why I now depend on simple, uncomplicated Linux as an OS so that MS  problems don't keep cropping up - causing me to tear my hair out. Time is money and MS sure wastes loads of your time.--Aspro (talk) 19:39, 17 March 2015 (UTC)
 * What software had you used to view jpeg files before Windows installation? Ruslik_ Zero 20:37, 17 March 2015 (UTC)
 * This page should be able to help you. -- 143.85.169.18 (talk) 21:46, 17 March 2015 (UTC)
 * Or in the long run -this might help even more: Install Linux Mint. Then you can say goodbye to anymore Window's  problems.--Aspro (talk) 21:55, 17 March 2015 (UTC)
 * The best answers address the question directly. I don't necessarily disagree Aspro but this isn't a podium for you to push linux. There's no such thing as an OS that doesn't have "ANY" issues. Also, you didn't close your formatting correctly, i fixed it for you :) Vespine (talk) 23:29, 17 March 2015 (UTC)
 * The advice is actually fairly dumb here, as it often is when Aspro randomly sprouts it. It seems fairly likely from this and earlier comments that the OP is specifically interested in Frontpage and this is despite having had multiple other suggestions, their choice even if it is an odd one. You may be able to get Frontpage working on WINE, but it definitely doesn't sound like it will be easier than simply fixing slightly messed up file associations. I would also note the OP appears to be using Windows XP, a nearly 14 year old operating system who's support was abandoned last year. Since IIRC Windows Vista, it's far harder for various programs to automatically take control of associations. At worst, you're likely to end up with multiple defaults, and when you try and open such a file Windows will ask you which one you want to be the defaults. You could run Windows XP in a VM on *nix, on the otherhand you can also do that with a more modern version of Windows which isn't getting an increasing number of security holes which will remain unpatched. In the absence of any real information on why the OP is choosing to run such a dangerous operating system, randomly telling them to 'run Linux' doesn't help. For example, if it's an old computer which won't ever be connected to the internet, then running Windows on a VM in *nix sounds like a dumb idea, an unnecessary waste of resources which won't help anything. (Bearing in mind also such computers often have more difficulty working with VMs since many x86 virtualisation programs seem to be abandoning support for processors lacking hardware assisted virtualisation instruction sets.) Unfortunately understanding complexities like this seem to be beyond Aspro's capabilities. Nil Einne (talk) 04:33, 18 March 2015 (UTC)
 * Friends, done! It was 'Windows Photo Viewer'. Thank you all. -- (SuperGirlsVibrator (talk) 07:49, 18 March 2015 (UTC))
 * Lol.
 * I downloaded Kompozer (inserts  'code' in the 'HTML tag' page as you enter data and press enter in the 'Normal' page), I also have Frontpage now (allows me to learn the codes quicker), but I think Kompozer will dominate since it is the latest version out of them both. I was going to download Nvu but Dfbris did not mention anything...
 * ( No, stick with Kompozer which superseded Nvu.   D b f i r s   07:56, 18 March 2015 (UTC) )
 * Thanks buddy! -- (SuperGirlsVibrator (talk) 08:21, 18 March 2015 (UTC))
 * I'm using Windows 7 Ultimate, same as the normal version to be honest. Its on 32-bit OS which hopefully will change in the near future, or I'll buy a new computer...

OS:
Now I've been planning to buy latest version of Unix, whatever is the latest, for some time! I know a guy who sells CDs, he possess Linux 'Mint' and 'Redhead', also 'Ubantu'. I've read through Research Unix, I don't really understand the last paragraph. Which one is the latest version? If someone could suggest me what to look for/what I should text my guy to get me, Unix v10 or Plan 9 4th edition (I don't know how its called...) on 32|64-bit OS. I will also be grateful if someone could provide me the downloadable links of 32|64-bit OS...

Regards.

Note: I was reading an article where it stated Plan 9 runs on Windows and Unix, I'm confused...

(SuperGirlsVibrator (talk) 07:49, 18 March 2015 (UTC))


 * You probably want Linux, which is not technically Unix, but they are close enough many people drop the distinction. You probably don't want Research Unix. You probably do want something like Linux Mint, Fedora_(operating_system), or Ubuntu - each of these are Linux distributions, that are designed to be installed easily and be easy to learn/manage. Keep in mind these are all free and open source software. It is legal for someone to sell you a Linux CD/DVD, but the cost should be minimal, because you are just paying for the disk and the copying service. The cost of the operating system itself is 0. You can also download installation disks online, but if you have a slow or unreliable internet connection it might make sense to pay a small fee for someone to hand you an install disk. You can also try to get or make a Linux Live CD, which will let you try out the OS booting from a CD drive, without making any changes to your current hard drive and Windows OS. You might also want to ask a new question; this thread is a bit stale and unrelated to picking a new Linux distribution to try out. SemanticMantis (talk) 13:55, 18 March 2015 (UTC)
 * Yeah I read somethings about it... The cost is 'peanuts' if compared to your currency ($0.50), not for me though, at the moment. And yes I'll pay for the disk and copying service, not just for this CD (what I'll buy soon) but for many...
 * I require the links for myself, for the near future, just in case I go to my guy and he says its unavailable. I saw it last week, I don't know what version he possess though. I can't afford to download... The best thing you said that it has a 'Live CD' functionality. this is perfect for me. Thank you. -- (SuperGirlsVibrator (talk) 21:36, 18 March 2015 (UTC))


 * Agreed. New post is better. In the meantime you could try reading: Install Linux in your PC. Remember, that it probably took you 40-45 hours to get the hang of Windows and you will need to invest similar  time in finding your way around Linux – but from there on,  there are less hassles ahead than sticking with Windows.--Aspro (talk) 15:22, 18 March 2015 (UTC)
 * Thank you for the links I really needed it. I wouldn't have understood even if I read it myself... One last help I require if you don't mind, I read through the link you provided, it is the latest version but clarify if it is the 'Live CD' one SemanticMantis stated. I would be happy with the 'Live CD'/USB pen drive one.  -- (SuperGirlsVibrator (talk) 21:36, 18 March 2015 (UTC))
 * Hello, I'm assuming it's the correct one, I was just confused with the following sentence: "Follow this guide if you want a Linux only PC or install Linux along side Windows."

Once again, Thank you all very much! Feel good to have you all in my life... Kind regards. -- (SuperGirlsVibrator (talk) 10:32, 19 March 2015 (UTC))