Talk:Histogram/Archives/2011

Missing: calculating bin sizes
Should this article contain something about calculating bin sizes?

If by bin size you mean width, then absolutely something needs to be here. There's no explanation of why the widths differ and in research elsewhere the widths are always equal, making the wikipedia definition inconsistent with what's available elsewhere. - Sam N.


 * Bin widths do not have to be equal. In many instances they are equal simply because it is easier to handle, and any benefits from variable bin sizes are out-weighed by the simplicity of having equal bin sizes. An example of variable bin sizes is the histogram method of white balancing digital images, where the bin widths of the histograms of brightness are adjusted so that each bin has a roughly equal number of elements. Mattopia 19:46, 29 March 2006 (UTC)

I support this, although I disagree about calling them bar charts. They are just Histograms where the class widths are all identical. This misses one of the points of plotting a Histogram. PsybertronJr (talk) 13:27, 11 December 2010 (UTC)

Excessive whitespace
This article needs to be reformatted to eliminate the excessive whitespace. If the tables could be placed beside the graphs then that would tighten things up considerably.--Hooperbloob 05:16, 12 August 2005 (UTC)

I reworked the layout quite a bit, mostly merging the tables and reducing the graphs to thumbnails. It's not quite pretty yet, but methinks it should be a little easier to work with. --Pathoschild 01:29, September 21, 2005 (EST)

Added logical headings and reorganised appropriately. --Pathoschild 01:16, September 22, 2005 (EST)

Missing: interpretation of histogram shapes
Should the article contain something about the interpretation of various histogram shapes; normally distributed around a mean, etc. ¿¿ Maybee!! --Kjetil Halvorsen 22:46, 27 January 2010 (UTC)

Missing: etymology of the word histogram
The article should contain something about the etymology of the word "histogram".
 * I've added a little something about the etymologyAastrup 09:46, 12 February 2007 (UTC).

Missing: Basic example of a histogram
This article should contain a simple histogram to illustrate what it is. Currently it is a tad ambiguous with the 2000 census versions.

Confusion
"Actually, this document shows bar graphs, but they are not histograms since the bars are not adjacent."

more concrete and correct examples of histograms are required

Histogram
Hi, what is the name given to the highest point in a histogram, and what is the name of the bars that are not the highest?

++ Density plots == This page needs links to density estimation, which are often superior to histograms, see, e.g. Simonoff Smoothing methods in statistics.

Density plots are not superior to histograms - they are different. Histograms are generally more appropriate for exploratory data analysis as they make fewer assumptions about the underlying distribution, and they let you see the raw data more directly.

--Hi, how do you know when to use a histogram and a bar graph? What is the difference between the bar graph and the histogram in simple terms??
 * Most basically, a proper histogram has no spaces between the bars because they are meant to represent numbers on a continuous scale (infinitely precise decimals theoretically, any and all of which have potential to be observed in your data), while bar graphs have spaces between the bars and are meant for comparing amounts for different categories that don't even have to be numerical (such as a survey about favourite colours). So, bar graphs for categorical variables and histograms for continuous numerical variables.  For discrete numerical variables (eg survey of number of TV sets owned), the bar graph works fine, though nothing but your conscience can stop you from doing a cheat and pretending the numbers are part of a continuous scale, as if someone could own say 2.75 TV sets.Nicknicknickandnick 05:23, 3 May 2007 (UTC)

Excel
As I have understood form this page there is a difference between a bar chart and a histogram, micorsoft excel 2003 calles a bar chart a histogram and a horizontal bar chart a bar chart... Mabe a warning that Excel get the names messed up might be in order? Thanks --Squidonius 22:58, 28 January 2007 (UTC)
 * Yes unfortunately Excel's histogram generator (in the Data Analysis ToolPak) doesn't really generate a proper histogram, because it uses the existing available bar chart generator to attempt the task. Nicknicknickandnick 05:36, 3 May 2007 (UTC)

Bin and Count independence?
A colleague had a chart displaying the number of events occuring during each hour of the day, displaying 24 bins, one for each hour of the day. He called it a Histogram and referenced the definition in this entry. What bothers me is that the definition of the count does not seem sufficiently independent of the definition of the bins. Is there some aspect of a histogram that would require the definition of the count to be independent of the definition of the bins?
 * That is somewhat borderline between histogram and bar chart of counts (an example is this). Generally, the placement and count of histogram bins is arbitrarily set by the distribution of the value, which is usually unknown (for example, measuring the heights of trees in a park). This is why there are multiple methods of determining the number of bins, and the placement of breaks between bins. But for your example, the frequency (hour) and range (0-23) of the value is known—so it is convenient to use the breaks of a day. In the end, the chart you described is interpreted the same as a histogram, so it would seem like an appropriate definition. +mwtoews 14:21, 3 May 2007 (UTC)

Diagrams are inconsistent with the data tables
There are data tables on two subjects. There is only one "histogram" and that is on drive times. It appears as twice. There is no "histogram" for the student data. 66.74.146.101 20:38, 17 October 2007 (UTC)LSquared Orange CA USA


 * It looks like the original source data for the "by proportion" example was replaced with an data for absolute numbers in | version 105847254. The edit was anonymous and no reason seems to have been given.  Additionally the image and the text were never changed to match the new data.  I've reinstated the original data, so the article should make more sense now.  I'm sort of surprised no one else has caught it before. Undisputedloser (talk) 22:57, 26 December 2007 (UTC)

Bin count & convolution?
Does anyone ever do away with the bin-count question and just convolve the data set with, e.g., a Gaussian kernel? That leaves the question of what the standard deviation of the kernel should be, but it does away with the arbitrariness of bin size. In the context of displaying a sample (e.g., scores on a test), this seems more natural since one's score on a test includes noise, so a score of 92 really means the person's understanding is between, say, an 89 level and a 95 level. —Ben FrantzDale 17:33, 3 November 2007 (UTC)
 * It looks like yes. See kernel density estimation. 155.212.242.34 20:24, 6 November 2007 (UTC)

Number of bins and width
This statement "The number of bins k can be calculated directly, or from a suggested bin width h:" seems misleading given the formula that follows on the page. In the formula displayed for the number of bins k the denominator as shown is n, shouldn't it be h (which represents the calculated bin width)? Doesn't n represent the number of samplesor observations? Hzlnt7 (talk) 23:31, 18 December 2007 (UTC)

Cherry Tree Graph Problem
The y-axis should be labeled as Frequency Density, shouldn't it? —Preceding unsigned comment added by 131.231.242.122 (talk) 13:07, 18 March 2009 (UTC)
 * There are two types of y-axis for histograms:
 * representation of frequencies, the counts component of the result;
 * probability densities, the counts divided by the total.
 * This figure was created using the former, by simply counting. "Density" implies the latter type, and would have a range of 0–1 (or 0–100%). (More info here, look at the freq parameter). However, you do raise a point that this isn't really clear on the front page. + m t  18:41, 18 March 2009 (UTC)

What is said aboove is NOT correct, density scale does NOT imply values between 0 and 1! This is a common misunderstanding. Consider a uniform distribution between the limits 0.0 and 0.5, the density is 2! --Kjetil Halvorsen 22:51, 27 January 2010 (UTC) —Preceding unsigned comment added by Kjetil1001 (talk • contribs)

Introduction overcomplicated
Can I suggest that the list of quality control tools and the alternative of kernel density estimation are too specialised to be in the introduction? Perhaps they could be pushed further down into a suitably titled subsection? --Alastair Rae (talk) 16:26, 13 August 2009 (UTC)
 * Don't think so. The introduction is already kinda short and the concepts and links therein are useful for anyone trying to understand the purpose and context of histogram. --Cyclopia (talk) 18:28, 23 August 2009 (UTC)

Area of a histogram is 1?
The histogram gives frequencies of data X within some range (as defined by bins). Therefore, the area of a histogram over the total range of X should be equal to the number of data points. If you normalize to 1, you get relative frequency, an estimate of the probability density. Ben T/C 13:57, 23 August 2009 (UTC)

Serious shortcoming of article?
The article currently covers the case for bins which have the same spacing, i.e. the case where bins are of the same size. What about flat histograms, where each bin contains the same number of points? I think this is important enough to write about it. Ben T/C 14:08, 23 August 2009 (UTC)
 * You're welcome to add that. --Cyclopia (talk) 18:25, 23 August 2009 (UTC)

As a non-statistician I would appreciate some discussion of the strengths and weaknesses of histograms compared to other representational tools, if you feel this is appropriate.

This section is a mess
What does the section below mean? It says "H(&fnof;)(y) =" and then that sentence ends, unfinished. Do the capital H and the lower-case h both mean the same thing? If so, why doesn't the section say that? And if not, why doesn't the section say that? Michael Hardy (talk) 20:34, 8 November 2009 (UTC)

Continuous data
The idea of a histogram can be generalized to continuous data. Let &fnof; &isin; L1(R) (see Lebesgue space), then the cumulative histogram operator H can be defined by:


 * $$H(f)(y) = $$ with only finitely many intervals of monotonicity this can be rewritten as


 * $$h(f)(y) = \sum_{\xi\in\{x : f(x)=y\}} \frac{1}{|f'(\xi)|}.$$

h(&fnof;)(y) is undefined if y is the value of a stationary point. ===end of excerpt


 * It is for sure badly written and I don't know if it is serious stuff or original research. But I wouldn't remove it just now. Is it possible to find the editor who added this section? -- Cycl o pia talk  16:54, 2 January 2010 (UTC)
 * Oh, I've seen now that you did just that (Try to be a bit less hard on your comments to users). That user seems inactive since end of October, so I wouldn't expect an answer soon. I'd say that the best thing is to ask at an appropriate wikiproject for comment. If you want to re-comment out the section, feel free (I reverted it, but I'm more and more convinced it could be OR). I think the guy wanted to express something similar to kernel density estimation, perhaps? -- Cycl o pia talk  17:02, 2 January 2010 (UTC)

Etymology
It is also said that Pearson derived the name from "historical diagram". Does anyone know more?Nijdam (talk) 22:30, 1 January 2010 (UTC)

Bar?
Is "bar" the usual term to call the area above an interval. Or would it be better to speak of column, rectangle or something else? To me it seems quite confusing to speak of a bar diagram with bars as they are, without a meaningful thickness, and also speak of bars in a histogram, with a meaningful width.Nijdam (talk) 21:01, 15 March 2010 (UTC)

Bin Size Redirect
"Bin Size" redirects here from a link the page on entropy. In this context I believe it is inappropriate (data binning is probably more appropriate), but should the redirect remain for other correct purposes or should it be removed entirely? Mickeyg13 (talk)

mathematical definition
If one likes to give such a definition, it should read rather different from the given one. The histogram maps classes onto the relative frequency density of each class.Nijdam (talk) 21:20, 12 July 2010 (UTC)
 * Right. Feel free to edit then. Is there any source about that? -- Cycl o pia talk  21:49, 12 July 2010 (UTC)


 * I rather like to delete the sections "mathematical definition" and "cumulative histogram". Nijdam (talk) 08:56, 6 September 2010 (UTC)


 * Uh, no! Why? -- Cycl o pia talk  12:42, 6 September 2010 (UTC)


 * Such a definition doesn't serve any purpose. A histogram is best considered a diagram. Nijdam (talk) 13:15, 6 September 2010 (UTC)


 * No, it isn't. It is considered a diagram by many people but it is definitely not a diagram. The diagram you talk about is bar chart, which is the classical way to display an histogram. Please don't revert sourced information without good reason: in any case, I don't like an edit war, so I'm going to notify Wikiproject Statistics to see if they can help in the matter. -- Cycl o pia talk  13:33, 6 September 2010 (UTC)


 * Well I hope others will interfere. A histogram is not a bar chart. A histogram is historically defined and considered a diagram. It may be mathematically considered a method if you like, but most interpretations are a graphical diagram. I consider it pedantry to try to introduce a kind of exact mathematical definition. Nijdam (talk) 13:59, 6 September 2010 (UTC)

A histogram IS a diagram: see the OED. If you're interested, the original reference by Karl Pearson is here (page 399), in which he clearly states it is a diagram. Yes it can be viewed in the context of mathematical statistics as a density estimator, but the most common usage refers to the diagram itself. —3mta3 (talk) 16:35, 12 September 2010 (UTC)
 * Right, if the dictionaries say it's a diagram, I stand corrected. However I am skimming the Pearson reference and I don't find it saying explicitly so -quite the contrary, it seems to me that in pp.345-346 he devises it as a mathematical-geometrical method. Can it be considered both, as you yourself admit it can be the case -e.g. a diagram and also a density estimation method? -- Cycl o pia talk  16:58, 12 September 2010 (UTC) Stupid me, I didn't read your ref. to p.399. Ok, now it is clear. -- Cycl o pia  talk  17:00, 12 September 2010 (UTC)

pictogram
This article does not mention histograms with pictograms (pictographs), ei. fill of the bars changed to a stacked clip-art in Excel, which are actually extremely popular in magazines and non-academic journals, which makes the disambiguation links quite annoying when actually looking for pictograms. --Squidonius (talk) 08:41, 6 September 2010 (UTC)
 * You are confounding an histogram with a bar chart -a common misunderstanding. Histogram is a mathematical technique. Bar chart is a graphical representation. You should take your comment there. -- Cycl o pia talk  12:32, 6 September 2010 (UTC)
 * I edited the lead section to clarify (I have to get my hands on statistics' books to make it more proper and well cited but I am at home with a cold now, sigh) -- Cycl o pia talk  12:44, 6 September 2010 (UTC)


 * @ Cyclopia: it turns out you are the one confounding histogram and bar chart. Nijdam (talk) 14:01, 6 September 2010 (UTC)

Algorithm
I'm not convinced the recent added algorithm is appropriate for the article, In the first place I think an algoritm is not informative, in the second place it does not help much, and in the third place does this algorithm not count for different class widths. Nijdam (talk) 22:55, 1 October 2011 (UTC)
 * I agree with your points. The article describes the components of the histogram well, but these cannot be stuffed in a simple Python example. But an exhaustive routine (e.g. in R: hist.default) would be off-scope for an encyclopedia article. + m t  08:22, 2 October 2011 (UTC)


 * I agree too. I'm a big proponent of Python examples where they are essentially pseudocode and where they add something. This doesn't add anything -- you'll use a library to make a histogram if you are programming -- and it gets into details of Python syntax, so isn't just pseudocode. —Ben FrantzDale (talk) 12:54, 3 October 2011 (UTC)