Talk:Unstructured data/Archives/2014

85%
In the first paragraph you say that Merrill Lynch estimates that more than 85% of all potentially usable business information originates in unstructured form. You reference an article in DMReview, 2003 that makes this statement.

However, this DMReview article does not support this claim. There is no reference to any report or statement by Merrill Lynch. In fact, you can find thousands of web pages making this claim, but none of them can support it. The Merrill-Lynch report simply can not be found. I think that this number has been repeated so many times that everyone now thinks it's true. But as far as I can tell, there is no such study.

I'm sure it must be Wikipedia's standard that one must quote sources directly, and not use hearsay. A rumor that there is a study is not the same as an actual study.

I think you should remove this statement from the article. —Preceding unsigned comment added by GregHolmberg (talk • contribs) 00:56, 5 August 2008 (UTC)

I emailed the author of this article, and here's what he said. It looks like this claim of 85% can't be substantiated. Unless someone can come up with the actual report from Merrill-Lynch.

From: "Robert Blumberg" To: "'Greg Holmberg'" Subject: RE: The Problem with Unstructured Data Date: 2008 08 05 10:00:42 AM

Hi Greg,

Well, I just went back and looked through about 50 reports and articles that I collected when writing the DM Review article and the Merrill Lynch study wasn't among them.

I do remember spending some time trying to validate the M-L reference but I don't seem to have documented my efforts. So this may well be an urban legend that I simply repeated, although I like to think that I did find a valid reference.

Sorry that I can't help you with this one.

Robert —Preceding unsigned comment added by GregHolmberg (talk • contribs) 18:16, 5 August 2008 (UTC)

After some discussion on the TextAnalytics Yahoo group, Seth Grimes believes that the following Merrill-Lynch report on Enterprise Information Portals from 1998 is the source of the claim: http://emarkets.grm.hia.no/gem/Topic7/eip_ind.pdf

Specifically, on page 15 it says:

Not surprisingly, unstructured data comprises the vast majority of data found in an organization, some estimates run as high as 80%.

Not exactly a scientific study. So it appears that this figure is just something someone made up, and was eventually quoted as fact. —Preceding unsigned comment added by GregHolmberg (talk • contribs) 01:03, 7 August 2008 (UTC)

Congratulations, all

 * I came to TALK to complement authors/community for the kind of reader-helping, guiding overview statement that (paraphrase) this article and the technology it introduces matters, because most of the world's data is like this . . . but that assertion, while widely held, is not backed up by specific research (/paraphrase). What contributors have achieved here is so much better than editor wars, so much more elegant than giving up--dropping the info for want of a ref.  I see now that it took work I never imagined to get there, so there's even more reason to celebrate.  FWIW, my congrats. Jerry-VA (talk) 18:27, 1 January 2014 (UTC)