Talk:Apache HBase

HDFS
>> It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS (Hadoop Distributed Filesystem).

Strictly speaking, this is not true. In Hbase, you can configure the storage systems. I believe 99% will choose HDFS, but in theory you could use also the local file system. — Preceding unsigned comment added by StefanPapp (talk • contribs) 07:12, 21 August 2014 (UTC)

Notability
I see the article's tagged for notability & haven't been able to find much in the way of articles in reliable sources. There were a few blog posts but that's about it. Not bad enough to warrant an AfD perhaps but may prod to see if anyone cares. -- samj in out 10:30, 6 January 2010 (UTC)


 * People like you are a plague on Wikipedia. "Derp. I've never heard of this is in the mainstream media -- better to delete it I think.". If you don't know anything about the subject, then just move along. — Preceding unsigned comment added by 82.9.176.129 (talk) 01:37, 6 September 2014 (UTC)


 * It's part of the Apache Hadoop Stack, which as you will agree, is notable as the primary non-Google implementation of datacentre-scale filesystem (HDFS) and layers on top, of which MapReduce is one feature, HBase another. Probably best coverage is ApacheCon slideware. One interesting feature of it is that since Microsoft bought Powerset, MS are effectively working on this. I shall improve the article a bit. No direct CoI problems, but I do know the people and am a committer on Hadoop proper. SteveLoughran (talk) 14:29, 6 January 2010 (UTC)


 * I've added some more on why I think it is notable. Left the tags marking other issues up.SteveLoughran (talk) 08:55, 7 January 2010 (UTC)

...and bloom filters
"HBase features compression, in-memory operation, and Bloom filters"

Bloom filters for what? Bloom filtered indexes? Just saying "and bloom filters" is like saying "and B-trees". Those are data structures, not features of a database.

External links modified
Hello fellow Wikipedians,

I have just modified 1 one external link on Apache HBase. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Added archive https://web.archive.org/web/20140528110238/http://www.networkworld.com/news/tech/2012/102212-nosql-263595.html to http://www.networkworld.com/news/tech/2012/102212-nosql-263595.html

When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at ).

Cheers.— InternetArchiveBot  (Report bug) 08:00, 16 October 2016 (UTC)

What is this even
"HBase is a column-oriented key-value data store and has been idolized widely because of its lineage with Hadoop and HDFS." Idolized? Wat? — Preceding unsigned comment added by 65.112.8.3 (talk) 18:59, 2 March 2018 (UTC)

Indeed. "This is getting needlessly messianic." — Preceding unsigned comment added by 2601:647:4680:EE80:DDC1:3B49:4077:FDB0 (talk) 17:24, 18 March 2018 (UTC)

Sparse data definition nonsensical
In the intro, we have "That is, it provides a fault-tolerant way of storing large quantities of sparse data (small amounts of information caught within a large collection of empty or unimportant data, such as finding the 50 largest items in a group of 2 billion records, or finding the non-zero items representing less than 0.1% of a huge collection)."

This is not making any sense. The parenthesized clause purports to define "sparse data", but only talks about queries to perform on the data. I don't know what an operation of finding 50 largest items means in terms of sparsity. Clearly the other items are not nothing and must be stored, it is only the particular query determining what is important, and I could have easily asked for the smallest 50, or all of those of a particular size. And again, finding non-zero items, this is about querying and indexing, but presumably the rest of the data is important and could be queried from as well. So there's nothing "sparse" here. A prototypical case of real sparse data is a multi-dimensional array with mostly 0's, where you can use a compact encoding method to store the nonzero entries and their locations, dispensing with storing the 0's at all. If HBase is doing something like this with other types of data (e.g. text JSON) it would be more informative to describe that. 213.239.66.194 (talk) 09:19, 29 August 2023 (UTC)