User talk:Daturablack

Big Bata
Big data is the data collection that far beyond the capabilities of traditional database software tools to has a large scale to the acquisition, storage, management, analysis. There are four typical characteristics:  Vast amounts of data size,fast data transfer, a variety of data types and value density.

Characteristics

Volume:The size of the data determines the value and potential of the data under consideration

Variety: The type of content and data is variety.

Velocity: The speed at which the data is generated and processed

Variability: The process to deal with the effectively manage data

Veracity: The quality of the data

Complexity: The huge quantity of the data and the data comes from multiple sources.<br /

Big data every where Nowadays, all of us are able to get access to numerous information in various aspects due to the popularization of computer and the Internet. While we’re reaping the benefit of sharing and exchanging information, new problems emerge accordingly. The information explosion unfavorably leaves us a pile of data, which can hardly be sorted out. In fact, we are always caught in the mud of the overload of information and fail to distinguish what we really want from various information, which is time-consuming. Facing the tough situation to deal with information, new solution is launched, which is called the “bigdata”. It not only focuses on volume and variety, but also emphases on velocity and veracity. The appearance of the bigdata is of great importance and necessity. Only when data are processing and analyzing properly, can we take advantage of them in the true essence and then the information may have the true value. Thus, I’m looking forward for the coming age of Big Data.
 * sensors gatherlng information
 * site search
 * purchase/bank card transaction records
 * mobile devices
 * social media and networks
 * Digital satellite images

The importance and apply of big data:

Big data differs from traditional information in mind-binding ways; Not knowing why but only what. The challenger with leaderships is that it's very driven by gut instinct in most cases. Air travelers can now figure out which flights are likeliest to be on time because data scientist can tracked the decade of flight history correlated with weather patterns. Publisher use data from text analysis and social networks to give readers personalized news. Health care is one of the biggest opportunities, if we had electronic records of going back generations, we would know more about genetic propensities, correlation among symptoms, and how to individualize treatment.

Technology

Big Data is a recent technology hot spots, but the name can judge it is not a new word. After all, big is a relative concept. Historically, database, data warehouse, data marts and other information management technology, to a large extent also to solve the problem of large-scale data. Data warehouse known as the father of the Bill Inmon early in the 1990s, he often talks about the Big Data. However, Big Data as a proper noun becomes hot in recent years, largely due to the rapid development of Internet, cloud computing, mobile and Internet of Things. Ubiquitous mobile devices, RFID, wireless sensor every minute generating data, the amount of data hundreds of millions of users in an interactive Internet services always generate a huge amount of data to be processed is too huge, too fast growth, while business needs and competitive pressures on the real-time data processing, the effectiveness and put a higher demand, the traditional conventional techniques simply can not cope. In this case, the technical staff have developed and adopted a number of new technologies, including distributed cache, MPP-based distributed database, distributed file systems, various NoSQL distributed storage solutions. 10 years ago, Eric Brewer made his famous CAP theorem states: a distributed system can not satisfy consistency, availability and partition tolerance of these three requirements can only meet two. Focus system is different, not the same tactics employed. Only truly understand the needs of the system, it will be possible to make good use of CAP theorem. Architects generally have two directions to take advantage of CAP theory. Key-Value storage, such as Amazon Dynamo, etc., the flexibility to choose different tendencies theory according to CAP database products. Domain model + + memory distributed cache can be combined with their own projects customized programs based on flexible distributed CAP theory, but more difficult. For large sites, availability and partition tolerance takes precedence over data consistency, the general will try to move A, P orientation design, and then by other means to ensure the consistency of the business demand. Architects do not waste energy on how design can meet the three distributed system perfect, but should know how to choose. Different data consistency requirements are different. SNS website can tolerate inconsistency relatively long time without affecting the transaction and user experience; and like Alipay transaction and accounting data so it is very sensitive, generally can not tolerate more than inconsistent second grade.

Benefits


 * Column-oriented organizations are more efficient when an aggregate needs to be computed over many rows but only for a notably smaller subset of all columns of data, because reading that smaller subset of data can be faster than reading all data.
 * Column-oriented organizations are more efficient when new values of a column are supplied for all rows at once, because that column data can be written efficiently and replace old column data without touching any other columns for the rows.
 * Row-oriented organizations are more efficient when many columns of a single row are required at the same time, and when row-size is relatively small, as the entire row can be retrieved with a single disk seek.
 * Row-oriented organizations are more efficient when writing a new row if all of the column data is supplied at the same time, as the entire row can be written with a single disk seek.

NoSol

As the amount of data grows, more and more people are concerned about NoSQL, especially in the second half of 2010, Facebook choose HBase to do real-time messaging storage system, replacing the original system developed by Cassandra. This makes a lot of people are concerned about HBase. Facebook Select HBase is rarely based on short-term access to data and temporary data small quantities of these two long-term growth needs to be considered. HBase is a high-reliability, high-performance, column-oriented, scalable distributed storage system using HBase technology to build large-scale structured storage cluster on low-cost PC Server. HBase is an open source implementation of BigTable, use HDFS as its file storage system. Google BigTable running MapReduce to process the huge amounts of data, HBase HBase is also in use MapReduce to process massive amounts of data; BigTable Chubby use as a collaborative service, HBase will use Zookeeper as a counterpart.

December 2015
Welcome to Wikipedia and thank you for your contributions. I am glad to see that you are discussing a topic. However, as a general rule, talk pages such as Talk:Main Page are for discussion related to improving the article, not general discussion about the topic or unrelated topics. If you have specific questions about certain topics, consider visiting our reference desk and asking them there instead of on article talk pages. Thank you.Art LaPella (talk) 06:06, 7 December 2015 (UTC)