User:Kvdheeraj123/sandbox

DATA MINING: Data Mining is collecting a lot of data and warehoused. Just like Web data, E-commerce, purchased at grocery stores. These days computers have become cheaper and more powerful. From the kind of scientific view point, it transfers lot of data speed which can be used from many transactions worldwide. They also help scientists in many views. Most of all the data is hidden and can be used only sometimes. Certain names are more prevalent in certain US locations (O’Brien, O’Rurke, O’Reilly… in Boston area). Group together similar documents returned by search engine according to their context (e.g. Amazon rainforest, Amazon.com,) Origins of data mining: There were couple of origins where it draw ideas from machine learning, pattern recognition and database statistics. But traditional techniques were not useful because of the Lot of data spread everywhere in the space where to store the data. Data mining tasks which it does prediction and description methods classification, clustering, regression and deviation detection. Given a gathering of records (preparing set ). Every record contains a set of traits, one of the characteristics is the class. Discover a model for class trait as a capacity of the estimations of different characteristics. Objective: already concealed records ought to be relegated a class as precisely as could be allowed. A test set is utilized to focus the precision of the model. Normally, the given information set is isolated into preparing and test sets, with preparing set used to manufacture the model and test set used to accept it.

Applications: DIRECT MARKETING: Objective: Reduce expense of mailing by focusing on a set of buyers liable to purchase another wireless item. Approach: Utilize the information for a comparative item presented some time recently. We know which clients chose to purchase and which chose something else. This {buy, don't buy} choice structures the class property. Gather different demographic, way of life, and organization communication related data about all such clients. Sort of business, where they stay, the amount they gain, and so forth. Utilize this data as information ascribes to take in a classifier model. FRAUD DETECTING: Objective: Predict deceitful cases in Visa exchanges.

Approach:

Utilization Visa exchanges and the data on its record holder as traits. At the point when does a client purchase, what does he purchase, how regularly he pays on time, and so on,Mark past exchanges as misrepresentation or reasonable exchanges. This structures the class property. Take in a model for the class of the exchanges. Utilize this model to recognize misrepresentation by watching Visa exchanges on a record. DATA ANALYSIS: The concept of data analysis is to approach the informational and factual data to the research questions. Its like solving a problem keeping all the values together. Categories of data analysis are Narrative, Descriptive, Mathematical, telecommunication and others. Use sample information to explain/make abstraction of population “phenomena”. Used in non-parametric analysis (e.g. chi-square, t-test, 2-way anova) Principles of data Analysis: Goal of an analysis: * To explain cause-and-effect phenomena * To relate research with real-world event * To predict/forecast the real-world phenomena based on research * Finding answers to a particular problem * Making conclusions about real-world event based on the problem * Learning a lesson from the problem An analysis contains some aspects of scientific reasoning/argument: * Define * Interpret * Evaluate * Illustrate * Discuss * Explain * Clarify * Compare * Contrast An analysis must have four elements: * Data/information (what) * Scientific reasoning/argument (what?     who? where? how? what happens?) * Finding (what results?) * Lesson/conclusion Fundamental manual for information examination:
 * Association (e.g. σ1,2.3 = 0.75)
 * Tendency (left-skew, right-skew)
 * Causal relationship (e.g. if X, then, Y)
 * Trend, pattern, dispersion, range


 * "Investigate" NOT "describe"


 * Go over to research flowchart


 * Break down into exploration goals and exploration questions


 * Identify phenomena to be examined


 * Visualize the "normal" answers


 * Validate the answers with information

MATH CALCULATIONS: Measure of Dispersion and Variability (1) Range – the difference between the largest and smallest observations in a group of data (2) Variance – average of squared deviates of each observation from mean of observations in a group of data
 * Don't tell something not backed by information

References: