User:Kranthi206

HADOOP DATA DICTIONARY

1) Is it internal or external? From what system does it came from?

The data is external from the third party data source platform Kaggle, which was eventually extracted from Twitter network.

Kaggle is a platform for predictive modelling and analytics competitions in which statisticians and data miners compete to produce the best models for predicting and describing the datasets uploaded by companies and users. This crowdsourcing approach relies on the fact that there are countless strategies that can be applied to any predictive modelling task and it is impossible to know beforehand which technique or analyst will be most effective.

2) Is it going to change? What data are you going to use ?

The data is static, which won't get change as it is not the real time data. Precisely, it was complete real time data which was extracted from Twitter but there won't be any further changes going to happen for the complete project.

The data we are using is "How ISIS uses Twitter?" we gathered the data set which describes the list of users and the followers along with the content which had been tweeted using the Twitter platform. By analysing the data, we can fetch the ISIS supporters and predict the attack

3) Data Description (Describe the different data types?)