User:LI AR/Books/Cracking the DataScience Interview

Basic Stuff To Know

 * Generic pages
 * Glossaire_de_l%27exploration_de_donn%C3%A9es
 * Big_data


 * Inspired from books like:
 * "A collection of Data Science Interview Questions Solved in Python and Spark vol I & II"
 * "120 real data science interview questions"


 * Tips / Known Limits of DS
 * DataScience is (very) experimental (Andrew Ng): https://pbs.twimg.com/media/CBXshmjWgAAgLKa.jpg
 * Overfitting
 * Bias%E2%80%93variance_tradeoff / http://www.ritchieng.com/machinelearning-learning-curve/
 * Sampling_bias
 * Survivorship_bias
 * Selection_bias
 * Concept_drift
 * Correlation_does_not_imply_causation
 * Curse_of_dimensionality


 * https://www.kaggle.com/wiki/Leakage
 * http://machinelearningmastery.com/data-leakage-machine-learning/
 * http://feedproxy.google.com/~r/blogspot/wCeDd/~3/EJB0G6BrbsU/solving-ill-posed-inverse-problems.html
 * Neural Networks
 * Vanishing_gradient_problem


 * Machine Learning definition and types
 * Artificial_intelligence
 * List_of_machine_learning_concepts
 * Machine_learning
 * Data_mining
 * Knowledge_extraction
 * Knowledge_extraction
 * Pattern_recognition
 * Signal_processing
 * Supervised_learning
 * Semi-supervised_learning
 * Unsupervised_learning
 * Reinforcement_learning
 * Online_machine_learning
 * Incremental_learning
 * Q-learning
 * One-shot_learning / https://www.quora.com/What-is-zero-shot-learning
 * Feature_learning
 * Learning_to_rank
 * Similarity_learning
 * Biclustering
 * Natural_language_processing
 * Biomimetics
 * Collective_intelligence
 * Data_stream_mining
 * Sequential_pattern_mining
 * Clickstream
 * Semantics
 * Semantic_Web
 * Speech_recognition
 * Speech_synthesis
 * Collaborative_filtering


 * Competitions
 * https://www.kaggle.com/
 * https://www.datascience.net/fr/home/
 * http://dreamchallenges.org/
 * https://www.drivendata.org/competitions/
 * https://www.testdome.com/tests/data-analysis-test/65
 * http://www.crowdanalytix.com/
 * https://www.topcoder.com/community/data-science/
 * https://www.datasciencechallenge.org/
 * http://tunedit.org/challenges
 * https://datasciencebowl.com/competitions/
 * https://www.innocentive.com/ar/challenge/browse
 * http://tamids.tamu.edu/2018-tamids-data-science-competition/
 * https://hackerearth.com


 * Datasets
 * List_of_datasets_for_machine_learning_research


 * https://www.analyticsvidhya.com/blog/2018/03/comprehensive-collection-deep-learning-datasets/
 * http://www.kdnuggets.com/datasets/index.html
 * https://aws.amazon.com/public-datasets/
 * https://www.kaggle.com/datasets
 * https://data.fivethirtyeight.com
 * https://www.quandl.com/
 * https://opendata.socrata.com/
 * https://cloud.google.com/bigquery/public-data/
 * https://github.com/BuzzFeedNews
 * https://en.wikipedia.org/wiki/Wikipedia:Database_download
 * http://mlr.cs.umass.edu/ml/datasets.html
 * https://data.world/
 * https://www.data.gov/
 * https://www.data.gouv.fr/fr/
 * https://data.worldbank.org/
 * https://www.reddit.com/r/datasets/top/?sort=top&t=all
 * http://academictorrents.com/browse.php?cat=6
 * http://www.kdnuggets.com/2015/04/awesome-public-datasets-github.html
 * http://www.kdnuggets.com/?s=datasets
 * https://www.springboard.com/blog/free-public-data-sets-data-science-project/
 * https://www.dataquest.io/blog/free-datasets-for-projects/
 * https://github.com/awesomedata/awesome-public-datasets
 * https://elitedatascience.com/datasets
 * https://blog.journeyofanalytics.com/50-free-datasets-for-data-science-projects/
 * https://www.datascienceweekly.org/data-science-resources/data-science-datasets


 * Software
 * http://www.databaseetl.com/data-mining-tools/
 * IDEs / DS-GUI
 * R
 * (DS-GUI) :Rattle_GUI http://rattle.togaware.com/
 * (IDE) :RStudio https://www.rstudio.com
 * Python
 * (DS-GUI) :Orange_(software) https://orange.biolab.si/
 * (IDE) :Project_Jupyter https://jupyterlab.readthedocs.io
 * Java
 * (DS-GUI) :Weka_(machine_learning) http://www.cs.waikato.ac.nz/ml/weka/
 * (IDE) :IntelliJ_IDEA https://www.jetbrains.com/idea/ https://github.com/JetBrains/intellij-community
 * (IDE) :Eclipse_(software) https://www.eclipse.org/ https://git.eclipse.org/c/
 * Online
 * DEAD http://www.gamifiedonlineweka.ga/
 * Paid Software
 * (DS-GUI) :Minitab https://minitab.com/
 * (DS-GUI) :Tableau_Software https://www.tableau.com/
 * R/Packages
 * https://cran.r-project.org/
 * https://cran.r-project.org/web/views/
 * https://cran.r-project.org/web/views/MachineLearning.html
 * https://cran.r-project.org/web/views/Bayesian.html
 * https://cran.r-project.org/web/views/Cluster.html
 * https://cran.r-project.org/web/views/NaturalLanguageProcessing.html
 * https://cran.r-project.org/web/views/Survival.html
 * https://cran.r-project.org/web/views/TimeSeries.html
 * Python
 * https://www.python.org/
 * :Scikit-learn http://scikit-learn.org/stable/
 * C++
 * https://orange.biolab.si/
 * Alteryx
 * https://www.alteryx.com/ [Commercial]
 * Comparison
 * http://onlinelibrary.wiley.com/wol1/doi/10.1002/widm.1204/full
 * DeepLearning
 * https://www.tensorflow.org/
 * http://www.deeplearning.net/software/theano/
 * http://mxnet.io/
 * http://caffe.berkeleyvision.org/
 * https://github.com/NervanaSystems/neon
 * GANs (Generative Adversial Networks)
 * https://github.com/hindupuravinash/the-gan-zoo
 * https://github.com/GKalliatakis/Delving-deep-into-GANs
 * https://github.com/nashory/gans-awesome-applications
 * DataViz
 * https://matplotlib.org/
 * https://plot.ly/
 * :GGobi http://www.ggobi.org/
 * http://ggplot2.org/
 * http://ggvis.rstudio.com/
 * https://d3js.org/
 * https://datascienceplus.com/creating-graphs-with-python-and-goopycharts/
 * https://www.tableau.com/ [Commercial]
 * http://bokeh.pydata.org/en/latest/ [Python]
 * http://pyqtgraph.org/ [Python]
 * https://uber.github.io/deck.gl [Uber's internal DataViz tool]
 * http://rawgraphs.io/
 * http://scidavis.sourceforge.net/
 * http://home.gna.org/veusz/
 * http://jwork.org/dmelt/
 * Graphs
 * https://gephi.org/
 * http://www.graphviz.org/
 * http://www.cytoscape.org/
 * GUI
 * https://www.rstudio.com/products/shiny/


 * Data Manipulation
 * Annotate examples: https://prodi.gy/
 * Data_pre-processing
 * Data_cleansing
 * Data_reduction
 * Data_wrangling
 * Data_scrubbing
 * Data_editing
 * Data_scraping
 * Data_curation
 * Data_pre-processing
 * Data_fusion
 * Data_integration
 * Data_binning
 * Sanitization_(classified_information)
 * Extract,_transform,_load
 * Imputation_(statistics)
 * Interpolation
 * Outlier


 * https://github.com/Quartz/bad-data-guide
 * https://en.wikipedia.org/wiki/Oversampling_and_undersampling_in_data_analysis
 * Local_case-control_sampling
 * Sampling_(statistics)
 * Sampling_(statistics)
 * Stratified_sampling
 * Jackknife_resampling
 * Oversampling_and_undersampling_in_data_analysis
 * Oversampling_and_undersampling_in_data_analysis
 * AdaBoost


 * "Essay Why Most Published Research Findings Are False"
 * http://robotics.cs.tamu.edu/RSS2015NegativeResults/pmed.0020124.pdf
 * "A Few Useful Things to Know about Machine Learning"
 * https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf
 * Working with text
 * Unicode_equivalence
 * URL_normalization
 * Text_segmentation
 * N-gram
 * Tokenization_(lexical_analysis)
 * Stemming
 * Word2vec https://www.tensorflow.org/tutorials/word2vec

https://github.com/explosion/thinc
 * https://google.github.io/seq2seq/
 * NLP in Python
 * Working with spatial data
 * Spatial_data
 * Trend_surface_analysis
 * Variogram
 * Geary%27s_C
 * Moran%27s_I
 * Spatial_descriptive_statistics


 * Signal processing
 * Dynamic_time_warping


 * Signal processing - Images
 * Normalization_(image_processing)
 * Normalized_frequency_(unit)
 * Image_segmentation


 * Techniques for Feature/Attribute Selection/Dimensionality Reduction
 * High-dimensional_statistics
 * Dimensionality_reduction
 * Factor_analysis
 * Principal_component_analysis
 * Independent_component_analysis
 * Singular_value_decomposition
 * Multidimensional_scaling
 * T-distributed_stochastic_neighbor_embedding
 * Autoencoder
 * Deep_learning
 * Elastic_map
 * Linear_discriminant_analysis


 * Signal processing
 * Compressed_sensing


 * Working with spatial data
 * Spatial_analysis
 * Spatial_analysis


 * Maths (Stats / Algebra)
 * Inspiration for this section: https://github.com/soulmachine/machine-learning-cheat-sheet
 * Pseudo-random_number_sampling
 * Glossary_of_probability_and_statistics
 * Bijection,_injection_and_surjection
 * Mean
 * Harmonic_mean
 * Median
 * Mode_(statistics)
 * Range_(mathematics)
 * Quartile
 * Interquartile_range
 * Variance
 * Covariance
 * Standard_deviation
 * Collinearity
 * ANOVA
 * ANCOVA
 * MANOVA
 * ANORVA
 * Moving_average
 * EWMA_chart
 * Exponential_smoothing


 * https://stats.stackexchange.com/questions/100019/window-models-in-stream-data-processing
 * Autoregressive_model
 * Autoregressive%E2%80%93moving-average_model
 * Autoregressive_integrated_moving_average
 * Autocorrelation
 * Cross-correlation
 * Entropy_in_thermodynamics_and_information_theory
 * Moment_(mathematics)
 * Residual
 * Expected_value
 * Likelihood_function
 * Cumulative_distribution_function
 * Probability
 * Probability_mass_function
 * Probability_density_function
 * Prior_probability
 * Prior_knowledge_for_pattern_recognition
 * Permutation https://fr.wikipedia.org/wiki/Arrangement
 * Combination https://fr.wikipedia.org/wiki/Combinaison_(math%C3%A9matiques)
 * Dependent_and_independent_variables
 * Independence_(probability_theory)
 * Hoeffding%27s_inequality
 * Pareto_efficiency
 * Nash_equilibrium
 * Pareto_principle
 * Tensor
 * Tensor_product
 * Cross_product
 * Taxicab_geometry
 * Norm_(mathematics)
 * Lp_space
 * Norm_(mathematics)
 * Determinant
 * Trace_(linear_algebra)
 * Eigenvalues_and_eigenvectors
 * Projection_(mathematics)
 * Curvature
 * Convolution
 * Hadamard_product_(matrices)
 * Kernel_(statistics)
 * Radial_basis_function
 * Logit
 * Latent_variable
 * Inference
 * Statistical_inference
 * Inductive_reasoning
 * Deduction_and_induction
 * Transduction_(machine_learning)
 * Stochastic
 * Stochastic_process
 * Probability_theory
 * Probability
 * Posterior_probability
 * Statistic
 * Statistics
 * Gaussian_noise
 * Bayesian_inference
 * Bayes_rule
 * Bayes%27_theorem


 * https://www.analyticsvidhya.com/blog/2017/03/conditional-probability-bayes-theorem/
 * Bayesian_network
 * Naive_Bayes_spam_filtering
 * Naive_Bayes_classifier
 * Belief_propagation
 * Loss_function
 * Regularization_(mathematics)
 * Normalization_(statistics)
 * Quantile_normalization
 * Nystr%C3%B6m_method (+PCA)
 * Preference_(economics)
 * Delaunay_triangulation
 * Neighbourhood_(mathematics)


 * Genetic Algorithms
 * Mutation_(genetic_algorithm)
 * Crossover_(genetic_algorithm)
 * Selection_(genetic_algorithm)
 * Fitness_function
 * Utility


 * SVM
 * Kernel_method
 * Kernel_(image_processing)
 * Kernel_(statistics)


 * Neural Networks
 * Rectifier_(neural_networks)
 * Backpropagation
 * Gradient
 * Gradient_descent
 * Stochastic_gradient_descent
 * Gradient_boosting


 * http://www.wildml.com/deep-learning-glossary/#gradient-clipping
 * http://www.wildml.com/deep-learning-glossary/#batch-normalization
 * http://www.wildml.com/deep-learning-glossary/#backpropagation
 * http://www.wildml.com/deep-learning-glossary/#momentym
 * http://www.wildml.com/deep-learning-glossary/#sgd
 * https://visualstudiomagazine.com/articles/2015/07/01/variation-on-back-propagation.aspx
 * Softmax_function


 * Softmax is a "discriminant learning metric": examples for all classes!={i} help learn even for class {i} since sum of evaluations is forced to be 1 (the method creates a link in the evaluations of the classes)
 * Sigmoid_function
 * Hyperbolic_function
 * Dropout_(neural_networks)
 * Radial_basis_function
 * Hebbian_theory


 * Signal processing
 * Signal_processing
 * Low-pass_filter
 * High-pass_filter
 * Energy_(signal_processing)
 * Fast_Fourier_transform
 * Wavelet
 * Discrete_wavelet_transform
 * Coherence_(signal_processing)
 * Kalman_filter


 * Time Series
 * Time_series
 * Decomposition_of_time_series
 * Seasonal_adjustment
 * Seasonality
 * Frequency_domain
 * Time_domain
 * Spectral_density


 * https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/
 * https://www.analyticsvidhya.com/blog/2015/12/complete-tutorial-time-series-modeling/
 * Games
 * Game_theory
 * A*_search_algorithm
 * Minimax
 * Multi-armed_bandit
 * Zero-sum_game


 * Distances
 * Distance
 * Euclidean_distance [dim1]
 * Edit_distance
 * Hamming_distance
 * Manhattan_distance [dim1]
 * Levenshtein_distance
 * Needleman–Wunsch_algorithm
 * Minkowski_distance [dim n == generalization]
 * Mahalanobis_distance
 * Canberra_distance
 * Distance_correlation
 * Angular_distance
 * String_metric
 * Jaro%E2%80%93Winkler_distance
 * Jaccard_index
 * Kendall_tau_distance
 * Chebyshev_distance
 * Tf%E2%80%93idf
 * Neural_coding


 * For graphs: http://blog.smola.org/post/33412570425
 * https://fr.wikipedia.org/wiki/Algorithme_de_Needleman-Wunsch
 * Clouds
 * Hausdorff_distance [between clouds of points, a point and a cloud]
 * Distance


 * Distributions
 * https://blog.cloudera.com/blog/2015/12/common-probability-distributions-the-data-scientists-crib-sheet/
 * Discrete_uniform_distribution
 * Normal_distribution
 * Bernoulli_distribution
 * Binomial_distribution
 * Poisson_distribution
 * Chi-squared_distribution
 * Log-normal_distribution
 * Pareto_distribution
 * Chi-squared_distribution
 * Gibbs_distribution
 * Weibull_distribution
 * Gamma_distribution
 * Beta_distribution
 * Hypergeometric_distribution
 * Dirac_delta_function


 * https://ercim-news.ercim.eu/en107/special/robust-and-adaptive-methods-for-sequential-decision-making [Characterization of the simplicity of a distribution: BernsteinExponent+TsybakovMarginCondition]


 * Evaluation
 * Performance_indicator
 * Mean_absolute_percentage_error
 * Mean_absolute_scaled_error
 * Symmetric_mean_absolute_percentage_error
 * Regression-kriging


 * https://www.kaggle.com/wiki/RootMeanSquaredLogarithmicError
 * http://weka.sourceforge.net/packageMetaData/percentageErrorMetrics/index.html
 * http://weka.sourceforge.net/packageMetaData/logarithmicErrorMetrics/index.html
 * Information_gain_ratio
 * Kullback%E2%80%93Leibler_divergence
 * Gini_coefficient
 * Pearson_correlation_coefficient
 * Entropy

http://www.cbcb.umd.edu/~salzberg/docs/murthy_thesis/node15.html
 * Akaike_information_criterion https://twitter.com/DataSciFact/status/963129411250933760
 * Bayesian_information_criterion
 * Brier_score == RMSE
 * Structural_similarity
 * Type_I_and_type_II_errors
 * False_positive_rate
 * False_coverage_rate
 * False_discovery_rate
 * Confusion_matrix
 * Accuracy_and_precision
 * Precision_and_recall
 * F1_score
 * Sensitivity_and_specificity
 * Receiver_operating_characteristic
 * Receiver_operating_characteristic
 * Discounted_cumulative_gain
 * Cross-validation_(statistics)
 * Errors_and_residuals


 * If residual is consistantly >0 or <0 on a range of the training set => the model has failed to capture something in the data or we use wrong type of model (e.g. linear reg on parabolic data; DataSkeptic/Heteroskedasticity)
 * Heteroscedasticity


 * Clustering
 * Dunn_index
 * Rand_index
 * Jaccard_index


 * See also the Calinski-Harabasz Index: http://stats.stackexchange.com/questions/97429/intuition-behind-the-calinski-harabasz-index
 * Silhouette_(clustering)


 * Others
 * Item_response_theory
 * BLEU


 * http://www.sthda.com/english/articles/29-cluster-validation-essentials/96-determining-the-optimal-number-of-clusters-3-must-know-methods/#elbow-method


 * Working with Text
 * Part_of_speech
 * Semantic_similarity
 * Tf%E2%80%93idf
 * Cosine_similarity
 * Okapi_BM25


 * See also Mr Gomez page on Weka: http://www.esp.uem.es/jmgomez/tmweka/
 * Named-entity_recognition
 * Conditional_random_field
 * Latent_Dirichlet_allocation
 * Sentiment_analysis
 * Web_mining
 * Web_crawler
 * Text_mining
 * Document_classification
 * Automatic_summarization


 * Working with Images
 * http://mirror.imagej.net/plugins/mexican-hat/index.html
 * If your model seeks to penalize near misses, the Mexican hat function is a good choice.

https://en.wikipedia.org/wiki/YAGO_%28database%29 http://wiki.dbpedia.org/ http://conceptnet.io/ http://cogcomp.org/Data/QA/QC/definition.html
 * Working with concepts (Ontologies)


 * Visualization
 * Data_visualization
 * Exploratory_data_analysis
 * List_of_graphical_methods
 * Category:Statistical_charts_and_diagrams
 * Statistical_graphics
 * Visual_perception
 * Heat_map
 * Misleading_graph
 * Pareto_chart


 * Need to develop "critical thinking":
 * https://www.nytimes.com/column/whats-going-on-in-this-graph
 * https://www.nytimes.com/column/learning-whats-going-on-in-this-picture


 * (Statistical) tests
 * A/B_testing


 * Evaluating an hypothesis
 * Statistical_power
 * Statistical_hypothesis_testing
 * P-value
 * Student%27s_t-test
 * Chi-squared_test
 * Type_I_and_type_II_errors


 * Detecting abrupt changes in time series
 * Stationary_process
 * Structural_break
 * Chow_test
 * Kruskal%E2%80%93Wallis_one-way_analysis_of_variance
 * F-test
 * F-statistics
 * Pairwise_summation
 * CUSUM


 * MOSUM: https://cran.r-project.org/web/packages/strucchange/vignettes/strucchange-intro.pdf
 * Time series / Chaos
 * Lyapunov_exponent
 * Kolmogorov_complexity


 * Machine Learning Techniques
 * Statistical_classification
 * One-class_classification
 * Binary_classification
 * Multiclass_classification
 * Multi-label_classification
 * Structured_prediction
 * Cluster_analysis
 * Elbow_method_(clustering)
 * Nearest_neighbor_search
 * Regression_analysis
 * Linear_regression
 * Logistic_regression
 * Ridge_regression
 * Kriging
 * Multivariate_adaptive_regression_splines
 * Association_rule_learning
 * Apriori_algorithm
 * Survival_analysis
 * Monte_Carlo_method
 * Monte_Carlo_algorithm
 * Multinomial_logistic_regression
 * Lasso_(statistics)
 * Expectation%E2%80%93maximization_algorithm
 * Markov_chain_Monte_Carlo
 * Hidden_Markov_Models
 * Viterbi_algorithm
 * Convolutional_code
 * Forward–backward_algorithm
 * Markov_random_field
 * Mean_field_theory
 * Mean_field_particle_methods
 * CART
 * Decision_tree_learning
 * Decision_tree
 * Pruning_(decision_trees)
 * ID3_algorithm
 * C4.5_algorithm
 * Random_forest
 * Support_vector_machine
 * Support_vector_machine
 * Support_vector_machine
 * Conditional_random_field
 * Latent_semantic_analysis
 * Genetic_algorithm
 * Evolutionary_algorithm
 * Evolutionary_computation
 * Voronoi_diagram
 * Local_outlier_factor
 * Ordered_weighted_averaging_aggregation_operator
 * Support_vector_machine


 * Neural Networks
 * History: http://www.chronicle.com/article/The-Believers/190147/
 * The various types of NN as a picture: http://www.asimovinstitute.org/wp-content/uploads/2016/09/neuralnetworks.png
 * Types_of_artificial_neural_networks
 * Comparison_of_deep_learning_software/Resources
 * Artificial_neural_network
 * Perceptron
 * Feedforward_neural_network
 * Multilayer_perceptron
 * Radial_basis_function_network
 * Long_short-term_memory
 * SNNS
 * Time_delay_neural_network
 * Recursive_neural_network
 * Recurrent_neural_network
 * Hopfield_network
 * Content-addressable_memory
 * Boltzmann_machine
 * Self-organizing_map
 * Learning_vector_quantization
 * Long_short-term_memory
 * Liquid_state_machine
 * Autoassociative_memory
 * Convolutional_neural_network
 * Autoencoder
 * Neuroevolution
 * Neuroevolution_of_augmenting_topologies
 * Deep_learning
 * Deep_learning
 * Deep_belief_network
 * Generative_adversarial_networks


 * https://stackoverflow.com/questions/4752626/epoch-vs-iteration-when-training-neural-networks
 * Neural_Turing_machine


 * http://spinningbytes.com/demos/
 * Early_stopping
 * ADALINE
 * Memristor
 * Instantaneously_trained_neural_networks
 * Spiking_neural_network


 * Signal Processing
 * Optical_character_recognition


 * Fuzzy Logic
 * Fuzzy_logic
 * Inference_engine
 * Fuzzy_logic
 * Type-2_fuzzy_sets_and_systems
 * T-norm_fuzzy_logics
 * Adaptive_neuro_fuzzy_inference_system
 * Fuzzy_control_system


 * Working with spatial data
 * Spatial_association


 * Ensemble Techniques
 * Weak learner: https://stats.stackexchange.com/questions/82049/what-is-meant-by-weak-learner#82063
 * Ensemble_learning
 * Ensembles_of_classifiers
 * Ensemble_learning


 * Ensemble Learning = Boosting, Bagging or Stacking: http://stats.stackexchange.com/questions/18891/bagging-boosting-and-stacking-in-machine-learning#19053
 * Applying Bagging should help reduce variance and overfitting.
 * Bootstrap_aggregating
 * Boosting_(machine_learning)
 * Gradient_boosting
 * Committee_machine


 * Applications
 * Bayesian_spam_filtering
 * Root_cause_analysis
 * Inpainting


 * https://github.com/phillipi/pix2pix
 * https://www.youtube.com/user/keeroyz
 * Chatbots
 * Personality
 * https://en.wikipedia.org/wiki/Big_Five_personality_traits


 * Experimentation framework
 * Goal: test various parameters on various algorithms to determine the best model(s)
 * Weka's "Experimenter" mode: http://weka.sourceforge.net/manuals/ExplorerGuide.pdf
 * AutoWeka: http://www.cs.ubc.ca/labs/beta/Projects/autoweka/
 * R::mlrMBO: https://github.com/mlr-org/mlrMBO


 * Coding / Exposing API to the rest of the application
 * Microservices


 * BigData
 * Data_lake
 * Streaming_algorithm
 * Star_schema
 * OLAP_cube
 * Solid-state_drive
 * MongoDB


 * Map-Reduce framework
 * Apache_Hadoop https://hadoop.apache.org/


 * Scrapping
 * Apache_Flume http://flume.apache.org/


 * Storage
 * Apache_Hadoop https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html
 * Apache_HBase http://hbase.apache.org/
 * Apache_Hive https://hive.apache.org/


 * Transfers - to/from RelationalDB
 * Sqoop http://sqoop.apache.org/


 * Transfers - serialization/streaming
 * Apache_Avro http://avro.apache.org/
 * Apache_Kafka https://kafka.apache.org/


 * Storage - In memory
 * Apache_Spark https://spark.apache.org/
 * Apache_Flink http://flink.apache.org/


 * Admin
 * Apache_ZooKeeper http://zookeeper.apache.org/
 * Apache_Cassandra https://cassandra.apache.org
 * Ambari http://ambari.apache.org/
 * Apache_Oozie http://oozie.apache.org/


 * Programming
 * Pig_(programming_tool) https://pig.apache.org/


 * ML
 * Apache_Mahout http://mahout.apache.org/
 * Apache_SystemML http://systemml.apache.org/


 * Working with text
 * Apache_Lucene
 * Elasticsearch https://www.elastic.co/


 * Working with text - Data Viz
 * Kibana https://www.elastic.co/products/kibana


 * Small/Micro Data
 * https://arxiv.org/abs/1610.00946
 * Small_data


 * Multi-Agent Systems
 * Agent-based_model
 * Multi-agent_system
 * Agent-oriented_software_engineering

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.47.7968&rep=rep1&type=pdf [YDemazeau: Vowels Methodology]
 * https://www.researchgate.net/publication/266182243_Agent_Groupe_Role_et_Service_Un_modele_organisationnel_pour_les_systemes_multi-agents_ouverts [JFerber: AGR Methodology]
 * Ant_colony_optimization_algorithms


 * Quantum Machine Learning
 * Quantum_machine_learning
 * Quantum_tunnelling
 * Quantum_annealing
 * Adiabatic_quantum_computation


 * Resources
 * http://www.wildml.com/deep-learning-glossary/
 * http://deeplearning.net
 * https://www.datacamp.com
 * http://www.learnpython.org
 * https://www.codecademy.com/learn/python
 * http://www.dataschool.io/how-to-get-better-at-data-science/
 * http://simplystatistics.org/2015/03/17/data-science-done-well-looks-easy-and-that-is-a-big-problem-for-data-scientists/
 * Social network for DataScientists
 * https://data.world/

https://github.com/janishar/mit-deep-learning-book-pdf https://web.stanford.edu/~hastie/ElemStatLearn/printings/ESLII_print10.pdf http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Seventh%20Printing.pdf http://infolab.stanford.edu/~ullman/mmds/booka.pdf http://www.guidetodatamining.com/assets/guideChapters/Guide2DataMining.pdf https://github.com/ajaymache/machine-learning-yearning
 * Books
 * Free Books
 * http://probmods.org/
 * http://www.thebiganalytics.com/
 * https://www.deeplearningbook.org/
 * http://neuralnetworksanddeeplearning.com/
 * http://deeplearning.net/tutorial/deeplearning.pdf
 * https://cours.etsmtl.ca/sys843/REFS/Books/ebook_Haykin09.pdf
 * http://hagan.ecen.ceat.okstate.edu/nnd.html
 * http://www.dkriesel.com/en/science/neural_networks
 * https://torres.ai/research-teaching/tensorflow/first-contact-with-tensorflow-book/
 * https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/DeepLearning-NowPublishing-Vol7-SIG-039.pdf
 * http://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/
 * http://www.greenteapress.com/thinkstats/thinkstats.pdf
 * http://www.greenteapress.com/thinkbayes/thinkbayes.pdf
 * http://www.greenteapress.com/thinkpython/thinkpython.pdf
 * http://r4ds.had.co.nz/
 * https://web.stanford.edu/~hastie/Papers/ESLII.pdf
 * http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Sixth%20Printing.pdf
 * https://web.stanford.edu/~hastie/CASI_files/PDF/casi.pdf
 * http://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning/
 * https://www.cs.cornell.edu/jeh/book.pdf
 * http://infolab.stanford.edu/~ullman/mmds/book.pdf
 * http://www.guidetodatamining.com/
 * http://www.mlyearning.org/
 * Paid Books
 * "Artificial Intelligence for Humans, Volume 1: Fundamental Algorithms", Jeff Heaton, 2013, ISBN:9781493682225
 * "Artificial Intelligence for Humans, Volume 2: Nature-Inspired Algorithms", Jeff Heaton, 2014, ISBN: 978-1499720570
 * "Artificial Intelligence for Humans, Volume 3: Deep Learning and Neural Networks", Jeff Heaton, 2015, ISBN: 978-1505714340
 * "Introduction to Machine Learning (Adaptive Computation and Machine Learning)", E. Alpaydin, MIT Press, 2004, ISBN: 978-0262012430
 * "Machine Learning: An Artificial Intelligence Approach", R.S. Michalski, J.G. Carbonell, T.M. Mitchell, Symbolic Computation, 1983, ISBN:978-3540132981
 * "A collection of Data Science Interview Questions Solved in Python and Spark vol I & II", Antonio Gulli, CreateSpace, 2015, ISBN:978-1517216719
 * "Artificial Intelligence a Modern Approach", Stuart Russell and Peter Norvig, Prentice Hall, 1995, ISBN:978-0131038059
 * "An Introduction to MultiAgent Systems", Michael Wooldridge, John Wiley & Sons, 2009 (2nd ed), ISBN:978-0470519462
 * "Data Mining: Practical Machine Learning Tools and Techniques", Ian H. Witten, Eibe Frank, Mark A. Hall, Christopher J. Pal, Morgan Kaufmann, ISBN:978-0128042915
 * "Agent Intelligence Through Data Mining", Andreas L. Symeonidis, Pericles A. Mitkas, Springer/Apress, ISBN:978-0387257570
 * "Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence", Gerhard Weiss, 2000, ISBN:978-0262232036
 * "Data science at the command line", Janssens, O'Reilly.
 * Also look for MachineLearning, DeepLearning, Spark, Mahout, R, Python, SciKit-Learn, Data/Text Mining, ElasticSearch, Natural Language, Statistics @ O'Reilly, Packt, Manning/In Action, HeadFirst
 * Lists of good books
 * https://www.kdnuggets.com/2015/09/free-data-science-books.html
 * https://www.kdnuggets.com/2017/04/10-free-must-read-books-machine-learning-data-science.html
 * https://www.learndatasci.com/free-data-science-books/
 * http://www.wzchen.com/data-science-books


 * News/Blogs/RSS
 * https://blog.acolyer.org/
 * https://www.reddit.com/r/machinelearning
 * https://www.reddit.com/r/statistics
 * https://www.reddit.com/r/datascience
 * https://www.reddit.com/r/bigdata
 * http://www.kdnuggets.com/
 * http://www.becomingadatascientist.com/
 * https://rdatamining.wordpress.com/
 * http://www.r-bloggers.com/
 * https://dataaspirant.com/
 * http://www.joyofdata.de/blog/
 * https://www.dataiku.com/blog/
 * https://www.datacamp.com/community/
 * http://beautifuldata.net/
 * http://www.datatau.com/news
 * http://dataelixir.com/
 * http://www.oreilly.com/data/newsletter.html
 * http://blog.kaggle.com/
 * http://blog.yhathq.com/
 * http://simplystatistics.org/
 * http://fastml.com/
 * http://www.win-vector.com/blog/
 * http://fivethirtyeight.com/
 * http://www.dataschool.io/
 * https://research.facebook.com/blog/datascience/
 * http://deeplearning.net/feed/
 * http://learningwithdata.com/
 * http://blog.plot.ly/
 * https://datasciencelab.wordpress.com/
 * https://shapeofdata.wordpress.com/
 * http://datalab.lu/
 * http://www.pythonweekly.com/
 * http://pbpython.com/
 * https://plus.google.com/communities/105141578068503684401 ( https://plus.google.com/+JaanaNystr%C3%B6m/posts/MKCV3vNsn1g )
 * http://blog.revolutionanalytics.com/2012/12/the-most-influential-data-scientists-on-twitter.html
 * http://www.kdnuggets.com/2012/12/most-influential-data-scientists-on-twitter.html
 * https://journal.r-project.org/


 * Podcasts
 * http://www.learningmachines101.com/
 * http://www.thetalkingmachines.com/
 * http://dataskeptic.com/
 * http://www.partiallyderivative.com/
 * http://www.ocdqblog.com/podcast/
 * http://blog.pivotal.io/podcasts-pivotal
 * https://www.udacity.com/podcasts/linear-digressions
 * http://datastori.es/
 * http://radar.oreilly.com/tag/oreilly-data-show-podcast
 * http://freakonomics.com/radio/freakonomics-radio-podcast-archive/
 * http://simplystatistics.org/category/podcast/
 * http://data-informed.com/multimedia/podcasts/
 * http://www.bbc.co.uk/programmes/p02nrss1


 * YT Channels
 * https://www.youtube.com/user/keeroyz
 * https://www.youtube.com/channel/UCWN3xxRkmTPmbKwht9FuE5A
 * https://www.youtube.com/channel/UCioEIe1o73G-oGR4b34E7Dg
 * https://www.youtube.com/channel/UCNIkB2IeJ-6AmZv7bQ1oBYg
 * https://www.youtube.com/channel/UC9LfrPNcIyHspci0t2W4T_w
 * https://www.youtube.com/channel/UCHBWJGoZMkhJyElgvuN1U1w
 * https://www.youtube.com/user/dataschool
 * https://www.youtube.com/channel/UCtY8JjMQpzYb5FFvUr2JnUw
 * https://www.youtube.com/channel/UCRhUp6SYaJ7zme4Bjwt28DQ
 * https://www.youtube.com/user/sentdex
 * https://www.youtube.com/user/DataScienceDojo


 * MOOCs
 * Generic
 * http://datasciencemasters.org/
 * https://arxiv.org/abs/1601.06862v1
 * Weka
 * http://www.cs.waikato.ac.nz/ml/weka/mooc/dataminingwithweka/
 * http://www.cs.waikato.ac.nz/ml/weka/mooc/moredataminingwithweka/
 * http://www.cs.waikato.ac.nz/ml/weka/mooc/advanceddataminingwithweka/
 * Andrew Ng
 * https://www.youtube.com/watch?v=UzxYlbK2c7E&list=PLJ_CMbwA6bT-n1W0mgOlYwccZ-j6gBXqE
 * Yann Lecun
 * https://www.college-de-france.fr/site/yann-lecun/course-2015-2016.htm
 * Ans Rosling (visualization)
 * https://www.youtube.com/results?search_query=ans+rosling
 * From renown Universities
 * https://www.coursera.org/specializations/jhu-data-science
 * https://www.coursera.org/specializations/machine-learning
 * https://www.coursera.org/specializations/data-science-python
 * https://www.coursera.org/specializations/big-data
 * https://www.coursera.org/learn/machine-learning
 * https://www.coursera.org/learn/r-programming
 * https://www.coursera.org/learn/data-scientists-tools
 * https://www.coursera.org/learn/python-data-analysis
 * http://www.holehouse.org/mlclass/
 * http://online.stanford.edu/course/statistical-learning
 * http://work.caltech.edu/telecourse.html
 * https://www.udacity.com/course/data-analyst-nanodegree--nd002
 * https://www.thinkful.com/courses/learn-data-science-online/
 * https://www.edx.org/course/introduction-computer-science-mitx-6-00-1x7
 * https://www.coursetalk.com/
 * https://github.com/justmarkham/DAT7#bonus-resources
 * http://datasciencemasters.org/
 * http://www.wolfram.com/broadcast/c?c=99
 * http://www.wolfram.com/broadcast/c?c=97
 * http://www.wolfram.com/broadcast/c?c=397
 * DataSchool
 * http://www.dataschool.io/learn/


 * Jobs
 * https://datajobs.com/
 * http://www.analytictalent.com/
 * http://www.kdnuggets.com/jobs/index.html
 * https://fr.hired.com/

http://edison-project.eu/edison/edison-data-science-framework-edsf
 * Teaching

https://github.com/search?utf8=%E2%9C%93&q=curated+list+awesome+frameworks&type= https://github.com/josephmisiti/awesome-machine-learning https://github.com/onurakpolat/awesome-bigdata https://github.com/onurakpolat/awesome-analytics https://github.com/analyticalmonk/awesome-neuroscience https://github.com/igorbarinov/awesome-data-engineering https://github.com/quantmind/awesome-data-science-viz https://github.com/fasouto/awesome-dataviz https://github.com/qinwf/awesome-R https://github.com/datascience-python/awesome-datascience-python https://github.com/caesar0301/awesome-public-datasets
 * Curated list of similar pages