Talk:Data preprocessing

Context
Is this just about data pre-processing in a Machine Learning context (apparently the subject of the first reference) or should it also cover data pre-processing in a data mining context (apparently the subject of the second reference)? If it is just about the former, then I have some concerns about notability and self-promotion - given that the article was apparently created by one of the authors of the first reference. --RichardVeryard 03:02, 3 September 2007 (UTC)


 * Machine Learning and Data Mining coincide to a very great degree. So, it is about data set preprocessing in both contexts —Preceding unsigned comment added by 194.219.216.110 (talk) 22:54, 7 February 2009 (UTC)


 * A very questionable article. The reference to 'Data Preprocessing for Supervised Leaning' is even more questionable. The referred article has a spelling mistake in the title, several spelling and grammar mistakes in the abstract and by no means can be considered an overview or a landmark paper in the area. I think that the whole article should be deleted.

128.189.119.222 (talk) 01:26, 5 June 2010 (UTC)


 * I've only just spotted the anonymous reply to my original question. Machine learning and data mining are usually regarded as separate disciplines, and I note that an introduction about data mining has been added since I last looked at the article. However, the article still needs a lot of work. RichardVeryard (talk) 15:11, 8 June 2010 (UTC)

What is data preprocessing?
The articles fails to explain one thing: What data preprocessing actually is. The term should be defined in the very first sentence of the article. —Kri (talk) 17:38, 15 February 2017 (UTC)

Poor writing
The 'Data mining' and 'Semantic data preprocessing' sections are plagued with poor prose and grammar errors - "The reason why a user transforms existing files into a new one is because of many reasons", "Here is the idea [...]", "[...] it make sense [...]", "[...] quantifiers like true positives ,true negatives,False positives and false negatives [...]" (badly-placed commas),  "Later it was recognized, that for machine learning [...]"...

SpanishDuke (talk) 23:44, 10 December 2020 (UTC)

Multiple Changes
Hi all, I've just made multiple changes to the Data pre-processing (now Data Preprocessing, see Move talk page) in an attempt to improve the article. These have been separated out in order to make it easier to see the reasoning behind each change: happy to discuss if there are any issues but please avoid reverting all changes if possible. These changes include grammar and capitalisation fixes, the removal of duplicate or unnecessary information, and some reformatting. I've also added some tags for issues that should be addressed in the future. EditorOnOccasion (talk) 11:43, 7 August 2023 (UTC)