Talk:Universal approximation theorem

Deep variants
The two deep variants appear to be somewhat dubious. The citation is just a conference preceding, and it omits the proofs. If an expert knows any more reliable sources that would be ideal — Preceding unsigned comment added by Pabnau (talk • contribs) 01:47, 21 April 2019 (UTC)

Note on above comment: There is no issue with proceedings and they are peer-reviewed and carry heavyweight in the field, at times more than journals.

In the last paragraph of the introduction, the result of n+1 width on continuous convex functions is stated as an "improvement" over the result of n+4 width on Lebesgue-integrable functions. Without additional knowledge of those papers, it's not clear to me why this should be considered an "improvement", since continuous convex functions are a much more restrictive class, and the network width difference is not asymptotically significant. - anonymous commenter

Hanin and Sellke's work are an improvement over Lu et al's because their result applies to general continuous functions, not just convex ones; furthermore they work in the compact-convergence topology, rather than the L1 topology; furthermore they are narrower. This does still all apply to the ReLU activation function. There has also been another recent paper for general activation functions, but I'm an author on that so I don't know if it's acceptable for me to go around editing Wikipedia pages to mention it... (see also my discussion in 'Out of date', below.) 82.14.199.121 (talk) 13:40, 7 October 2019 (UTC)

''The deep section of the page is a mess. I have cleaned up the shallow section to make it more legible but maybe someone can pickup the slack on the second part... it really is badly written.''

Vague wording
The line "All Lebesgue integrable functions except for a zero measure set cannot be approximated by width-n ReLU networks" is confusing. What did the wikipedia editor mean to say? It's clearly not true that "All Lebesgue integrable functions cannot be approximated by width-n ReLU" networks, because if you take a given width-N ReLU network, it defines a Lebesgue integral function.

Maybe the original editor meant to say "Not all Lebesgue integrable functions can be approximated by width-n ReLU networks (approximation up to a set of zero measure)", or "There exists a Lebesgue integrable function that cannot be approximated by any width-n ReLU network (even up to a set of zero measure)." Lavaka (talk) 22:14, 16 August 2019 (UTC)

Out of date
This page is about twenty years out of date. A simpler more general form of the universal approximation theorem has been known since 1999 (http://www2.math.technion.ac.il/~pinkus/papers/acta.pdf). As mentioned in the discussion above, there's also some confusion about the deep variants of this theorem. I am happy to bring this page up to date as long as that's not breaking any Wikipedia rules about discussing one's own work? (As I have a paper on this topic myself; see also my comment in 'Deep variants', above.) 82.14.199.121 (talk) 13:40, 7 October 2019 (UTC)


 * I am not a seasoned wikipedian, so take my upvote cautiously. But 100% am in favor of this. I hope it it not a stale offer by now. Wikipedia allows paid editing of pages as long as the user discloses that they are paid. And usually there is a clear COI interest there. People just check their edits to ensure they are not being unfair. The Plain_and_simple_conflict_of_interest_guide recommends that people in your situation make suggestions in the talk page, if I am reading it correctly. But I think it would also allow you to make changes as long as they are reviewed and you specific COI is declared. I suggest copying the page to your sandbox and working on it there. Then asking for someone to review and give either suggestions or an OK for merging it in. This could be viewed as making a lot of really specific suggestions, which are the best kind. I would love to help with that if you are still interested. Themumblingprophet (talk) 12:54, 1 May 2020 (UTC)


 * The offer stands! (I'm the previously anonymous user from above.) I'll try and put something together, and then hopefully run it past you. My first time editing a Wikipedia article; very exciting. PatrickKidger (talk) 18:01, 10 June 2020 (UTC)
 * First of all, welcome to Wikipedia and thank you for your contribution! I certainly agree that the article needed an update as it lacked novel developments. On the other hand, I am not entirely happy with the changes, for several reasons:


 * In my opinion, you should avoid adding your own works to WP, as you may very likely be biased with respect to their importance. It is especially true, if your work is not yet published (even if it is accepted to some conference). Even if you think that it is elegant and simplifies the results, you should let others decide. If it is really that good, then it will be eventually added to WP. WP is not for advertising your work.


 * For similar reasons, we prefer secondary sources. For example, if a scientific book or university textbook surveys your theorem, then it is an indicator of its importance. It looks quite strange that you gave the same weight to your new result and to the classical well-cited formulation of the theorem (contained in dozens of scientific books). Moreover, you have simply removed other results, which is also not elegant.


 * I am a bit disappointed that you have deleted the proof of the original version of the theorem. In my opinion, it was very instructive and would be nice to have. The aim of WP is not to present the newest results (because there are tons of such results and it is hard to decide their importance), but to show standard, widespread formulations for a general audience. For example, the article on Hoeffding's inequality starts with the Bernoulli case, even though there are much more general formulations of the result.


 * Therefore, I am planning to put back the original formulation of the theorem with its proof, and move your contribution to another section. Cheers,  K œrte F a  {ταλκ}  10:59, 30 June 2020 (UTC)
 * PS: You have also removed the comment that the proofs are usually not constructive, which is a very important point and should be mentioned.  K œrte F a  {ταλκ}  11:05, 30 June 2020 (UTC)


 * @ K œrte F a  -- I appreciate your concern.


 * I completely appreciate one should avoiding adding one's own work - this is why I waited between preprint on arXiv (last year) and peer-reviewed publication (this year) to make any changes involving my work. The paper is not just accepted to some conference, it is both peer reviewed and published.


 * I certainly agree that in general it is instructive to present simple widespread formulations for a general audience! This was the primary purpose of my edit - the previous versions all make additional assumptions; current versions by contrast are much more straightforward. In particular I do not think the current state of having so many versions of the theorem proliferating in the article helps this matter. On the topic of sketchproofs, I agree that these can be instructive, and perhaps a sketchproof of Pinkus' version in particular is worth adding. I do not see the interest in including the sketchproof of Cybenko's result, which is far harder to follow than Pinkus' elegant proof.


 * I removed the comment on non-constructivity because it is false.


 * For the above reasons I would like to revert your edits, but I'd prefer to have a discussion about it here first. PatrickKidger (talk) 08:51, 1 July 2020 (UTC)


 * Thank you for your reply. (i) The fact that your conference paper is now published does not change the fact that it is your own work and so you are biased w.r.t. its importance. Also, being peer-reviewed is only a necessary, but not a sufficient condition to be included. There are several versions of this theorem, so which ones should we present? The general approach of WP is that we present the versions covered by secondary sources (scientific books, university textbooks, survey papers, etc.), even if it means that there is a delay in presenting the latest results. Remember that WP is for a general audience and not only for dedicated researchers of a specific field. (ii) I find Cybenko's proof instructive, but you are welcome to add Pinkus' version, if you think that it is simpler. If it is indeed better for WP readers, then we could remove Cybenko's proof. (iii) The statements of the presented theorems only contain the word "exist" without providing a specific formula for the weights, hence, at least the current formulations of the results are not constructive. Regarding the proofs, some of them are based on variants of the Stone–Weierstrass theorem, which means that they are not constructive. I do not know the proof technique that you used for your theorem, but if it provides an explicit construction for the network (given a target function  $$f$$), then it would be good to highlight it. Cheers,  K œrte F a   {ταλκ}  12:45, 1 July 2020 (UTC)


 * @ K œrte F a  Thanks your response. Ordered from least thorny to most thorny:


 * Several versions of the theorem indeed do not rely on Stone--Weierstrass. (And off the top of my head, certain versions of Stone--Weierstrass actually are constructive, but I'm not sure about how in how general a setting constructivity is known - a Google search suggests that this is known in quite general settings, but perhaps you know more than I do on this topic?)


 * I think the most important point is the removal of Cybenko and the highlighting of Pinkus. (Which I feel unnecessarily complicates the article.) I'll try and write up a brief sketchproof for Pinkus.


 * Yeah, I realise one's own work is a thorny topic. You can see from the discussion above that I wasn't sure if I was okay discussing it, but was told that I should give it a go. I did talk to Themumblingprophet to review and avoid potential COI but they didn't get back to me.


 * If we use only secondary sources then unfortunately I think the whole "dual" formulation has to be removed. So be it; I'll do that when I remove Cybenko + add a sketchproof for Pinkus. Best, PatrickKidger (talk) 17:47, 1 July 2020 (UTC)


 * Thanks again for your reply. (i) Regarding constructivity: you are right that whether the proofs are constructive is debatable (there are many versions of the theorem based on various proof techniques), and it might not even be important for an average reader. On the other hand, what I think is important is that it is an existence theorem that is, the theorem itself does not provide a construction for the object it claims to exist (irespectively whether the proofs are constructive in the sense of formal logic). So, the statements of the theorems do not provide methods / algorithms to build networks with the claimed approximation properties (though the proofs might have constructions). In my opinion, it is a crucial point. (ii) Regarding Pinkus' theorem: I agree with you that if it has a concise and instructive proof and we can present it, then we do not really need Cybenko's proof. We might even remove the classical version of the theorem (though, its advantage is that it is the one presented by several books). (iii) About "arbitrary depth" type theorems: though we prefer secondary sources (for several reasons, e.g., they help to identify what is more important), the usage of primary sources is not forbidden (furthermore, see the last pillar of WP:5p: "Wikipedia has policies and guidelines, but they are not carved in stone."). As there are not many secondary sources about the arbitrary depth case, we could use primary sources, as in my view these versions show important new developments. I think that both the L1 version and your theorem is nice and I would keep both of them. What do you think? Cheers,  K œrte F a  {ταλκ}  14:30, 3 July 2020 (UTC)


 * @ K œrte F a  Okay, I think we're starting to agree! (i) Constructivity - fair enough; constructivity of proof vs construction in theorem are indeed different points. (ii) Arbitrary depth: okay, let's keep both of them. Although the L1 version actually has an improved Lp version available which I'll state instead. PatrickKidger (talk) 12:43, 6 July 2020 (UTC)


 * OK, sounds good.  K œrte F a  {ταλκ}  19:22, 6 July 2020 (UTC)

To scientific? Not understandable?
No, I think the article is just fine. At least do not abbreviate it. Perhaps this text could be moved more to the end of the article and a more elementary instruction could be written in an introduction. 139.14.20.177 (talk) 11:54, 25 April 2024 (UTC)