Talk:Activation function

What does "$$f(x)\approx x$$ when $$x \approx 0$$" mean?
What is the purpose of the column "$$f(x)\approx x$$ when $$x \approx 0$$"? It's unclear what the expression means, as I wrote in the clarify template. What I don't understand are:


 * By substitution, is this equivalent to the expression $$f(x)\approx 0$$ when $$x \approx 0$$?
 * If not, does $$f'(x)\approx 1$$ when $$x \approx 0$$?
 * Does $$f(x)$$ have to pass through the origin?

These ambiguities apparently lead to some disagreement to what should actually be the truth value of that expression (yes or no?). I think we need to answer these questions, otherwise I don't know if it makes any sense to keep the column. —Kri (talk) 11:56, 5 May 2016 (UTC)


 * It's a vague approximation to $$f(x)\approx 0$$, but $$f(x)\approx x$$ is closer. For one thing, $$f(x)\approx 0$$ would imply $$f'(x)\approx 0$$ rather than $$f'(x)\approx 1$$
 * We know little of the derivative because we're now stacking approximations on top of each other. In particular it can become risky to assume much about the range of x for where this assumption holds usefully true. Although, yes, $$f'(x)\approx 1$$
 * Yes
 * Andy Dingley (talk) 12:27, 5 May 2016 (UTC)


 * Okay, so if I understand you corectly, saying that $$f(x)\approx 0$$ when $$x \approx 0$$ for an activation function $$f$$ is equivalent to saying that $$f'(x)\approx 1$$ when $$x \approx 0$$ and that $$f(x)$$ passes through the origin, or more formally that $$f(0) = 0$$ and $$f'(0) = 1$$; is that correct? —Kri (talk) 21:58, 7 May 2016 (UTC)

Newer Activation Functions
Take a look at these two links


 * https://www.semanticscholar.org/paper/Comparison-of-new-activation-functions-in-neural-Gomes-Ludermir/9b37079041bdaca4248ab4f62f1a63013a50f067/figure/1
 * https://www.semanticscholar.org/paper/Activation-Functions-for-Generalized-Learning-A-Villmann-Ravichandran/04d54996bcbe44b3547da889d7eab8aab3660990/figure/0
 * https://www.semanticscholar.org/paper/A-comparative-performance-analysis-of-different-in-Farzad-Mashayekhi/bcfdfe54796c501a90c3b353661a19e9c161d2c8/figure/0
 * https://www.semanticscholar.org/paper/Searching-for-Activation-Functions-Ramachandran-Zoph/c8c4ab59ac29973a00df4e5c8df3773a3c59995a/figure/2

Is it possible to add these to the table? — Preceding unsigned comment added by 183.179.55.44 (talk) 16:54, 21 January 2020 (UTC)

Folding activation functions
The terminology in this subsection has little support in literature. The embedded link to fold functions is neither particularly relevant. This section should be removed or updated. — Preceding unsigned comment added by 88.94.64.26 (talk) 16:39, 17 October 2022 (UTC)

Removed activation functions
On July 7th 2020, many of the existing activation functions were removed from this Wikipedia article. The full list is available at: https://en.wikipedia.org/w/index.php?title=Activation_function&oldid=966536154 — Preceding unsigned comment added by 185.107.13.4 (talk) 20:49, 26 July 2020 (UTC)

The user who removed these activation functions gave number of citations as their criterion for removal. Is it clear how citations were counted? Is 20 a reasonable threshold? Kaylimekay (talk) 12:18, 20 September 2020 (UTC)

There are hundreds of different activation functions, most with minimal traction. Usage is a better criterion than citations: https://paperswithcode.com/methods/category/activation-functions Even now, there are probably too many irrelevant activations listed; are there any state-of-the-art models that use the sinc, atan, or sin as an activation function? Another good heuristic is this: what activations are included in PyTorch, Jax, Tensorflow, or MXNet? User:Ringdongdang 9 November, 2020

I removed sinc, sin, atan since those aren't activation functions in any SOTA architecture nor are they used in common neural networks. https://paperswithcode.com/methods/category/activation-functions Someone added "squashing functions" referencing papers from October 2020 with 0 citations; once the paper has more traction (like it being implemented in the core library of tensorflow and pytorch) the person can feel to re-add the activation. It may be useful to add activations that people actually use, like GLU. User:Ringdongdang 20 November, 2020 — Preceding unsigned comment added by Ringdongdang (talk • contribs) 23:49, 20 November 2020 (UTC)

Removed activation functions (AF) still have usage even minimal. If some AF that not necessary mean that it will not be used in feature. If we there some criteria for AF to be on lit let's add comment like top 20 most used AF. Trach their score like in sport. Pick most used AF every year.

Someone keeps adding the "Growing Cosine Unit" which does not even have five citations at the time of writing. They keep adding it and text that serves to advertise the activation function, as well as a figure that talks about it. When I removed it, I was called "possible vandalism," but spending inordinate time talking about an activation that someone just proposed (that has not caught on) is closer to vandalism. — Preceding unsigned comment added by Ringdongdang (talk • contribs) 22:41, 23 November 2021 (UTC)

They added back their three citation paper for a third time. — Preceding unsigned comment added by Ringdongdang (talk • contribs) 19:07, 13 December 2021 (UTC)

They've tried adding it back I think five times now. Can we require that people have an account to edit this page? Researchers keep adding their activation functions that do not have traction. — Preceding unsigned comment added by Ringdongdang (talk • contribs) 00:45, 26 December 2021 (UTC)


 * Thanks for pushing back on the spam. We can request semi-protection if IP disruption continues, but it looks like he made an account now.  Hopefully he'll join the discussion. Dicklyon (talk) 00:37, 11 January 2022 (UTC)

Mish and other Activation Functions
The list within the link provided is more comprehensive than this Wiki Page, updates are worth considering. https://github.com/digantamisra98/Mish#significance-level — Preceding unsigned comment added by 183.179.53.41 (talk) 04:15, 21 October 2021 (UTC)

Derivative of ELU activation function - missing case
What is the result of the derivative of the ELU function, when alpha != 1.0 and x == 0.0? Someone who knows the answer might want to update the page. — Preceding unsigned comment added by 2003:E5:2724:4EA3:784E:FE9D:6EC8:85C3 (talk) 15:20, 29 November 2021 (UTC)
 * It's not missing. The entry in the "Order of continuity" column means the derivative doesn't exist at x = 0 except in the special case of alpha = 1. Dicklyon (talk) 00:44, 11 January 2022 (UTC)

Self promotion and arxiv preprints
It would be great if the IPs edit warring at this page would review WP:COI and WP:RS - Wikipedia isn't a place to promote yourself by posting links to your arxiv preprints. Also have a look at WP:POINT - you are not going to get your way by disrupting Wikipedia. Reverting my edits at random is just going to ensure you keep getting blocked and will get this article locked down so IP editors won't be able to change it. MrOllie (talk) 19:07, 20 October 2022 (UTC)


 * What was the reason you reverted my edit ("Rv arxiv preprint as source")? Was it only because of the reference being an arxiv preprint or do you think such information should not make it to the wikipedia article?
 * While I agree in general that arxiv preprints often make unbacked claims, I think that there are cases where it makes sense to use them as a reference (the article itself contains several arxiv preprints as references so it cannot be blanket prohibition of arxiv references - I know, most of these cases are well known papers).
 * I referenced an article I found useful in my search for activation functions and I believe such information should make it to the wikipedia. While the arxiv preprints often make unsubstantiated claims, the claim I referenced of there being over 400 activation functions is easily verifiable as the preprint lists them all and I did verify the claim by counting them :). I doubt that this preprint has the ambition to be published as a paper as it provides only a list with references yet I strongly feel that such list might be useful to many readers as it was to me.
 * For example, I have found the revision 966536154 very useful as a list but I understand why it was stripped - it makes for a too complicated article and it is unclear which AFs should be included and it is very susceptible to self promotion and COI. But I still think that such information about the number of AFs available is useful to the reader without messing the article too much and the reader can always check the reference for the list.
 * Could you please reconsider your revert? If you feel strongly about such information being useless or that the arxiv preprint is still too unreliable in this case, I will abide your decision.
 * I just wanted to help other readers who visit the article with the same goals as me - when I was starting my research, I went to the article and found several useful stuff but this small piece of information with the reference would have saved me countless hours of going through scientific literature and I think that there are other readers as me :) PteroBacter (talk) 20:10, 11 March 2024 (UTC)
 * On Wikipedia the rule is to avoid self published sources (WP:RS) - and arxiv is a selfpublishing venue. Wikipedia isn't an indiscriminate collection of information, there are some minimal standards in place. If there are other bad sources in the article the right thing to do is replace or remove them, not make the problem worse by adding more. I do understand that you meant well and thought you were improving the article - it is perfectly understandable that you weren't aware of Wikipedia's guidelines on sources like this. MrOllie (talk) 21:02, 11 March 2024 (UTC)