Talk:Tensor (machine learning)

Repeated refs, multiple authors
I have simplified the referencing in the case of duplicated references. For information on what to do about multiple authors, see the template documentation here. --JBL (talk) 18:41, 17 February 2023 (UTC)


 * in this edit you undid a large number of edits by and myself; was this intentional?  --JBL (talk) 19:38, 17 February 2023 (UTC)
 * Unintentional. See note below. Ramakarl (talk) 19:59, 17 February 2023 (UTC)

Mess
@User:Ramakarl, did you intentionally revert many of my edits? Or did you revert them accidentally, perhaps by editing an old version? Mgnbar (talk) 19:38, 17 February 2023 (UTC)


 * @Mgnbar
 * Any reverts to edits made by yourself or @JBL were unintentional. I appreciate/accept both of your changes. I was working on a new section for Tensor factorization and it might have happened then (e.g. writing a while w/o update). Will be sure not to do that again... Just now I believe I successfully merged our changes. Ramakarl (talk) 20:45, 17 February 2023 (UTC)

Move to main space?
This article still needs improvement, but that will always be true. Should we move it to main space now? If not, then what are the crucial issues preventing that move? Mgnbar (talk) 04:30, 19 February 2023 (UTC)


 * @Mgnbar Just moved, thanks for your help. My addition of sections is done. Others can continue to improve. Look fwd to talk further. Ramakarl (talk) 01:01, 21 February 2023 (UTC)
 * @Mgnbar @Ramakarl I've tried to delete as much of the misconceptions as possible. Once I got to the math, I found too many mistakes to correct.
 * Feel free to undo all my edits. Alexmov (talk) 02:08, 19 March 2023 (UTC)

Article reinforces misconceptions in ML
First, this article reinforces major misconceptions in ML, which we are fighting against. One of the major misconceptions is that an image is a matrix or a tensor. Deepface uses an image as a vector despite its depiction in research articles as a matrix. TensorFaces treats an image as a vector despite its decpiction as a matrix.

I would like to edit it, but I do not want to waste my time either. Alexmov (talk) 15:40, 18 March 2023 (UTC)


 * I've tried to delete as much of the misconceptions as possible. Once I got to the math, I found too many mistakes to correct. Feel free to undo all my edits. Alexmov (talk) 02:14, 19 March 2023 (UTC)

Corrections and Deletions
This a mashup of neural networks and tensor algebra. The mathematics vesion of the tensor article was fine. It has a couple of logical inconsistencies, like using the word dimensionality to mean two different things in the same sentence....but it was fine.

Alexmov (talk) 16:44, 18 March 2023 (UTC)


 * @Alexmov We had a discussion of the topic here: Reorg of Tensor article. The term Tensor has a different usage in ML. It was suggested that an article be written for tensor (ML), and I did that. Tensors in ML are generally not considered as multi-linear maps and this is confusing to novices in ML. Many of the topics in this article were considered as unsuitable for the main tensor (math) page by other mathematicians/editors. Ramakarl (talk) 19:26, 18 March 2023 (UTC)
 * @Ramakarl:I understand that you want to keep the article which is fine.

However, tensors when they were first introduced in ML, they were considered multilinear maps. If people start performing causal inference again, then one will have to differentiate one more time between tensors (multilinear maps) and "data tensors" (n-way arrays).

Text that suggest an image is naturally/inherently/intinsically a matrix or tensor is a major misconception. The misconception sounds like it ought to be true and unfortunately it has caught on. I've tried to deemphasize the idea of a 2D image.

Right now the article is reinforcing some very bad misconceptions. Alexmov (talk) 19:33, 18 March 2023 (UTC)


 * I've deleted all the misconceptions and equations that were poor or inelegant math. Alexmov (talk) 03:56, 19 March 2023 (UTC)
 * This document adds very little to Wikipedia, but it is no longer is spreading outrageous misinformation. We can safely delete it and no one would miss it. Alexmov (talk) 03:59, 19 March 2023 (UTC)

@Alexmov Please do not rewrite an entire article without discussion. You are one person. Wikipedia is a collaborative effort. Make arguments regarding things you'd like to change and we can discuss it. Your edits have been reverted. You describe several 'misconceptions' but don't say what you think the corrected view might be.

For example, the original text does not say an image is a matrix/tensor, but that it can be interpreted/embedded as one. Images are a common problem in convolutional ML, and therefore it makes sense to discuss them. Ramakarl (talk) 05:12, 19 March 2023 (UTC)
 * When employing NN, an image "embedded" in a "matrix/tensor" is a convenient illusion. For example, Deepface treats an image as a vector, but displays images in papers as a matrices because in NN the mathemtical operations between neurons and weights are the same in both cases. Hence, the emphasis on an image embedded as matrix/tensor in NN is misleading, but convenient for practioners whose math skills are limitted.
 * On the other hand, image as a matrix in linear/tensor algebra would not compute the same quantaties as image as a vector.
 * Some of the issues are subtle and it would take too long to explain in writing.
 * The current mathematical notation is inelegant and inconsistent with linear/tensor algebra notation.
 * I am sorry that the way I cleaned up the document was not too your liking.
 * Alexmov (talk) 06:28, 19 March 2023 (UTC)
 * You may wish to read the following articles from Yoshua Bengio and Geoff Hinton that explains briefly the relationship between representation learning that employ differrnt types of neural networks and tensor (multilinear) methods. Your current Etymology and History section are completely at odds with what they have written.
 * https://arxiv.org/pdf/1210.5474.pdf
 * https://www.google.com/books/edition/Handbook_on_Neural_Information_Processin/uWozB4dNa-EC?hl=en&gbpv=1&bsq=multilinear
 * https://www.cs.toronto.edu/~hinton/absps/rolandNC.pdf
 * https://proceedings.mlr.press/v28/tang13.html
 * Panagakis et al. survey their use of tensors in computer vision. The paper makes unsubstantiated image as a matrix assertions and state that causality can be inferred from unsupervised data which is false.
 * @Alexmov Regarding specifically "unsubstantiated image as a matrix assertions". I'm not sure what you're talking about. Certainly digital images are most commonly stored discretely as 2D matrices ("a rectangular array of numbers"), although there are other representations also (e.g. vectors). This is how they are most frequently used, in actual practice, in CNNs and many other fields. So I don't know what you mean. What specific assertions? Ramakarl (talk) 03:34, 20 March 2023 (UTC)
 * Kolda and Bader should be referenced.
 * Vasilescu or Terzopoulos should also be referenced considering they were the first to use tensors in computer vision, computer graphics and machine learning. See Bengio and Hinton references.
 * I hope this helps. Best of luck!
 * PS. You ought to add a reference to every image you are using.
 * Alexmov (talk) 07:08, 19 March 2023 (UTC)
 * @Alexmov Thank you for the references. I will take a closer look and consider how they could be incorporated. Ramakarl (talk) 01:33, 20 March 2023 (UTC)
 * You are coming at this from MPCA, which is suitable here as a form of machine learning. I've tried to remove all statements which claim the multilinear map aspect is not needed in ML. For the sake of novices I prefer early parts of article are clearer and in simpler language, while reserving more complex topics later in the article. Instead focusing history on the specific use of tensors as they appear in CNN, MPCA or other. Highly technical text made clearer and explicit wherever possible (e.g. mpca, as used in facial recognition..). Merged the etymology & history sections like you did, as I think that's a good idea. Ramakarl (talk) 02:52, 20 March 2023 (UTC)
 * I appreciate your efforts, but what you and I consider correct or appropriate for a beginner differs greatly. I read the first sentence and it looks very wrong.
 * In machine learning, a tensor is a way of representing high-dimensional data in a multi dimensional array (data type) suitable for artificial neural networks or multilinear component analysis.
 * What does that mean?
 * You are coming at this from MPCA, which is suitable here as a form of machine learning.
 * A shallow Hebb autoencoder is a PCA approximation. Deep neural networks are a type of hierarchical block-based tensor factorization, aka hierarchical block-based MPCA. I recommend that you read the TensorFaces paper. It is a very gentle introduction to tensor algebra.
 * Alexmov (talk) 04:33, 21 March 2023 (UTC)
 * You seem to use the word "represented" or "embedded", when you mean to say organized or stored. Alexmov (talk) 06:40, 21 March 2023 (UTC)
 * I appreciate your efforts, but what you and I consider correct or appropriate for a beginner differs greatly. I read the first sentence and it looks very wrong.
 * In machine learning, a tensor is a way of representing high-dimensional data in a multi dimensional array (data type) suitable for artificial neural networks or multilinear component analysis.
 * What does that mean?
 * You are coming at this from MPCA, which is suitable here as a form of machine learning.
 * A shallow Hebb autoencoder is a PCA approximation. Deep neural networks are a type of hierarchical block-based tensor factorization, aka hierarchical block-based MPCA. I recommend that you read the TensorFaces paper. It is a very gentle introduction to tensor algebra.
 * Alexmov (talk) 04:33, 21 March 2023 (UTC)
 * You seem to use the word "represented" or "embedded", when you mean to say organized or stored. Alexmov (talk) 06:40, 21 March 2023 (UTC)

Merge into Tensor Article
Tensors in machine learning are not different from tensors in other fields. The only difference is the tensor gives a mathematically rigorous but confusing introduction to the topic from a very abstract perspective. You can look at tensors from either perspective (multi-indexed arrays, or multilinear transformations). But they’re still fundamentally the same thing. Closed Limelike Curves (talk) 19:14, 4 April 2023 (UTC)


 * Right. But we have had numerous comments/complaints convincing us that the Tensor article does not serve the machine learning audience well. So the goal here is to present pretty much the same material, in a way that works better for them.


 * This article (Tensor (machine learning)) is new, volatile, and messy. For example, the first paragraph is pretty bad. Any improvements, that you could contribute, would be appreciated. Mgnbar (talk) 19:30, 4 April 2023 (UTC)
 * My own impression is there is one quite fundamental difference – in mathematics and physics, the distinction between covariant and contravariant indices is important, whereas in machine learning that distinction is usually ignored. From a maths/physics viewpoint, a (2,0)-tensor, a (0,2)-tensor, and a (1,1)-tensor are three rather different things; but from an ML perspective, a 2-tensor is a 2-tensor is a 2-tensor and rarely is any attention paid to the (2,0)-vs-(0,2)-vs-(1,1) distinction. SomethingForDeletion (talk) 09:27, 26 June 2023 (UTC)
 * I agree with you. In particular, the total "order" p + q of a (p, q)-tensor is not generally meaningful.
 * However, if you're working in Euclidean space and you restrict to orthogonal changes of basis, then the transformation laws don't care whether indices are covariant or contravariant. (More conceptually, there is a canonical way of converting between them using the dot product. It's a special case of Raising and lowering indices.) It appears that this viewpoint is common in engineering applications. See for example parts of Cauchy stress tensor ... but not other parts.
 * So it's not just machine learning that ignores covariance vs. contravariance. Mgnbar (talk) 12:47, 26 June 2023 (UTC)
 * My impression of ML, is a lot of ML people don't know about covariance vs contravariance at all. It simply isn't part of their model of a "tensor". Whereas, I assume (maybe wrongly) that engineers know about it, but also know they can get away with ignoring it in certain cases, and so in those cases they do. Do you think that's accurate?
 * Also, when engineers ignore covariance/contravariance, it is because (as you say) "if you're working in Euclidean space and you restrict to orthogonal changes of basis". However, I don't think that's necessarily true for ML – tensors in engineering are ultimately representations of something physical which exists in (approximately) Euclidean space; an ML tensor is just a bunch of numbers, and it may not have any meaningful relationship to any Euclidean space. Suppose I take the string "Hello", and turn it into a vector based on ASCII values: (72 101 108 108 111). In ML, that's a perfectly valid rank-1 tensor, but I doubt it makes any sense to view it as having any relationship to any Euclidean space. Sure, you can pretend that's coordinates in R^5, and even calculate the Euclidean distance between two ASCII-encoded 5-character strings as if they were points in R^5 – but doing that probably isn't meaningful or useful. And in ML, if you are going to measure the distance between two vectors, you are more likely to use cosine similarity than Euclidean distance anyway, in which case is "you're working in Euclidean space" still true?
 * I think this is a big source of confusion – if you've learnt about "tensors" from ML, then you encounter a maths/physics discussion of "tensors", you are scratching your head "what's all this covariance/contravariance stuff?". Conversely, if you've learnt about "tensors" from maths/physics, then you encounter an ML discussion of "tensors", you are scratching your head "where did covariance and contravariance go?" (And its disappearance is arguably not for the same reason as in engineering.) I really hope Wikipedia could clarify that confusion for people, but at the moment it just seems to have two separate articles – the main one with a maths/physics focus, this one with an ML focus – without any explanation of what the differences actually are. SomethingForDeletion (talk) 22:47, 26 June 2023 (UTC)


 * The questions/issues that you raise are reasonable. It would be good to have this all clarified. (I can't do it, because I come from the math/physics side and don't really know the machine learning side.) Mgnbar (talk) 22:52, 26 June 2023 (UTC)
 * Thanks. My background is more ML than maths/physics – although I wouldn't claim to be an expert in either. My biggest problem here, is – I think I've worked out the "true story" here, but I don't know of any reliable source I can cite for it, so I don't know if I'm allowed to put that story in the article. SomethingForDeletion (talk) 23:02, 26 June 2023 (UTC)
 * I just went ahead and added something to the article anyway–a new section explaining differences between ML tensors and maths/physics tensors, based on our discussion here. I'll see if it survives. Would you mind reviewing it? SomethingForDeletion (talk) 23:28, 26 June 2023 (UTC)
 * I appreciate your sincere attempt, but I don't think that it helps. I'm not convinced, based on our brief discussion here, that any of that text is true. Reliable sources, that say (A) this is a difference and (B) this is the key difference, are desparately needed. Mgnbar (talk) 00:52, 27 June 2023 (UTC)
 * I found a book which tackles this issue head on – https://www.worldscientific.com/doi/pdf/10.1142/9789811241024_0001 – although unfortunately only Chapter 1 appears to be open access, whereas the main chapter on machine learning is chapter 2. But even in chapter 1, the author (a CS professor does talk about how ML and math/physics communities use different definitions of "tensor"; see especially on page 11:
 * This kind of definition of tensor is often referred to as the old-fashioned definition. It is this component approach that caused the conundrum, with the concept of tensor portrayed as an equivocal duality of matrix and non-matrix, just like the mixture of the living and the dead states of Schr¨odinger’s cat. The tensor is defined as a matrix, but amended by the transformation laws. It is defined as the components of an object, without a clear definition of what this object is.
 * In recent years, with the booming research in machine learning, the machine learning community uses the tensor simply in the sense of a multidimensional array (or higher dimensional matrix), ignoring the transformation laws and breaking up this fuzzy duality.
 * I think "ignoring the transformation laws" is basically saying the same thing as I was saying, just putting it a bit differently–the "contravariant"/"covariant" distinction is due to those transformation laws, if you ignore them it no longer serves any purpose.
 * The author also gives an interesting analogy on page 4:
 * What do love and tensor have in common? Is the love between sisters the same as that between mom and dad, dating teenagers, and dogs and humans? Compare with the question: is the tensor in machine learning the same as those in mathematics and physics?
 * SomethingForDeletion (talk) 07:43, 27 June 2023 (UTC)


 * It's very valuable to find a source. Thanks. When writing "without a clear definition" the author must have specific treatments of tensors in mind. In the math literature, the concept is precisely defined (although it requires substantial background to understand).
 * In the past, we have had editors comment that machine learning tensors are fundamentally the same as math/physics tensors, right down to their meanings as multi-linear maps. It would be good to have these editors chime in, before we refocus this article based on one source. I've posted a notification at Talk:Tensor. Mgnbar (talk) 12:26, 27 June 2023 (UTC)
 * When writing "without a clear definition" the author must have specific treatments of tensors in mind. In the math literature, the concept is precisely defined (although it requires substantial background to understand). When he says "without a clear definition", he is talking about the old definition of tensors found in the older literature of mathematics and physics from decades ago, not the contemporary definition. He is talking about the definition "mostly seen in older textbooks of tensor analysis, physics, and especially general relativity". In 2023 mathematics/physics, "tensor" is clearly defined; in 1953 mathematics/physics, "tensor" wasn't – but, he argues, a lot of sources even today still repeat those old imprecise definitions, because they appear simpler, but despite that simplicity, people end up being confused by their ambiguity/equivocation about what a "tensor" actually is. SomethingForDeletion (talk) 23:11, 27 June 2023 (UTC)


 * I think that we understand each other. And maybe you didn't intend 1953 to be taken as a specific important date, but in case you did, let me push back a bit on it. Without being an expert on the history, I would guess that the rigorous mathematical foundation for tensors was established sometime between 1858 (Kronecker product?) and 1922/1935 (Kunneth formula/Tor functor?). Mgnbar (talk) 13:29, 28 June 2023 (UTC)
 * I kind of plucked the year 1953 out of the air, I didn't mean it as a precise figure, just a rough estimate. And it isn't just about when the rigorous mathematical foundation was first published, it is also how long it takes for it to filter through the academic community and become established in the textbooks – which isn't always immediate, sometimes that process can take many years, decades even. Not being an expert on the history either, but it could well be that the contemporary definition had already been published by the 1950s, but might not have conquered all the textbooks until the 1960s or 1970s (or maybe even later than that?) SomethingForDeletion (talk) 14:38, 28 June 2023 (UTC)
 * Definitely. In fact, the rigorous foundation is still not in many textbooks even in 2023. I guess it's because the abstract rigor is not needed to keep a bridge from falling down, for example.
 * Anyway, it would be nice to get more opinions and sources about all of this. Mgnbar (talk) 16:14, 28 June 2023 (UTC)
 * I guess it's because the abstract rigor is not needed to keep a bridge from falling down, for example True. Although (something which Guo alludes to in his book), give some students a non-rigorous definition, they'll just accept it and move on; but other students will start thinking about it deeply, and asking probing questions, and with a non-rigorous definition those questions are impossible to answer in a non-contradictory way, and the student becomes confused and disheartened by the contradictions. This can especially be a problem when the teacher doesn't actually know the rigorous definition themselves, which is something I've personally experienced before (in high school, not about tensors though–our computing teacher was trying to explain CRC algorithms without reference to finite fields and polynomial rings, because the teacher didn't actually know anything about them, but since they are the mathematical foundation of CRC algorithms, trying to explain those algorithms when you don't know that foundation produces an incoherent mess.)
 * Anyway, it would be nice to get more opinions Yes, there were a bunch of other people editing this page a couple of months back, where did they all go? SomethingForDeletion (talk) 23:51, 28 June 2023 (UTC)

Differences between ML tensors and mathematics/physics tensors
I added the below two paragraphs to the article as a section with heading "Differences from tensors in mathematics and physics":
 * In mathematics and physics, tensors are defined in terms of a vector space – hence, the distinction between covariant and contravariant indices is significant. In some cases – particularly in engineering – that distinction can be disregarded, if you are working in Euclidean space and only allow orthogonal changes of basis. Tensors are commonly described using (p,q) notation, where p is the number of contravariant indices and q the number of covariant indices. A (2,0)-tensor, a (0,2)-tensor and a (1,1)-tensor are fundamentally different types of tensors. Furthermore, the total order p + q of a (p, q)-tensor is not necessarily meaningful.
 * However, in machine learning, the distinction between covariant and contravariant indices is usually ignored – there is no distinction between (2,0)-tensors, (0,2)-tensors and (1,1)-tensors, there are only 2-tensors. This cannot simply be justified on the basis of working in Euclidean space (as is sometimes done in engineering), since a machine learning tensor is just a multidimensional array of numbers, and there is no guarantee that the numbers in the array have any meaningful relationship to any Euclidean space.

Since (a) I don't have a reliable source for the above, and (b) User:Mgnbar opposes the addition, I have removed it for now. I still think we do need something like this section in the article though – even if not that exact text. I am hopeful maybe in this section of Talk we can work on coming up with something we can put in the article. SomethingForDeletion (talk) 00:58, 27 June 2023 (UTC)