Talk:Gene expression programming

GEP vs Linear Genetic Programming (LGP)
Can anyone give a good comparison between Gene Expression Programming (GEP) and Linear Genetic Programming (LGP)? According to this article, it seems that the only difference between GEP and Genetic Programming (of which LGP is a subset) is how programs, in the population to evolve, are represented. Quoting the relevant fragment: "'In Genetic Programming the individuals comprising a population are typically symbolic expression trees; however, the individuals comprising a population of GEP are encoded as linear chromosomes, which are then translated into expression trees. That is, in GEP, the genotype (the linear chromosomes) and the phenotype (the expression trees) have a different encoding structure.'" Emphasis mine. Now, "typically" does not exclude other representations, indeed LGP is a counter-example. In LGP each program from the population is represented as a sequence of instructions, sometimes directly as machine instructions as in the approach of Peter Nordin, which doesn't need a translation phase "from genotype to phenotype", to use GEP questionable terminology (IMO the author doesn't understand the difference between a process - a running program - and its specification - a program properly encoded; using GEP analogy, a program is always genotype). I concur with section "Dubious claims throughout" below, where the merits of such translation in GEP is questioned. --MaD70 (talk) 15:27, 24 January 2010 (UTC)


 * The difference is the way that the programs are represented during the reproduction stage. In GEP this allows the use of genetic algorithm methodologies while in the genotype form and defines a upper limit to program size.  Genotype representation is used to simplify the crossover and mutation process by using short strings represent the allowed operators and operands.  Both GEP and LGP are sub-sets of Genetic Programming that use different approaches to achieve the GP goal.  I do think the article needs to be re-written with the GEP algorithm clearly explained. Bob0the0mighty (talk) 14:51, 19 February 2010 (UTC)


 * LGP is much more well-thought-out, as recombination operating on its linear encoding *actually makes sense*, as opposed to GEP, which might as well be replacing portions of the resulting subtree with random subtrees. LGP's phenotype is comparable to an L-System, describing a *depth-first* constructive process, whereas GEP's is breadth-first, which is patently absurd. I was very disturbed when I found this article had been published in Complex Systems, as it's clearly a crock. 74.64.96.71 (talk) 07:52, 14 October 2010 (UTC)


 * Also, the usual complaint is that GEP is a subset of "grammatical evolution" (GE). 74.64.96.71 (talk) 22:11, 14 October 2010 (UTC)

Author refers to a random numeric constant (RNC). It breaks no new ground (the commonly accepted term is ERC - ephemoral random constant). Presenting this concept as something new is dubious. It is self-aggrandizing at best, fundamentally incorrect and intentionally misleading at worst. All genetic programming systems utilize this concept. 170.74.56.84 (talk) 18:55, 7 July 2015 (UTC)

POV
NPOV? "... this new algorithm surpasses the old GP system in 100-10,000 times." Please specify "old GP system". I think, that Machine code [Linear genetic programming] is really fast. --Michal Jurosz 11:40, 2 November 2005 (UTC)

Answer: The old GP is the algorithm popularized by Koza (it's on the paper referenced in the article)


 * Please sign your posts. Just append four tilde's (~) and the wiki will automatically add your IP address or username (if you have one) and the date. Thanks. 65.183.135.231 (talk) 17:42, 31 October 2008 (UTC)


 * I simply removed the numeric portion of this claim, replacing it with "significantly". —Preceding unsigned comment added by 74.64.96.71 (talk) 07:53, 14 October 2010 (UTC)

What does this mean???
"This new algorithm surpasses the old GP system in 100-10,000 times." What does this mean? Is it 100-10,000 times faster? As it stands, this seems a completely nonsensical statement... —Preceding unsigned comment added by 65.183.135.166 (talk) 08:03, 28 November 2007 (UTC)

Dubious claims throughout
Here is some "original research" (i.e. glaringly obvious points):

In GEP, the mapping from genotype to phenotype is deterministic and static; as such, GEP is nothing more than GP with a convoluted encoding--In fact, I would argue it is significantly less. As a consequence of the high non-linearity of GEP's genotype-to-phenotype mapping, offspring generated via recombination bear very little resemblance to their parents. GEP is thus, effectively, GP with a broken recombination operator, such that runs degenerate into near-random search and simple hill-climbing.

I am nothing less than shocked that these ideas have been accepted for print. Consider:

In section 6.1 of the paper, a GEP "evolves" solutions to the regression problem: a^4 + a^3 + a^2 + a. Note how this is a very easy problem--there aren't even any coefficients to determine! In any case, I replicated the experiment with the same settings, in Matlab. Know what I found out (confirmed)? With a population size of 30 individuals, and 50 generations, simply randomly generating individuals yields comparable performance to this "revolutionary" algorithm! That is, (30x50=)1500 random guesses is usually enough to guess a solution

Where does the claim to 10,000x improvement come from? It may be enlightening to examine the text accompanying Figure 11: "Plot 1: ... a perfect solution was found in generation 1. Plot 2: ... a perfect solution was found in generation 3. Plot 3: ... a perfect solution was found in generation 2. Plot 4: ... a perfect solution was found in generation 3. Plot 5: a perfect solution was found in generation 1." I call bullshit. If your stochastic approximating algorithm is finding perfect solutions after only a handful of generations, you should be suspicious! Clearly, either the problem is too easy, the driver routine is misreporting results, or the researcher is fabricating them. I would prefer not to believe this last, but could the severe problems with this algorithm be any more obvious? How could someone continue on this topic for so many years, even publishing books(!) on it, without realizing its readily apparent flaws?65.183.135.231 (talk) 03:04, 29 October 2008 (UTC)

I am finding it hard to find journals and other researchers citing GEP except in a few passing mentions (2 brief mentions in the GP and evolable machines journal) which is a little concerning. As an absolute beginner to the subject of GP I do find the solutions being found after one generation a little suspicious, however theres nothing I can find condeming it either so for the purposes of wikipedia your 'original research' points are moot. If you can find peer reviews on GEP i'd like to see them (I need them for a literature review!) 138.253.136.37 (talk) 16:25, 30 October 2008 (UTC)


 * I of course agree with you that my OR is not suitable for the article; that's why I didn't include it. However, it is important to note the very sketchy nature of the current article's claims--published or not--so that other editors can see where the article needs further work, and what sorts of additional sources should be sought out. 65.183.135.231 (talk) 16:45, 30 October 2008 (UTC)


 * Yes sorry missed your point first time around, most of the claims in the article have no supporting evidence beyond Ferreira's own work and should be removed. I've just reread the preface of the GEP book to quote directly from it:


 * "The invention of a new paradigm can often create strong resistance, especially if it seems to endanger long established technologies and enterprises. The publication of my work on scientific journals and conferences, which should be forums for discussing and sharing new ideas, became a nightmare and both my work and myself were outright dismissed and treated with scorn."


 * I think that pretty much sums up GEP as a fringe theory in his Ferreira's own words, regardless of whether hes right or not the fact that his ideas have been rejected by the GP community is significant. 138.253.136.37 (talk) 18:16, 30 October 2008 (UTC)


 * Wow. Nice find! My own cursory search was unable to find any critical reviews, and here is Ferreira (female, not male, by the way) providing a perfect quote in her own book! Should we make a "Controversy" section? 65.183.135.231 (talk) 22:46, 30 October 2008 (UTC)


 * Going back to figure 11 I think that is a study of the population over time after solutions have been found, the paper probably would have benefitted from some harder examples and stronger comparisons though. I havn't found any comparisons to existing algorithms within the tested examples yet in my reading and the conclusion says it has been shown GEP outperforms existing algorithms. I will read it all tonight. 138.253.136.37 (talk) 13:41, 31 October 2008 (UTC)

Hi, given the novelty of the method, that the developer is biologist and woman and seeing how GEP citation is extending all over the www (a second edition of the book by Springer should give us some clue) I am able to predict that this kind of evolutionary algorithm will become standard practice in the field. Congrats to Ferreira! (whom by the way I have no relationship and never met with): I am just a evolutionary biologist and computer engineer interested in evolutionary computation. I am pasting here some links for (I think) independent references that seems to sustain Ferreiro's claims on GEP.

the first compares GEP with GP and linear regression:



the second uses GPE on high energy physics event selection



evaluates GPE on high energy physics event selection: 

This paper also sustain the better performance of GEP:



AC-R (15th Nov 2009)


 * Most of those links were broken (but I believe you when you say they once existed) or behind subscription barriers. But for the high-energy physics paper (this link might be more stable: ), do actually take a look inside. The author says absolutely *nothing* about the population size, number of generations, nor the number of trials! They simply conclude "it works", seeing as it successfully evolved an exceedingly trivial expression (the logical conjunction of three floating-point comparisons). Absolutely incredible. What do you suppose the probability of randomly having this or a very similar expression appear in the initial generation, purely by chance? Moreover, if you'll grant that Ferreira's genotype-changing operators are essentially gross mutation, in the phenotype space, then how many generations before, again, we reach this result purely by random chance (as opposed to, say, building blocks being discovered, over time, and recombined in useful ways)? With very short expressions, a few hundred individuals in the population, and even a handful of generations, that's several hundred nearly-random expressions--it is highly likely you'll find the result. But we don't call this an evolutionary algorithm--we call it *random search*. I've seen a few papers like this one, and it's always the same pattern: A practitioner from an unrelated field tries the tool without understanding what's really going on (in particular, with a large enough population and a simple problem, pure chance has an excellent chance of returning the result even in the first generation), and says, "Yep, it works" and publishes it in a conference or other thinly-refereed context. (Again, I was absolutely shocked to see the seminal paper in Complex Systems). —Preceding unsigned comment added by 74.64.96.71 (talk) 08:15, 14 October 2010 (UTC)


 * And in case you were stupefied by my rant (sorry--it's late, I'm not at all being concise), note again that the author said *nothing* about either population size nor number of generations. Read that again. How can you *possible* evaluate an evolutionary algorithm without saying how large of a population and how many generations? I can run an amazingly foolish algorithm for a few hundred thousand iterations and find a solution. Or, often, I can run a *legitimate* algorithm from the evolutionary computation toolbox and find a solution to the same problem in a few *hundred* generations. (Likewise, I can run a foolish algorithm for a handful of generations and get a solution, if I just increase the size of the population by a few orders of magnitude: random search) 74.64.96.71 (talk) 08:36, 14 October 2010 (UTC)


 * And notice the excellent observation above, where someone looked up where the "10,000 times improvement" claim comes from: Figure 11: "Plot 1: ... a perfect solution was found in generation 1. Plot 2: ... a perfect solution was found in generation 3. Plot 3: ... a perfect solution was found in generation 2. Plot 4: ... a perfect solution was found in generation 3. Plot 5: a perfect solution was found in generation 1." I checked, this is an actual quote from the seminal paper. *Please* check, yourself--it's amazing. What does "a perfect solution was found in generation 1" tell you? Generation 1 means the algorithm wasn't even *run* yet. Generation 1 means that's just the *randomly-generated* initial population. That is the *definition* of random search. If the problem is so mind-numbingly trivial that a random search "solves" it any more often than *almost never*, that tells you something. And, indeed, the other "perfect solution[s]" were found in generation 3, 2, 3, and, *again*, 1! So, is GEP magically evolving the solution in a *single step*, in *two steps*, or, *twice* in *zero steps*?!! No. Quite clearly, the problem is being solved by the random search that is the initial population, or else the random search that is the nonlinear (to the point of being meaningless) genotype-altering operations. Any researcher who sees their problem being solved undeniably purely by chance in at least *2/5* of their experiments can't possibly attribute any of that success to their algorithm. It was either highly unethical, or demonstrative of truly prodigious incompetence, for Ferreira to submit this for publication. 74.64.96.71 (talk) 08:31, 14 October 2010 (UTC)


 * Oooh... here's a good one: Ferreira herself describes her Complex Systems paper as "infamous". I wonder why that was, no?

The handling of generation of neural networks from the symbology seems to imply that the full function of a neural net would be employed in processing, but it seems, based on my cursory observations, that the decomposition of a neural network into a decision tree is what is actually occurring and from there you lose some of the fully connected processing power of neural networks.

Wikiproject Robotics
While roboticists often use evolutionary algorithms in their work, I'm not sure it's really appropriate for this article to be part of Wikiproject Robotics (for example, electrical engineers use EA's as well, but this isn't part of some EE Wikiproject)--I'm sure there is a more appropriate category. How do we change it? Or do folks disagree? 65.183.135.231 (talk) 17:47, 31 October 2008 (UTC)

Critical responses from researchers
1. Apparently, in addition to a great number of other researchers, John Koza himself, the inventor of genetic programming, *supposedly* (a less indirect source is wanted for such a claim--EDIT: see #8, below) took the time to speak out against this nonsense: 74.64.96.71 (talk) 09:19, 14 October 2010 (UTC)

2. Here we have folks saying the results in the paper suggest "that the mutation and crossover algorithms are performing near-randomization" (i.e. random search): They also point out that, despite the claims, the paper doesn't actually compare GEP to any other algorithms. 74.64.96.71 (talk) 09:27, 14 October 2010 (UTC)

3. Technically not a critical response, but check out Ferreira's response to the above:. She resorts to Random Capitalization and forceful Blustering without Addressing Any Points. 74.64.96.71 (talk) 09:32, 14 October 2010 (UTC)

4. Brilliant layout of the problems here:. I believe this is Sean Luke at GMU (someone who actually retained a professorship past the publication date of their thesis--and at a not-too-shabby institution, for that matter). 74.64.96.71 (talk) 09:36, 14 October 2010 (UTC)

5. Another such thread. Argues grossly unsubstantiated inference and unwarranted self-aggrandizement; also, that the work is subsumed by grammatical evolution (GE) and others, in addition to misleadingly citing Koza's famous paper: Bill Mydlowec. Stanford. Worked with Koza. 74.64.96.71 (talk) 09:44, 14 October 2010 (UTC)

6. Here we have Dr. Conor Ryan of the University of Limerick, the inventor of grammatical evolution (GE) informally demonstrating GEP is a mere subset of GE:. 74.64.96.71 (talk) 09:49, 14 October 2010 (UTC)

7. Again, not a critical response, but Candida's own words (which consistently serve the same effect): "GAs are good at parameter optimization, a task GP cannot handle, although GEP, more related to GP than GAs, can handle this kind of problem very well." That's right--GP "cannot handle" "parameter optimization" (despite the fact that GA, which *of course* can optimize a vector of parameters, is a subset of GP). Martin Sewell, from the University of Cambridge, points this out (except inadvertently saying GP is subset of GA, rather than vice versa), and not so subtly suggests that, in comparison, GEP is "overtly more commercial". Probably because half of Ferreira's posts are soliciting people to buy her books or software. 74.64.96.71 (talk) 10:12, 14 October 2010 (UTC)

7.5 David Anisman sketches out the trivial proof the GA is GP: 74.64.96.71 (talk) 10:18, 14 October 2010 (UTC)

8. Here we have John Koza (the Man himself) dismiss Ferreira very concisely, after she suggests he is ignorant of some fairly basic ideas:. 74.64.96.71 (talk) 10:14, 14 October 2010 (UTC)

9. Here, Maarten Keijzer suggests GEP is so "trivial" that "GEP as such shouldn't [even] be mentioned in the GP-tree" He suggests "overall, [it] made the list by virtue of the [author's] high annoyance factor, high likeliness of being provoked, and a level of stubbornness and absence of reason that [he] hadn't encountered before in scientific circles." 74.64.96.71 (talk) 10:26, 14 October 2010 (UTC)

More controversy:. 74.64.96.71 (talk) 10:12, 14 October 2010 (UTC)

Tutorial
See figure 6, on page 12, of Dr. Ferreira's tutorial: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.128.4476&rep=rep1&type=pdf

This is for a simple quadratic regression problem. Notice how:

1. There is no discernable trend in the average fitness. 2. The fitness of the best individual is almost perfect, right from the start. 3. The accompanying analysis somehow sees this as an excellent result.

Similarly, in figure 13, on page 19, for a more complex regression problem, there is again no discernable trend in the average fitness--even after 5,000 generations!

74.101.91.229 (talk) 06:04, 1 December 2010 (UTC)

Review paper
Here is a paper, also published in Complex Systems, comparing GEP and 3 other algorithms on a set of 5 benchmark problems: http://www.complex-systems.com/pdf/14-4-1.pdf

Note in particular the graphs and tables in Section 8, where GEP consistently and quite markedly underperforms in comparison to LGP and MEP, often exhibiting nearly flat fitness curves. 74.101.91.229 (talk) 06:19, 1 December 2010 (UTC)

punctuation and mathematical notation
I just did a bunch of corrections per WP:MOS and WP:MOSMATH. In particular, notice that:
 * Ranges of pages, years, etc. should have an en-dash, not a hyphen, thus:
 * wrong: pages 31-60
 * right: pages 31–60
 * (There were probably a couple of dozen instances of hyphens for this purpose in the article.)


 * A minus sign looks different from a hyphen.
 * wrong: 5 - 3
 * right: 5 &minus; 3
 * (Of course in TeX one uses a hyphen, and except in \text{} mode, it gets rendered as a minus sign.)


 * \max is a standard operator name in TeX.
 * wrong: $$ n_{max} - 1 \, $$
 * right: $$ n_\max - 1 \, $$
 * Among the effects of \max rather than {max} or the like are that it is not italicized, it has proper spacing before and after it, thus
 * $$ 5\max A \, $$
 * and subscripts are suitably positioned in expressions like this:
 * $$ \displaystyle \max_{x\in S} f(x) \, $$

Michael Hardy (talk) 16:08, 1 October 2012 (UTC)

Copyright issue
The copyright issue has been addressed in the email sent to permissions-en@wikimedia.org. The referred material has been licensed under the terms of the Creative Commons Attribution-Sharealike 3.0 Unported License (CC-BY-SA) and the GNU Free Documentation License (GFDL) as shown in the license page. Oritnk (talk) 20:28, 14 October 2012 (UTC)
 * We have received that email, and I have removed the copyvio notice. Thanks, Legoktm (talk) 07:16, 26 November 2012 (UTC)
 * Addendum: per the OTRS ticket,, , and have been released under a CC-BY-SA-3.0/GFDL license, and may be included in the article. Happy editing, Legoktm (talk) 07:19, 26 November 2012 (UTC)

Copyright policy
It’s a shame that Wikipedia is letting anyone come in anonymously with a blanket accusation about copyright infringement without even bothering to tell exactly where in the text the copyright was being infringed, so that people could take immediate action and fix the problem.

The gene expression programming article is a big one and it was written specially for Wikipedia. So saying that it infringes copyright without saying exactly where, can only be perceived as intending to paralyze and disrupt. And since Wikipedia is letting anyone come in anonymously and close a page that took great effort and dedication to write, I don’t think we are encouraging the people who are contributing constructively to Wikipedia, on the contrary.

Oritnk (talk) 13:51, 15 October 2012 (UTC)
 * I think in these cases we tend to stay more on the cautious side and hide everything. There definitely was copyright infringement though. It was resolved by the copyright owners donating their text under a compatible license. Legoktm (talk) 07:18, 26 November 2012 (UTC)