Wikipedia talk:WikiProject Molecular Biology/Molecular and Cell Biology/Archive 11

Reclassification of Cyclopenenone prostaglandins
Would someone review the Cyclopentenone prostaglandins article for the purpose of reclassifying it? The article has recently been greatly expanded. Although it is categorized as a Pharmacy and Pharmacology article, it is better categorized in the Molecular and Cellular biology section. All of the prostaglandins are so categorized (e.g. see prostaglandin). Also, the article as currently formatted is correctly redirected from 15-deoxy-Δ12,14-prostaglandin J2 (15-d-Δ12,14-PGJ2), a principal cycloentenone prostaglandin. Is it possible to similarly redirect it from other cyclopenentone prostaglandins viz., Δ12-PGJ2, PGJ2, PGA2, and PGA1, discussed in the article (but not given separate Wikipedia pages elsewhere) and, if so, how do I do that? Thanks, (User talk:joflaher). 6 December 2016 (UTC)

Revisit figure of Illumina dye sequencing
Illumina sequencing is a very important method to know nowadays, as least for anyone working in molecular biotechnology. The figure for the creation of clusters is correct, but could use someone who puts in some love. It was obviously drawn quickly and I think the article deserves more.

https://en.wikipedia.org/wiki/Illumina_dye_sequencing

There are many nice resources to find inspirations, such as the Youtube channel of illumina.

Proposal to delete the MCB tag option from Template:WikiProject Microbiology and Template:WikiProject Fungi
Following the discussion above, interested parties should also chime in at WikiProject Microbiology and WikiProject Fungi. --Nessie (talk) 15:10, 12 April 2019 (UTC)

Question about naming conventions
It's at Wikipedia_talk:WikiProject_Equine. Iamnotabunny (talk) 04:54, 2 May 2019 (UTC)

Gap
I was looking up phase separation and found biomolecular condensate which has a rather confusing link to Spinodal decomposition - having had only a just an overview after reading a couple of reviews, it looks like these articles need more work considering that there is currently a lot of talk (hype?) on the topic. I hope someone with specialist knowledge on this can improve coverage on this cluster. Shyamal (talk) 15:57, 6 May 2019 (UTC)
 * One problem is that Spinodal decomposition is a special case of the more general phenomena of phase separation. Hence the later redirect should be converted into its own article. The hot topic is biomolecular condensate which is also a special case of phase separation. Boghog (talk) 16:14, 6 May 2019 (UTC)
 * I went ahead and created a phase separation stub which is probably more in the realm of chemistry than MCB. Hopefully this is less confusing. Boghog (talk) 17:28, 6 May 2019 (UTC)
 * Thank you for doing that! I am surprised that this topic is not covered on Wikipedia, because it is so important to scientific articles across many different fields. This is the type of topic I would normally expect C-level coverage on. I'm happy to collaborate with you on it when I have some time after my exams! Prometheus720 (talk) 18:40, 6 May 2019 (UTC)
 * Thanks for taking an interest - I am afraid I still see a lack of clarity on the reason for the heightened interest - may perhaps be useful for someone to provide a historical view at biomolecular condensate - my near-layperson understanding is that these condensates and their state changes are involved in control and regulation of (enzyme) expression which extends earlier ideas on expression control being largely at the transcriptional and translational stages (and presumably some of the hype perhaps comes from the idea that epigenetics has been a "suppressed" field with much popular writings and some fringe writings of mind-over-matter). Shyamal (talk) 05:56, 7 May 2019 (UTC)
 * I have just found that User:Turiya1952 is working on a draft at Draft:Membraneless organelles - I imagine it would be more appropriate to improve biomolecular condensate... Shyamal (talk) 06:04, 7 May 2019 (UTC)
 * Thanks for including me in this thread User:Shyamal! My understanding of the field (which has garnered a lot of interest in the last five years. Why? Because.) is that there is still a lack of consensus on what these structures ought to be called, and the term "condensate" was proposed by Michael Rosen's group. While "condensate" nicely describes both the material property of these structures as they are understood today from phase separation theory (spinodal decomposition being one framework to understand the formation of these structures) and their physical appearance, I find it most useful to think of it as a biophysical descriptor of the general category "membraneless organelle". I personally would prefer to include biomolecular condensate as a section in Draft:Membraneless Organelle but I am clearly coming at this as a cell biologist.
 * Seconding User:Prometheus720, I'd love to help with both articles now that it is the summer. I'm a PhD student in cell biology, and brand new to Wikipedia, but eager to contribute! Turiya1952 (talk) 12:15, 8 May 2019 (UTC)
 * Great. As far as the entry at which you are working - If biomolecular condensate and membraneless organelles are essentially identical in scope, I would think that they should be just one article, the title of which can be decided by a WP:RM if needed. Shyamal (talk) 12:27, 8 May 2019 (UTC)

Proposed recoding of Template:Gene
The current template hyperlinks to pages with this url: This is a HGNC search page for entries associated with the parameter input to. Due to how the template is coded, it can't be used to search for gene names using a phrase/string, as only the first word in a string is encoded in the url.
 * - example https://www.genenames.org/tools/search/#!/genes?query=18175

Example: contains results for "coenzyme" only; the "Q3, methyltransferase" is omitted because the spaces aren't url-encoded [via replacement with "%20"]).  The correct url for that search is https://www.genenames.org/tools/search/#!/genes?query=coenzyme%20Q3,%20methyltransferase.

The following url links to the HGNC entry for a specific HGNC ID number when the parameter value is an HGNC ID: Since this is a better link target for inputted HGNC ID numbers, would anyone object to me recoding this template using this markup? I can't imagine that this would break any existing links that aren't already broken due to the use of a phrase/string as an input value.  Seppi  333  (Insert 2¢) 19:11, 23 November 2019 (UTC)
 * - example: https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/HGNC:18175

Never mind. A number of template calls use the gene symbol in the first parameter, so recoding it as described would break those links.

I’ll just create Template:HGNC for with the url above for HGNC IDs when I’m back on my laptop.  Seppi  333  (Insert 2¢) 00:39, 24 November 2019 (UTC)

Merge C19orf44 ← C19orf44 (gene)
Can someone take care of this page merge on both WP and wikidata? C19orf44 (gene) has a lot of content; C19orf44 has almost nothing. Found them searching for pages/redirects containing "C#orf#" and "(gene)" in the title. I've taken care of another page that simply needed to be moved from "C#orf# (gene)" to "C#orf#".

There's undoubtedly more duplicate gene articles out there, but I'll look for them later. Going to work on programming a solution to fix the dablinks and mistargeted links to non-gene-related pages in my tables first.  Seppi  333  (Insert 2¢) 00:34, 20 November 2019 (UTC)


 * I've added a Requested_moves/Technical_requests in for the wp page. I've not done the wd item yet, since one refers to the gene and the other to the protein and wikidata currently often keeps both seprate. T.Shafee(Evo &#38; Evo)talk 02:53, 20 November 2019 (UTC)
 * .  Seppi  333  (Insert 2¢) 19:13, 23 November 2019 (UTC)

Does anyone know of any resources that provide IPA or "sounds like" pronunciations for human protein-coding gene symbols (specifically, the acronyms like SERPINA1/TAAR1)?
Context: for training a speech to text AI.  Seppi  333  (Insert 2¢) 20:43, 2 October 2019 (UTC)
 * Really interesting question. I don't know that anything systematic exists sadly. T.Shafee(Evo &#38; Evo)talk 10:28, 3 October 2019 (UTC)
 * I know of no such database. Even establishing ground truth for a classifier would be a challenge. They say S-H-H and I say Sonic Hedgehog. One would need an ensemble of the most common pronunciations in the research and clinical genetics communities. It sounds like something the HGNC project could be interested in. -- 18:59, 3 October 2019 (UTC)
 * Oh well. I don't actually need it for my particular intended use, but it would've made it more reusable since I expect to need to reuse at least once next year.  Seppi  333  (Insert 2¢) 16:39, 4 October 2019 (UTC)
 * I wonder whether there is at least a way to guesstimate the pronunciation. E.g. if 4 letters or shorter, just say each letter separately. If 5 or longer check whether it looks pronounceable (based on vowel placement) and otherwise play it safe and say each letter separately. That'd get things like NAD, SERPIN and EGFB correct. Sadly it'd get SERPINA1 wrong (pronounced "serpin-A-1"). T.Shafee(Evo &#38; Evo)talk 01:28, 5 October 2019 (UTC)
 * I'm using IBM's version of speech-to-text since they've apparently got the best-performing base model among the big 4 (MSFT Azure, AMZN AWS, Google Cloud, IBM Watson); their base model uses a phonetic pronunciation for words which it doesn't recognize, so it'll treat any string of text without a vowel as an isolated group of letters to be read individually; I think it'd probably try to pronounce NAD as "nad" though. Based upon the pronunciation rules for a word's sounds-like field, for SERPINA1 I'd use "sir pin A. one".  I might sort all 20000 gene symbols and filter out all the ones which can't be pronounced by IBM's phonetic rules and then manually code those pronunciations later on.  The correct pronunciation of a gene symbol is the only thing I can't get from training the AI on a text corpus. Right now, I'm focusing on training the AI on relevant text corpora for all the relevant domains of a biomedical hackathon as well as sets of medical, molecular biology, and AI-related audio files along with their transcriptions so that it can learn to transcribe words like "subluxation", "transcriptome", "immunoassay", "deep neural network", etc.  It learns that stuff directly from being trained on text corpora alone, but training on an audio file along with its transcription tells the AI exactly how it's pronounced.  Without training the AI on domain specific text corpora, the audio transcription it produces is laughably bad because the base AI model is trained only on common/everyday words for technical (model performance) reasons.
 * In any event, my brother studied film at Berkeley, hence he made video of his hackathon with a small film crew: I could manually transcribe all of the audio in this video faster than I expect it will take me to adequately train an AI to recognize multiple technical domains.  But, manually transcribing the audio is extremely tedious, whereas I like the challenge this presents and it'll take me much less time to transcribe with a trained AI later on.  All I'd have to do is open my python script, change the input audio file name, and run it.  You can see how poorly an untrained speech-to-text AI performs if you follow the YouTube link above and click the ellipsis tab ("...") below the full screen icon, then click "Open transcript"; only about half the transcript makes any sense.  Seppi  333  (Insert 2¢) 03:16, 5 October 2019 (UTC)

... Or just a resource for the gene symbols?
I'm ready to tackle this now. Anyone know of any source (i.e., a single webpage or data file [any format, structured or unstructured]) with a list (preferably expert-reviewed) of just the 20,000-ish HGNC-approved gene symbols? I imagine I could probably find this if I looked hard enough in genomics info/data-bases. With exception for KEGG, I only have familiarity/experience with proteomic and metabolomic DBs; I don't really know what expert-reviewed/annotated genomics DBs are available, so I'm not sure where the best place to start looking for this data would be.

FWIW, it probably wouldn't be a bad idea for this project to have a subpage containing wikilinks to all those symbols for maintenance and bot-related purposes.  Seppi  333  (Insert 2¢) 05:42, 29 October 2019 (UTC)


 * Nevermind. I found a data file - 'protein-coding_gene.txt' - here on the HGNC website.
 * The subpage I suggested is now located in two places since it's apparently too large to fit on one page:
 * WikiProject Molecular Biology/Molecular and Cell Biology/Human protein-coding genes1
 * WikiProject Molecular Biology/Molecular and Cell Biology/Human protein-coding genes2


 * In the event anyone cares to update those tables in the future, see WikiProject Molecular Biology/Molecular and Cell Biology/Human protein-coding genes. the python code I used to generate it follows. All you'd need to do is download an updated   file from HGNC, put the .txt file and the python file containing the code below in the same working directory, and then just run the python code; it then creates the   files, which contain wikitext markup for those wikitables, in the same directory.  Seppi  333  (Insert 2¢) 17:11, 31 October 2019 (UTC)
 * Would you happen to know why it takes so long to commit a large edit (~1 MB)? The parser profiling data for when I was editing those pages was >60 seconds, whereas small edits like this one take a fraction of a second (0.067 sec). It seems excessive that it requires more than a minute to write just 1 MB of text to WP.  Seppi  333  (Insert 2¢) 21:51, 31 October 2019 (UTC)
 * Which editor (there are lots) are you using? WhatamIdoing (talk) 12:36, 1 November 2019 (UTC)
 * this one: https://www.mediawiki.org/wiki/Extension:WikiEditor  Seppi  333  (Insert 2¢) 14:48, 1 November 2019 (UTC)
 * Is this a consistent/persistent problem? If it's just one day, then there might have been some kind of glitch somewhere.   WhatamIdoing (talk) 20:39, 3 November 2019 (UTC)
 * Since the HGNC database changes everyday, I've updated those 2 pages twice since publishing them last week, most recently today. Each time, it took approximately 60 seconds to publish the edit.  Even loading these diffs takes a while for me: . You can check how long it takes to write an update to those pages on your machine if you're on Windows: just download  this text file from HGNC and  this zipped Windows executable file (it runs this python code on your machine) and run it; it'll provide direct links to edit those pages and indicate which auto-generated text file should be used to replace the source code of each page. Your antivirus software might prevent you from running the executable unless you create an exception for that program though. Addendum: I've automated everything except writing the text to Wikipedia as this point...  Seppi  333  (Insert 2¢) 17:26, 4 November 2019 (UTC)
 * Come to think of it, I should probably just use the WP:Pywikibot library to write a simple script to automatically update those pages everyday. I've never gone through the bot approval process though, so IDK if I'd be able to gain consensus to use a bot to write daily updates to 2 new/obscure project pages. I suppose I could get approval to do that if I moved those tables to the article space at List of human protein-coding genes a la List of human genes, but since the list has to span 2 pages due to page size limits, I'm not sure what I'd title the 2nd page. List of human protein-coding genes 2 maybe?  Seppi  333  (Insert 2¢) 17:26, 4 November 2019 (UTC)
 * It's slow for me, too. I don't know if the templates (e.g., gene) make any difference, but huge pages should be expected to be slow, I guess.  The 2017 wikitext mode paints the toolbar sooner but takes longer to get the end of the page loaded.  User:Jdforrester (WMF) might be able to tell you where things get slow, but I suspect that the practical recommendations look like "don't do that".  Multiple shorter pages or moving it all to Wikidata would probably be less painful to maintain.  WhatamIdoing (talk) 23:30, 6 November 2019 (UTC)
 * It's slow because you're writing far too much into a single page, sorry. MediaWiki used to cap pages at 64KiB and discourage edits above 32KiB, and though that hard limit was raised to 2MiB (mostly for very complex templates), it's still built principally for around that size. If we were going to optimise for larger pages, we'd have to drop a bunch of the features that make wikitext slow (templates, lua, etc.). I'd recommend splitting up the pages more aggressively. Jdforrester (WMF) (talk) 15:00, 7 November 2019 (UTC)

Human metagenome and microbiomics
I was surprised to find that there isn't an article on these topics. The more general article on metagenomics has far too broad a scope to be suitable piped linking since it pertains to aggregation of genomes in an arbitrary environment (e.g., a pond, a person's body, soil, etc.). That said, there are two level 3 sections in that article that pertain to human metagenomics: Metagenomics and Metagenomics. I'm also kind of puzzled as to how there's no article on microbiomics given how much content is contained in the article on the more niche topic of pharmacomicrobiomics. In any event, I'm just posting these here in the event anyone knows of a suitable article for redirecting or wants to create a stub. There obviously a rather large amount of topical overlap between [human] metagenomics and [human] microbiomics (i.e., metagenomic sequencing is used to characterize all of the genomes within an arbitrary microbiome), so I suppose human metagenome could redirect to human microbiome provided that it's covered there. Similarly, microbiomics could probably be covered in a section of metagenomics.

I wrote a new section in Amphetamine which covers material pertaining to these topics (it's censored in the source) but your citation tool is down and I'm too lazy to manually look up the parameter inputs and write the markup for 4 citations. Would you be able to fix it, please?  Seppi  333  (Insert 2¢) 18:17, 10 July 2019 (UTC)
 * The citation filling tool started working again an hour ago. For the gorry details, see T226088. Boghog (talk) 19:30, 10 July 2019 (UTC)
 * Per your request, I have added the reference details. What is up with the flashing stop sign when this article is edited? This is incredibly annoying. Boghog (talk) 05:17, 11 July 2019 (UTC)
 * Ah, thanks. I still have to add/refine it a bit, but will do that later.
 * I was tinkering around with amphetamine's edit notice 30 minutes prior to abruptly leaving Wikipedia for a month a few months ago. Forgot about that, but whatever the case, something in that edit notice needs to draw attention to the 2nd paragraph.  I've forgotten how many times that someone has stopped by and unwittingly borked that group of article by breaking the selective transclusion markup, but I know its well over 10.  I thought those transclusion-breaking issues were resolved when I switched the transclusion method to WP:LST, but that apparently wasn't the case since Doc James managed to bork it.  The effects of broken syntax are less significant w/ the newer method though. If the markup for the former method was broken, the pages transcluded more-or-less the full amphetamine article multiple times.  If the current method's markup is broken, nothing is transcluded, so I suppose toning it down shouldn't be a problem.  Seppi  333  (Insert 2¢) 15:28, 11 July 2019 (UTC)
 * On a tangential note, I figured I'd ask since chemistry and I apparently do not get along well: that paper mentioned the enzyme was tyramine oxidase and linked to ExPASy (corresponing enzyme entry), but they didn't specify a metabolite. Based upon the reaction listed in the link, wouldn't the product when amphetamine is used as a substrate be this (alpha-methylphenylacetaldehyde)?  Seppi  333  (Insert 2¢) 16:39, 11 July 2019 (UTC)

Discussion at Draft talk:Horizontal transfer of mitochondria
You are invited to join the discussion at Draft talk:Horizontal transfer of mitochondria. Worldbruce (talk) 14:47, 24 June 2019 (UTC)

Christopher Kaelin up for deletion
IMO, well sourced article about a geneticist. But you can help improve it. 7&amp;6=thirteen (☎) 15:52, 4 July 2019 (UTC)

Your input appreciated
Hi all,

I would appreciate your input in this strange case: Wikipedia_talk:Copyright_problems.

--Steven Fruitsmaak (Reply) 20:00, 20 July 2019 (UTC)

Edit Request on ATAC-seq Page
Hi WikiProject Molecular Biology members,

I have made numerous small edits to the ATAC-seq page, a page that is within the purview of your project, to improve its quality. I've also rewritten the whole page to remove promotional and overly-technical content and to add citations and crosslinks. But I have a conflict of interest, so I cannot make the changes myself. I copied my rewrite into the talk page. Could a project member please review my rewrite and use it to replace the content that is there?

Thank you,

cglife.bmarcus (talk) 1:17, 5 September 2019 (UTC)
 * Since it's also relevant to the genetics and compbiol parts of the WP:MolBio, I've also copied it to the general talkpage here. T.Shafee(Evo &#38; Evo)talk 12:29, 6 September 2019 (UTC)

CLB2
Just found a link on Angelika Amon to the redirect CLB2 which is an airport code. Is CLB2 also an abbreviation for Cyclin B2? CambridgeBayWeather, Uqaqtuq (talk), Sunasuttuq 05:50, 30 September 2019 (UTC)
 * Correct. CLB2 typically reffers to yeast cyclin B2 in that context (see entry in Uniprot). T.Shafee(Evo &#38; Evo)talk 09:45, 30 September 2019 (UTC)

Functional proteins despite compound heterozygous frameshift mutations
I need some help interpreting the implications of the TNXB entry in this spreadsheet: (Suspect variants.xlsx).

The spreadsheet indicates my brother's TNXB gene is compound heterozygous due to distinct frameshift mutations on each allele, reflecting the autosomal recessive form of "TNXB deficiency". From my understanding, frameshift mutations tend to result in the translation of proteins that are completely functionally borked. So... his genotype predicts a clinical phenotype of "classic-like EDS" (clEDS), which is rather severe.

I've repeatedly read a number of times in about 20 or so papers on TNXB this past week that "TNXB haploinsufficiency" (associated with having 1 wild type and 1 mutant allele, autosomal dominant inheritance) causes "hypermobility-type EDS" (hEDS), a milder clinical phenotype than clEDS in which hypermobility can manifest with muscle issues, skin issues, and/or vascular issues.

His actual clinical phenotype is User:Seppi333, which notably lacks any skin involvement or easy bruising, both of which always occur in TNXB deficient individuals and are required for a clinical diagnosis of clEDS. He meets the straightforward clinical diagnosis for hEDS listed here. In comparison, I am wild-type; I fail to meet both criterion 1 (Beighton Score: 2/9) and criterion 2 (2A: 1/12 total, 2B: obviously applies, 2C: 0/3) in that list and so criterion 3 is N/A. Edit: I have no clue about my genotype since I've never gotten or needed any form of genome sequencing.

I'm a bit confused as to how to explain this; the only explanation I can think of is that one or both of his compound heterozygous alleles is creating a functional frameshift-mutated tenascin-X protein. Since he doesn't have skin or vascular symptoms and since the loss of a single TNXB allele is sufficient for – and a common cause of – hEDS, it seems like the combined protein function across both alleles can't be significantly worse than having 1 wild-type allele and 1 allele from which a protein isn't translated. Is that rather unusual or are frameshift mutations simply not as pathogenic as I've been lead to believe?  Seppi  333  (Insert 2¢) 23:51, 5 October 2019 (UTC)

The penetrance of TNXB deficiency (i.e., the complete loss of tenascin X expression and/or function) in manifesting a phenotype with the 3 major criteria of clDNS is 100% according to the sources I've read; this is primarily because clEDS is the variant of EDS which is defined by that genotype. As for the penetrance of haploinsufficiency causing one or more of the 3 base hEDS phenotypes (muscular, vascular, integumentary), based upon my understanding, it's high but not 100% variable depending upon the study. Also, as I understand it, TNXB haploinsufficiency is only typically associated with the muscular features of hEDS.  Seppi  333  (Insert 2¢) 08:31, 6 October 2019 (UTC)
 * I cannot comment on TNXB, because I have no expertise with the gene or associated disease. Nor am I medical geneticist. In general, a frameshift can have major effects on the protein. But pathogenicity of a variant depends in part on its location in the protein. A frameshift toward the end of the protein, for instance, may only mess up the very end of polypeptide and not change protein conformation and/or enzymatic activity, too much. Also is this the only possible protein/pathway involved in the disease? Sometimes alternate pathways can partially make up for a broken main pathway. Sometimes putative Mendelian diseases are not so Mendelian. -- 03:07, 6 October 2019 (UTC)
 * Oh, no he definitely has other pathogenic mutations; the ones I listed were just his musculoskeletal, vascular, and gastrointestinal symptoms. His other major set of symptoms is metabolic: he's always struggled to gain weight - both fat and muscle - despite eating sufficient amounts for a normal person to do so.  He's understandably fatigued fairly often.  Also, wound healing for him takes longer than usual for other people and he has hereditary fructose intolerance due to a pathogenic mutation in his aldolase B gene that he's known about for a while. I don't recall any other metabolic symptoms right now, but these have all been lifelong for him.
 * The last set of symptoms is immune and they're a bit perplexing. He used to have chronic issues with plantar warts, which kept recurring despite the use of salicylic acid and numerous cryotherapy treatments.  He claims that his problems resolved around the time he got an HPV vaccine though.  He also used to get frequent fungal skin infections when he was a teenager; they were distressing more for cosmetic reasons (i.e., discoloration of patches on his back/neck) than for anything else.  The infections are at least partially explainable by some of his blood tests revealing low IGG levels.  He's experienced chronic low-grade inflammation in his gastrointestinal tract (based upon various endoscopies and a full-thickness biopsy of his small intestine, it's evident everywhere from his mouth - particularly the uvula - all the way through to the ileum) for a number of years as well; it manifests as redness and mild swelling of oral/gastric/intestinal mucosa.
 * Ever since his very early childhood up until a couple of years ago when he convinced a doctor to perform an experimental surgical treatment in which his uvula was folded into itself and sutured together, he experienced chronic severe nausea and vomiting which often occurred after eating. He's essentially had a hypersensitive vomiting reflex arc his entire life, but it normalized quite a bit after that experimental treatment. He told me he still occasionally experiences nausea, but it is never anywhere near as severe as it used to be. Anyway, given the polygenic nature of his clinical phenotype, it's not surprising that he's been undiagnosed until now.  Other than ADHD – which my brother has as well – I have no medical history suggestive of a genetic disorder. It seems a little unfair that he ended up getting stuck with all that extra genetic baggage.  Seppi  333  (Insert 2¢) 06:35, 6 October 2019 (UTC)
 * Also is this the only possible protein/pathway involved in the disease? It occurred to me you were asking about EDS, not the other stuff I mentioned above. I think it's very likely that the pathogenic structural variant(s) that're causing his metabolic symptoms also exacerbate his EDS phenotype. If any of his TTN gene mutations are pathogenic, that would also exacerbate the phenotype because pathogenic TTN mutations result in joint hypermobility.
 * This is very interesting. One question I had is whether there's any chance of a natural DNA recombination process (such as Homologous recombination) "accidentally" fixing the gene, by swapping the pieces around (you'd end up with one copy broken twice and the other not broken at all).  I don't think that process normally happens in the middle of a gene, but that led me to wonder:  how certain are they that the two frameshift mutations are actually on separate chromosomes?
 * On a related point, is he able to tolerate Quinolone antibiotics, or do they give him tendonitis and joint pain? And what happens if he tries to eat a barely-cooked egg white?  (Please ping me.) WhatamIdoing (talk) 16:01, 7 October 2019 (UTC)
 * Hmm. I wasn't at the hackathon, so my knowledge of each teams' findings is pretty much limited to what was covered in the video of the final presentations and what's hosted on the event's github. That said, the team that pointed out that my brother's TNXB gene is compound heterozygous was fairly large compared to most of the other teams (i.e., 14 people) and their expertise was concentrated primarily in bioinformatics, AI, and molecular biology; they lacked someone with strong medical/clinical background though. Due to the collective expertise in bioinformatics alone, I kind of doubt that they misidentified a monoallelic mutation as a biallelic mutation in TNXB. When going through their findings, the lack medical/clinical expertise on that team is mildly apparent though.
 * I'm virtually certain my brother has taken ciprofloxacin and/or levofloxacin before since our father was a practicing ER physician for 30+ years before he retired and he wasn't even remotely hesitant to prescribe drugs for us (the last thing he Rxed me before he retired was actually ciprofloxacin and amoxicillin) . Because of that, my brother has tried dozens of different drug therapies for treating his condition (a handful are mentioned here) and I know that, on one occasion, he basically nuked his gastrointestinal microbiome and then consumed between 1-10 trillion CFU of probiotics (that's a small fraction of the number of microbes that reside in a healthy adult's large intestine) over several days.  I don't remember him ever experiencing an adverse drug reaction to an antibiotic or antifungal medication; I know he's never ruptured a tendon, but he has had issues with spinal disc herniation and tried this experimental treatment twice: spinal disc herniation. It significantly improved his back problems each time, but the treatment effects only seem to last ~1-2 years and it's not particularly cheap. I'm not aware of eggs causing him problems, undercooked or otherwise. Why do you think that would be problematic for him?  Seppi  333  (Insert 2¢) 06:50, 8 October 2019 (UTC)
 * Hi Seppi333, I gather that both of these are associated with EDS. Cipro and related drugs seem to (insert lots of handwaving here) interfere with collagen production, resulting in a higher risk of ruptured tendons, so if you've got a problem with collagen's next-door neighbor, then maybe you would have problems with it, too.  And Mast cell activation syndrome seems to be a not-entirely-unusual comorbidity among EDS folks, and it could explain a lot of things.  I made a (perhaps unwarranted) leap from MCAS to Egg white intolerance, because it also involves mast cells.  Both of these things could tend to be confirmatory evidence of the EDS idea.  WhatamIdoing (talk) 15:27, 8 October 2019 (UTC)
 * Ah. I was familiar with the risk of cipro causing tendon ruptures, but not eggs and mast cell activation.  I'm not aware of any him having any issues with allergic diseases; of the two of us, I'm actually the one who has allergies. I'm pretty familiar with MCAS since I created the article a while back and did some work on it though.  I'm still learning about EDS and I have a feeling that as more research into the pathophysiology of hEDS/clEDS and TNXB structural variants is published, it will shed light on more of the symptoms that compose my brother's complex clinical phenotype.  Seppi  333  (Insert 2¢) 03:18, 9 October 2019 (UTC)

Problematic links in the wikitables
There's two issues I should address before I even consider moving those tables to the article space. The first is DABlinks. There's a total of 389 gene symbol DAB links in both tables; I can easily fix that by creating a python dictionary with entries like  for all 389 of those genes, then rewriting the gene symbols in the   file if the gene symbol is a key in that dictionary. If I were to do that prior to writing the wikitable markup in the python script, all of the entries for those DAB link gene symbols in the wikitable would be piped links like.

The bigger problem is pages where an article for the gene symbol does not exist and the target article of the wikilink is about something completely unrelated. Suppose, for example, rage was not a disambiguation page, but an article about the emotion and that RAGE redirected to that article. Then RAGE would appear as a blue link in the wikitable and not be a link to a disambiguation page. In that case, the Dablinks tool wouldn't tell me that I should be linking RAGE to RAGE (gene) via a piped link in the table. The only way this could happen is if there's no article about the gene and there's a notable topics about something else with the same name to merit creating a disambiguation page; for any given gene symbol, I'd say there's a very low probability of that happening, but there's literally 20000 gene symbol links in that table, so there's a good chance that at least one such link like that exists in the tables. I'm not sure how I can identify a blue-linked gene symbol where the link target is an article unrelated to the corresponding gene though.

@,, , or anyone else for that matter: would you happen to know of a way I might be able to detect problematic wikilinks like that?  Seppi  333  (Insert 2¢) 19:53, 4 November 2019 (UTC)


 * Hmm... after reading the Pywikibot documentation, it looks like I can probably build my own tool to detect links to non-gene-related pages using the linkedPages function and some natural language processing with Python's NLTK library; it would not be difficult for me to program that, but it would take quite a while for that script to run since I'd be loading and parsing the content of thousands of pages.
 * The linkedPages function creates a generator that I can use in a loop to iteratively load the contents of every page linked in those tables; if the target page of a wikilink is indeed about a gene/protein, it should contain one or both of those words. I can use the NLTK library to word-tokenize the page contents (i.e., generate a list of all words contained on the page, formatted as strings) and then check to see if "protein" or "gene" is contained in that list of strings. If neither word is included in the page contents, I can return the page title for me to manually review later. That said, if anyone knows of a simpler way to check for problematic links, please let me know.   Seppi  333  (Insert 2¢) 21:15, 4 November 2019 (UTC)

Sorry, late to the discussion. Another option is Wikidata. These data items provide links to the corresponding Wikipedia article if one exists. Boghog (talk) 07:49, 5 November 2019 (UTC)
 * That's not a bad idea. It'd definitely be less time-intensive to check the links through Wikidata items as opposed to parsing Wikipedia page content. I'll need to look at the Pywikibot documentation again tomorrow to see how I might go about coding that.  Seppi  333  (Insert 2¢) 09:49, 5 November 2019 (UTC)
 * Was working on automating the script yesterday. Will look into coding a program to check for bad links today or tomorrow.  I noticed M195 is one of the links in the 2nd table (see the withdrawn entries), so I definitely need to check for other links before moving it to the article space.  Seppi  333  (Insert 2¢) 19:35, 6 November 2019 (UTC)
 * ... and AK1 is an approved gene symbol that links to an article about an airport with that airport code.  Seppi  333  (Insert 2¢) 22:05, 6 November 2019 (UTC)

Sigh. The Dablinks tool did not give me all the dablinks in these tables. I'm guessing it only parses a few thousand links before stopping. Working backwards and checking the last 1000 entries in the 2nd table, I found WIZ, WLS, WRN, XDH, XG, XK, XPC, ZYX. I'll need to program a solution to take care of that as well.  Seppi  333  (Insert 2¢) 19:52, 6 November 2019 (UTC)
 * Started working on this again today. Figured out that the dablinks tool parses 5000 links at a time, so I ended up fixing both tables this time around using it. There were >700 dablinks. Now onto fixing the mistargeted links... >.>  Seppi  333  (Insert 2¢) 02:46, 22 November 2019 (UTC)

I'm running my mistargeted link detection script right now. I expect it will take around two hours to finish based upon sample tests. Assuming nothing goes wrong, I'll post the complete list of links here, since I will need help locating the correct link targets if they exist.  Seppi  333  (Insert 2¢) 02:49, 26 November 2019 (UTC)

There's tons of script output spamming my output window right now, but I was curious when I saw a RFNG redirect to Radical Fringe gene. It doesn't have a gene infobox; could you fix the pair of wikidata items for that? I'm not sure what to do with the one that radical fringe gene is currently linked to. Edit: attempted to just go ahead and move the link to the right data item, but it says the other needs to be deleted. >.>  Seppi  333  (Insert 2¢) 04:38, 26 November 2019 (UTC)

Not sure what to do with this gene disambiguation page
Pretty sure this is the very last dablink in the wikitables at the moment.


 * COQ3 (gene)
 * HGNC COQ3 link: (gene name: coenzyme Q3, methyltransferase)
 * Official UniProt name for protein: Ubiquinone biosynthesis O-methyltransferase, mitochondrial

COQ3 (gene) redirects to COQ3, which includes a list of 2 enzymes. The wikidata item for COQ3 in Homo sapiens isn't linked to a WP article:.

From a cursory glance, it looks like 3-demethylubiquinone-9_3-O-methyltransferase is the correct article of those 2. Should the WD item be linked to that and a gene infobox be included in the article?  Seppi  333  (Insert 2¢) 19:10, 23 November 2019 (UTC)


 * It makes sense to merge enzyme EC and gene Wiki articles if and only if there is a one-to-one correspondence between EC number and HUGO gene symbol. In this particular case, a single human gene COQ3 ( is assigned to more than one EC number (2.1.1.64, 2.1.1.114, 2.1.1.222).  Hence I think it would be better to convert COQ3 to a gene specific page and leave the three enzyme pages (Hexaprenyldihydroxybenzoate methyltransferase, 3-demethylubiquinone-9 3-O-methyltransferase, and 2-Polyprenyl-6-hydroxyphenyl methylase) as is. Boghog (talk) 06:38, 24 November 2019 (UTC)
 * I went ahead converted COQ3 from a disambiguation to gene specific page. Boghog (talk) 06:49, 24 November 2019 (UTC)
 * It makes sense to merge enzyme EC and gene Wiki articles if and only if there is a one-to-one correspondence between EC number and HUGO gene symbol. That's useful to know. Thanks for resolving the disambiguation issue.  Seppi  333  (Insert 2¢) 10:10, 24 November 2019 (UTC)

Possible bot task in the future
After I fix the few dozen mistargeted links I mentioned, these tables will contain links to almost all pages in the GeneWiki along with a list of 1 or more UniProt IDs for each gene.

Given that these tables contain all the article links and id numbers I'd need, I could program an algorithm to move all the pages to which the the gene symbols are linked to their official UniProt names a la MOS:MCB provided there is only 1 UniProt link associated with the gene symbol; it obviously makes no sense to move a page that corresponds to multiple UniProt IDs. I could also set an upper limit on the length of the string-formatted UniProt name to avoid moving pages to long page titles per this: If the protein name is verbose, either a widely used protein acronym or the official HUGO gene symbol, followed by "(gene)" if necessary to disambiguate. Should I work on a script to move gene pages that are located at gene symbols to the pagename of their official UniProt name, or are there circumstances where doing that would be problematic?  Seppi  333  (Insert 2¢) 21:30, 25 November 2019 (UTC)

Lines 6–11289 in User:Seppi333/GeneListNLP show that the vast majority of pages about human protein-coding genes use the gene symbol as the page name. Using ctrl-F to count them, 8600 pages are located at the gene symbol and 2683 are located on pages with a different title (NB: these numbers include the 226 unrelated pages, so they're not exact counts for each). Also, the fact that only 11283 links were processed suggests that Wikipedia is missing around 8000 articles on human protein-coding genes. On a tangential note, I thought I moved almost all of the pages in the solute carrier family from the gene symbol to the official UniProt name, but that output indicates that I missed a few.  Seppi  333  (Insert 2¢) 09:20, 26 November 2019 (UTC)

Feedback on the wikitables
Specific things I'd like feedback on:  Seppi  333  (Insert 2¢) 05:16, 24 November 2019 (UTC)
 * Genes with "Entry withdrawn" status at the end of the 2nd table. The HGNC website states that genes with that status were previously approved, but are no longer thought to exist. Should I remove those?  Seppi  333  (Insert 2¢) 05:16, 24 November 2019 (UTC)
 * The invariant "Locus group" column. Should I remove the entire column from the table and replace it with data from one of the other fields listed in https://www.genenames.org/help/statistics-and-downloads/ (e.g., location, NCBI gene ID, etc.)? The only field I can't include is the gene name because the resulting table will exceed MediaWiki's page size limit.

Possible project consolidation
There's a discussion going on over at Wikipedia_talk:WikiProject_Biology about whether it is worth consolidating some of the disparate biology wikiprojects. One possibility could be a merger or semi-merger of WP:GEN + WP:MCB + WP:COMBIO + WP:BIOP, since their scopes are well-aligned. Ideas and opinions welcome! T.Shafee(Evo &#38; Evo)talk 01:14, 25 May 2019 (UTC)
 * WP:Gene Wiki may also be included in this. We are still in the early stages of this discussion. Comments are welcome (read:needed)!!
 * I should also add that we are considering bringing some aspects of WP:Wikiproject X to WP:Biology and turning it into a proper meta-project. Whether MCB remains independent or is merged into a larger project, there will be an opportunity to participate in that change as well. Prometheus720 (talk) 03:39, 25 May 2019 (UTC)

Confirmation pre-merger
Hello, based on the consensus at the WP:Biol discussion, this is confirmation of my suggestion to merge: WP:GEN + WP:MCB + WP:BIOP + WP:CELLSIG (possibly + WP:COMBIO) -> into Wikipedia:WikiProject Molecular Biology (name to be confirmed) The new main page should be able combine all of the information of each project (much of which overlaps) and the talkpage should also also centralise discussion to make it more lively and easier for newcomers! Separate tracking tables of article qualities can still be kept by making them 'taskfores' if people think that'll be useful. If people don't object I'll go about redirecting the WP and WT pages to that centralised location next week per this process. T.Shafee(Evo &#38; Evo)talk 13:09, 10 June 2019 (UTC)


 * I have pinged all of you as you are marked as active participants under Wikiproject Directory tool.


 * I generally think the proposed merger would be beneficial as most if not all of the involved projects are well below critical mass. (It is a bit depressing to learn that there are only 4 active WT:MCB editors). Boghog (talk) 16:05, 10 June 2019 (UTC)

Merger complete! See the new unified talkpage. All talkpage archives should be clearly visible and searchable and the new unified WikiProject page is almost complete. T.Shafee(Evo &#38; Evo)talk 12:25, 16 June 2019 (UTC)

A possible Science/STEM User Group
There's a discussion about a possible User Group for STEM over at Meta:Talk:STEM_Wiki_User_Group. The idea would be to help coordinate, collaborate and network cross-subject, cross-wiki and cross-language to share experience and resources that may be valuable to the relevant wikiprojects. Current discussion includes preferred scope and structure. T.Shafee(Evo &#38; Evo)talk 03:04, 26 May 2019 (UTC)

would you kindly help me with this image
Hello everyone, i am the author of the DNA replication image give or take i made this image arround 2007, i have been asked to modify it to incorporate all the comments and corrections made on the discussion page plus someone left the following request on the Graphics lab on commons:
 * Pol epsilon should be on the leading strand and attached to the helicase (which should have a ring structure). Pol delta should be on the lagging strand. Pol delta might have some role in leading strand replication (still debated to which extend), but is mostly responsible for lagging strand replication. See also File talk:DNA replication en.svg --大诺史

Now I am an illustrator and by no means expert in this subject so i would really appreciate if someone with the appropriate knowledge could have a look on this Sketch. it isnt yet on its final form but the elements are in the relative position i am hoping them to have (all marked with ? would need some reassuring). also there is like a ton of questions i would really want to have answer if that is ok:
 * I have been told that the polymerase that works on the leading strand is attached to the helicase. if so, where do the SSBs (single stranded binding proteins) connect on that side of the fork? also i have seen other diagrams with this polymerase hugging the lagging strand instead sso whicone would be correct?
 * i made the sliding clamp on the polymerase seem part of the same body, i am aware it is a completely different enzyme is just i thought it would show they both work together and also gives the drawing some directionality which seems to be important in this subject. is this ok?
 * i am very confused about which polymerase is what greek letter of pol. on the sketch i have made my best guess but i still not sure which is which. Also would it be a problem if a place several polymerases working on different sections of the lagging strand? at the moment the same primase, and polymerase must go round and round backwards to do the job.
 * on the last image i was asked at least once a month to change the 3'5' notations on the diagram. i have made a little color hook on the primers to show where the 3' end of it would be, and made a image section to explain here is where the polymerase attaches to start working. would this be helpfull or do you think it will make things more confusing?
 * is it practical to add the Telomerase in the diagram? i am not even sure if it would attach there, would you find it more useful or should i leave it out?
 * because of space resstrains i placed the RNAse at the same spot than the DNA polymerase that repairs the gap. is this understandable or would they need their own gap?
 * In many images i have fished out online (excluding those based on my own image) i have found often that enzymes are much bigger than what i have show, and i dont mind that becouse more than scale i am more interested in showing function, but it does worry when i see images like this one where the single strand make strange loops.. is there anything about this i should be aware of?

if you would be so kind to answer my questions i could at least have some degree of confidence in remaking the image in a better more accurate state. 大诺史 was kind to offer this article as reference. I look forward to your feedback. and thank you for your time -LadyofHats (talk) 18:58, 5 June 2019 (UTC)

PDB EM rendering for pre-RC images
The articles pre-replication complex, minichromosome maintenance, and origin recognition complex can use some better images for the protein structures in light of the not-so-recent publications of EM structures. My computer and my brain are currently too messed up to sort out the PyMol or whatever rendering thing, so would anyone please render some views (two will do: front and top) for PDB ID (pre-RC OCCM),  (MCM; 5UDB will do too), and  (ORC; 5UDB will do too)? --Artoria2e5 🌉 23:48, 23 June 2019 (UTC)

Gene symbol link targets that don't include the words "protein", "proteins", "gene" or "genes"
So, below is what my algorithm found.

If the mistargeted gene symbol isn't a redirect, it's listed as PAGENAME on that line.

If the term is a redirect, the format below is REDIRECT → PAGENAME.  Seppi  333  (Insert 2¢) 05:15, 26 November 2019 (UTC)

I'm not sure how to address this. For each gene symbol below that no relevant article exists, should we just create a DAB pages with redlinks to SYMBOL (gene) listed on it? A lot of the symbols below should clearly link to disambiguation page.  Seppi  333  (Insert 2¢)


 * Actually, I think it would make sense for me to just pipe the links for the gene symbols to SYMBOL (gene) for now and maybe get some assistance from WP:WikiProject Disambiguation on this to address the DAB issues. This approach doesn't work for the enzyme pages though since, like  mentioned: It makes sense to merge enzyme EC and gene Wiki articles if and only if there is a one-to-one correspondence between EC number and HUGO gene symbol.  I could really use some help going through the enzyme page links below to see whether or not the gene symbol should redirect there or to SYMBOL (gene). I intend to deal with all the others myself using the algorithm I've described below.  Seppi  333  (Insert 2¢) 09:39, 26 November 2019 (UTC)
 * Assistance needed with enzyme entries 7, 129, and 179. The issue is described on the line for that entry. Addendum: I'm just going to ignore the issues and pipe the links to the parenthetically disambiguated gene symbol for the 3 entries. If anyone has a better solution based upon what I wrote below, feel free to chime in. I'm going to leave the gene symbols for the other enzymes in this list unpiped since they're the correct targets.  Seppi  333  (Insert 2¢) 05:53, 27 November 2019 (UTC)
 * Nvm. It's readily apparent to me what to do now. FYI: I've been merging and redirecting a number of duplicate WP articles and moving sitelinks from protein pages to gene pages on WD when a 1:1 correspondence exists between them.  Seppi  333  (Insert 2¢) 00:18, 30 November 2019 (UTC)
 * Note to self: Still need to address the HADH link in the list below and in the list of human protein-coding genes table.  Seppi  333  (Insert 2¢) 00:20, 30 November 2019 (UTC)


 * Script output

If you'd like to help me out with addressing these links, feel free to edit this list with relevant comments (e.g., page moves, disambiguation progress, etc.) to reflect your progress. I really don't care about formatting; just try to maintain the list numbering (e.g., by using ) if you add a comment on a new line. All of the highlighted entries are about enzymes.  Seppi  333  (Insert 2¢) 06:57, 27 November 2019 (UTC)

→ not mistargeted, no disambiguation necessary → requires disambiguation.

Use to mark set index articles for enzymes if you find any in this list.


 * 1) AARD → AARD code  helped disambiguate this.  Seppi  333  (Insert 2¢) 06:57, 27 November 2019 (UTC)
 * 2) ADSL → Asymmetric digital subscriber line
 * 3) AK4 → Ak 4 rifle
 * 4) AK5 → Ak 5
 * 5) AK9 → AK-9
 * 6)   lists a 1:1 corresponence between the enzyme EC number and this gene symbol  Seppi  333  (Insert 2¢) 05:53, 27 November 2019 (UTC)
 * 7)  –  lists the gene symbols ALG10 and ALG10B for this EC number.  Not sure what to do in this context.  Seppi  333  (Insert 2¢) 05:53, 27 November 2019 (UTC)
 * 8)   lists a 1:1 corresponence between the enzyme EC number and this gene symbol  Seppi  333  (Insert 2¢) 05:53, 27 November 2019 (UTC)
 * 9) ALLC → European Association for Digital Humanities
 * 10) APEH → AP European History
 * 11) APOD → Astronomy Picture of the Day
 * 12) ARSF → Natural Environment Research Council
 * 13) ASL → American Sign Language
 * 14) ATRIP → International Association for the Advancement of Teaching and Research in Intellectual Property
 * 15) AVEN → Asexuality
 * 16) AVIL → Pheniramine
 * 17) BATF → Bureau of Alcohol, Tobacco, Firearms and Explosives
 * 18) BBC3 → BBC Three
 * 19) BIVM → Vestmannaeyjar Airport
 * 20) BMX
 * 21) BRF1 → Belgischer Rundfunk
 * 22) BRF2 → Belgischer Rundfunk
 * 23) CAD → Computer-aided design
 * 24) CBS
 * 25) CBSL → Central Bank of Sri Lanka
 * 26) CCN2 → Grand Manan Airport
 * 27) CCN3 → Caroline Aerodrome
 * 28) CCN4 → Conn Aerodrome
 * 29) CCNY → City College of New York
 * 30) CCSAP → Srikrishna Committee on Telangana
 * 31) CCT2 → Cookstown Airport
 * 32) CCT5 → South Brook Water Aerodrome
 * 33) CEPT1 → CEPT Recommendation T/CD 06-01
 * 34) CES3 → List of heliports in Canada
 * 35) CGAS → Children's Global Assessment Scale
 * 36) CGB2 → Carstairs/Bishell's Airport
 * 37) CGB3 → Picton (Greenbush) Aerodrome
 * 38) CHGA → Commission on HIV/AIDS and Governance in Africa
 * 39) CHIA → Children's Hope In Action
 * 40) CHP2 → Belwood (Heurisko Pond) Water Aerodrome
 * 41) CINP → Christmas Island National Park
 * 42) CIPC → CIPC-FM
 * 43) CKLF → CKLF-FM
 * 44) CLK3 → Lucknow Airpark
 * 45) CLK4 → Saint-Michel-des-Saints/Lac Kaiagamac Water Aerodrome
 * 46) CLMP → Council of Literary Magazines and Presses
 * 47) CLTA → Chinese Language Teachers Association
 * 48) CMC2 → Misericordia Community Hospital
 * 49) CNN3 → Shelburne/Fisher Field Aerodrome
 * 50) CPA4 → Simcoe (Dennison Field) Airport
 * 51) CRB2 → Cottam Airport
 * 52) CRCP → Consumer Rights Commission of Pakistan
 * 53) CROT → Controlled NOT gate
 * 54) CSF3 → Poste Montagnais Airport
 * 55) CSH2 → Isle-aux-Grues Airport
 * 56) CSN3 → Saint-Jérôme Aerodrome
 * 57) CST9 → Mont-Tremblant/Lac Ouimet Water Aerodrome
 * 58) CTSH → Cognizant
 * 59) CTU2 → Fontages Airport
 * 60) CUTA → Canadian Urban Transit Association
 * 61) CYREN
 * 62) DMTN
 * 63) DMWD → Department of Miscellaneous Weapons Development
 * 64) DSEL → Domain-specific language
 * 65) DST → Daylight saving time
 * 66) DSTN → Dual Scan
 * 67) DTL → Diode–transistor logic
 * 68) DXO → Dextrorphan
 * 69) EBF3 → Electron-beam freeform fabrication
 * 70) ELL2 → Encyclopedia of Language and Linguistics
 * 71) EME2 → European Movement for Efficient Energy
 * 72) ENSA → Entertainments National Service Association
 * 73) EPOP → Employment-to-population ratio
 * 74) EPYC → Epyc
 * 75) ESPN
 * 76) FMOD
 * 77) GALP → Glyceraldehyde 3-phosphate
 * 78) GFY draftified the previous occupant of this pagename (Draft:GFY) and created the gene/protein article
 * 79) GGCT → Gyan Ganga College Of Technology
 * 80) GMDS → Generalized multidimensional scaling
 * 81) GMIP → Gerakan Mujahidin Islam Patani
 * 82) GPS2 → Global Positioning System
 * 83) GPX2 → GP2X
 * 84) HECA → Cairo International Airport
 * 85)  –  lists a 1:1 correspondence between the EC number 3.1.2.4 and HIBCH.  Seppi  333  (Insert 2¢) 05:53, 27 November 2019 (UTC)
 * 86) HPCA → High Performance Computing Act of 1991
 * 87) ID3
 * 88)  –  lists a 1:1 correspondence between the EC number 2.1.1.49 and INMT.  Seppi  333  (Insert 2¢) 05:53, 27 November 2019 (UTC)
 * 89)  –   lists a 1:1 correspondence between the EC number 2.7.1.151 and IPMK.  Seppi  333  (Insert 2¢) 05:53, 27 November 2019 (UTC)
 * 90) IRGC → Islamic Revolutionary Guard Corps
 * 91) ISX → Iraq Stock Exchange
 * 92) KAZN
 * 93) KDSR
 * 94) KEL → Kiel Airport
 * 95) KIZ
 * 96) KLB
 * 97) KLLN
 * 98) KMO → Knowledge Master Open
 * 99) KNCN
 * 100) KYNU
 * 101) LEP → Large Electron–Positron Collider
 * 102) LIPI → Indonesian Institute of Sciences
 * 103) LIPK → Forlì Airport
 * 104) LOX → Liquid oxygen
 * 105) LTV1 → Latvijas Televīzija
 * 106) LVRN → LoveRenaissance
 * 107) LXN → Jim Kelly Field
 * 108) MAFA
 * 109) MAGIX → Magix
 * 110) MAL2 → Western Mallee
 * 111) MBIP → Iskandar Puteri City Council
 * 112) MCAT → Medical College Admission Test
 * 113) MGMT
 * 114) MIB2 → Men in Black II
 * 115) MIDN → Midshipman
 * 116) MIOS
 * 117) MLEC → Movement for the Liberation of the Enclave of Cabinda
 * 118) MLIP → Mackenzie Large Igneous Province
 * 119) MOK
 * 120) MPEG1 → MPEG-1
 * 121) MRAP
 * 122) MRM1 → Rassvet (ISS module)
 * 123) MSLN → Magical Girl Lyrical Nanoha
 * 124) MSN
 * 125) MSX2 → MSX
 * 126) MT4 → MetaTrader 4
 * 127)  –  is sort of confusing to me.  I'm pretty sure that's 1:1 since both entries list the gene symbol, but I'm not sure what the ST20-MTHFS readthrough entry is.  Seppi  333  (Insert 2¢) 05:28, 27 November 2019 (UTC)
 * 128) MTR
 * 129) MVD → Ministry of Internal Affairs (Russia)
 * 130) MVK → Methyl vinyl ketone
 * 131) MVP → Most valuable player
 * 132) MYNN → Lynden Pindling International Airport
 * 133) NACA → National Advisory Committee for Aeronautics
 * 134) NAGA → North American Grappling Association
 * 135) NANP → North American Numbering Plan
 * 136) NBL1
 * 137) NEBL → North European Basketball League
 * 138) NEMF → New England Motor Freight
 * 139) NES → Nintendo Entertainment System
 * 140) NFIC → National Fire Information Council
 * 141) NGEF
 * 142) NHS → National Health Service
 * 143) NKRF → Kidney Research UK
 * 144) NNAT → Naglieri Nonverbal Ability Test
 * 145) NPB → Nippon Professional Baseball
 * 146) NRK
 * 147) NRL → National Rugby League
 * 148) NXN → Nike Cross Nationals
 * 149) ODAM → International Strategic Research Organization
 * 150) OSCAR → Amateur radio satellite
 * 151) OSTN → Open Student Television Network
 * 152) PATJ → Tok Airport
 * 153) PCP2 → PCP site 2
 * 154) PCTP → Portuguese Workers' Communist Party
 * 155) PDF
 * 156)  –  lists a 1:1 corresponence between the enzyme EC number and this gene symbol  Seppi  333  (Insert 2¢) 05:53, 27 November 2019 (UTC)
 * 157) PIFO → Versailles Saint-Quentin-en-Yvelines University
 * 158) PMCH → Patna Medical College and Hospital
 * 159) PNOC → Philippine National Oil Company
 * 160) POP1 → Post Office Protocol
 * 161) POP4 → Post Office Protocol
 * 162) PPCS → Post-concussion syndrome
 * 163) PPIE → Panama–Pacific International Exposition
 * 164) PPIG → Psychology of programming
 * 165) PPL → PPL Corporation
 * 166) PREP → PowerPC Reference Platform
 * 167) PRTG → PRTG Network Monitor
 * 168) PSD2 → Payment Services Directive
 * 169) PSG1 → Heckler & Koch PSG1
 * 170)  –  lists a 1:1 corresponence between the enzyme EC number and this gene symbol  Seppi  333  (Insert 2¢) 05:53, 27 November 2019 (UTC)
 * 171) QPRT → Qualified personal residence trust
 * 172) RAX → Radio Aurora Explorer
 * 173) RBFA → Royal Belgian Football Association
 * 174) RDX
 * 175) REST → Representational state transfer
 * 176) RFK → Robert F. Kennedy
 * 177)  - This entry is not about the human RGL4 gene or encoded protein; per, the target of RGL4 (Rhamnogalacturonan endolyase) is not a human enzyme. RGL4 (gene) does not exist.  This link requires disambiguation.  Seppi  333  (Insert 2¢) 05:28, 27 November 2019 (UTC)
 * 178) RHOF → National Radio Hall of Fame
 * 179) RHOV → The Real Housewives of Vancouver
 * 180) RTL4 → RTL 4
 * 181) RTL5 → RTL 5
 * 182) RTL9
 * 183) RTP1
 * 184) RTP2
 * 185) RTP3
 * 186) RTP4 → Rádio e Televisão de Portugal
 * 187) RTP5 → Rádio e Televisão de Portugal
 * 188) SBSN → Santarém-Maestro Wilson Fonseca Airport
 * 189) SCEL → Arturo Merino Benítez International Airport
 * 190) SCIMP → Silent Circle Instant Messaging Protocol
 * 191) SCLY → Dom Sicily
 * 192) SELL
 * 193) SHD → Shenandoah Valley Regional Airport
 * 194) SHPK → Kosovo Police
 * 195) SI → International System of Units
 * 196) SIAE
 * 197) SMS
 * 198) SNCB → National Railway Company of Belgium
 * 199) SP100 → SP-100
 * 200) SPARC
 * 201) SPRN → Main Centre for Missile Attack Warning
 * 202) SRMS → Canadarm
 * 203) SSX3 → SSX 3
 * 204) STYX → Styx
 * 205) SUCO → State University of New York at Oneonta
 * 206) SVOP → Save (baseball)
 * 207) SYP → Syrian pound
 * 208) TACR2 → TACR2 (Range Rover)
 * 209) TEPP → Tetraethyl pyrophosphate
 * 210) THEMIS
 * 211) TIFA → Trade and Investment Framework Agreement
 * 212) TSR1 → RTS 1 (Swiss TV channel)
 * 213) TXK
 * 214) UBC → University of British Columbia
 * 215) UFC1 → UFC 1
 * 216) USF3 → United States Formula Three Championship
 * 217) VASP
 * 218) VHLL → Very high-level programming language
 * 219) VIP → Very important person
 * 220) VMAC
 * 221) WAPL
 * 222) WDCP → WDCQ-TV
 * 223) WTAP → WTAP-TV
 * 224) WTIP
 * 1) WTIP

An algorithm to locate obscure gene pages
After thinking about how to algorithmically deal with the problem of locating existing articles for the symbols above that do not redirect to enzyme pages – and all the redlinked gene symbols in the wikitables for that matter – I realized that I could do the following in sequence for every gene symbol:

Iterate this over the set of all protein-coding gene symbols: Lastly:
 * 1) create an empty list for a gene symbol's corresponding gene/protein names and aliases
 * 2) use pyHGNC to input the gene symbols and obtain the gene name/aliases, adding them to the name list, storing the UniProt ID(s) and IUPHAR ID (if applicable) for each symbol in local variables
 * 3) use pyUNIPROT to input the UniProt ID(s) obtained from pyHGNC to get the official UniProt name and all of the alternative names for the protein, adding them to the name list
 * 4) use pyGtoP to input the IUPHAR ID obtained from pyHGNC to get the official IUPHAR names and previous/unofficial names, adding them to the name list
 * 5) use some basic programming to remove the duplicate names/aliases from the name list for a given gene symbol
 * 6) and finally, use the wikipedia python API to obtain the search results for all of the aforementioned gene and protein names and aliases in the names list for a given gene symbol.
 * If there are no search matches for any of the names, I’ll do nothing.
 * if one exact match exists, I’ll move it to the UniProt name and create the pair of gene symbol redirects as necessary.
 * if two or more exact matches exist, I'll merge the pages and move it/create the redirects as necessary.

Despite the fact that I will very likely miss relevant pages, I'm just going to automate the last step to test for an exact match with the top search result for every name/alias of a gene symbol. If I find multiple search results and the solution isn't immediately obvious to me, I'll just ignore it and address ones that I can easily deal with.  Seppi  333  (Insert 2¢) 11:43, 27 November 2019 (UTC)

Are there any other notable organizations that assign names to genes/proteins (either all of them or specific subsets of them, similar to how IUPHAR assigns names to receptors) that I should know about? Literally every database I know about has a python API, so I could likely obtain relevant IDs from pyHGNC or pyUniProt to query any other relevant databases in the manner I've described above. Decided I'm not going to bother with this. Sorry for the ping.  Seppi  333  (Insert 2¢) 20:05, 26 November 2019 (UTC)

I apologize to everyone in this WikiProject for practically spamming the page with issues related to the wikitables I'm working on; I imagine it might be annoying to some people. I'm very close to finishing my work on them, so I won't be frequently posting new threads to the project talk page for much longer.  Seppi  333  (Insert 2¢) 20:05, 26 November 2019 (UTC)

I'm done with the tables now. I'm going to write the algorithm I described above, but I'm going to ignore anything I can't quickly address by myself.  Seppi  333  (Insert 2¢) 11:43, 27 November 2019 (UTC)

New RFBA since I now have the data that I was going to program this algorithm to obtain – Bots/Requests for approval/Seppi333Bot 2. Haven't processed the query results yet, but I expect it'll permit creation of about ~2000 gene symbol redirects.  Seppi  333  (Insert 2¢) 01:29, 20 December 2019 (UTC)

Prime editing
Prime editing has hit the mainstream press, with lots of hyperbole about its potential, in spite of it still being at the laboratory proof of concept stage. I've created a stub article for it; can knowledgable people please help improve the article? -- The Anome (talk) 09:33, 22 October 2019 (UTC)

Request for information on WP1.0 web tool
Hello and greetings from the maintainers of the WP 1.0 Bot! As you may or may not know, we are currently involved in an overhaul of the bot, in order to make it more modern and maintainable. As part of this process, we will be rewriting the web tool that is part of the project. You might have noticed this tool if you click through the links on the project assessment summary tables.

We'd like to collect information on how the current tool is used by....you! How do you yourself and the other maintainers of your project use the web tool? Which of its features do you need? How frequently do you use these features? And what features is the tool missing that would be useful to you? We have collected all of these questions at this Google form where you can leave your response. Walkerma (talk) 04:24, 27 October 2019 (UTC)

MEDRS sourcing on molecular and cell biology articles
The following discussion may be of interest to this project: Wikipedia_talk:WikiProject_Medicine. --Signimu (talk) 17:50, 1 November 2019 (UTC)

Bots/Requests for approval/Seppi333Bot
I wrote the bot script yesterday morning and it's been operating since then. Pending approval of my bot and validating the links as mentioned in the section below, I'll move those tables to the mainspace.  Seppi  333  (Insert 2¢) 18:16, 6 November 2019 (UTC)


 * I haven't programmed an algorithm to detect the mistargeted links in the wikitables yet (there's only a few dozen mistargeted links in both tables based upon their rate of occurrence in the few thousand links I've followed); however, in order to follow-up on the discussion in the request for approval page, I moved the tables to the mainspace at List of human protein-coding genes 1 and List of human protein-coding genes 2. If anyone has any feedback/suggestions pertaining to the page layout or the tables themselves, please let me know.  Seppi  333  (Insert 2¢) 02:33, 24 November 2019 (UTC)


 * My assumption is that you are creating this table so that you can quickly find GeneWiki articles for specific genes. Sometime ago, I had a bot add a redirect " (gene)" for every GeneWiki article (see BFA:BogBot_3).  This provided an ambiguous way of locating GeneWiki article.  These redirects should probably be updated.  As you point out, there are limitations to the size of pages.  Why not rely on Wikipedia's search engine instead? Boghog (talk) 11:05, 24 November 2019 (UTC)


 * My primary personal motivation for adding that table was mostly just curiosity and a desire to be able to visualize the completeness of different gene groups in the GeneWiki and the completeness of our coverage of protein-coding genes as a whole (i.e., compare # red vs # blue links). Due to how fragmented the GeneWiki is (mostly due to PBB's bugs), I don't think I'd ever use those tables to navigate to a specific gene. It could serve as a centralized internal navigation page for protein-coding GeneWiki pages, but all the articles on protein coding-genes that are located at an obscure pagename (i.e., gene or protein alias) w/o redirects from the gene symbol would need to be located and moved to the right place first.  I've found and corrected a few of those in the past, but I can't remember the pagenames off the top of my head.
 * I added it to the article space because several non-editors I asked thought it would be interesting and/or useful. Also, if you hadn't disambiguated the gene pages that way, it'd have taken me a lot longer to disambiguate those links, so ... .  Seppi  333  (Insert 2¢) 20:57, 25 November 2019 (UTC)
 * I forgot to add, I broke this list up into 4 pages due to how large they were as 2 relative to those list in Special:LongPages. Now, 3 of them are in the top 10 largest mainspace pages.  Seppi  333  (Insert 2¢) 21:19, 25 November 2019 (UTC)

Deletion of EYCL1, EYCL1 (gene) and Eye color 1 (green/blue)
Identified in User:Certes/Gene links; I've PRODed these per. If you agree, you may want to add a template.  Seppi  333  (Insert 2¢) 01:44, 30 November 2019 (UTC)
 * Is there a specific reason given in WP:DEL-REASON that the PROD is based on? If not, then dePROD it would be since it is an inappropriate use of PROD. In fact, is there any discussion where being withdrawn in HGNC is a valid reason for deletion? As far as I can see, it has not disappeared since it is still in other databases, merely that HGNC chose not to list it. If it gets deProdded, you start a AfD so that such deletion can be discussed.  Hzh (talk) 15:46, 30 November 2019 (UTC)
 * Don’t care enough to bother. FWIW, look up what “withdrawn entry” actually means as a status label.  Seppi  333  (Insert 2¢) 00:48, 1 December 2019 (UTC)
 * It is irrelevant for PROD, because notability of an article may still exist even if the object of the article no longer exists. For example, you can still have an article about phlogiston even if it has been shown not to exist. Such articles need a proper discussion to determine if they still have a reason to exist. The discussion can be wide-ranging to cover all entries that HGNC had withdrawn so that all such articles can be deleted unless there are specific reasons not to. Hzh (talk) 02:15, 1 December 2019 (UTC)
 * There’s exactly 1 30-year old publication about it listed in databases, 0 papers on pubmed indexed to the symbol and 0 papers indexed to “eye color 1” . I assumed this was evident from the link in reason I left in the prod template, but perhaps that’s not the case. If you feel a single primary source establishes notability for the topic despite the fact that its existence is in doubt, fine. Seems odd to me. Don’t really feel like arguing this though, so we’ll keep it.  Seppi  333  (Insert 2¢) 03:06, 1 December 2019 (UTC)
 * The original paper is still cited in 2014 -, so it hasn't disappeared. It seems that HGNC chose not to maintain symbols for phenotypes, which may be the reason for its withdrawal rather than it not existing (its locus is phenotype only rather than a gene product), so this discussion may be going in completely the wrong direction. Hzh (talk) 04:09, 1 December 2019 (UTC)

Restoring
On the one hand, this bot had some problematic bugs prior to being blocked; I've personally had to merge/regarget dozens of erroneous pages it created. As far as I am aware, it correctly wrote page content, but sometimes got the page name wrong, created a duplicate entry, and/or created a redirect to the wrong pagename.

On the other hand, subsequent to identifying the 215 mistargeted links in list of human protein-coding genes pages, I've only created 4 new gene/protein stubs while attempting to unbork the articles, redirects, and set index articles identified by my algorithm as well as creating entries for redlinks on DAB pages like (e.g., ALLC (disambiguation)) that WT:WPDAB is helping to create. A few dozen gene articles need to be created to fully address those issues. I currently need to create the 2 redlinked entries listed in User:Certes/Gene links, but I would rather repeatedly bang my head against a wall than create another due to how tedious it is to create new gene articles with all the minutia involved.

This led me to an idea that I hope is a viable solution to the problems PBB and I have, assuming that everyone is on board. Consider the following:
 * 1) The biogps.org website still appears to support the user-activated feature for creating gene articles via, although the bot obviously can't edit right now so it doesn't work.
 * 2) The bot is somewhat error-prone, so a human needs to verify that there are no problems with the pages it creates.
 * 3) Abandoned pages in the draft namespace are eventually deleted and there is obviously no harm at all in the creation of duplicate or mistitled entries in that namespace.

A relatively simple solution to keeping my brain intact, my wall stain-free, and PBB's bugs in check is to modify the bot code so that it writes gene articles to the corresponding pagename in the draft namespace upon user activation at biogps.org or via other means. Whoever activates the bot would then have the option of moving the created page to the mainspace. Since it would necessarily be the responsibility of the person who moves that draft to the mainspace to validate the page, this should effectively address all the past concerns about this bot.

would you be open to unblocking the bot if it wrote gene articles to the draft namespace so that they can be moved pending validation?

if either of you are still around and biogps can still trigger PBB to create a gene page, would you be willing to recode the bot so that it writes the corresponding gene entry to Draft:Pagename instead of Pagename?  Seppi  333  (Insert 2¢) 04:11, 2 December 2019 (UTC)

For a much less self-centric motivation: WP has ~11500 bluelinked protein-coding gene symbols and ~8500 redlinked ones. PBB created ~9400 articles (non-redirect, still live pages) since it was created. Given its original purpose, I suspect that the majority of the 9400 pages PBB created are the articles or targets of the redirects located at those 11500 symbols. So, in spite of its bugs, Wikipedia needs algorithms like PBB if we’re ever going to finish creating the entries for the other half of the exome.  Seppi  333  (Insert 2¢) 04:12, 2 December 2019 (UTC)


 * There are many of genes whose function has not yet determined and have not been mentioned in reliable secondary sources, hence these genes fail WP:NOTABILITY requirements. Articles for these gene should probably not be created until something more about their function is known.
 * While the ProteinBoxBot has been blocked, the BioGPS tool still is able to produce the text for a Gene Wiki page (see for example Genereport ALLC and press "Toggle Stub Code").  This can be copied and pasted to create a new article. Boghog (talk) 04:39, 2 December 2019 (UTC)


 * There are many of genes whose function has not yet determined and have not been mentioned in reliable secondary sources, hence these genes fail WP:NOTABILITY requirements I’m well aware.
 * There’s also protein-coding genes that haven’t even been identified yet. I’m not advocating the overnight creation of 8500 articles.  I’m saying we need PBB to even attempt it over time. Addendum: will look into what you linked when back on a computer; using mobile right now.  Seppi  333  (Insert 2¢) 04:50, 2 December 2019 (UTC)


 * Hmm. Wasn't aware of that. Well, I'm not sure why we haven't been listing that tool on WP:MCB, WP:WikiProject Genetics, or WP:MOLBIO, but we're going to now. That tool deals with almost all the BS I mentioned by fully automating the creation of the article's source code, provided there's a non-empty summary field in the NCBI gene entry for the corresponding gene in Homo sapiens; the only manual work required is the adding project templates and creating the sitelink.  Seppi  333  (Insert 2¢) 05:31, 2 December 2019 (UTC)

Ty for creating that; makes all the work ahead of me seem much more bearable now.  Seppi  333  (Insert 2¢) 06:14, 2 December 2019 (UTC)
 * Great! Glad that work-around is working out for you...  Best, Andrew Su (talk) 16:39, 2 December 2019 (UTC)

Folding@home FAR
I have nominated Folding@home for a featured article review here. Please join the discussion on whether this article meets featured article criteria. Articles are typically reviewed for two weeks. If substantial concerns are not addressed during the review period, the article will be moved to the Featured Article Removal Candidates list for a further period, where editors may declare "Keep" or "Delist" the article's featured status. The instructions for the review process are here. GamerPro64 17:18, 9 December 2019 (UTC)

Non-helical models of DNA structure
I saw a Science Reference Desk question about this article. I've edited some of the references but have concerns about the content in this article. I've started a discussion at Talk:Non-helical models of DNA structure about my concerns. Any and all comments, edits, etc welcome. Thanks, EdChem (talk) 04:14, 13 December 2019 (UTC)

Articles for deletion/List of human protein-coding genes 1
I figured someone would do this sooner or later.  Seppi  333  (Insert 2¢) 13:08, 2 January 2020 (UTC)

Question about a type of article
So, this is something I wondered about years ago, but it's time for me to raise it again. What are the thoughts of people about writing articles like List of human Nrf2 targets (i.e lists of genes that are regulated by a given transcription factor)? They might be of interest to readers, but I don't know what kind of issues they might do. Jo-Jo Eumerus (talk) 19:01, 17 April 2020 (UTC)


 * I assume you mean target genes of a given transcription factor (e.g., NFE2L2). My worry is that the underlying experimental data for example from ChIP sequencing maybe somewhat unreliable.  Are there databases that list such information that include a reliability score?  Also this could be quite messy.  Regulated genes may be context specific (regulated in some tissues and not others). Boghog (talk) 19:53, 17 April 2020 (UTC)
 * To be honest, I was thinking of using more focused sources (publications discussing one or a few particular target genes) rather than databases. Great lists with little supplemental information for each item are something I don't think makes for great sourcing. Jo-Jo Eumerus (talk) 08:44, 29 April 2020 (UTC)
 * Yes, I agree it would be much better to base these lists on review articles where there are multiple lines of evidence supporting a given target gene. This would be better than a reliability score reported in a database.  Boghog (talk) 13:27, 29 April 2020 (UTC)
 * Somehow I don't tink there'll be many such reviews. At best, you'd find a paper saying "previous studies[1][2] indicate that HMOX1 is a Nrf2 target gene" which technically satisfies WP:SECONDARY requirements. Jo-Jo Eumerus (talk) 13:35, 29 April 2020 (UTC)

Genetic inheritance of virtually identical rare mutations
First off, I feel like a bit of a dumbass for asking if one of my brother's frameshift mutations could be functional a while back because I didn't know how to read this earlier in the annotated file I had linked:


 * p.G362fs*158
 * p.G362fs*39
 * c.9050A>G
 * c.9044A>G

According to UniProt, the canonical protein isoform of TNXB has a length of 4244 AA. These frameshifts both occur at glycine residue 362, are >90% truncated relative to the wild-type tenascin-X protein, and they lack nearly all functional domains on the wild-type protein (https://www.uniprot.org/uniprot/P22105#family_and_domains). Complete deficiency in this gene is supposedly highly penetrant for clEDS, but it's somehow only partially penetrant in my brother's case; it's definitely not because one of these proteins is expressed, partially folded, and somehow functional on the sequence preceding the mutation though.

In any event, I'm finding it pretty hard to grasp how this is possible in terms of genetic inheritance unless this involved some abnormal form of inheritance; it just seems unfathomably unlikely that both my parents carry nearly identical frameshift mutations which haven't been previously reported and that both involve an A>G mutation and affect the same protein residue. Sequencing them both would obviously determine if that's the case, but TNXB sequencing is prone to errors due to significant pseudogene interference from TNXA and I don't think it's likely to happen anyway.

Could this arise from uniparental disomy? Or is there another abnormal mode of inheritance that might give rise to seeing mutations like this? I'm just a bit baffled due to my ignorance in this area.  Seppi  333  (Insert 2¢) 12:40, 20 May 2020 (UTC)
 * Previous thread
 * Hi Seppi, I'm not a clinician, but I can offer a few thoughts to guide your research:
 * It's not clear to me whether your brother's alleles are identical. If so, uniparental isodisomy could certainly be the explanation.
 * Chimerism in parent or offspring can also explain anomalous inheritance, or inheritance that seems inconsistent with phenotypes.
 * These are both rare. Incomplete penetrance and variable expressivity are far more common and mundane, so probably more likely, though I haven't looked into the specifics of TNXB.
 * I think you're misunderstanding the word "penetrance". If we say that a genotype is, for example, "70% penetrant", it means 70% of people with that genotype will have at least one detectable aspect of the associated phenotype, however mild; the other 30% have no detectable phenotype.  It's a population thing.  In a given individual, a genotype is either penetrant or it's not.  Mild vs. severe cases are an example of variable expressivity.  (Sorry if this sounds pedantic, but I can't tell whether you're misunderstanding something you've read.)
 * You mentioned in the previous thread that frameshifts produce non-functional proteins. In fact, frameshifts in eukaryotes often/usually lead to no protein at all, as the mRNAs are destroyed by nonsense-mediated decay.
 * In principle, an early frameshift could be compatible with production of at least some functional protein if (1) the mutation is excluded from at least some transcripts by alternative splicing, or (2) an alternative transcription start site (anyone want to blue that red?) leads to mRNAs with a start codon that comes after the mutation. I haven't checked whether enough is known about TNXB to support or rule out those possibilities.
 * A frameshift can also be functionally "reversed" by another nearby frameshift. For example, if there's a single base-pair insertion early in the gene, and a single base-pair deletion several codons away, you'd end up with several gobbledygook codons in between.  But as long as the gobbledygook did not include a stop codon, the overall protein could still be functional.
 * In other news, is this seriously only the third thread on this page all year?? I'm glad we merged. Adrian J. Hunter(talk•contribs) 06:43, 21 May 2020 (UTC)
 * I haven’t visualized my brother’s genome on chromosome 6 because TNXB is located in the major histocompatibility complex and I’m pretty sure my brother’s short-read assembly was generated from alignment-based methods following GATK.
 * The MHC can’t be resolved by that approach. Short of Sanger sequencing the entire region, the only other way to resolve it is generating contigs from de novo assembly that span most or all of that region.  Since de novo hybrid genome assembly will produce longer contigs, that’s what I’ve been working on since I posted this thread.  After about 15-20 hours of research and a lot of data wrangling, I started running the Wengan-D assembler today.  It takes 1000 cpu hours and requires >600 GB of RAM (not hard disk space) to run, but it’ll give me the most contiguous high-quality assembly of my brother’s datasets that current technology can provide.  Also, the paper published about it used the WenganD assembler on same DNA sequencing technology (Illumina NovaSeq and ONT PromethION) with approximately the same level of coverage as my brother’s datasets to almost completely resolve MHC regions I and II in ~4-5 contigs (see fig 2 ).  MHC region IIi, where TNXB is located, spans 700 kB, but I expect this assembly to have an NG50 in the 8-20 Mb range, so I will likely have that whole region fully resolved on a single contig.  In about 5 days, the assembly will be done; I’ll need to compute the validation stats for the assembly, computationally resolve the haplotigs into a diploid assembly, and then align it to a reference genome before I can visualize it.  At that point, I’ll be able to see whether or not there’s a lot of unusual sequence similarity on chromosome 6 which is characteristic of isodisomy.
 * Regarding “penetrance”, in this particular context it’s a bit hard for me, since the biallelic genotype is diagnostic for clEDS and the monoallelic genotype is often associated with hEDS. Given that the biallelic genotype has never been reported to cause the milder hEDS and that these are clinically distict phenotypes, it’s hard for me to describe correctly.  I get that penetrance is something-or-nothing, but the variable expressivity of the clinical phenotypes of clEDS and hEDS - based upon the diagnostic criteria alone - do not overlap.  That’s why I described it in that manner.
 * Re: nonsense-mediated decay - that’s useful to know, thanks.  Seppi  333  (Insert 2¢) 22:05, 24 May 2020 (UTC)
 * Edit: just to clarify, the aspect of clEDS which is just entirely unlike my brother's phenotype is skin involvement. He has no overt skin abnormalities, although something might be noticeable in a histological examination of his skin tissue (this hasn't been conducted).  Seppi  333  (Insert 2¢) 22:43, 24 May 2020 (UTC)
 * Edit: When I say "unusual sequence similarity", I'm mainly referring to variant similarity, not whether or not the wild-type sequences are the same; my brother is obviously human.  Seppi  333  (Insert 2¢) 22:49, 24 May 2020 (UTC)

Archaea Archive
Hello guys and gals,

Can someone get the archiving fixed on Talk:Archaea, please? It is getting long and has posts going back to 2014. Thanks, Galendalia Talk to me CVU Graduate 13:58, 30 May 2020 (UTC)


 * I hope. Boghog (talk) 15:08, 30 May 2020 (UTC)
 * LOL I know that feeling! 70 iterations later. Thanks! Galendalia Talk to me CVU Graduate 15:19, 30 May 2020 (UTC)

DNA repair Featured article review
I have nominated DNA repair for a featured article review here. Please join the discussion on whether this article meets featured article criteria. Articles are typically reviewed for two weeks. If substantial concerns are not addressed during the review period, the article will be moved to the Featured Article Removal Candidates list for a further period, where editors may declare "Keep" or "Delist" the article's featured status. The instructions for the review process are here. Sandy Georgia (Talk)  22:51, 14 June 2020 (UTC) I've also copied this over to WT:MOLBIO. T.Shafee(Evo &#38; Evo)talk 03:15, 15 June 2020 (UTC)

Edits to Translation (biology)
There have been a lot of small edits from brand new users to this article. Just about every red name in the edit history has made exactly that one edit. I thought they were all one person, but it looks like I was wrong. A lot of them are unsourced with no edit summary and change around the meaning of the article. I don't have the subject knowledge to say if these are vandalism or plausible though, perhaps someone on this WikiProject can take a look through edits such as these? 1 2 3 4 5 For example, I can vaguely say that the first edit I linked is probably incorrect...? But I'm far from certain. (Those 5 are just an example, there are many more in the history, some reverted, some accepted) Leijurv (talk) 20:48, 8 July 2020 (UTC)
 * Ugh, what a mess. I've never seen anything like this!  All five diffs you linked are erroneous.  This goes back to at least 2018... I'm going through the article history, looking to revert to stable version. Adrian J. Hunter(talk•contribs) 01:12, 9 July 2020 (UTC)
 * The first of the bad-faith edits seems to have been this one from February 2018. Too bad for anyone who's relied on our article since then.  I've reverted to the version immediately before that.  There are some substantive good-faith additions in the intervening versions; if no-one beats me to it, I'll restore those manually later on. Thanks for posting about this, . Adrian J. Hunter(talk•contribs) 01:32, 9 July 2020 (UTC)
 * Sure thing! I just saw your reinstating of the constructive edits, wow! Thank you for sorting out what's correct and what wasn't, going back over two years. :) Leijurv (talk) 18:52, 13 July 2020 (UTC)

G4 EA H1N1 redirect discussion
Talk:G4 EA H1N1 Sandy Georgia (Talk)  04:46, 12 July 2020 (UTC)

Should Cis-regulatory module and Cis-regulatory element be merged?
Cis-regulatory module and Cis-regulatory element seem to overlap largely, and to a non-geneticist they seem extremely similar if not a WP:FORK. It doesn't look as if a simple redirect would do the job, so perhaps someone could look at the matter, and either make the articles clearly distinct or merge them. Editors have noted the issue since, ah, 2011. All the best, Chiswick Chap (talk) 21:14, 17 July 2020 (UTC)
 * Well spotted. I've replied over at Talk:Cis-regulatory_element to help keep the record together. T.Shafee(Evo &#38; Evo)talk 11:07, 25 July 2020 (UTC)


 * It seems there's consensus. Could someone close the merger discussion over there so we can get it done? Chiswick Chap (talk) 09:57, 3 August 2020 (UTC)

Merge required for the articles Cell communication (biology) and Cellular communication (biology)
The articles https://en.wikipedia.org/wiki/Cell_communication_(biology) and https://en.wikipedia.org/wiki/Cellular_communication_(biology) should be merged. RIT RAJARSHI (talk) 09:34, 30 July 2020 (UTC)

Is there a lack of user activity in Wikiproject Molecular Cell Biology articles?
It looks like many of the Wikiproject Molecular Cell biology articles lacking details and remaining unimproved for years to decades. Recently I noticed an article whose title does not match with its content (https://en.wikipedia.org/wiki/Membrane_topology). Looks like there are scope of lot of improvements in many of the articles, which are not happening. Is there a deficit or reduction of contributors? Also is there a general reduction of wikipedia activities? How can we again increase activity in Wikipedia? With best wishes for Wikipedia. Thanks in advance. RIT RAJARSHI (talk) 09:38, 30 July 2020 (UTC)
 * MCB is very broad subject with a large number of stubs and many potential articles still unwritten. At the same time, participation in MCB and in Wikipedia as a whole as dropped.  Given the breadth of the subject and the declining number of editors, the best we can probably hope for is to maintain high priority articles and to create a few new articles on hot emerging topics.  Student editors can and do help, but they also create a significant amount of work for the regulars. There has been a lot of discussion on increasing participation, but few results. Boghog (talk) 10:43, 30 July 2020 (UTC)
 * Let us be realistic. Apart from the occasional generous contributions by students and industry, and lots of imported articles (from InterPro, TCDB, EC), this was always mostly an endeavour for amateurs, and molbio has much less amateurs because any halfway knowledgeable molbio amateur gets a good job that completely grabs their time. Personally I have come to the conclusion that, once the basic concepts are done in WP, most molbio facts can be expressed by Wikidata statements. Together with automatic translation this will be the only way to spread the knowledge in all languages. Because, let's face it, en-WP is still much more complete than any other language WP. --SCIdude (talk) 09:45, 3 August 2020 (UTC)
 * While I think WikiData is useful, I think you are putting the cart before the horse. Without corresponding Wikipedia content, WikiData has very limited value. I also think you are grossly underestimating the contribution of both professionals an amateurs.  For a recent high quality contribution, see for example Holocentric chromosome. The intention of Gene Wiki was to provide seed articles that human editors would later expand. Granted, most of these articles have not been expanded, but at least a few like Reelin that have been significantly expanded by knowledgable editors.  You are correct that students get professional jobs (and start families, etc.) therefore have less time to devote to Wikipedia.  However there are also cases of retired professionals who now have more time to devote to Wikipedia. Boghog (talk) 20:13, 3 August 2020 (UTC)
 * Judging by incoming articles at NOP and AfC, there seem to be many people, a good number of them ptobably students, who like to do such  articles. What we realy need, as  indicated, are people to revise the older ones. That may not as much excitement and prestige, but its can make for good group projects. DGG ( talk ) 09:44, 12 October 2020 (UTC)

FAR for cell nucleus
I have nominated Cell nucleus for a featured article review here. Please join the discussion on whether this article meets featured article criteria. Articles are typically reviewed for two weeks. If substantial concerns are not addressed during the review period, the article will be moved to the Featured Article Removal Candidates list for a further period, where editors may declare "Keep" or "Delist" the article's featured status. The instructions for the review process are here. (t · c)  buidhe  22:16, 11 September 2020 (UTC)


 * User:Ajpolino has raised a number of important issues at the featured article review that would require a fairly extensive effort to fix. User:Hanif Al Husaini and I have worked to add citations and I can further update the citations with the most recent editions of cell biology text books. Are there any volunteers to help out with the other issues that Ajpolino raised? I will try to address some of these other problems, but I have fairly limited time to devote to this. Boghog (talk) 10:24, 24 October 2020 (UTC)

Bio/Med-tech and precision medicine startup accelerators
In retrospect, it does seem rather inappropriate for me to have posted what DGG deleted in this section, so I wanted to apologize to those here (re this discussion: User talk:Seppi333). My bad. I hope everyone can forgive my lack of forethought and stupidity for committing a faux pas.  Seppi  333  (Insert 2¢) 07:45, 15 October 2020 (UTC)

Too many GPCR articles fail to even discuss the second messenger pathway(s) they are coupled to
Yes I could be BOLD and do everything but it is pretty disconcerting that so many of our GPCR articles (case in point, several of our mGluR articles) fail even to discuss the G-proteins / second messenger pathways coupled to the GPCR, which is pretty frustrating as it promotes an inherent tendency among many in the "real world" to fail to consider interactions of second messenger pathways or functional selectivity. Some time ago, our article on the mu opioid receptor failed to even mention that it is canonically Gi-coupled (functional selectivity and downstream effects on dFOSB aside) -- until I fixed it. Such core details are pretty important when evaluating whether a receptor's known effects make physiological sense. Could there be a prioritization of ensuring that all second messenger receptors actually discuss the actual known specific pathways they are coupled to? It is pretty frustrating to have so many articles on GPCRs only to have to hunt for some other article to see which actual G-protein(s) it is coupled to. Yanping Nora Soong (talk) 23:44, 12 October 2020 (UTC)


 * Hi, thanks for your message. The intention of Gene Wiki was to create stubs that would later be expanded by editors.  Unfortunately the number of active editors in the MCB Project has drop precipitously, so most of these stubs have no yet been expanded. I agree that the GPCR articles should contain basic information about which G protein each receptor signals through. At this point, we have something like 500 GCPR articles (see G protein-coupled receptors), so it will be time consuming to go through each to add information about the signaling pathway. It would be helpful to have comprehensive listing of GPCR/G protein interactions.  One such list is here.  Is this source reliable enough?  For example ADRB3 according to the GPCRdb, signals primarily through Gs. This is already mentioned in the Mechanism of action section. Based on the data in GPCRdb, we could add a short mechanism of action section for each of the GPCRs. Boghog (talk) 10:09, 13 October 2020 (UTC)