User:Certes/Gene links

This page lists gene articles which are not linked from the base name. For example, there is no obvious route from ACR to ACR (gene). FOO is used as a placeholder to denote the base name such as ACR.

Dab missing entry
FOO is (or redirects to) a dab which does not list the gene.

✅ Section completed: add an entry for FOO (gene) to existing dab FOO.
 * ACR
 * AIC also add Akaike information criterion
 * BTC
 * CAMP: CAMP (gene) is a list of enzymes
 * Rewrote cathelicidin and the enzyme article it linked to, as neither of its two corresponding genes are now called "camP", then redirected CAMP (gene).  Seppi  333  (Insert 2¢) 02:50, 30 November 2019 (UTC)
 * CDNF
 * CFI
 * CGB: existing entry displays the gene name but is piped elsewhere
 * CNO
 * CROP
 * CS
 * CTRC
 * FAT
 * GART
 * GDA
 * GLA
 * HAL
 * MFF
 * MIA
 * NPPA
 * NTM
 * PGC
 * PIGS
 * PLI
 * Pol
 * POR
 * PTLD
 * REN
 * SAC
 * SLN
 * TAT
 * UMPS
 * Y14

Unrelated article with dab
FOO is (or redirects to) an article about an unrelated primary topic. FOO (disambiguation) is (or redirects to) a dab which does not list the gene.

✅: Section completed: add an entry for FOO (gene) to existing dab FOO (disambiguation).
 * FEMA
 * PPL

Unrelated article without dab
FOO is (or redirects to) an article about an unrelated topic. FOO (disambiguation) does not exist.

Fix: If the incumbent article is not primary, move it to FOO (topic) and list it along with the gene on a new dab FOO. Check for incoming links to FOO and update these. If the topic is primary but the initials also denote other topics, create FOO (disambiguation). Otherwise, the primary topic article needs a hatnote to the gene.

✅ Section complete except for CTU2, which is the actual name of the C16orf84 gene: requesting a second opinion from or.
 * AK1 ✅
 * APOD ✅
 * ASUN ✅
 * BATF ✅
 * BRF1 ✅
 * BRF2 ✅
 * BX3 ✅
 * CCT2✅
 * CCT5✅
 * CES3✅
 * CGB2✅
 * CHGA✅
 * CKLF✅
 * CLK3✅
 * CNN3✅
 * CPA4✅
 * CRCP✅
 * CROT✅, though the gene has a claim to be PT
 * CSF3✅: retargeted to gene
 * CSH2✅
 * CSN3✅
 * CTSH✅
 * CTU2
 * DMWD✅
 * DNA2✅
 * Doubletime ✅ dab page tweaked, gene now linked directly by hatnote
 * EN1 ✅ two genes linked in one complex hatnote
 * ESAM ✅ new dab page
 * ESPN ✅ (mega hatnote now even bigger)
 * FMOD ✅ expanded hatnote
 * GATM ✅ new dab page
 * GBAS ✅ new dab page
 * GMDS ✅ new dab page
 * GPS2 ✅ new dab page (but querying whether existing redirect was justified)
 * GPX2 ✅ new dab page
 * HPCA ✅ new dab
 * HULC✅
 * IRGC✅
 * Isomorph ✅: added Isomorph (gene) (a classification of mutations) to Isomorphism (disambiguation)
 * Kaiso ✅ linked PT to new dab
 * KMO ✅ new dab
 * KYNU✅
 * MAL2✅
 * MLIP✅
 * MPNS✅
 * MSLN✅
 * NAAB✅ new dab page
 * NAGA ✅: added NAGA (gene) to dab Naga
 * NEBL✅
 * NEMF✅
 * NFIC✅
 * NKRF ✅
 * ODAM ✅
 * Paralytic ✅ P to S
 * PCTP
 * PIGN: medical, but probably unrelated to PIGN (gene)
 * POLA1
 * POP1
 * POP4
 * PPCS
 * PPIE
 * PPIG
 * PREP
 * PSG1
 * RARS
 * RAX
 * SCEL
 * SNCB
 * Spätzle ✅ P to S
 * VISA✅
 * WARS✅
 * WTAP✅

Enzyme or protein article
FOO describes an enzyme or protein related to FOO (gene) but does not link to the gene.

Fix: Expert advice is needed.
 * AlkB; AlkB (gene) redirected to a section of AlkB ✅  Seppi  333  (Insert 2¢) 23:19, 29 November 2019 (UTC)
 * ANK2; ANK2 (gene) – the latter is a duplicate article of the former created by User:ProteinBoxBot. It should be merged into the former. The sitelink for ANK2 (gene) needs to be moved to ANK2 on wikidata in order to move the template when this happens (i.e., the duplicate article has a gene infobox but the primary article does not). ✅  Seppi  333  (Insert 2¢) 23:23, 29 November 2019 (UTC)
 * CACNA1B; CACNA1B (gene) ✅ merged.  Seppi  333  (Insert 2¢) 23:44, 29 November 2019 (UTC)
 * CASP12; CASP12 (gene) ✅ merged.  Seppi  333  (Insert 2¢) 23:44, 29 November 2019 (UTC)
 * NRXN1; NRXN1 (gene) ✅ merged.  Seppi  333  (Insert 2¢) 23:44, 29 November 2019 (UTC)
 * SCN1A; SCN1A (gene) ✅ merged.  Seppi  333  (Insert 2¢) 23:54, 29 November 2019 (UTC)
 * SKP1; SKP1 (gene) ✅ moved Skp1 to the official UniProt name since that was an incorrectly capitalized gene name, SKP1 and SKP1 (gene) (will) redirect there after 2x redirects are corrected by a bot. Sitelink on wikidata moved to the correct item.  Seppi  333  (Insert 2¢) 00:04, 30 November 2019 (UTC)
 * SPI1; SPI1 (gene) ✅ - fixed this one earlier.  Seppi  333  (Insert 2¢) 23:54, 29 November 2019 (UTC)
 * TMEM243; TMEM243 (gene) ✅ moved sitelink and redirected page.  Seppi  333  (Insert 2¢) 00:04, 30 November 2019 (UTC)

Miscellaneous
See individual entries for a description of each anomaly.

Fix: Expert advice is needed.
 * ALG2 is a gene; ALG2 (gene) is a list of enzymes.
 * See WT:MCB. Deleted that section. The ALG2 gene encodes a protein that belongs to 2 classes of enzymes, so it makes sense to redirect both pages to the gene and list the corresponding enzymes there.  Seppi  333  (Insert 2¢) 01:23, 30 November 2019 (UTC)
 * CFTR: redirects are anomalously titled CFTR(gene) (no space) and Cftr (gene) (lower case). deletion.
 * This is a fairly widely studied gene due to its central pathophysiological role in cystic fibrosis; CFTR gets a lot of search traffic. I'd suggest deleting CFTR(gene) since it's an erroneous page title that I tried to move to CFTR (gene) with redirect suppression before realizing it already existed. Keeping cftr (gene) seems fine since, while technically incorrect capitalization, it's at least the correct spelling. Addendum: I've PRODed CFTR(gene).  Seppi  333  (Insert 2¢) 01:23, 30 November 2019 (UTC)
 * EYCL1 is a gene; EYCL1 (gene) redirects to a related protein. deletion.
 * See WT:MCB.  Seppi  333  (Insert 2¢) 01:46, 30 November 2019 (UTC)
 * KCTD9 and KCTD9 (gene) may be duplicate articles. ✅
 * LRIF1 and LRIF1 (gene) redirect to articles about different proteins.
 * They now go to the same place; one of them went to the wrong page.  Seppi  333  (Insert 2¢) 02:00, 30 November 2019 (UTC)
 * NAGK redirects to one enzyme; NAGK (gene) is a list of enzymes.
 * The bacterial enzyme listed on NAGK (gene) had a 2:1 correspondence between gene and enzyme and the corresponding gene also had a different capitalization (NagK), so I redirected it to where NAGK went.  Seppi  333  (Insert 2¢) 02:00, 30 November 2019 (UTC)
 * NFAM1 and NFAM1 (gene) may be duplicate articles. ✅
 * NFATC2IP and NFATC2IP (gene) may be duplicate articles. ✅
 * NMT1 dab and NMT1 (gene) list share the same entries.
 * This one was confusing; don't think I've ever seen 2 enzymes associated with a single gene, but since both enzymes are associated with multiple genes, I redirect both pages to the pagename of the protein that the gene encodes and listed both enzymes there.  Seppi  333  (Insert 2¢) 03:21, 30 November 2019 (UTC)
 * TITF1: TITF1 (gene) is a different topic, but is it just a typo for TTF1?
 * Hmm.  - this is a query in the HGNC database for TITF1.  It's an old gene symbol for NKX2-1 (current gene symbol), which is currently known as the "NK2 homeobox 1" gene. Both should redirect to the current gene symbol unless disambiguation at TITF1 is necessary. Changed the TITF1 target.  Seppi  333  (Insert 2¢) 00:57, 30 November 2019 (UTC)
 * WAS: WAS (gene) redirects to Wiskott–Aldrich syndrome protein but that article calls the gene WASp. Clarified by stating the encoding gene's gene symbol.  Seppi  333  (Insert 2¢) 00:57, 30 November 2019 (UTC)

Merged the wikidata sitelinks for NFATC2IP, KCTD9, and NFAM1 and the corresponding (gene) pages. Will deal with the rest a bit later.  Seppi  333  (Insert 2¢) 00:10, 30 November 2019 (UTC)

Re-ALG2 (gene): I think it may be worth recoding and rerunning my User:Seppi333/GeneListNLP script to detect/write a list of target pages that are wikilinked from the gene lists and that contain all 5 of the words "Set", "index" "page", "lists", and "articles" on them in order to identify links to set index articles, unless you can locate those with an SQL query. The last time I ran that script, it took 1:33:45 (1.5 hrs) to download and process all the pages, so if it's possible to locate them using another method, it'd probably best to do that instead.  Seppi  333  (Insert 2¢) 01:23, 30 November 2019 (UTC)
 * This PetScan query identifies SIAs linked from gene lists. Certes (talk) 10:25, 30 November 2019 (UTC)

False positives
FOO links to FOO (gene) (or the target of that redirect) in a complex way not spotted by the Quarry queries.

Fix: probably no action but we may consider a more direct link.
 * BBC3 ✅ (improved hatnote to offer direct link to gene)
 * CAD
 * Cfr
 * Dlx
 * ELO ✅ (clarified link to gene on dab page)
 * FARSA ✅ (retargeted to dab page, no clear PT: shortens route to gene)
 * Hairy ✅ (improved hatnote to offer direct link to gene)
 * KIZ
 * LAT
 * LOX ✅ (clarified link to gene on dab page)
 * MAFA (possibly a related protein)
 * MFSD2A and MFSD2A (gene) redirect to the same article.
 * MIB2
 * MINA✅ (clarified link to gene on dab page)
 * NES
 * OSCAR
 * Pokemon
 * REST
 * RHO
 * Sphinx
 * Tinman
 * THEMIS
 * TOR

Other links
Here are some other link issues raised by the gene lists. They need an expert to fix them because the suggested fix may be wrong, they may indicate wider problems, or the initialism redirect might merit conversion into a dab.

Direct links
The gene lists link directly to a page which is not in gene categories. These fall into two sections.

1. The target page appears not to be a gene. The link needs to be corrected. In each case, incoming links suggest that the non-gene article is the primary topic, but we could consider moving that article and creating a dab.
 * CHML: List of human protein-coding genes 1 should link to CHML (gene)
 * DR1: List of human protein-coding genes 1 should link to DR1 (gene)
 * HPX: List of human protein-coding genes 2 should link to HPX (gene)
 * PIM2: List of human protein-coding genes 2 and Protein kinase domain should link to PIM2 (gene)

2. The target page appears to be a gene or closely related topic. Links may be correct but the gene page could be added to appropriate gene categories.
 * AKNA
 * CD96
 * WRAP53

Redirects
The gene lists link to a redirect to a page which is not in gene categories.
 * List of human protein-coding genes 1 links to, which redirects to unrelated article African American Museum in Philadelphia. They should probably link to AAMP (gene).
 * List of human protein-coding genes 1 links to, which redirects to unrelated article Chinese Canadian National Council. They should probably link to CCNC (gene).
 * List of human protein-coding genes 1 and Cathepsin Z link to, which redirects to unrelated article Flight Design CT. They should probably link to CTSW (gene).
 * List of human protein-coding genes 1, Helicase and ZGRF1 link to, which redirects to unrelated article DNA². They should probably link to DNA2 (gene).
 * List of human protein-coding genes 1 links to, which redirects to unrelated article EN postcode area. They should probably link to EN1 (gene).
 * List of human protein-coding genes 1 links to, which redirects to unrelated article EN postcode area. They should probably link to EN2 (gene).
 * List of human protein-coding genes 2 links to, which redirects to unrelated article Ethylenediaminetetraacetic acid. They should probably link to a new redirect ETDA (gene).  Which article should it redirect to?
 * Going to leave that as a redlink until it's better characterized.  Seppi  333  (Insert 2¢) 10:07, 2 December 2019 (UTC)
 * List of human protein-coding genes 2, Epstein–Barr virus-associated lymphoproliferative diseases, List of OMIM disorder codes and PD-1 and PD-L1 inhibitors link to, which redirects to article Icos about a genetics company. They should probably link to ICOS (gene).
 * List of human protein-coding genes 2 and Brpf1 link to, which redirects to unrelated article KAT-7. They should probably link to KAT7 (gene).
 * List of human protein-coding genes 2, Alpha/beta hydrolase superfamily and Ichthyosis link to, which redirects to an article Lamellar ichthyosis about a related disease. We may want to link via a new redirect LIPN (gene).
 * Created Lipase member N and LIPN (gene)
 * List of human protein-coding genes 2 and CARD domain link to, which redirects to unrelated article Dallas Mavericks. They should probably link to Mitochondrial antiviral-signaling protein, perhaps via a new redirect MAVS (gene).
 * List of human protein-coding genes 3 and AAA proteins link to, which redirects to unrelated article Null (SQL). They should probably link to NVL (gene).
 * List of human protein-coding genes 3 links to, which redirects to unrelated article Windows 95. It should probably link to OSR2 (gene).
 * List of human protein-coding genes 3 and several articles link to, which redirects to unrelated article Acute proliferative glomerulonephritis. They should probably link to PIGN (gene).
 * List of human protein-coding genes 3 and WD40 repeat link to, which redirects to unrelated article Poor Law Amendment Act 1834. They should probably link to PLAA (gene).
 * List of human protein-coding genes 3 and List of OMIM disorder codes link to, which redirects to unrelated article Rho. They should probably link to RHO (gene).
 * List of human protein-coding genes 3, Cancer syndrome and Housekeeping gene link to, which redirects to unrelated article SD card. They should probably link to SDHC (gene).
 * List of human protein-coding genes 4 and several articles link to, which redirects to unrelated article Helsingin Suomalainen Yhteiskoulu. They should probably link either to Syk or to its redirect SYK (gene).

Ahh. I was wondering why my NLP script didn’t locate those... it’s the hatnotes. I should probably reprogram it to fix that bug. Will fix these pages later tonight and (nothing to fix, exception maybe conversion to DABs; I think you guys are better judges of when/how to disambiguate than I though, so I'll leave it to you) revise the wikitables once we locate all these pages.  Seppi  333  (Insert 2¢) 02:02, 1 December 2019 (UTC)
 * Looks like you're right; all of them should link to the SYMBOL (gene) page since those are all the correct articles. I moved the Syk page to the official UniProt name for the protein (Tyrosine-protein kinase SYK) since the only synonym/alias with a lowercase spelling was "p72-Syk". I'll retarget the links in the gene lists/tables once we find the rest of these since it's much less work for me to add them all at once than piecewise. I can rewrite my script to detect the multi-word expressions used on the hatnote pages and just parse the leads to identify ones like Rho tomorrow since it's fairly easy to code that; but, I get the impression that you're able to identify all of the remaining links to mistargeted by simpler means than downloading and parsing 11500 pages.
 * Makes me want to learn SQL. What other methods do you use to locate pages like this? I'm really curious now.  Seppi  333  (Insert 2¢) 04:51, 1 December 2019 (UTC)
 * In theory I could have located these with SQL. In practice, it might have been too complex to complete within Quarry's 30 minute limit, so I used PetScan instead with a Wikipedia search for incoming links.  You mention checking 11,500 pages manually.  In a way I've done that check myself, but only on the 30 or so suspicious pages that remained after filtering out cases that the queries suggest to be correct. Certes (talk) 12:57, 1 December 2019 (UTC)
 * Oh. Wow, that's a surprisingly useful tool then. The algorithm is actually fully-automated; it basically just iteratively goes through all ~11500 of the blue wikilinks on the four list pages one at a time, loads the page (it takes 1.5 hours to run almost entirely because it has to load 11500 pages; I can't run it on a database dump), and determines whether or not the words "gene", "genes", "protein" or "proteins" are present on the page. It missed most of the links above because those words are in the DAB hatnotes. I hadn't considered that being a possibility when I wrote it. I should have some time to revise both the wikitable script to fix the lists and mistargeted link detection script to do a second check within the next 12-24 hours; shouldn't take that long to do.  Seppi  333  (Insert 2¢) 22:00, 1 December 2019 (UTC)
 * Finding the bad direct links is as simple as this, which takes 4 seconds. There are a few false positives such as Locus (genetics) from wikilinks not in the table, but they're obvious.  The links via redirects took a little more fiddling. Certes (talk) 22:52, 1 December 2019 (UTC)
 * I'll have to make use of that tool; seems very handy. Going to work on the gene lists now and update it once I'm done.  Seppi  333  (Insert 2¢) 10:07, 2 December 2019 (UTC)
 * Following up, I retargeted the links in the gene lists yesterday. Haven't quite finished reprogramming the other one yet, but will probably be tomorrow. I'll retarget the non-list gene articles with mistargeted links sometime within the next couple of hours.
 * Assuming neither of us find any additional pages, I suppose we're done. Thanks again for your help.  Edit: I didn't notice the sections above; will get to them after I retarget the links.  Seppi  333  (Insert 2¢) 10:04, 3 December 2019 (UTC)

Further progress
I've fixed incoming links apart from the gene lists which should link to CHML (gene) rather than CHML, AAMP (gene) rather than AAMP, etc. I see that some of these have been done manually in the lists (though a piped link might be better) but not in the Python. Also, do you have any thoughts about AKNA, CD96 and WRAP53? Certes (talk) 00:25, 16 December 2019 (UTC)
 * Hey there! I'm really sorry for falling off the grid after my last reply here; it seems rather rude of me. I've been really busy off-wiki lately and forgot to work on this. My bad about that. I'll go ahead and finish addressing the links above within the next day or so since I now have some time to work on WP. I'll fix AKNA, CD96, and WRAP53 right now though. I only need to adjust their wikidata sitelinks and add infobox gene to the article source. ✅


 * BTW, I finished recoding an updated version of my mistargeted link detection algorithm last week. The updated algorithm is designed to detect the type of mistargeted links you uncovered since I used all of the links that you listed in this section as a sample of testcases; I continually revised the algorithm until it had a 100% detection rate on that sample. This time around, it took 3.5 hours (originally, 1.5 hours) for the algorithm to finish processing all ~12,500 blue wikilinks in the gene lists (LOL). The likely mistargeted links it found are included in the collapse tab below. It found a few more articles with similar issues to the ones that you listed above; these articles would be included in the 2nd list in the tab below. Sometime within the next 24-48 hours, I'll manually go through all the links in the tab below and highlight the mistargeted ones I find. This is probably the last set of links in the gene lists that need to be fixed/retargeted since I think I've accounted for all possible ways that a false negative might occur.  Seppi  333  (Insert 2¢) 00:39, 18 December 2019 (UTC)

Note: immediately after each bulleted entry below, there are two index values listed:  and. Index  is the number of distinct gene-related terms that are present in the lead's source code and index   is the number of distinct gene-related terms that are present in the input parameters of the lead's hatnote templates, provided that any were found (NB: there's no entries in either list where one index equal to 0 and the other non-zero).

My original script detected links to articles where none of 4 gene-related terms (i.e., "gene", "genes", "protein", "proteins") were found anywhere in the article's source code (NB: these links would be marked with in the 1st list below) ; the updated version of my algorithm checked the source code of only the lead for 5 word tokens (i.e., the original 4 and "infobox_gene") instead of searching the full article's source code, so there's additional entries in the 1st list below that weren't detected by the original algorithm.

The updated algorithm also listed all articles that included specific gene-related multi-word expressions (i.e., the following phrases: "the gene", "the genes", "the protein", "the proteins", "the enzyme", "the enzymes", "(gene)", "(enzyme)", and "(protein)") in the parameters of certain lead hatnotes if any were present – specifically, the hatnote,  hatnote, and the family of redirect hatnotes like /,, etc.. These new entries are included in the 2nd list below and have corresponding index values of. If an entry in that list is marked with index values of 0<i<j, it's extremely likely that the link is mistargeted.

Entries in this list are articles where none of these 5 single-word tokens –  – are present in the source code of the article's lead.
 * ABO → ABO blood group system; i=0, j=0
 * ALKBH8 → TRNA (carboxymethyluridine34-5-O)-methyltransferase; i=0, j=0
 * AMACR → Alpha-methylacyl-CoA racemase; i=0, j=0
 * AKR1D1 → 5β-Reductase; i=0, j=0
 * AGMAT → Agmatinase; i=0, j=0
 * ASPA (gene) → Aspartoacylase; i=0, j=0
 * BHMT → Betaine—homocysteine S-methyltransferase; i=0, j=0
 * BHMT2 → Betaine—homocysteine S-methyltransferase; i=0, j=0
 * CERK → Ceramide kinase; i=0, j=0
 * COLEC12 → Collectin; i=0, j=0
 * CYP19A1 → Aromatase; i=0, j=0
 * DDT; i=0, j=0
 * GYG1 → Glycogenin; i=0, j=0
 * GYS2 → Glycogen synthase; i=0, j=0
 * HIBCH → 3-hydroxyisobutyryl-CoA hydrolase; i=0, j=0
 * INMT → Amine N-methyltransferase; i=0, j=0
 * IPMK → Inositol-polyphosphate multikinase; i=0, j=0
 * IVD (gene) → Isovaleryl-CoA dehydrogenase; i=0, j=0
 * LZTR1; i=0, j=0
 * MGME1; i=0, j=0
 * MTHFS → 5-formyltetrahydrofolate cyclo-ligase; i=0, j=0
 * MYO1G; i=0, j=0
 * NFIB (gene); i=0, j=0
 * OXT (gene) → Oxytocin; i=0, j=0
 * PCCB → Propionyl-CoA carboxylase; i=0, j=0
 * POR (gene) → Cytochrome P450 reductase; i=0, j=0
 * PPCS (gene) → Phosphopantothenate—cysteine ligase; i=0, j=0
 * PRSS56; i=0, j=0
 * PSTK → O-phosphoseryl-tRNASec kinase; i=0, j=0
 * PSTPIP2; i=0, j=0
 * SHMT1 → Serine hydroxymethyltransferase; i=0, j=0
 * SHMT2 → Serine hydroxymethyltransferase; i=0, j=0

Entries in this list are articles where one or more of these 5 single-word tokens –  – are present in the source code of the article's lead (index i is the count of how many distinct tokens were found, so if the word gene is repeated 2+ times in the lead and none of the other word tokens were found, the linked entry would have an index value of i=1) AND one or more of the following tokenized multi-word expressions –   – are present in the parameter inputs of an, , or redirect-type hatnote template that the algorithm found in the lead (index j is the count of the number of distinct aforementioned expressions that were detected in the hatnote's parameter inputs) :
 * ADAM22; i=5, j=1
 * ADAM7; i=4, j=1
 * AHR → Aryl hydrocarbon receptor; i=3, j=2
 * AIP (gene) → AH receptor-interacting protein; i=3, j=2
 * ALB (gene) → Serum albumin; i=4, j=1
 * ATM (gene) → ATM serine/threonine kinase; i=4, j=1
 * BAK1 → Bcl-2 homologous antagonist killer; i=3, j=1
 * BAMBI; i=3, j=1
 * BCAM → Basal cell adhesion molecule; i=2, j=1
 * BMI1; i=3, j=1
 * BTD → Biotinidase; i=2, j=1
 * CA12; i=2, j=1
 * CDKN1A → P21; i=3, j=2
 * CHL1; i=3, j=1
 * CHML;
 * CISH; i=3, j=2
 * CLN3; i=3, j=1
 * CTBS; i=4, j=1
 * CWC15; i=4, j=1
 * CYBB → NOX2; i=3, j=1
 * DBR1; i=3, j=1
 * DBX2; i=3, j=1
 * DCP2; i=4, j=2
 * DHPS; i=4, j=2
 * DOCK10 → Dock10; i=3, j=1
 * DR1;
 * DTNB; i=3, j=2
 * F5 (gene) → Factor V; i=2, j=1
 * F8 (gene) → Factor VIII; i=3, j=1
 * FANCI; i=5, j=1
 * GNMT; i=1, j=1
 * GRID2; i=3, j=1
 * HBA2 → Hemoglobin, alpha 2; i=2, j=1
 * HIRA; i=3, j=1
 * HK2; i=2, j=1
 * HSPA14; i=3, j=1
 * ID1; i=5, j=1
 * IL31 → Interleukin 31; i=3, j=1
 * IL32 → Interleukin 32; i=3, j=1
 * IL33 → Interleukin 33; i=3, j=1
 * IL34 → Interleukin 34; i=2, j=1
 * INS (gene) → Insulin; i=3, j=1
 * KAT6B → MYST4; i=5, j=1
 * LAD1 → Ladinin 1; i=3, j=1
 * LCTL; i=2, j=1
 * MAGEC2; i=4, j=1
 * MAP6; i=4, j=1
 * MIS12; i=5, j=1
 * MKX; i=4, j=1
 * MTA1; i=4, j=1
 * MTA2; i=5, j=1
 * MTA3; i=5, j=1
 * NONO → NONO (protein); i=4, j=2
 * NPW; i=3, j=1
 * OSR1; i=5, j=1
 * PEPD; i=2, j=1
 * PIM2;
 * PKIA; i=3, j=2
 * PNN (gene) → Pinin; i=3, j=2
 * POLA1 → DNA polymerase alpha catalytic subunit; i=2, j=2
 * PRCP; i=3, j=1
 * PRND; i=3, j=1
 * PTK2; i=3, j=1
 * PTX3; i=4, j=2
 * RAB3A; i=3, j=1
 * RAI1; i=1, j=1
 * RHOB; i=3, j=1
 * RHO (gene) → Rhodopsin; i=2, j=1
 * RP1; i=3, j=1
 * RRAS; i=4, j=1
 * RTL1; i=2, j=1
 * SAT2; i=3, j=1
 * SCN1A → Nav1.1; i=3, j=2
 * SERPINA7 → Thyroxine-binding globulin; i=2, j=1
 * SMCP; i=3, j=1
 * ST13; i=4, j=2
 * ST14; i=3, j=2
 * ST7; i=3, j=2
 * TCF4; i=3, j=1
 * TFPT; i=3, j=1
 * TOX; i=5, j=1
 * TRA (gene); i=4, j=3
 * ZFX; i=3, j=1
 * ZP2; i=4, j=1


 * Also, thank you so much for helping me find and address the problematic links in the gene lists! I can't adequately express just how much I appreciate your assistance thus far.
 * If it weren't for you, several dozen links in the gene lists probably would've continued to point to the wrong articles since I don't think I would've realized the issues with the original algorithm that were producing false negatives.  Seppi  333  (Insert 2¢) 00:46, 18 December 2019 (UTC)
 * No problem: there is no deadline and we all have things to do offline, especially in December. I'm happy to have helped but have probably done all I can for now.  I think the only outstanding issue not mentioned above is cases like CTU2, where the base name leads to a rather flimsy non-gene primary topic and we need either a redirect hatnote or a two-entry dab.  (I'm not sure which is better.)  However, I think all the wikilinks now lead to the right destination even in those cases.  We've made a lot of improvements and it looks as if the job's almost complete.   Certes (talk) 01:26, 18 December 2019 (UTC)

I went through all the links and fixed problems that I found. In addition to the 4 you identified (CHML, DR1, HPX, and PIM2), it looks like only DDT is new. I'll fix these links in the lists shortly.  Seppi  333  (Insert 2¢) 15:58, 24 December 2019 (UTC)
 * I missed DDT because it's in Category:Nonsteroidal antiandrogens, a subcategory of Hormones, which I viewed as legitimate link targets. When I stopped excluding Hormones from my Petscan query, DDT appeared and nothing else did, so I don't see any similar cases.  Most links to the pesticide seem correct but please can you fix the Python for List of human protein-coding genes 1 and check Protein design, which should perhaps link to DDT (gene) instead? Certes (talk) 16:23, 24 December 2019 (UTC)
 * Looks like the DDT link in protein design is correctly targeted; had to read the paper to verify which page to link to (quote: Then they synthesized the 24-mcr (MIF1RPNVGAMSNFYHYPNIIIII:) designed to form a four-stranded 13-sheet and to bind the insecticide DDT . It did indeed...). Working on recoding the python script for the list pages right now.  Seppi  333  (Insert 2¢) 17:23, 24 December 2019 (UTC)
 * ✅ The lists have been updated with piped links for these genes.  Seppi  333  (Insert 2¢) 18:48, 24 December 2019 (UTC)