User:Was a bee/Gene

1. Test
A test trying to put a marker icon automatically onto the chromosome ideogram image to show location of gene (based on basepair position data stored in wikidata).

In other words, trying to make the image like below automatically.



Currently in Wikimedia Commons, there are about 100 ideogram images which are used to show gene position. See commons:category:Human chromosome ideograms which indicates gene location.





2. Result
Good (see the test case box at the right)

3. Calculation detail
The position where the marker should be put is calculated as follows. Although math expression looks something complex, actual calculation is not so complex. The concepts which are used here are basically only plus and minus for calculating length, and multiplication and division for calculating scaling.

4. See also
Effort for reader-friendliness for general readers Introductory gene textbook website by National Library of Medicine. It includes gene location data for each gene pages.
 * Wikipedia_talk:WikiProject_Medicine/Archive_97
 * Genetics Home Reference - gene list

5. Test with Module
at Module:Infobox gene/sandbox2

https://en.wikipedia.org/w/index.php?title=Sonic_hedgehog&diff=prev&oldid=795122559

Category:Pages with script errors - Article namespace

6. On the width of the marker
Gene length is varies from one by one. So marker width also has to change page by page. Here I list up some data which are needed to think about marker width.

As result of some experiments, marker width must be at least 2px, because 1px marker is difficult to detect.

And largest marker width would be 15px (red area in the table at the right).

Length of the longest human genes are... http://www.cshlp.org/ghg5_all/section/gene.shtml
 * Dystrophin     - 2.3 Mb at Chr.X
 * CNTNAP2     - 2.3 Mb at Chr.7
 * PTPRD     - 2.3 Mb at Chr.9
 * RUNX1     - 1.2 Mb at Chr.21
 * LARGE     - 760 kb at Chr.22

7. Ideograms
Currently used ideogram set is as above. If you want use different ideogram set, following 5 conditions must be met.
 * 1) 24 images are needed. (1-22 and XY)
 * 2) All 24 images must have same image size (same height and same width).
 * 3) Among all 24 images, pter (terminus of the p-arm, leftist point) and qter (terminus of the q-arm, rightest point) must be set at the same position.
 * 4) Banding pattern must be drawn in basepair-proportional style. Standard ideograms defined by ISCN are drawn based on actual visual appearance of stained chromosomes under microscope, is not basepair-proportional. (see the table below)
 * 5) All file names must have same format, changing only in chromosome number. For example, if you created chromosome 1 image named , then the rest of file names should be as follows.

After these 5 conditions are met, you can switch current images into new images, by changing the part of the code where ideogram file name is defined.

8. Forward and Reverse strands
https://www.biostars.org/p/210929/

https://www.biostars.org/p/3908/

http://seqanswers.com/forums/showthread.php?t=39388

In GRCh, as convention, direction from p-arm (short arm) to q-arm (long arm) is forward. The opposite direction is reverse.

From Nelson, Sarah C., et al. Trends in Genetics 28.8 (2012): 361-363. In all human reference chromosomes, as for other eukaryotes, the plus (+) strand is defined as the strand with its 5' end at the tip of the short arm (Genome Reference Consortium, personal communication, March 27, 2012).

Forward strand

Reverse strand

Forward strand

Reverse strand

Forward strand

Reverse strand


 * Sense (molecular biology)
 * Directionality (molecular biology)
 * Upstream and downstream (DNA)
 * Sense strand
 * Coding strand
 * Reference genome

9. Others
If the following kind of technology is available, it's so nice. But currently it seems there are no this kind of technology.
 * meta:2016_Community_Wishlist_Survey/Categories/Multimedia

Perhaps Graph extension can do something...
 * wikidata:Template:Extension:Graph
 * https://github.com/vega/vega/wiki/Marks#image

10. Pages for test

 * Dystrophin - long gene (2.3 Mb at Chr.X)
 * RUNX1 - long gene (1.2 Mb at Chr.21)
 * Oct-4 - has many Ensembl IDs
 * Aprataxin - has many RefSeq IDs
 * MT-ND1 - mitochondrial gene
 * IFNA8 - has human data, but mouse data