Wikipedia:WikiProject Chemistry/IRC discussions/20 May 2008

--- Log opened Tue May 20 11:22:21 EDT 2008

11:22 * dmacks can't stay today:(

11:22 -!- dmacks [n=dmacks@pdpc/supporter/active/dmacks] has left #wikichem []

11:33 -!- ChemSpiderman [n=tony@c-68-33-211-217.hsd1.md.comcast.net] has joined #wikichem

11:34 -!- Rifleman_82 [n=blahblah@wikipedia/Rifleman-82] has joined #wikichem

11:34 -!- mode/#wikichem [+v Rifleman_82] by ChanServ

11:34 &lt;+Rifleman_82&gt; hi all

11:43 &lt;@Physchim62&gt; hi RM!

11:43 &lt;@Physchim62&gt; hi CSM, dmacks

11:46 &lt;+Rifleman_82&gt; hey pc

11:46 &lt;+Rifleman_82&gt; :)

11:46 &lt;+Rifleman_82&gt; today seems quiet

11:47 &lt;+Rifleman_82&gt; won't be able to stay long too

11:49 &lt;@Physchim62&gt; does anyone know if Martin is coming?

11:50 &lt;@Physchim62&gt; Rifleman_82, have you been able to take a look at the "First 500" organics at all?

11:51 &lt;+Rifleman_82&gt; think martin's travelling

11:51 &lt;+Rifleman_82&gt; yes i have

11:51 &lt;@Physchim62&gt; what do you think?

11:51 &lt;+Rifleman_82&gt; i've looked at some of the comments

11:51 &lt;+Rifleman_82&gt; and tried to fix them

11:52 &lt;+Rifleman_82&gt; as in, fix the WP entry

11:52 &lt;+Rifleman_82&gt; i'm using chemfilebrowser

11:52 &lt;+Rifleman_82&gt; i can only read SDF, can't write

11:52 &lt;@Physchim62&gt; I think Martin is still in Potsdam; don't know if he can make it to IRC though, he is very busy with exams

11:52 &lt;@Physchim62&gt; I use chemfilebrowser as well

11:52 &lt;+Rifleman_82&gt; ic

11:52 &lt;+Rifleman_82&gt; chemfilebrowser crashes quite easily

11:53 &lt;@Physchim62&gt; ESPECIALLY with the CAS file :P

11:53 &lt;+Rifleman_82&gt; when you display in tabular form

11:53 &lt;+Rifleman_82&gt; yeah

11:53 &lt;+Rifleman_82&gt; also not very searcheable?

11:53 &lt;@Physchim62&gt; no. but if you have any comments, just quote the structure ID and we can find it ;)

11:54 &lt;+Rifleman_82&gt; ok :)

11:54 &lt;+Rifleman_82&gt; still not quite pleased with the drugbox

11:54 &lt;+Rifleman_82&gt; :P

11:54 &lt;+Rifleman_82&gt; i wish we could replace it with a chembox

11:55 &lt;@Physchim62&gt; they won't let us ;)

11:55 &lt;+Rifleman_82&gt; ah...

11:55 &lt;+Rifleman_82&gt; chembox new already has a pharma module

11:55 &lt;+Rifleman_82&gt; actually i've replaced a few drugboxes with chemboxes surreptiously

11:55 &lt;@Physchim62&gt; there was a huge row a couple of years ago when Wim tried

11:55 &lt;+Rifleman_82&gt; nobody noticed

11:55 &lt;+Rifleman_82&gt; but i suppose if we did it en masse...

11:56 &lt;@Physchim62&gt; of course, Wim being Wim, he tried to force it through on "Paracetamol", a featured article! :P

11:56 &lt;+Rifleman_82&gt; haha

11:56 &lt;+Rifleman_82&gt; i was talking to edgar about pancuronium

11:56 &lt;+Rifleman_82&gt; i recall we agreed not to give ions chemboxes, but that's what we have

11:56 &lt;@Physchim62&gt; since then, we have called (and fairly effectively enforced) a truce on the matter

11:57 &lt;ChemSpiderman&gt; Hi gents

11:57 &lt;@Physchim62&gt; the Chembox should be able to give all the pharmacological data which is in the Drugbox: at least it could when I wrote it!

11:57 &lt;@Physchim62&gt; hi Antony, glad you could make it

11:57 &lt;ChemSpiderman&gt; Yup...change of plans.

11:57 &lt;+Rifleman_82&gt; hi antony

11:58 &lt;+Rifleman_82&gt; yeah, it still can

11:58 &lt;+Rifleman_82&gt; i'll try to do the explosiveboxes first

11:58 &lt;+Rifleman_82&gt; haha

11:58 &lt;+Rifleman_82&gt; they seem an easier target

11:58 &lt;@Physchim62&gt; Explosiveboxes are a much bigger concern than dugboxes

11:58 &lt;+Rifleman_82&gt; if i do everything in a day they would probably accept it as a fait accompli

11:58 &lt;+Rifleman_82&gt; bigger concern?

11:58 &lt;@Physchim62&gt; ie, they are worse as infoboxes ;)

11:59 &lt;+Rifleman_82&gt; ah yeah

11:59 &lt;+Rifleman_82&gt; but to be fair

11:59 &lt;+Rifleman_82&gt; drugbox implemented the transcluded template ahead of us

12:00 &lt;+Rifleman_82&gt; or at least

12:00 &lt;+Rifleman_82&gt; implemented on a large scale

12:00 &lt;+Rifleman_82&gt; we took a long time to clean up the chembox old

12:00 -!- walkerma [n=chatzill@admin-151-108.potsdam.edu] has joined #wikichem

12:00 &lt;+Rifleman_82&gt; hi martin

12:01 &lt;ChemSpiderman&gt; hi

12:01 &lt;walkerma&gt; Hi! Looks like we have a good number here already- Better than last week!

12:01 &lt;+Rifleman_82&gt; hi martin!

12:01 &lt;walkerma&gt; Have you been discussing anything already?

12:01 &lt;+Rifleman_82&gt; heh

12:01 &lt;@Physchim62&gt; Drugbox is simple compared to chembox

12:01 &lt;+Rifleman_82&gt; we were discussing subverting the pharma group

12:02 &lt;+Rifleman_82&gt; maybe replacing all their drugboxes with chemboxes in an hour? :)

12:02 &lt;@Physchim62&gt; Andrew has been listening to Wim too much, he wants to restart the infobox war

12:02 &lt;+Rifleman_82&gt; actually i haven't spoken to him about this

12:02 &lt;+Rifleman_82&gt; it just bothers me on grounds of aesthetics and consistency

12:02 &lt;walkerma&gt; Yes, and I said I would talk to them, and never got around to it :(

12:02 &lt;+Rifleman_82&gt; yeah well if i went rogue fvas would kill me

12:03 &lt;+Rifleman_82&gt; do explosivebox first

12:03 &lt;+Rifleman_82&gt; as in, i'll do that first

12:03 &lt;walkerma&gt; Cacycle can probably help too

12:04 &lt;walkerma&gt; So are we ready to upload the first 500 from the SDF?

12:05 &lt;walkerma&gt; Or do we need to talk to the drugbox folks first?

12:05 &lt;@Physchim62&gt; we are fairly ready on the issue of substance identification and CAS number verification

12:06 &lt;@Physchim62&gt; (as we are for the inorganics, for that matter)

12:07 &lt;walkerma&gt; Physchim62: Can you email me an SDF version of the inorganics, and I'll do some double-checking while I'm in England?

12:07 &lt;@Physchim62&gt; ChemSpiderman said by email that he had some ideas which he was working on for uploading, and that he would rather we wait

12:08 &lt;ChemSpiderman&gt; We can discuss them here

12:08 &lt;@Physchim62&gt; walkerma, the inorganics are not in SDF format, nor do I see how that would be useful... I can mail you an Excel file...

12:08 &lt;ChemSpiderman&gt; I am thinking that if we get the organics looking okay

12:08 &lt;ChemSpiderman&gt; then I can try and get a series of images generated automatically

12:08 &lt;walkerma&gt; Thanks

12:09 &lt;ChemSpiderman&gt; each with embedded InChI which is how I ffirst got connected with Martin while at ACD/labs

12:09 &lt;walkerma&gt; ChemSpiderman: Can you remind me what the issue is before we upload? Is it the structures?

12:09 &lt;ChemSpiderman&gt; Send me the Excel file too...I can probably generate an SDF file and include into the masterfile

12:09 &lt;ChemSpiderman&gt; yes...Structure format consistency

12:09 &lt;walkerma&gt; (Other than the obvious one of how to do the upload)

12:10 &lt;@Physchim62&gt; I am a little dubious here... a little voice is shouting "hold on!"

12:10 &lt;ChemSpiderman&gt; that's a little devil :-)

12:11 &lt;walkerma&gt; Yes, we need to get this right. I'd say that we need to find a way to match up the structure as displayed on WP with the data from the SDF

12:11 &lt;@Physchim62&gt; *which* structures are going to be concerned by this move? everything in the master SDF file, or only those which have been positively validated?

12:11 &lt;walkerma&gt; Because in general the structure on WP is a common sense correct structure, but the one in the SDF is machine-generated

12:12 &lt;@Physchim62&gt; *which* MOL files are we going to use? Antony's or CAS's (both have some errors, as is only to be expected)

12:12 &lt;ChemSpiderman&gt; if there are errors in my Molfiles then they need correcting.

12:12 &lt;ChemSpiderman&gt; If they are not correct then the file is broken by default.

12:13 &lt;ChemSpiderman&gt; Focus only on the first 500

12:13 &lt;@Physchim62&gt; there are still compounds in the first 500 with Antony's "CHECK" comments

12:13 &lt;ChemSpiderman&gt; Ah-ha! Agreed

12:13 &lt;@Physchim62&gt; and some entries for which I have raised problems which don't seem to have been dealt with yet

12:14 &lt;walkerma&gt; So should we resolve those problem entries, or just set them aside for now?

12:15 &lt;@Physchim62&gt; there was one structure which I spent - literally - hours over before deciding that CAS and Antony didn't agree: luckily, I know where to get a definitive answer on that one

12:16 &lt;+Rifleman_82&gt; which oneis that?

12:16 &lt;@Physchim62&gt; I'm trying to find it again now, I don't have my notes with me!

12:17 &lt;ChemSpiderman&gt; Depending on what it is I would trust CAS. Expecially if it's stereochemistry in Natural products

12:17 &lt;ChemSpiderman&gt; There are many structures on WP with one stereo incorrect

12:17 &lt;ChemSpiderman&gt; stereocenter

12:20 &lt;walkerma&gt; Yes but then we get back to the old issue of What is the "correct" stereochemistry - for some pharmaceuticals and some natural products it's clear, but in some cases it's not

12:20 &lt;ChemSpiderman&gt; that's why the exmaple in question matters

12:21 &lt;walkerma&gt; The name of the WP article may not indicate a particular stereoisomer

12:21 &lt;walkerma&gt; The classic one: http://en.wikipedia.org/wiki/Thalidomide

12:21 &lt;@Physchim62&gt; the structure was Ajmaline, #156

12:21 &lt;ChemSpiderman&gt; We are working on matching CAS numbers with structures in the ChemBox etc. and heading for consistency. I think we have MOST of this all resolved in the first 500

12:22 &lt;@Physchim62&gt; we certainly have most of it resolved for structures where we have a match with the CAS file

12:22 &lt;+Rifleman_82&gt; undesignated?

12:22 &lt;+Rifleman_82&gt; for thalidomide?

12:22 &lt;+Rifleman_82&gt; because it racemizes?

12:23 &lt;@Physchim62&gt; yep, have a look at Acarbose as well in the first 500 list: WP gives the alpha isomer, but the CAS number refers to both alpha and beta

12:24 &lt;walkerma&gt; (Rifleman, yes it does)

12:24 &lt;@Physchim62&gt; (as does Acarbose)

12:26 &lt;@Physchim62&gt; Either myself or Antony could produce three lists: compounds we're sure about, compounds we're not sure about and coupounds where we simply don't know

12:27 &lt;ChemSpiderman&gt; the ony ones we know are the ones CAS validated so that's easy to separate

12:27 &lt;ChemSpiderman&gt; 10 secs work

12:27 &lt;ChemSpiderman&gt; The other separation is more work.

12:27 &lt;@Physchim62&gt; ChemSpiderman, for Ajmaline, there is a IUPAC _defined_ structure of ajmalan (the parent cpd), so I will go with two-against-one!

12:27 &lt;ChemSpiderman&gt; but I want to get the process agreed on now before we go any further

12:28 &lt;walkerma&gt; I wonder if what we need is (in effect) a field associated with the CAS # to indicate WHAT isomer/form the CAS# represents

12:28 &lt;ChemSpiderman&gt; opening the file to look at Ajmaline

12:28 &lt;@Physchim62&gt; couldn't we do that just as easily in text?

12:28 &lt;walkerma&gt; That would resolve a lot of these stereochemical issues. Maybe we could even add a link to the CAS-supplied data for that compound, from the CAS#

12:29 &lt;walkerma&gt; That way you could KNOW that the CAS# represents the racemic form - so even if the image file changes the CAS is clear

12:30 &lt;walkerma&gt; And yes, we'd indicate the basic info with text

12:30 &lt;walkerma&gt; Like we do already [123-4-56] (anhydrous) [456-7-67] (hexahydrate) etc

12:32 &lt;@Physchim62&gt; or "12-34-5 represents the 2-isomer, while 67-89-0 represent a mixture of isomers"

12:32 &lt;ChemSpiderman&gt; Ok...compared the structure of Ajmaline I have with the one from CAS

12:32 &lt;ChemSpiderman&gt; The one from CAS is UGLY

12:32 &lt;@Physchim62&gt; they're different!

12:32 &lt;ChemSpiderman&gt; WOw....really bad

12:32 &lt;ChemSpiderman&gt; Yes, they are different

12:32 &lt;@Physchim62&gt; yes, that's one reason it took me so long

12:33 &lt;@Physchim62&gt; I had to download a MOL file, then drag atoms all over the place!

12:33 &lt;ChemSpiderman&gt; But are there various forms of the Ajamline skeleton and CAS is one and we have the othere?

12:33 &lt;ChemSpiderman&gt; other?

12:33 &lt;@Physchim62&gt; no, ajmalin is a retained name for a specific stereochemistry

12:33 &lt;ChemSpiderman&gt; ANd the CAS number on the Ajmaline structue on WP is wrongly associated?

12:34 &lt;ChemSpiderman&gt; Ok. Then I will replace the one in the first 500

12:34 &lt;@Physchim62&gt; no, I THINK CAS is right on this one, but I need to check

12:34 &lt;ChemSpiderman&gt; :-) and here we go again...

12:34 &lt;ChemSpiderman&gt; It's fun huh?

12:34 &lt;ChemSpiderman&gt; NOT

12:35 &lt;ChemSpiderman&gt; One stereocenter is inverted

12:38 &lt;ChemSpiderman&gt; btw, according to the InChI there is one undefined stereocenter in both the WP structure and the CAS structure

12:38 &lt;@Physchim62&gt; I think two stereocenters are inverted

12:38 &lt;ChemSpiderman&gt; we should take this offline

12:38 &lt;ChemSpiderman&gt; These structures are painful

12:38 &lt;@Physchim62&gt; C-17 has its stereochemistry defined by the laws of physics, as far as I can gather

12:39 &lt;walkerma&gt; This is why I usually stick to articles on things like carvone - I can handle that!

12:40 &lt;walkerma&gt; I think, then, we need to have a quarantine area where we will hold entries with a problem flagged

12:40 &lt;@Physchim62&gt; but I preferred working hard on ajmaline than just ignoring the bloody psychedelics!

12:40 &lt;walkerma&gt; But hopefully we can find >400 out of the 500 where everything matches up without a problem?

12:40 &lt;ChemSpiderman&gt; :-)

12:41 &lt;@Physchim62&gt; Martin, it's nearer 250

12:41 &lt;walkerma&gt; Or are 200 of them PIKHAL entries?

12:41 &lt;walkerma&gt; PIHKAL

12:41 &lt;ChemSpiderman&gt; It's 162

12:42 &lt;ChemSpiderman&gt; 162 of the first 500 have numbers supplied by CAS

12:42 &lt;ChemSpiderman&gt; IF this is our criterion

12:43 &lt;@Physchim62&gt; NO, there are 32 more whose CAS numbers match but whose InChIs don't match because of CAS problems

12:43 &lt;@Physchim62&gt; the famous "barium nitrate" cases

12:44 &lt;@Physchim62&gt; so 194

12:44 &lt;ChemSpiderman&gt; that's not in the first 500 is it?

12:44 &lt;@Physchim62&gt; yes

12:44 &lt;walkerma&gt; Yes it is, I think

12:45 &lt;ChemSpiderman&gt; hmmm

12:45 &lt;ChemSpiderman&gt; looking

12:45 &lt;@Physchim62&gt; the ones marked "PC Validated" in the file I sent out

12:45 &lt;walkerma&gt; I was assuming that we'd go ahead and upload ALL that have been checked - and just indicate whether the CAS# is a validated one from CAS or not

12:46 &lt;walkerma&gt; Is that what others were thinking too?

12:46 &lt;@Physchim62&gt; there are 103 compounds for which we have NO CAS number from any source (at present)

12:47 &lt;ChemSpiderman&gt; yes Walkerma

12:47 &lt;walkerma&gt; But if we have the other data checked, we should upload that, right?

12:47 &lt;ChemSpiderman&gt; PC...and that is where we started the project..validation

12:47 &lt;@Physchim62&gt; ?

12:48 &lt;ChemSpiderman&gt; Getting CAS numbes for the WP:Chem set of structures from CAS

12:48 &lt;ChemSpiderman&gt; Anyhow, yes, even without CAS numbers we SHOULD upload all checked by us

12:48 &lt;walkerma&gt; If a compound has been checked and is good, but it lacks a CAS No., that's still worth uploading, surely?

12:48 &lt;ChemSpiderman&gt; they will have new Names, SMILES, InChIs etc

12:48 &lt;@Physchim62&gt; but what have we checked, in that case?

12:48 &lt;ChemSpiderman&gt; for sure

12:48 &lt;walkerma&gt; And InChIKeys

12:48 &lt;ChemSpiderman&gt; We have checked many things...

12:48 &lt;ChemSpiderman&gt; structure drawin

12:48 &lt;ChemSpiderman&gt; names

12:49 &lt;ChemSpiderman&gt; SMILES

12:49 &lt;ChemSpiderman&gt; MF consistency

12:49 &lt;ChemSpiderman&gt; Mw consistency

12:49 &lt;ChemSpiderman&gt; etc

12:49 -!- Beetstra [n=djbeetst@Wikimedia/Beetstra] has quit ["Bye Bye"]

12:49 &lt;@Physchim62&gt; the names, InChIs and InChIkeys will be WRONG in several cases, where there are multiple structures on a single SDF record

12:49 &lt;ChemSpiderman&gt; remember...the validation process was happening without CAS number validation initially

12:50 &lt;ChemSpiderman&gt; why?

12:50 &lt;ChemSpiderman&gt; An InChI can and does represent several structuress

12:50 &lt;ChemSpiderman&gt; whetehr we want it to is a different issue of course

12:50 &lt;ChemSpiderman&gt; up for discussion

12:50 &lt;ChemSpiderman&gt; they can easily be deleted for multiple structures

12:51 &lt;ChemSpiderman&gt; but a hydrochloride ssalt is a multi structure tooo

12:51 &lt;@Physchim62&gt; Agent Orange, #154: this is not a chemical compound, it is a mixture, and yet the SDF describes it as if it were an addition compound

12:51 &lt;ChemSpiderman&gt; yes, agreed

12:51 &lt;ChemSpiderman&gt; The InChI doesn't capture mixture ratios etc

12:51 &lt;ChemSpiderman&gt; so we could generate separate InChis

12:52 &lt;ChemSpiderman&gt; This would be manual and not automatic thoughh

12:52 &lt;ChemSpiderman&gt; this is an enormous project

12:52 &lt;@Physchim62&gt; we should do so (in this case, there are articles for each of the components as well)

12:52 &lt;+Rifleman_82&gt; guys, gotta go

12:52 &lt;+Rifleman_82&gt; talk again

12:52 -!- Rifleman_82 [n=blahblah@wikipedia/Rifleman-82] has quit []

12:52 &lt;ChemSpiderman&gt; so, what we need is this teams feedback about what too do on each article and it can get done

12:53 &lt;@Physchim62&gt; not that big, such "problem articles" are less than 10% of the organic dataset

12:53 &lt;ChemSpiderman&gt; it's only as we work though it that we are seeing these issuess

12:53 &lt;ChemSpiderman&gt; I mean the entire validation project is enormous.

12:54 &lt;ChemSpiderman&gt; I judge we are at 100s of hours now and still nothing published...and I think that's what Martin wants to see

12:54 &lt;@Physchim62&gt; they are a larger proportion of the inorganic set, which explains why I have a different methodology for verification than Antony (both are, of course, useful!)

12:54 &lt;ChemSpiderman&gt; something published as an outcome of effort to date

12:54 &lt;walkerma&gt; We need to have various piles - A: Everything is OK; B: Everything is OK except for resolving a structural issue; C: Other problems.

12:54 &lt;walkerma&gt; Then we publish pile A

12:55 &lt;walkerma&gt; then we set to work resolving pile B

12:55 &lt;walkerma&gt; then deal with the difficult ones, pile C

12:55 &lt;@Physchim62&gt; OK, so how do we "publish"?

12:55 &lt;walkerma&gt; That should be doable, shouldn't it?

12:55 &lt;@Physchim62&gt; that's what we're working on, Martin!

12:55 &lt;walkerma&gt; Physchim62: Yes, that's the next big problem!

12:56 &lt;@Physchim62&gt; OK, so at least everyone has the same strategic goals! :)

12:56 &lt;walkerma&gt; Do we go with Physchim62's method, or do we use Daniel's data page method?

12:57 &lt;walkerma&gt; dmacks_logging, are you actually around?

12:57 &lt;ChemSpiderman&gt; no he's not

12:57 &lt;@Physchim62&gt; no, he is studying for finals

12:57 &lt;walkerma&gt; Grading finals?

12:58 &lt;@Physchim62&gt; as opposed to Martin, who merely has to mark them ;)

12:58 &lt;@Physchim62&gt; (and write them, but one would hope that that has been done ;) )

12:58 &lt;walkerma&gt; dmacks teaches at a college as well

12:59 &lt;walkerma&gt; It's hard for me to tell which approach will work better - PC, what are your thoughts?

12:59 &lt;@Physchim62&gt; fair enough, but he is exammed-out

13:00 &lt;@Physchim62&gt; what are our objectives here?

13:01 &lt;walkerma&gt; We need a way to present our validated data within chemboxes and drugboxes, in such a way that we can protect validated information

13:01 &lt;walkerma&gt; But in a way that can be updated when necessary

13:02 &lt;@Physchim62&gt; Daniel's solution works better for checking the stability of data, but it is less Wiki-friendly (and might cause us political problems); My solution is only really useful for for indexing, although it does allow some automated verification as well

13:02 &lt;walkerma&gt; We need to make sure that associated data remain associated - e.g., structure/InChI/InChIKey/SMILES/formula/MW

13:03 &lt;@Physchim62&gt; true, but possibly a different issue

13:03 &lt;walkerma&gt; Physchim62: Can you elaborate further on the differences between the two approaches?

13:03 &lt;@Physchim62&gt; we need to present reliable data, and we need to help people to find it

13:06 &lt;@Physchim62&gt; Daniel's idea is to place the code for the Chembox on a subpage, which will then be transcluded (like a template) into the article: ADV; the chembox pages can be easily monitored for changes, and are less visible to passing vandals: DIS; it is more difficult for well-meaning users to add data to the chemboxes, WP in general does not like subpages for exactly this reason

13:08 &lt;@Physchim62&gt; My idea is to insert machine readable code at the end of each article which identifies the compound(s) discussed: ADV; is based on an accepted principle, is invisible to most users so less of a target for vandalism, links search engines etc directly to the article page: DIS; hardly looks very glorious for all the effort we're expending, difficult to expand to include more data than simple...

13:08 &lt;@Physchim62&gt; ...identifiers

13:09 &lt;@Physchim62&gt; any questions? ;)

13:10 &lt;walkerma&gt; can you put in all of the SDF info into this code? Structures?

13:10 &lt;@Physchim62&gt; InChIs, yes, it was *designed* for InChIs!

13:11 &lt;@Physchim62&gt; MOL files... in principle, yes, but we would quickly run into problems. Why do it? Why not have a seperate service for MOL files if that's what people want?

13:13 &lt;@Physchim62&gt; otherwise, so far, I have envisaged, InChIs, InChI keys, preferred IUPAC names and CAS registry numbers

13:14 &lt;walkerma&gt; It sounds like the main problem with Daniel's method is political - and maybe that could be handled? I'm wondering if there was a separate "Edit this Chembox" link in the Chembox, if that would satisfy the wiki police?

13:14 &lt;ChemSpiderman&gt; I would also add SMILES, MF, Mw etc but be careful again with multicimponet structures

13:15 &lt;@Physchim62&gt; my solution also works better when there are multiple compounds discussed in a single article: each one has its own template, and the average reader won't notice the difference. the bot reader, on the other hand, will know that each of the compounds is treated on that page

13:15 &lt;walkerma&gt; Regarding the problem (in dmacks method) of directing to the actual article page, I have a feeling he had a solution for that

13:15 &lt;@Physchim62&gt; ChemSpiderman, that would not be a problem

13:16 &lt;@Physchim62&gt; I think he did as well, but I woul need to check to be sure

13:17 &lt;walkerma&gt; One point - if we solve this issue in a nice way, MANY projects will want to copy what we've done here. Remember what happened with article assessment (now 1.25 million articles and growing..)!

13:17 &lt;ChemSpiderman&gt; Also, if acceptable to WP:Chem I would like to run the SDF across ChemSpider and extract CSIDs to provide links to ChemSPider since we will have analytical data on there etc.

13:17 &lt;ChemSpiderman&gt; goes back to my question about links to CS from WP

13:18 &lt;walkerma&gt; I think links to ChemSpider are perfectly appropriate, since it provides open access structure-specific data unavailable on WP

13:18 &lt;@Physchim62&gt; I don't see any problem with putting CSIDs in the publically visible Chembox: I would be more reticent about including them as metadata

13:18 &lt;ChemSpiderman&gt; works for me.

13:19 &lt;@Physchim62&gt; but then, IMHO, it would be better for CS to have them publically visible!

13:19 &lt;ChemSpiderman&gt; ?

13:19 &lt;ChemSpiderman&gt; puzzled...

13:20 &lt;@Physchim62&gt; if they are visible to all users, people will click on them and come to ChemSpider; if they are in semi-invisible metadata, they are only really useful to bots

13:20 &lt;@Physchim62&gt; (or spiders...)

13:21 &lt;ChemSpiderman&gt; yes...what I would like is similar status to pubChem, Drugbank and eMolecules

13:21 &lt;@Physchim62&gt; personally, I think we should ditch eMolecules

13:21 &lt;ChemSpiderman&gt; I'm holding my tongue on that.

13:22 &lt;walkerma&gt; I agree, though we need to see if we can provide the same information as they do, but in a different way

13:22 &lt;@Physchim62&gt; it doesn't give us the range of information that we have on ChemSpider

13:22 &lt;walkerma&gt; That's a debate for another day

13:23 &lt;@Physchim62&gt; incidentally, other language WPs are increasingly using ChemSpider for info gathering

13:23 &lt;ChemSpiderman&gt; I noticed...Germany especially

13:24 &lt;walkerma&gt; So - would it be a good idea to do a "pilot trial" of the two upload methods - say 50 compounds? I realise we probably can't have the bot written to monitor them, but we could check things like "Can Google find the InChIKey" and such issues

13:26 &lt;walkerma&gt; Is this realistic, PC?

13:26 &lt;ChemSpiderman&gt; sounds like a good approach

13:26 &lt;@Physchim62&gt; my method is already tested for that, the answer is yes

13:27 &lt;@Physchim62&gt; I think the same goes for Daniel's idea, although it is still at an earlier stage

13:27 &lt;@Physchim62&gt; "tributylphosphine" was the original test page for my method

13:28 &lt;@Physchim62&gt; Note that it takes Google about 10 days to pick up the changes

13:28 &lt;walkerma&gt; I think I'd like to bring the idea to the WP:Chem people, and say - "This page here lists 50 compounds set up using PC's method, and 50 (other) compounds set up using dmacks's method - now go and play with these"

13:29 &lt;@Physchim62&gt; note as well that my method and Daniel's are not mutually exclusive!

13:29 &lt;@Physchim62&gt; we COULD do both...

13:29 &lt;walkerma&gt; Physchim62: Certainly - and there may be a case to be made for doing both

13:30 &lt;@Physchim62&gt; either is simple to reverse, as well

13:30 &lt;walkerma&gt; But I'd like to hear what people like Beetstra, BDuke, Itub, Cacycle, etc think of how these things work in practice

13:31 &lt;@Physchim62&gt; OK, let's go with that.

13:31 &lt;walkerma&gt; I think that way we'll find if one (or both!) method is hard to scale, or clunky etc, and we can fix the bugs

13:31 &lt;@Physchim62&gt; it will have to be done "by hand", and probably won't be done this week

13:33 &lt;walkerma&gt; ChemSpiderman, can you assign 50 structures to each person? We should probably include both organics and a few inorganics in each, just to make sure! And please, not 49 psychedelics...!

13:33 &lt;walkerma&gt; Bearing in mind that we may choose to use both, I don't see it as a competition

13:33 &lt;@Physchim62&gt; why not just split the list of 162 clearly verified compounds down the middle

13:34 &lt;ChemSpiderman&gt; :-( Martin...I am off for an MRI today and then surgery whenever.

13:34 &lt;@Physchim62&gt; all organics

13:34 &lt;@Physchim62&gt; I'll do it!

13:34 &lt;ChemSpiderman&gt; I can do PC's suggestion RIGHT NOW

13:34 &lt;walkerma&gt; Sure, we could do that for starters!

13:34 &lt;@Physchim62&gt; so can I :P

13:34 &lt;ChemSpiderman&gt; If PC wants to do it fine with me..it'll take me 60 secs to do it and send them out

13:34 &lt;walkerma&gt; Yes, I have an ACS local section meeting at 1400h US EDT (in 25 minutes) - in the office next door

13:34 &lt;ChemSpiderman&gt; PC..your call...either way is fine with me

13:34 &lt;@Physchim62&gt; if my Maths hasn't failed me, that gives us 81 compounds for each method

13:35 &lt;ChemSpiderman&gt; hmm...good math!

13:35 &lt;walkerma&gt; Dmacks said that he could work on this very thing once the marking was all done, and he could maybe even write a bot as well

13:35 &lt;ChemSpiderman&gt; PC..want to split it?

13:36 &lt;ChemSpiderman&gt; I am out over here shortly...

13:36 &lt;@Physchim62&gt; I will email him with a list of compounds to do, and then do mine myself

13:36 &lt;ChemSpiderman&gt; if you can split that would be great

13:36 &lt;ChemSpiderman&gt; excellent

13:36 &lt;walkerma&gt; FYI: Some of my limited wiki-time is being spent moderating this very lively discussion"

13:36 &lt;walkerma&gt; http://en.wikipedia.org/wiki/Wikipedia_talk:Version_1.0_Editorial_Team/Assessment#Votes_on_changing_the_assessment_scale

13:37 &lt;walkerma&gt; That is a very important debate, and it means I can't devote much time to chem before I travel

13:37 &lt;@Physchim62&gt; can't you just get a friendly admin to salt it?

13:37 &lt;walkerma&gt; I think most of the people commenting are admins, you'd have a wheel war if you weren't careful! No, we really need to do this.

13:38 &lt;@Physchim62&gt; OK, so I have agreed to get an Excel file of the inorganics to the list members ASAP, to provide Daniel with a list of compounds for testing our methods and to get templates onto the other half of the list...

13:38 &lt;ChemSpiderman&gt; looks important but having done meetings in Fortune 500 companies I wouldn't enjoy it

13:39 &lt;walkerma&gt; OK, that sounds like progress! I will promise to work on offline curationwhile I'm away. Antony, can you email me and let me know exactly WHAT you'd like me to work on? Or should I just continue on after entry#3300 or so?

13:40 &lt;ChemSpiderman&gt; I've done some work so I suggest sending you from 3300 onwards

13:40 &lt;ChemSpiderman&gt; I need to finish my next chunk..500 onwards and then knit your stuff in from before

13:40 &lt;walkerma&gt; CSD- thanks! Physchim62 - could you perhaps get 19 inorganics for each set - so we have a round 100 in each test dataset?

13:41 &lt;ChemSpiderman&gt; I will send you from 3300 onwards after this chat

13:41 &lt;walkerma&gt; Thanks

13:41 &lt;walkerma&gt; I think we can close now, alright?

13:41 &lt;@Physchim62&gt; walkerma, OK, no difficulty there

13:42 &lt;@Physchim62&gt; and blimey, don't you people at WP1.0 have better things to do? :P

13:44 &lt;walkerma&gt; Thanks a LOT, both of you, and I'll be in touch while I'm away

13:44 &lt;walkerma&gt; Bye!

13:44 &lt;@Physchim62&gt; bye! have fun in Newcastle!

13:44 -!- walkerma [n=chatzill@admin-151-108.potsdam.edu] has quit ["ChatZilla 0.9.82.1 [Firefox 2.0.0.14/2008040413]"]

13:46 -!- ChemSpiderman [n=tony@c-68-33-211-217.hsd1.md.comcast.net] has quit []

--- Log closed Wed May 20 13:49:12 EDT 2008