Wikipedia:WikiProject Chemistry/IRC discussions/5 Feb 2008

--- Log opened Tue Feb 05 10:32:35 EST 2008

10:38 &lt;+Physchim62&gt; morning dmacks :)

10:38 &lt;dmacks&gt; hello

10:38 &lt;dmacks&gt; (not quite awake yet, waiting for coffee to brew:)

10:39 * Physchim62 passes dmacks a cup of coffee

10:39 &lt;dmacks&gt; *phew*, that's better. Thanks!

10:39 &lt;+Physchim62&gt; be careful, it was made with a Soxlet ;)

10:41 * Physchim62 can never remember how to spell that

10:42 &lt;dmacks&gt; hehe yeah. I think there's an "h" in it or something?

10:45 -!- ChemSpiderMan [n=ChemSpid@c-68-33-151-242.hsd1.md.comcast.net] has joined #wikichem

10:46 &lt;ChemSpiderMan&gt; I will be here but am going on to a conference call at 11am EST (the time this meeting starts) so my contributions will only start when I get off the call I'm afraid

10:46 &lt;+Physchim62&gt; ok

10:46 &lt;ChemSpiderMan&gt; Did you see all of my PDFs?

10:47 * dmacks didn't get a chance to look this week.

10:47 &lt;+Physchim62&gt; I've not seen the latest ones, but

10:47 &lt;+Physchim62&gt; I've seen up until about ten days ago, I think

10:47 &lt;ChemSpiderMan&gt; http://www.chemspider.com/filex/Wikipedia_reports/Wikpiedia_Review_project_020108.pdf

10:48 &lt;+Physchim62&gt; that one I've got

10:48 &lt;+Physchim62&gt; no, I haven't

10:48 * dmacks d/l & reads....

10:48 &lt;ChemSpiderMan&gt; I'm putting further work on hold until we make some decisions around the following:

10:49 &lt;ChemSpiderMan&gt; Representation of sugars

10:49 &lt;+Physchim62&gt; fire away

10:49 &lt;dmacks&gt; "HTTP401-permission denied" on that file.

10:49 &lt;ChemSpiderMan&gt; Primary keys - article versus structure versus CAS and CAS validation (topic for today)

10:49 &lt;ChemSpiderMan&gt; hmmm....

10:49 &lt;ChemSpiderMan&gt; Andrew...can you get to it?

10:50 &lt;+Physchim62&gt; I'm worried about the questions of protonation, esp. on biochemicals, which I think we will be discussing today as well

10:50 &lt;ChemSpiderMan&gt; dmacks...email address? I'll send it to you

10:50 &lt;dmacks&gt; dmacks@towson.edu

10:50 &lt;dmacks&gt; thx

10:51 &lt;ChemSpiderMan&gt; sent

10:51 &lt;ChemSpiderMan&gt; ok...yes: protonation is important

10:51 &lt;+Physchim62&gt; and difficult

10:51 &lt;ChemSpiderMan&gt; Hmmm...I think the majority is pretty easy actually.

10:52 &lt;ChemSpiderMan&gt; There are some problematic cases but it's 80/20

10:52 &lt;+Physchim62&gt; I sometimes get the impression that we're looking for a solution that -chemically- isn't there

10:52 &lt;+Physchim62&gt; especially for biochemicals which attached phosphate groups!

10:52 &lt;+Physchim62&gt; *with

10:53 &lt;ChemSpiderMan&gt; I hear ya....

10:53 &lt;+Physchim62&gt; ie, we will *never* be able to get a unique InChI or CAS, because the compound exists as a eqm mixture ;)

10:54 &lt;ChemSpiderMan&gt; But this again is where we come down to whether or not we are creating the tie to the structure or not

10:54 &lt;+Physchim62&gt; actually, CAS is simpler than InChI in that respect: CAS doesn't care about the chemical nature of the substance it describes ;)

10:54 &lt;ChemSpiderMan&gt; I mean...what's the structure of water?

10:54 &lt;ChemSpiderMan&gt; It's not just H2O

10:55 &lt;ChemSpiderMan&gt; same issue for me

10:55 &lt;ChemSpiderMan&gt; That's why a decision has to be made.

10:55 &lt;ChemSpiderMan&gt; For sugars...we can draw them many ways.

10:56 &lt;ChemSpiderMan&gt; What way should we "decide on" so we can get it done

10:56 &lt;ChemSpiderMan&gt; Staying stuck is going to be easy :-)

10:56 &lt;dmacks&gt; yeah:/

10:56 &lt;ChemSpiderMan&gt; I have 150 molecules now in PDFs for CAS validation...

10:57 &lt;+Physchim62&gt; but "get it done" has a sound of set in stone... there are four cyclic structures which can accurately be described as glucose, how should we present them?

10:58 &lt;+Physchim62&gt; D-glucose is "more important" than L-glucose, but an encyclopedia needs to say what the difference is!

10:58 &lt;dmacks&gt; Comes back to "what are we actually talking about" by "a chemical"...structure, group of structures (mixtures), group of structures (different forms of same structure), etc.

10:58 &lt;ChemSpiderMan&gt; we should present them as we "decide to"

10:58 &lt;ChemSpiderMan&gt; Because if we don't decide it won't get done and we stay where we are

10:58 &lt;ChemSpiderMan&gt; which is better?

10:59 &lt;+Physchim62&gt; agreed, it is an "editorial decision", there is no shame in admitting it!

10:59 &lt;ChemSpiderMan&gt; I have to make decisions like this about 20 million molecules.

10:59 -!- walkerma [n=chatzill@admin-151-108.potsdam.edu] has joined #wikichem

10:59 &lt;ChemSpiderMan&gt; I ask for opinions but then have to decide and move forward

10:59 &lt;ChemSpiderMan&gt; Martin...hi...

10:59 &lt;dmacks&gt; hello walkerma

10:59 &lt;walkerma&gt; Hello!

11:00 &lt;ChemSpiderMan&gt; Just to let you know I will be here but am on a conf call at 11am and will only be able to watch...will participate when I get off..

11:00 &lt;ChemSpiderMan&gt; the call

11:00 * dmacks will put the past few minutes of "hallway discussion" in the log.

11:00 &lt;walkerma&gt; OK, I understand

11:01 &lt;+Physchim62&gt; dmacks, you're logging?

11:01 &lt;dmacks&gt; yup

11:01 &lt;+Physchim62&gt; good :)

11:02 &lt;walkerma&gt; Does anyone know if Rifleman82 is coming? Itub?

11:02 &lt;ChemSpiderMan&gt; oops..I thought Andrew was on here...no wonder he didn't answer :-)

11:05 &lt;+Physchim62&gt; no idea

11:06 &lt;walkerma&gt; OK, I've sent Andrew a quick email. Should we get started?

11:06 * Physchim62 proposes that Martin takes the Chair ;)

11:06 &lt;dmacks&gt; Itub was active on WP a minute a go, so might be along...

11:07 &lt;walkerma&gt; OK, I just left Itub a message.

11:07 &lt;walkerma&gt; I was wondering if we could discuss CAS # validation

11:07 &lt;walkerma&gt; which seems very challenging for us

11:07 &lt;walkerma&gt; How should we proceed?

11:07 &lt;+Physchim62&gt; what is the challenge?

11:08 &lt;ChemSpiderMan&gt; Phone call is delayed until 11:15am..I am here. The challenge is validation. We have CAS numbers associated with structures. Are they correct

11:08 &lt;walkerma&gt; The challenge: We have at least 6000 compounds to check against original ACS information (not the Aldrich catalogue!)

11:09 &lt;walkerma&gt; It is a very slow process, and most of us (me included) don't have good access to SciFinder

11:09 &lt;+Physchim62&gt; no, we have CAS numbers *and* structures associated with *compounds*

11:09 &lt;ChemSpiderMan&gt; Searching on the CSA number to check the structure is the wrong way around also

11:09 &lt;ChemSpiderMan&gt; ASSUMING the structure/compound/substance shown in the Chembox is correct then what is a "representative" CAS number

11:10 &lt;walkerma&gt; Sorry, I'm still in InChI mode - thanks for the correction Antony. Are there other (legal ways) we can get CAS #s for SUBSTANCES?

11:10 &lt;walkerma&gt; besides Scifinder? That would be faster?

11:10 &lt;+Physchim62&gt; yes, there are 1001 ways to get them for /most/ of the compounds on WP

11:10 &lt;ChemSpiderMan&gt; there's only one "trusted way" I believe

11:11 &lt;ChemSpiderMan&gt; search the CAS registry using one of their tools

11:11 &lt;+Physchim62&gt; CAS #s are used as identifiers by many, many organizations which place info on the web: sometimes they get them wrong, of course, but we needn't be restricted to the CAS database

11:11 &lt;ChemSpiderMan&gt; I agree that there are good sources online. But how to sort the wheat from the chaff?

11:12 &lt;ChemSpiderMan&gt; CAS is the "authority"

11:12 &lt;+Physchim62&gt; if the "wrong" CAS no. is widely used, that is a piece of info which should be included in WP

11:12 &lt;walkerma&gt; I disagree, PC: If we want a VALIDATED set of data we should go to the authority

11:12 &lt;ChemSpiderMan&gt; it's expensive and time consuming to do though

11:12 &lt;dmacks&gt; "also often (incorrectly) listed as 12345-67-8"?

11:13 &lt;dmacks&gt; Needn't be part of our validated dataset, but useful note for WP.

11:13 -!- itub [n=tubert@lalo.chemie.unibas.ch] has joined #wikichem

11:13 -!- mode/#wikichem [+v itub] by ChanServ

11:13 &lt;walkerma&gt; dmacks: That would be nice, but the most important thing is getting the main CAS# right

11:13 &lt;+Physchim62&gt; replace "incorrectly" by "which is the number for LSD"

11:13 &lt;dmacks&gt; Right.

11:14 &lt;walkerma&gt; Itub: Welcome! We're talking about validation of CAS#s on WP

11:14 &lt;+itub&gt; hi

11:14 &lt;+Physchim62&gt; hi itub

11:14 * dmacks agrees that the real data is more important than other stuff.

11:15 &lt;walkerma&gt; I chatted with PC earlier, and wondered how they do this at ACS Pubs and at CAS

11:15 &lt;+Physchim62&gt; I disagree. our users are going to use these numbers to search for data elsewhere: if they use SciFinder, they will realise the mistake, but if they use other databases how would they know that the number was "wrong" if we're using the nulber which is normally used?

11:15 &lt;walkerma&gt; Does anyone know what tools THEY use? They have a lot more than 6000 to do

11:16 &lt;walkerma&gt; Physchim62:Fair point

11:18 &lt;+itub&gt; they probably have people working full time on that

11:18 &lt;+itub&gt; doesn't sound like a fun job

11:18 &lt;ChemSpiderMan&gt; don't understand the question? How does ACS pubs get CAS numbers?

11:18 &lt;+Physchim62&gt; we risk opening up a new field of dispute similar to the ones we had over the "correct" IUPAC name.

11:18 &lt;+Physchim62&gt; they ask CAS to assign them a number

11:18 &lt;ChemSpiderMan&gt; ah yes...for sure

11:19 &lt;+Physchim62&gt; there are CAS numbers for compounds which have never been published

11:19 &lt;ChemSpiderMan&gt; propehtic compounds

11:19 &lt;ChemSpiderMan&gt; prophetic.

11:19 &lt;ChemSpiderMan&gt; specifically for patents

11:19 &lt;+Physchim62&gt; soon to be patented compounds....

11:20 &lt;+Physchim62&gt; and also, for compounds whose description is still in the bowels of a journal's editorial office ;)

11:20 &lt;ChemSpiderMan&gt; yup

11:20 &lt;walkerma&gt; The question is: I'm at CAS. If I'm abstracting the 200 compounds described in a J. Org Chem paper, how do I check the CAS nos. for ths#s for those 200?

11:20 &lt;walkerma&gt; How do I know if any of them are new?

11:20 &lt;ChemSpiderMan&gt; draw them

11:21 &lt;ChemSpiderMan&gt; and search them

11:21 &lt;+Physchim62&gt; I believe they also assign numbers to hypothetical compounds, but I have no source for that

11:21 &lt;walkerma&gt; They have to be able to do that with lightning speed.

11:21 &lt;ChemSpiderMan&gt; Of course...i don't work there so I am making this up

11:21 &lt;+itub&gt; that's why people at CAS invented canonicalization

11:21 &lt;+Physchim62&gt; CAS has a *lot* of people working for it ;)

11:21 &lt;walkerma&gt; Yes, but surely they've streamlined things for efficiency?

11:22 &lt;ChemSpiderMan&gt; not really...when an article is submitted many come as images in documents and the OLE objects can easily be opened in an editor in general

11:22 &lt;ChemSpiderMan&gt; So, a word document with embedded Chemdraw or ISIS files can be opened in a structure editor.

11:22 &lt;ChemSpiderMan&gt; Word document ARE structure searchable...

11:22 &lt;+Physchim62&gt; I could check 200 compounds in a morning's work

11:22 &lt;walkerma&gt; They always ask for your original ChemDraw files - do they use those over at CAS?

11:23 &lt;ChemSpiderMan&gt; Provided the structures are embedded from an OLE compatible drawing package

11:23 &lt;ChemSpiderMan&gt; Yes...ACS blesses ISIS and ChemDraw

11:23 &lt;ChemSpiderMan&gt; But...if we want to know we should ask them :-)

11:23 &lt;walkerma&gt; I sent a couple of emails to ACS people I know, but got no reply as yet

11:23 &lt;+Physchim62&gt; we already have, we're waiting for /their/ reply ;)

11:24 &lt;walkerma&gt; The two people I contacted tend to be very busy people, so it doesn't mean they don't want to help

11:24 &lt;walkerma&gt; I know someone less well from CAS, who I'll contact too

11:25 &lt;walkerma&gt; Does anyone else here have useful contacts who might be able to answer our questions, and/or help?

11:25 &lt;ChemSpiderMan&gt; Just fyi structrue IMAGES can be converted too...but I don't know whether CAS do that: http://www.chemspider.com/blog/clide-and-more-complexity-chemspider-has-agents-and-eyeballs.html

11:25 &lt;+Physchim62&gt; but which numbers do we want to use?

11:25 &lt;ChemSpiderMan&gt; yes...back on topic...

11:26 &lt;+Physchim62&gt; CAS numbers are *not* unique

11:26 &lt;+Physchim62&gt; (nor could they be)

11:26 &lt;walkerma&gt; Before we leave validation:

11:26 &lt;walkerma&gt; Can we agree that we

11:26 &lt;+Physchim62&gt; some numbers are taken "out of service" on a regular basis

11:27 &lt;walkerma&gt; (a) Try to comeup with a quicker way to get CAS #s from an ACS source then

11:27 &lt;ChemSpiderMan&gt; The biggest part of the problem ca be solved by getting the CS number for the "structure/compound" as drawn. MOST of the "organic structures" I am curating likely have one CSA number

11:27 &lt;walkerma&gt; (b) If that fails, try and plod through with Scifinder where possible

11:27 &lt;+Physchim62&gt; (a) any "reliable source"

11:28 &lt;+Physchim62&gt; (b) is the point I was trying to get on to

11:28 &lt;walkerma&gt; (c) If all else fails use "any reliable source" such as ChemSpider?

11:28 &lt;+Physchim62&gt; if possible, multiple reliable sources

11:28 &lt;ChemSpiderMan&gt; No...please don't! While I am flattered we are definitely NOT an authority for CAS

11:28 &lt;ChemSpiderMan&gt; on phone now

11:29 &lt;walkerma&gt; OK, (c) could perhaps be multiple sources

11:30 &lt;walkerma&gt; Can we agree on that as a strategy?

11:30 &lt;ChemSpiderMan&gt; okay..back...call rescheduled.

11:30 &lt;ChemSpiderMan&gt; The problem you will have with multiple sources is MOST are contaminated

11:30 &lt;ChemSpiderMan&gt; I've tried this approach

11:30 &lt;+itub&gt; I think the only sure way is to check each with scifinder

11:31 &lt;dmacks&gt; Annotate the CAS# with "from whatever" if can't find authoritative source?

11:31 &lt;ChemSpiderMan&gt; You can find numbers but going back to your comment Martin...if you want them validated you need to go to the authority

11:31 &lt;walkerma&gt; dmacks: Yes that sounds like a good compromise

11:31 &lt;ChemSpiderMan&gt; yes...agree with DMACKS

11:31 &lt;+itub&gt; since cas numbers are opaque they are very often not checked for accuracy

11:31 &lt;ChemSpiderMan&gt; if you are willing to annotate that way that would work

11:32 &lt;walkerma&gt; ChemSpiderMan: What would you recommend as some of the more reliable non-ACS sources?

11:32 &lt;dmacks&gt; Alternative, have separate cas= and cas_unvalidated= fields in the infobox, then can know which aren't authoritative (and even search for them if one has some time and SciFinder access)

11:32 &lt;walkerma&gt; dmacks:Yes!

11:33 &lt;+itub&gt; good idea

11:33 &lt;ChemSpiderMan&gt; works for me

11:33 &lt;+itub&gt; better call them cas and cas_validated

11:33 &lt;walkerma&gt; yes

11:33 &lt;+itub&gt; so the default for existing cases is not validated

11:33 &lt;dmacks&gt; yeah

11:34 &lt;ChemSpiderMan&gt; for sure

11:34 &lt;+itub&gt; has there been any discussion of the legal issues? CAS says that one can only use up to 10,000 numbers for free...

11:35 &lt;ChemSpiderMan&gt; http://www.chemspider.com/blog/how-many-electronic-databases-have-more-than-10000-cas-numbers.html

11:35 &lt;ChemSpiderMan&gt; They really can't hold to this at this point

11:36 &lt;ChemSpiderMan&gt; They have to "enforce" such rules historically in order to apply them now

11:36 &lt;+Physchim62&gt; itub, this has been discussed at length in the past, it doesn't hold up

11:36 &lt;ChemSpiderMan&gt; PubChem has way more than 10000

11:36 &lt;+itub&gt; ok

11:36 &lt;+Physchim62&gt; Feist v. Rural is the U.S. copyright case

11:36 &lt;+Physchim62&gt; among others

11:37 &lt;+itub&gt; IMO it is a case of "you can copyright phone numbers"

11:37 &lt;+itub&gt; can't

11:37 &lt;ChemSpiderMan&gt; wow...great reference! Thanks...

11:37 &lt;dmacks&gt; Yeah!

11:38 &lt;+Physchim62&gt; you can't copyright *a collection* of phone numbers (unless your collection is particularly original, which CAS's isn't)

11:38 &lt;walkerma&gt; If we want ACS's help, we may want to ask them nicely for permission..!

11:39 &lt;+Physchim62&gt; WP's originality is in our selection of compound's, which is our own and which is licensed under th GFDL

11:39 &lt;walkerma&gt; We are a non-profit, after all

11:40 &lt;ChemSpiderMan&gt; Good luck...again, read this one later but I doubt you'll get support. i HOPE I am wrong.... http://www.chemspider.com/blog/intention-to-scrape-crystaleye-content-and-staying-in-relationship-with-publishers.html

11:40 &lt;ChemSpiderMan&gt; I cannot get an answer from the copyright group so a simple question

11:41 &lt;+Physchim62&gt; in short, there are no probs for the WMF: there might be problems for individual contributors, especially in Europe, but even here the risk is very limited

11:41 &lt;walkerma&gt; OK: Are there other CAS# issues we need to resolve today?

11:41 &lt;ChemSpiderMan&gt; so, CAS validated and CAS in the box then?

11:41 &lt;walkerma&gt; I think we can all agree on that

11:41 &lt;ChemSpiderMan&gt; Do we agree that we are out to get a CAS number consistent with the structure/compound shown?

11:42 * dmacks nods

11:42 &lt;walkerma&gt; Yes, I was hoping we could agree on that too

11:42 &lt;ChemSpiderMan&gt; so, a particular tautomer has the CAS for that tautomer

11:42 &lt;+Physchim62&gt; can I make I plea for us to use CAS numbers which relate to the title of the article, rather than necessarily the structure shown in the chembox?

11:42 &lt;walkerma&gt; That way it matches with the InChI, the SMILES, the IUPAC name

11:42 &lt;walkerma&gt; AND THE TITLE OF THE ARTICLE!

11:42 &lt;walkerma&gt; All should match!

11:42 &lt;+Physchim62&gt; on WP, the *title* is the primary key

11:42 &lt;ChemSpiderMan&gt; I don't think that's going to work

11:43 &lt;ChemSpiderMan&gt; I agree the title is the primary key

11:43 &lt;+Physchim62&gt; matching everything to the title is an impossibility in some cases

11:43 &lt;ChemSpiderMan&gt; agree

11:43 &lt;ChemSpiderMan&gt; glucose

11:43 &lt;walkerma&gt; Yes, there will be a problem in some cases

11:43 &lt;ChemSpiderMan&gt; The InChI is a form of the structure

11:43 &lt;ChemSpiderMan&gt; The SMILES is a form of the structure

11:44 &lt;ChemSpiderMan&gt; They convert to the structure'they do not convert to the title

11:44 &lt;walkerma&gt; So we need some clear guidelines in cases like Glucose, Tartaric acid

11:44 &lt;ChemSpiderMan&gt; The IUPAC name is for the structure

11:44 &lt;+Physchim62&gt; if there is a /generic/ CAS no for CoA, we should use that /regardless/ of the exact structure we show in the chembox. in an ideal world, we would also explain the problem in the article text...

11:44 &lt;dmacks&gt; I think a given cheminfobox should be completely self-consistent (incuding and other chembox-like stuff I guess) as a factual matter, than it's a WP editorial matter which infobox(es) go on each page.

11:44 &lt;ChemSpiderMan&gt; what about "CAS for structure" and "Generic CAS"

11:44 &lt;ChemSpiderMan&gt; yes dmacks. yes

11:45 &lt;ChemSpiderMan&gt; the chembox should be self-consistent

11:45 &lt;walkerma&gt; How about: We aim to match all of them wherever possible

11:45 &lt;+Physchim62&gt; some chemboxes cannot be self-consistent, because they include data for multiple compounds

11:45 &lt;walkerma&gt; But when they don't, we do what CHemSpiderMan suggested

11:46 &lt;walkerma&gt; One CAS to match the article name, another to match the structure drawing/InChI/SMILES/IUPACName

11:46 &lt;walkerma&gt; Made clear which is which

11:46 &lt;+Physchim62&gt; don't forget that the CAS number for the structure could (in principle) be included as metadata in the image

11:47 &lt;walkerma&gt; But the way we display it should be standardised

11:47 &lt;+Physchim62&gt; hard to do in practice, I'm well aware!

11:47 &lt;walkerma&gt; We read the metadata and display it as CAS for structure

11:48 &lt;+Physchim62&gt; why do we need to display it?

11:48 &lt;dmacks&gt; Yeah, /me was just about to say "more metadata that isn't derived from structure raises barrier to editors"

11:48 &lt;dmacks&gt; (metadata _in image file_)

11:48 &lt;walkerma&gt; Beetstra: Are you there?

11:48 &lt;+Physchim62&gt; we should be displaying the data which isn't immediately obvious, which in this case seems to be the /generic/ CAS number

11:49 &lt;walkerma&gt; But we don't want to show (say) a particular form of D-glucose in the diagram, then only display the CAS# for generic glucose

11:49 &lt;+Physchim62&gt; WP exists to impart useful information, not as an exercise in self-consistency: at least, it is more successful in impartint useful information than as an exercise in self-consistency ;)

11:50 &lt;+Physchim62&gt; why not? the article is about glucose, after all, not just about alpha-D-glucose

11:50 &lt;ChemSpiderMan&gt; and this is where we are back to a "decision" :-)

11:51 &lt;walkerma&gt; Because then the reader will think that alpha-D-glucose has the CAS # shown just below

11:51 &lt;ChemSpiderMan&gt; for glucose the structure shown should have no stereo...

11:51 &lt;ChemSpiderMan&gt; the particular forms of glucose can be shown as alpha and beta and they have their own CAS

11:51 &lt;ChemSpiderMan&gt; but that would be in the article

11:52 &lt;ChemSpiderMan&gt; not the Cbox

11:52 &lt;walkerma&gt; Or else in some cases we show more than one form, as at Carvone

11:52 &lt;walkerma&gt; http://en.wikipedia.org/wiki/Carvone

11:52 &lt;+Physchim62&gt; ChemSp^iderMan, showing glucose with no stereo only makes the problem six times bigger!

11:53 &lt;dmacks&gt; 2^6?

11:53 &lt;ChemSpiderMan&gt; Did you see the discussions that happened about glucose already?

11:53 &lt;+itub&gt; I think several cas numers in the infobox are ok

11:53 &lt;+itub&gt; if there are too many, add a table somewhere else

11:53 &lt;ChemSpiderMan&gt; MArtin...do you have that link?

11:53 &lt;walkerma&gt; http://en.wikipedia.org/wiki/User:Walkerma/Sandbox5#Comments_1

11:53 &lt;ChemSpiderMan&gt; thx

11:54 &lt;walkerma&gt; Actually, I was wondering how to fix that

11:54 &lt;+Physchim62&gt; dmacks, there are eight aldohexoses, each with two enantiomers and two ring forms

11:54 &lt;dmacks&gt; What if we have a chemsubbox, that contains just the isomer-specific or component-specific info for each thing in a less-than-specific chembox?

11:54 &lt;walkerma&gt; Glucose gets about a million hits a year, and we're giving them the name for galactose right now!

11:55 &lt;ChemSpiderMan&gt; gents...we can go into minute detail about the outliers now but I think we have about 95% of the articles without this problem

11:55 &lt;walkerma&gt; dmacks: As ever, you offer a very workable compromise

11:55 &lt;+Physchim62&gt; that was doubtlessly an OrgChem prof who wanted to check if his students were copying from WP!

11:55 &lt;dmacks&gt; *blush*

11:55 &lt;walkerma&gt; Actually, it was somoeone from [affiliation removed for privacy]

11:55 &lt;+itub&gt; but most aldohexoses are not glucose

11:56 &lt;ChemSpiderMan&gt; [name removed for privacy]

11:56 &lt;walkerma&gt; (Please rm that name from the log, though - I wanted to keep IDs of respondents confidential)

11:56 &lt;+Physchim62&gt; so, how are we going to work the validation process, assuming that we are going to have one for many, if not all, of our articles?

11:56 &lt;dmacks&gt; okay

11:57 &lt;ChemSpiderMan&gt; oops..sorry

11:57 -!- Beetstra [n=djbeetst@Wikimedia/Beetstra] has quit ["Bye Bye"]

11:58 * Physchim62 is going to have to disappear in a few minutes or so

11:58 &lt;walkerma&gt; I think the process should be (a) Mostly wait and see if ACS can help

11:59 &lt;walkerma&gt; (b) If they definitely can't we need to work through the lists

11:59 &lt;ChemSpiderMan&gt; I'll put the curation process on hold until they do ;-)

11:59 &lt;+Physchim62&gt; I'm fine with that if other people are

11:59 &lt;walkerma&gt; http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Chemistry/CAS_validation

11:59 &lt;walkerma&gt; We also need to see if we can get CAS_validated into ChemBox new

12:00 &lt;ChemSpiderMan&gt; You need a timeline...if ACS don't answer by XXX then we do YYYY

12:00 &lt;walkerma&gt; How about the end of the month?

12:00 &lt;ChemSpiderMan&gt; My question to them in November resulted in them asking me to meet them in New Orleans in April to discuss

12:00 &lt;+Physchim62&gt; I don't think that CAS validation is the most important thing for the project at the moment, although it is linked with other decisions that we need to take

12:00 &lt;ChemSpiderMan&gt; end of month is GOOD

12:01 &lt;walkerma&gt; Physchim62: Currently it is one of the weakest areas of our chemical substance pages

12:01 &lt;walkerma&gt; We have a lot of errors

12:01 &lt;+Physchim62&gt; bovine excrement!

12:01 &lt;+Physchim62&gt; IMHO

12:02 &lt;walkerma&gt; You mean that the severe lack of non-childish content is our most pressing problem?

12:02 &lt;+Physchim62&gt; it is an area where certain people have chosen to attack us, that is all!

12:02 &lt;ChemSpiderMan&gt; I can deal with the triples of strutcure representations, SMILES, InChIs and names in general. But I cannot deal with CAS in the validation

12:02 &lt;ChemSpiderMan&gt; Have people attacked about CAS?

12:03 &lt;walkerma&gt; Only occasionally

12:03 &lt;+Physchim62&gt; on many occasions!

12:03 &lt;ChemSpiderMan&gt; hmmm....

12:03 &lt;+Physchim62&gt; it's easy to pick up on, beacuse CAS numbers are so widely used

12:04 &lt;+Physchim62&gt; but if all you do is reply to attacks, you're never going to *construct* anything

12:04 &lt;dmacks&gt; Skool-kidz vandalism isn't any "badder" than typos and data taken from an incorrect MSDS though. It's the same validation problem.

12:04 &lt;walkerma&gt; Anyway, Physchim62: Can you, Andrew and Beetstra agree how to handle the generic CAS vs specific CAS in the ChemBox?

12:04 &lt;walkerma&gt; That way, if/when we hear back from CAS/ACS we are ready

12:05 &lt;walkerma&gt; Perhaps I should've said: generic CAS plus specific CAS

12:05 &lt;+Physchim62&gt; OK, as I see it, "specific CAS" must go *directly* below the structure: but that's eminently doable

12:05 &lt;walkerma&gt; Yes

12:06 &lt;walkerma&gt; Or we could use dmacks suggestion

12:06 &lt;+Physchim62&gt; I'll speak to Dirk and Andrew when I get a chance: they will no doubt read the logs or the summary anyway

12:07 &lt;walkerma&gt; One point on that: If I get something wrong/miss something in the summaries, please edit

12:07 &lt;+Physchim62&gt; dmacks suggestion is technically more difficult: it could be implemented for individual "problem cases" but I'm hesitant about making it a general standard

12:08 &lt;walkerma&gt; Fair point

12:08 &lt;dmacks&gt; Yeah, was only intended for cases where it was needed, not as a general-use thing.

12:09 &lt;walkerma&gt; Well, I'd like to draw things to a close. Is there anything else "burning" to discuss today?

12:09 &lt;+itub&gt; I was looking at http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Chemistry/CAS_validation and the links at the bottom ask me for a password

12:09 &lt;ChemSpiderMan&gt; Sorry...that's my fault!

12:10 &lt;ChemSpiderMan&gt; I check from within the domain...

12:10 &lt;ChemSpiderMan&gt; you shouldn't need one

12:10 &lt;ChemSpiderMan&gt; I can send the new path to Martin when resolved or just annotate the page....let me do that

12:10 &lt;ChemSpiderMan&gt; it's a redirect issue

12:11 &lt;ChemSpiderMan&gt; will get on it now

12:11 &lt;walkerma&gt; It's a ChemSpider password. I've had problems - ChemSpider wouldn't let me log on via that system for some reason

12:11 &lt;walkerma&gt; I was able to log on in other ways

12:11 &lt;ChemSpiderMan&gt; bottom line there are now 150 structures waiting for "validation:

12:11 &lt;walkerma&gt; That's why I haven't updated the main validation list

12:11 &lt;ChemSpiderMan&gt; ok...will fix

12:11 &lt;walkerma&gt; (the other list for structures/InChIs etc)

12:11 &lt;ChemSpiderMan&gt; we just shuffled domains and it's the problem

12:13 &lt;walkerma&gt; For next week: Physchim62 has proposed a very elegant way of putting all of our data into a nice PersonData-type database

12:13 &lt;walkerma&gt; Could we put that on the agenda?

12:13 &lt;walkerma&gt; So we can all tear his idea to shredsxxxxxxxxxxx come up with some helpful ideas?

12:14 &lt;+Physchim62&gt; :P

12:14 &lt;ChemSpiderMan&gt; not here next week Walkerma...will be travelling.

12:14 &lt;walkerma&gt; I happen to think it's a brilliant proposal that will help us move into the next level of trustworthiness & reliability

12:14 &lt;+Physchim62&gt; I will try to see what the IUBMB recommend for biochemical nomenclature

12:15 &lt;walkerma&gt; Other remaining issues include: Resolving how we present carbohydrates, ATP-type phosphates, amino acids, etc

12:15 &lt;+Physchim62&gt; "protonation issues"

12:15 &lt;walkerma&gt; Yes

12:16 &lt;dmacks&gt; yup

12:16 &lt;walkerma&gt; I'd like CHemSpiderMan to be around for that discussion - so should we hold off on the protonation issue till he returns?

12:16 &lt;+Physchim62&gt; we obviously need to discuss these, even if we cannot come to a quick and simple conclusion

12:16 &lt;+Physchim62&gt; OK

12:17 &lt;walkerma&gt; So should we discuss the PersonData proposal next week?

12:17 &lt;+Physchim62&gt; I propose "choice and indexing of identifiers"

12:17 &lt;+Physchim62&gt; as the agenda item

12:17 &lt;walkerma&gt; Good one - can you elaborate though

12:18 &lt;+Physchim62&gt; 1. which identifiers are the most important for us (already discussed, I realise)

12:19 &lt;+Physchim62&gt; 2. should we create indexes on these identifiers

12:19 &lt;+Physchim62&gt; 3. under what circumastances should we link out to external sites

12:20 &lt;ChemSpiderMan&gt; Wish I could be there for number 3.

12:20 &lt;+Physchim62&gt; ChemSpiderMan, don't worry, it's a perennial topic!

12:21 &lt;ChemSpiderMan&gt; no surprise!

12:21 &lt;walkerma&gt; Are there any other things we should consider covering next week?

12:21 &lt;walkerma&gt; If not, I'd like to see PC's suggestion as the main topic

12:22 &lt;walkerma&gt; It's perennial, but hopefully we can move things forward a bit

12:24 &lt;+Physchim62&gt; must dash, TTFN

12:24 &lt;walkerma&gt; OK, let's call it a day/night! PC: Can you hang on, I wanted to ask you a quick question about the GoldBook work

12:24 -!- Physchim62 is now known as PC62|away

12:24 &lt;+PC62|away&gt; OK, you just caught me

12:24 &lt;ChemSpiderMan&gt; I am talking to Fabienne Meyers about this...

12:25 &lt;walkerma&gt; I wanted to update the worklist, for things I've worked on, but it's very unclear. Can you put some instructions on wiki about how to do this

12:25 &lt;walkerma&gt; So it's clear where (as with anil) there is NOW a redirect from anil to Schiff base, there wasn't before - I think the work is done, now that the Schiff base article cites GoldBook

12:27 &lt;+PC62|away&gt; we should have a cite to the entry for "anil" as well, surely

12:27 &lt;+PC62|away&gt; anil is a IUPAC accepted term for a certain type of Schiff base

12:27 &lt;walkerma&gt; It's not really notable enough IMHO. Same with Crown (conformation) which is one conformation (not the major one) for eight membered rings

12:27 &lt;+PC62|away&gt; so long as the ref is in there, simply change the template to say Status=Y

12:28 &lt;walkerma&gt; Do I change anything else?

12:28 &lt;+PC62|away&gt; OK, I'll have a think about it. some of these terms really belong on Wiktionary rather than WP

12:28 &lt;+PC62|away&gt; no

12:28 &lt;walkerma&gt; OK, that's it! Thanks

12:29 &lt;ChemSpiderMan&gt; bye

12:29 &lt;+PC62|away&gt; when you think the links are OK, you change the status to Y

12:29 &lt;walkerma&gt; I'll "see" everyone next week? Except Antony..

12:29 &lt;+PC62|away&gt; that way, other's know not to worry about it

12:29 &lt;walkerma&gt; Thanks

12:29 -!- ChemSpiderMan [n=ChemSpid@c-68-33-151-242.hsd1.md.comcast.net] has quit []

12:29 &lt;dmacks&gt; I think I have a seminar, but will try to pop by depending on how long it goes.

12:29 &lt;+PC62|away&gt; connection willing, yes!

12:30 &lt;walkerma&gt; Bye!

12:30 -!- walkerma [n=chatzill@admin-151-108.potsdam.edu] has quit ["ChatZilla 0.9.80 [Firefox 2.0.0.11/2007112718]"]

12:30 &lt;+itub&gt; bye

12:30 -!- itub [n=tubert@lalo.chemie.unibas.ch] has left #wikichem []

--- Log closed Tue Feb 05 12:30:43 EST 2008