Wikipedia:WikiProject Chemistry/IRC discussions/15 Jan 2008

--- Log opened Tue Jan 15 11:04:03 EST 2008

11:04 &lt;+walkerma&gt; H2Cr=CrH2 !

11:04 &lt;+dmacks&gt; haha!

11:05 &lt;+dmacks&gt; I'd say it's maybe more likely they mean the benzopyran idea?

11:05 &lt;+ChemSpiderMan&gt; I'm loking at IUPAC rules ...B.3, R9.1

11:05 &lt;+ChemSpiderMan&gt; Llet me find you the link online

11:05 &lt;+Rifleman_82&gt; hahahaha

11:05 &lt;+Rifleman_82&gt; hi dmacks

11:05 &lt;+dmacks&gt; hello

11:05 &lt;+walkerma&gt; We're still waiting for Itub and Arcadian. Henry Rzepa can't come till later, if at all. I say we get started, OK?

11:06 &lt;+Rifleman_82&gt; itub is here i think...

11:06 &lt;+dmacks&gt; okidoke.

11:06 -!- mode/#wikichem [+o Rifleman_82] by ChanServ

11:06 &lt;+ChemSpiderMan&gt; fine with me...the link is here: http://www.acdlabs.com/iupac/nomenclature/93/r93_691.htm

11:06 &lt;+walkerma&gt; Shall I attempt to moderate?

11:06 &lt;@Rifleman_82&gt; please

11:06 &lt;+dmacks&gt; good idea.

11:06 -!- mode/#wikichem [-o Rifleman_82] by Rifleman_82

11:06 &lt;+ChemSpiderMan&gt; yes please

11:06 &lt;+kelson&gt; dmacks: hello

11:06 &lt;@Beetstra&gt; Good plan, Walkerma

11:06 &lt;+pm286&gt; yes please

11:06 &lt;+Rifleman_82&gt; invited itub... if he's not away i guess he'll be here

11:07 &lt;+walkerma&gt; OK, I'd like to ask ChemSpiderMan to explain what he's been doing, and where we stand with the chemicals list

11:07 &lt;+ChemSpiderMan&gt; How many people on this chat have had the chance to look at the PDF reports I've been generating..that will help

11:07 * dmacks has briefly

11:07 * Beetstra had a look .. briefly

11:08 * Rifleman_82 had a look at the old version

11:08 &lt;+ChemSpiderMan&gt; ok...what I've done is take the dumps provided by walkerma..

11:08 &lt;+ChemSpiderMan&gt; used them as the basis to review the data on WP

11:08 &lt;+CheMoBot&gt; user:Petermr has edited monitored page Wikipedia talk:WikiProject Chemicals - diff - (+131)- summary: /* IRC discussion on using Wikipedia chemistry pages to provide chemical data */

11:09 &lt;+ChemSpiderMan&gt; I imported the files into a software app andused a script to create a URL link to each record in WP from the title

11:09 &lt;+ChemSpiderMan&gt; I did some basic text searches to remove "stuff" (polite word used)

11:10 &lt;+ChemSpiderMan&gt; Then set to work to review the records "one-by-one"

11:10 &lt;+CheMoBot&gt; user:Petermr has edited monitored page Wikipedia talk:WikiProject Chemicals - diff - (+83)- summary: /* IRC discussion on using Wikipedia chemistry pages to provide chemical data */

11:10 &lt;@Beetstra&gt; CheMoBot quit

11:10 -!- CheMoBot [n=beetstra@69.37.50.156] has quit ["Mayday! Mayday! .. going down!"]

11:10 &lt;+dmacks&gt; Quick question: "dumps provided by walkerma" are...?

11:10 &lt;+ChemSpiderMan&gt; people still there?

11:10 &lt;+walkerma&gt; Yes

11:10 &lt;@Beetstra&gt; Yes, all here

11:11 &lt;+walkerma&gt; Except for the bot

11:11 &lt;+Rifleman_82&gt; haha okay

11:11 &lt;+Rifleman_82&gt; was gonna kick the bot

11:11 &lt;+ChemSpiderMan&gt; Its been eyes applied to records. I have checked structure drawings, systematic names, consistency of links to PubChem

11:12 &lt;+walkerma&gt; [The dumps are a list of all the chemicals and drugbox articles]

11:12 &lt;+ChemSpiderMan&gt; I am at letter "M" in the file (but already knocked out X,Y,Z) and "TRYING" to knock out three letters per day

11:12 &lt;+dmacks&gt; As determined by...transcusion of an infobox, manual additon to a cat?

11:13 &lt;+ChemSpiderMan&gt; I have access to Name to structure conversion software and name generation software...I managed Nomenclature software in my previous role

11:13 &lt;+ChemSpiderMan&gt; I have three software packages in my hands and I use all of them as necessary

11:13 &lt;+ChemSpiderMan&gt; The report I am generating for WIkipedia is one of many

11:14 &lt;+ChemSpiderMan&gt; I generate a report regarding errors in the software I am using and forward it to the development team(s)

11:14 &lt;+ChemSpiderMan&gt; I check consistency between emolecules, PubChem and ChemSpider

11:15 &lt;+ChemSpiderMan&gt; I am at the letter M and am down to 6700 records in the DB at present.

11:15 &lt;+Rifleman_82&gt; quick question

11:15 &lt;+Rifleman_82&gt; how do you do it ... so quickly?

11:15 &lt;+ChemSpiderMan&gt; I foresee that by the time I finish I will be down to about 5500-6000 total

11:15 -!- itub [n=tubert@lalo.chemie.unibas.ch] has joined #wikichem

11:15 -!- mode/#wikichem [+v itub] by ChanServ

11:15 &lt;+Rifleman_82&gt; do you have something to automate the process or is it a slow manual comparison?

11:15 &lt;+Rifleman_82&gt; hi itub

11:15 &lt;+itub&gt; hi

11:15 &lt;+itub&gt; lots of people today!

11:15 &lt;+ChemSpiderMan&gt; Quickly...interesting question. It's called "Xmas vacation" really...working until 2am most nights when my family went to bed

11:16 &lt;+Rifleman_82&gt; your dedication is impressive!

11:16 &lt;+ChemSpiderMan&gt; Some things are fast...simple structures are easy. The process....

11:16 &lt;+walkerma&gt; He's also REALLY fast at using the software and eyeballing structures

11:16 &lt;+ChemSpiderMan&gt; My lunacy irritates my wife...

11:17 * ChemSpiderMan ChemSpiderman reflects on the "rants" when cold feet crawl into bed...

11:17 &lt;+ChemSpiderMan&gt; So, the process.

11:17 &lt;+ChemSpiderMan&gt; Open the record in the desktop software and click the link to open up the WP page

11:18 -!- pm286 is now known as petermr

11:18 &lt;+ChemSpiderMan&gt; Two monitors or split screen on laptop while watching cheesy movies

11:18 &lt;+ChemSpiderMan&gt; since the title of the article in general is the name of the compound click one button in the desktop software to generate the molecule for comparison

11:19 &lt;+ChemSpiderMan&gt; (I could do this in batch mode for the whole file but don't have that software)

11:19 &lt;+ChemSpiderMan&gt; Now...eyeball for differences. Then, see if the systematic name on WP will convert to the structure on WP

11:20 &lt;+ChemSpiderMan&gt; Validates the systematic name very well...caught many errors...

11:20 &lt;+Rifleman_82&gt; your software can generate structures, given a systematic name?

11:20 &lt;+ChemSpiderMan&gt; Validates "is the name consistent with the structure" NOT that the structure is the right structure

11:21 &lt;@Beetstra&gt; You check the structure by eye, I suppose .. do you catch wrong ones in that way?

11:21 &lt;+ChemSpiderMan&gt; Validate the CSA number as much as I can...I don't have access to any CAS databases and this is why I aask for people to help me validate these

11:21 &lt;@Beetstra&gt; (I know that you can't possibly know all molecules)

11:21 &lt;+ChemSpiderMan&gt; Oh yes...very easy to catch differences by eye

11:22 &lt;+Rifleman_82&gt; you don't trust emolecules?

11:22 &lt;+ChemSpiderMan&gt; Rifleman : yes, the software can generate a structure given a systematic name

11:22 &lt;+Rifleman_82&gt; (re: cas database)

11:22 &lt;+ChemSpiderMan&gt; as well as trivial names

11:22 &lt;+Rifleman_82&gt; is that the paid version of chemsketch 10?

11:22 &lt;@Beetstra&gt; CAS is difficult, there are cases where the different databases have different molecules

11:23 &lt;+ChemSpiderMan&gt; Also, since I have access to systematic names and trivial names on ChemSpider I validate there too

11:23 &lt;@Beetstra&gt; I recall Physchim62 ranting about that

11:23 &lt;+Rifleman_82&gt; Beetstra: yeah, i get that sometimes

11:23 &lt;+ChemSpiderMan&gt; The paid version of ChemSketch 10 (actually 11 now) only has limited capabilities...it does NOT convert names to structures

11:23 &lt;+Rifleman_82&gt; emolecules may give fragments for the same cas...

11:23 &lt;@Beetstra&gt; I have had problems with pubchem/emolecules .. not sure sometimes what is what

11:24 &lt;+ChemSpiderMan&gt; One comment here...I suggest NOT validating CAS against eMolecules, PubChem or Chemspider

11:24 &lt;+Rifleman_82&gt; they're no better than us?

11:24 &lt;+ChemSpiderMan&gt; ALl of us have the same problems.

11:24 &lt;+ChemSpiderMan&gt; The CAS numbers come from depositors

11:24 &lt;@Beetstra&gt; Well, CAS should be checked against CAS.org .. the problem there is .. they want money if you want more than .. say .. 20

11:24 &lt;+ChemSpiderMan&gt; They are not careful with the CAS numbers being correct

11:25 &lt;+ChemSpiderMan&gt; Yes...I am afraid you are right

11:25 &lt;+walkerma&gt; I could talk to ACS about this if you like

11:25 &lt;@Beetstra&gt; OK, what about InChI and SMILES?

11:25 &lt;+ChemSpiderMan&gt; But I would rather that the CSA numbers were not hyperlinked to any external dbs

11:25 &lt;+dmacks&gt; Are the CAS-number mistakes simple typos, or "wrong compound" entirely?

11:25 &lt;+ChemSpiderMan&gt; wrong compound

11:25 &lt;+ChemSpiderMan&gt; or multiple compounds

11:25 &lt;+dmacks&gt; 'k

11:26 &lt;+Rifleman_82&gt; dmacks: the way we find cas numbers for compounds... such mistakesshouidn't be surprising

11:26 &lt;+ChemSpiderMan&gt; A CAS number associated with a neutral is different than a CAS number for a salt

11:26 &lt;+dmacks&gt; yeah

11:26 &lt;+Rifleman_82&gt; some sort of checksum?

11:26 &lt;+Rifleman_82&gt; i mean not checksum but numbering convention?

11:26 &lt;+ChemSpiderMan&gt; it is common to have confusion within the article

11:26 &lt;+ChemSpiderMan&gt; The article name can be for a salt

11:26 &lt;+ChemSpiderMan&gt; The structure is the neutral

11:26 &lt;+ChemSpiderMan&gt; The name is the salt

11:27 &lt;+Rifleman_82&gt; happens for drugs very often... where you have the valerate etc

11:27 &lt;+ChemSpiderMan&gt; The CAS number is the neutral

11:27 &lt;+ChemSpiderMan&gt; The SMILES is for "that girl in the corner smiling at me"

11:27 &lt;+Rifleman_82&gt; haha

11:27 &lt;+Rifleman_82&gt; ok

11:27 &lt;+ChemSpiderMan&gt; Okay..InChIs

11:27 &lt;+ChemSpiderMan&gt; My opinion is take them off for now

11:27 &lt;+ChemSpiderMan&gt; Here's why...

11:28 &lt;@Beetstra&gt; they are ugly?

11:28 &lt;+ChemSpiderMan&gt; Everyone I have seen (NOT many) is "broken" with a line break

11:28 &lt;+Rifleman_82&gt; we can leave them in but disable the field for the moment

11:28 &lt;+dmacks&gt; They make screen layout a mess?

11:28 &lt;+ChemSpiderMan&gt; Same issue on PubChem..

11:28 * dmacks likes that, Rifleman_82

11:28 &lt;+ChemSpiderMan&gt; When an InChI on Pubchem needs to be converted you have to copy, remove spaces etc

11:29 &lt;+ChemSpiderMan&gt; I say wait for InChI KEys..

11:29 &lt;+ChemSpiderMan&gt; And here's my commitment to this team...

11:29 &lt;+Rifleman_82&gt; what about doing away with gifs and pngs and using some sort of mol file, with the molecular formula, smiles, etc. generated on the fly?

11:29 &lt;+ChemSpiderMan&gt; When the file is "finished" then I will return the following associated with each structure

11:29 &lt;+Rifleman_82&gt; not aware of what is technically available here and now but... i think it'll make sense

11:29 &lt;+itub&gt; on the fly generation can be ugly

11:29 &lt;+ChemSpiderMan&gt; If they can be generated....

11:30 &lt;+Rifleman_82&gt; generation of smiles and inchi?

11:30 &lt;+ChemSpiderMan&gt; Hold on...let me get to your suggestion...

11:30 &lt;+itub&gt; no, the figures

11:30 &lt;@Beetstra&gt; JMOL is busy with that .. but still ..

11:30 &lt;+itub&gt; generating the smiles and inchi is certainly possible

11:30 &lt;+Rifleman_82&gt; figures can be rotatable... ball stick wireframe etc

11:30 &lt;+Rifleman_82&gt; i'll let ChemSpiderMan get tto his point

11:30 &lt;+itub&gt; I think the inchi could be included but hidden in a way that people who want it can acess it

11:31 &lt;+itub&gt; maybe with a [show] button

11:31 &lt;+ChemSpiderMan&gt; So, each structure will have: 1) structure 2) InChIKey 3) IUPAC Name 4)SMILES AND...

11:31 &lt;+ChemSpiderMan&gt; I already worked with Walkerma to embed InChIs into structure images (png)...

11:31 &lt;+ChemSpiderMan&gt; I know that InChIkeys, InChIStrings, SMILES, molfile can ALL be membedded into images

11:32 &lt;+ChemSpiderMan&gt; embedded

11:32 &lt;+ChemSpiderMan&gt; the question is about searching them....

11:32 &lt;+ChemSpiderMan&gt; Walkerma can point you to histroicaldiscussions about this..

11:32 &lt;+Rifleman_82&gt; too tedious to put them in the image description page on wiki?

11:32 &lt;+walkerma&gt; We need to make sure that the embedded info gets into the article page rather than the image page

11:33 &lt;+walkerma&gt; So a Google search finds the article

11:33 &lt;+ChemSpiderMan&gt; If I return an SDF file to you, or some other agreed upon format then you could "bot" them into fields

11:33 &lt;+walkerma&gt; ChemSpiderMan, I think this is what we need to do for now

11:33 &lt;+ChemSpiderMan&gt; I AGREE

11:33 &lt;+walkerma&gt; Can we write such a bot?

11:33 &lt;+dmacks&gt; Wiki question: Can arbitrary pages be transcluded, or only pages in the Template: namespace?

11:34 &lt;@Beetstra&gt; Everything can be transcluded

11:34 &lt;+Rifleman_82&gt; arbitrary pages can be transcluded

11:34 &lt;+Rifleman_82&gt; mainspace pages can be transcluded with a :sodium chloride

11:34 &lt;+Rifleman_82&gt; for example

11:34 &lt;+Rifleman_82&gt; you use the ":"

11:34 * ChemSpiderMan going quiet now until there are questions

11:35 &lt;+dmacks&gt; So the embedded image data could be extracted onto the image talk page, and then that talk page be transcluded into the infobox.

11:35 &lt;+walkerma&gt; Sounds interesting!

11:35 &lt;+Rifleman_82&gt; but...

11:35 &lt;+Rifleman_82&gt; you'll want all this to be commented out?

11:35 &lt;+dmacks&gt; &lt;noinclude&gt;

11:35 &lt;@Beetstra&gt; That would contain to much information

11:35 &lt;+ChemSpiderMan&gt; The data can be in the image (but there is development work to be done there) but the SDF file is an SDF file with the data available now..

11:36 &lt;+ChemSpiderMan&gt; You can 'bot in what you need

11:36 &lt;+Rifleman_82&gt; so google can find them but you don't see them? &lt;noinclude&gt; will not be transcluded

11:36 &lt;+dmacks&gt; Er, &lt;includeonly&gt;

11:36 &lt;+Rifleman_82&gt; i think it'll display ... or you can hide them in &lt;!-- --&gt;

11:36 &lt;+walkerma&gt; Can we write a bot to handle the SDF file?

11:37 &lt;+Rifleman_82&gt; but then each page will balloon... and there might be some issues about transclusion and server load?

11:37 &lt;+ChemSpiderMan&gt; There needs to be a way to generate the images in batch mode from the SDF file to generate images for the whole database for you to deposit with the 'bot

11:37 &lt;+ChemSpiderMan&gt; NOT necessarily a WP problem...

11:37 &lt;@Beetstra&gt; What about putting the full identifiers on &lt;pagename&gt;/Data_page for ALL compounds .. ??

11:38 &lt;+Rifleman_82&gt; ' /datapage is seen as a new page, not a sub page

11:38 &lt;@Beetstra&gt; There the width of a box is not a problem

11:38 * dmacks was just about to ask that as a more general philosophica qurstion: what's the feel in WP for using complex inclusions? I dislike the idea of having to bot and re-hard-code the same info in multiple places vs having all data somewhere and extracting as needed.

11:38 &lt;+Rifleman_82&gt; and the use of this "trick" of transclusion is frowned upon

11:39 &lt;+dmacks&gt; Rifleman_82: the whole template: namespace is shifting towards doing this though (the /doc game) I think.

11:39 &lt;+walkerma&gt; [ChemSpiderMan: I think we'll have to let this discussion happen - it is of importance within WP, how we get data into the chembox]

11:39 &lt;@Beetstra&gt; We don't have to transclude the datapage .. google finds the datapage, which is one link away from the real page

11:39 &lt;+Rifleman_82&gt; dmacks: yes, i'm aware... but still frowned upon in the mainspace

11:40 &lt;+Rifleman_82&gt; that's the impression i got from #wikipedia-en-admins after i tried something funny

11:40 &lt;+dmacks&gt; Gotcha.

11:40 &lt;@Beetstra&gt; The ugly InChI is a problem .. we could add a field 'RealInChI', which is not displayed, but is the one that is correct (using the original InChI as the display one)

11:40 &lt;@Beetstra&gt; That is easy to code into chembox new

11:40 &lt;+Rifleman_82&gt; dmacks: check the history of Rules of basketball

11:41 &lt;+walkerma&gt; Look at http://en.wikipedia.org/wiki/Tributylphosphine#External_links the best InChI system we have now

11:41 &lt;+dmacks&gt; So the data page would be the "actual" data, and then bot would copy certain fields into the main-page?

11:41 &lt;+dmacks&gt; Rifleman_82: Ah yeah, I vaguely remember that.

11:42 &lt;+walkerma&gt; Beetstra: That RealInChI is a good one

11:42 &lt;+Rifleman_82&gt; having a bot copy it over seems... a bit repetitive... it'd be nice if everything from mw to inchi and systematic name can be generated from the molecule itself

11:43 &lt;+ChemSpiderMan&gt; It can

11:43 &lt;+ChemSpiderMan&gt; It will include formula and Mw

11:43 &lt;+dmacks&gt; So "the structure" is the primary piece of data?

11:43 &lt;+ChemSpiderMan&gt; All will be generated automatically and part of the file...can include monoisotopic masses for MS people if you want them

11:44 &lt;+ChemSpiderMan&gt; Yes...the structure is the primary piece of data from me...but the name is the primary piece of data for the article

11:44 &lt;+walkerma&gt; MS - On the data page

11:44 &lt;@Beetstra&gt; oops

11:45 &lt;+dmacks&gt; So the data page would have lots of data auto- (or bot-) generated from the image?

11:45 &lt;+ChemSpiderMan&gt; No, from the structure...the connection table

11:45 * Beetstra should use preview

11:45 &lt;+dmacks&gt; Yeah, "generated" including "extracted from image embedded data"

11:45 &lt;+Rifleman_82&gt; does this mean we're importing someone's database wholesale

11:45 &lt;+Rifleman_82&gt; ?

11:45 &lt;+ChemSpiderMan&gt; Yes...the one I am building

11:46 &lt;+Rifleman_82&gt; ah okay

11:46 &lt;+walkerma&gt; We're reimporting our own database!

11:46 &lt;+Rifleman_82&gt; any copyright issues there?

11:46 * kelson think about http://de.wikipedia.org/wiki/Vorlage:Personendaten

11:46 &lt;+ChemSpiderMan&gt; What I recieved from WP was a list of article names

11:46 &lt;+Rifleman_82&gt; we're exporting our database, tidying it up, reimporting them?

11:46 &lt;+ChemSpiderMan&gt; "kind of"

11:47 &lt;+ChemSpiderMan&gt; What you've handed me is a list of article names

11:47 &lt;+Rifleman_82&gt; how about maintainability? the next time we want to validate them, do we have to go through them all again? or can we flag those without changes? or check only changed parameters?

11:47 &lt;+Rifleman_82&gt; and that kinda means data is only as valid as the last check/

11:48 &lt;+ChemSpiderMan&gt; What I'm doing is handing back structures, intrinsic properties, "algorithmically-generated" names, InChIs, SMILES etc

11:48 &lt;+dmacks&gt; Having the data be on a separate page would help with this kind of maintainability (eliminates article text changes, so monitoring "changes to the page" clearly means the data has changed.

11:48 &lt;@Beetstra&gt; Regarding the RealInChI, see the code of http://en.wikipedia.org/wiki/Tributylphosphine

11:49 &lt;+Rifleman_82&gt; dmacks: will there be a greater barrier to the lay user, to editing and filling out the chembox?

11:49 &lt;+ChemSpiderMan&gt; If the structure is correct and the algorithms to add atomic weights isn't broken, and inChI generation doesn't change and,and,and then it won't need changing. Never say never, but, in my opinion, a big step ahead

11:50 &lt;+ChemSpiderMan&gt; What you shouldn't allow, in my opinion, is the editing of intrinsic properties extracted from the molecule that is displayed...

11:50 &lt;+dmacks&gt; Rifleman_82: could have an "edit" button on the infobox that goes to the data-page (like many navboxes have)

11:50 &lt;+ChemSpiderMan&gt; I saw examples where the molecule drawn didn't maytch the mass or the formula

11:50 &lt;+Rifleman_82&gt; Beetstra: that's nice... but looks timeconsuming unless you're going to get a bot to arbitrarily add linebreaks after a few chars

11:50 &lt;+Rifleman_82&gt; dmacks: fair enough

11:51 &lt;+ChemSpiderMan&gt; the MF and Mw are derived FROM the structure, not independent of it

11:51 &lt;+walkerma&gt; I think we should have another IRC meeting once we have the complete SDF to look at, would that be OK?

11:52 &lt;+walkerma&gt; I think I'd like to move on to ask Peter about his work.

11:52 &lt;+walkerma&gt; And his plans

11:52 &lt;@Beetstra&gt; Mw can be generated automatically in chembox new, use 'C = 1 | H = 4' i.s.o. 'Formula = CH&lt;sub&gt;4&lt;/sub&gt;'

11:52 &lt;+petermr&gt; hi - reay when you are...

11:52 &lt;+ChemSpiderMan&gt; BTW...look at triphenylphosphine and the IUPAC name...

11:52 &lt;+ChemSpiderMan&gt; sorry tributylphosphine..

11:52 &lt;@Beetstra&gt; err .. and its molecular weight, which is off ..

11:53 &lt;+petermr&gt; ok - do you want me to strat?

11:53 &lt;+petermr&gt; == start

11:53 &lt;+walkerma&gt; Would that be OK?

11:53 &lt;+Rifleman_82&gt; please go ahead peter

11:53 &lt;+ChemSpiderMan&gt; yes

11:53 &lt;@Beetstra&gt; Yes Peter, please

11:53 &lt;+Rifleman_82&gt; we'll sort out the finer points another time and place

11:54 &lt;+Rifleman_82&gt; i mean of the last thread

11:54 &lt;+petermr&gt; thanks to all of you for the work you have put in

11:54 &lt;+petermr&gt; there are 2-3 issues. One, which I mention initially is that we plan to apply for short-term funding to develop a RDF-based version of the WPChem to support repositories

11:55 &lt;+petermr&gt; I mentioned this to walkerma in mail.

11:55 &lt;+petermr&gt; will have to get the grant in soonish - all I'm doing here is flag it up

11:56 &lt;+petermr&gt; I see WPChem as a semantic resource which will be used in conjunction with chemical literature and that is the main theme of the grant

11:56 &lt;+petermr&gt; the second issue is related to what you have been talking about - how should the data be maintained.

11:56 &lt;+petermr&gt; Sorry for typos

11:57 &lt;+petermr&gt; I think there has to be a mechanism whereby the data can be checked and edited by bots and passed back into WP

11:57 &lt;+petermr&gt; That's why I was asking walkerma about BNF (a formal spec for the info boxes)

11:57 &lt;+petermr&gt; please interrupt if this doesn't make sense

11:57 &lt;+ChemSpiderMan&gt; AGreed...for example, with the new InChI generation will come new InChIkeys

11:57 &lt;+dmacks&gt; Wouldn't be hard to write the grammar for it.

11:58 &lt;+petermr&gt; good. I have found ca 4 different info bixes for chemistry and I suspect there are some more. Is there a clear policy on what is currently supported and whether the old ones will be refactored?

11:58 &lt;+ChemSpiderMan&gt; Also, new nomenclature rules change names etc. So, the structure will need to be available to WP to boss to bots.

11:58 &lt;+petermr&gt; the vision is that a bot can extract all the data from the box and do its own thing witn it

11:59 &lt;+ChemSpiderMan&gt; and pass back and repopulate?

11:59 &lt;+Rifleman_82&gt; petermr: there will be only one infobox very soon

11:59 &lt;+Rifleman_82&gt; we're clearing the remnants... about 80 or so left

11:59 &lt;+Rifleman_82&gt; one infobox for chemicals

11:59 &lt;+Rifleman_82&gt; one infobox for elements

12:00 &lt;+petermr&gt; Then there is the questions of the actual way information is entered. I downloaded all the extant pages (through the list of chemicals )

12:00 &lt;+Rifleman_82&gt; one infobox for drugs which i hope will be subsumed into the chemicals box at a later date

12:00 &lt;@Beetstra&gt; Rifleman_82, wait, 2 for chemicals, chembox new and drugbox

12:00 &lt;+Rifleman_82&gt; that's about it

12:00 &lt;@Beetstra&gt; OK

12:00 &lt;@Beetstra&gt; :-)

12:00 &lt;+petermr&gt; in XML... And although I can parse the XML the data in the infobox were rather variable

12:00 &lt;+petermr&gt; OK I will go with just chembox new

12:00 &lt;+walkerma&gt; [Rifleman has run a bot through the remaining articles recently to update all the chemboxes]

12:01 &lt;+petermr&gt; the main probelsm with the data were (from memory) character encodings (can be awful), lack of consistency in units, difficulty of parsing annotations in values (e.g. 200 (decomposes))

12:01 &lt;+dmacks&gt; Would be good to sanitize the data somehow, both for display/MOS/etc purposes and to make it easier to parse.

12:01 &lt;+Rifleman_82&gt; my search doesn't catch those weird ... manual html boxes, nor manual tables using wikisyntax though

12:02 &lt;+petermr&gt; it would b e nice to have slightly more structure

12:02 &lt;+petermr&gt; (not chemical structure)

12:02 &lt;+dmacks&gt; :)

12:02 &lt;+walkerma&gt; Structure? In Wikipedia?

12:02 &lt;+Rifleman_82&gt; petermr: look at drugbox. they've had a bit more success at it

12:02 &lt;+Rifleman_82&gt; boiling_high and boiling_low to give a boiling point range for example

12:02 &lt;+petermr&gt; structure in the infobox.

12:02 &lt;+Rifleman_82&gt; where's physchim? not coming?

12:02 &lt;+petermr&gt; :agree with rfileman

12:03 &lt;+walkerma&gt; No, his (PCs) internet is down I think

12:03 &lt;+petermr&gt; (how do I enter a comment)

12:03 &lt;+dmacks&gt; /me

12:03 &lt;+dmacks&gt; "/me agrees with rfileman"

12:03 &lt;@Beetstra&gt; technically, for boilingpoints we have BoilingPt and Boiling_Notes .. but not used consistently

12:03 &lt;+petermr&gt; OK - anyway this is a first pass - I shall work with what I have got - ca 1000 pages

12:04 &lt;@Beetstra&gt; And advanced, even BoilingPtC, BoilingPtK and BoilingPtF

12:04 &lt;+Rifleman_82&gt; sheesh

12:04 &lt;+Rifleman_82&gt; no rankine?

12:04 &lt;+petermr&gt; I suggest separating units from quantities. Is there any reason to HOLD quantites in different units (display is a different question)

12:05 &lt;+petermr&gt; it leads to errors and lack of consistency

12:05 &lt;+dmacks&gt; Back to personal pet peeve: duplication of info. Those are not independent values, so why entering them distinctly instead of generating all from one?

12:05 &lt;+dmacks&gt; Yeah, what petermr said.

12:05 &lt;+petermr&gt; exactly. then someone alters one MPt and fails to edit the other

12:05 &lt;+ChemSpiderMan&gt; yes...agreed...it's all conversions

12:05 &lt;@Beetstra&gt; No, there is no reason, except for a historical one (it needed to be done that way in the old chembox)

12:05 &lt;+Rifleman_82&gt; so why don't we just use degrees C? that's by far the most common

12:05 &lt;+Rifleman_82&gt; and generate from there?

12:05 &lt;+Rifleman_82&gt; but then we get problems with significant figures

12:06 &lt;+petermr&gt; being dictatorial suggest SI units for all basic info unless agreed exceptions

12:06 &lt;@Beetstra&gt; That is what BoilingPtC does :-p

12:06 &lt;+Rifleman_82&gt; agree with petermr

12:06 &lt;+Rifleman_82&gt; petermr: another form of inconsistency is in densities

12:06 &lt;+Rifleman_82&gt; some in g/mL (liquids), solids in g/cm^3

12:06 -!- kelson [n=Kelson@fw-vianetworks.minick.ch] has quit [Read error: 104 (Connection reset by peer)]

12:06 &lt;+petermr&gt; I can see that it may be useful to report the original reported values asis but this is rather special

12:06 &lt;+Rifleman_82&gt; some quoted in kg/m^3, which seems to be preferred for gases

12:06 &lt;+dmacks&gt; Regarding data & conversions, drugbox already does this.

12:07 &lt;+Rifleman_82&gt; and in chemicals produced in high volume

12:07 &lt;+petermr&gt; anyway I am sure that you will eveolve a consist mechanism

12:07 &lt;+petermr&gt; an intermediate goal is then to create a RDF version of WPchem which can be quried using SPRARQL

12:07 &lt;+Rifleman_82&gt; we can just standardize with g/cm^3 since it works equally well for solids, liquids, and gases? the recent debate about cm^3 and mL notwithstanding

12:08 &lt;@Beetstra&gt; For boiling/melting points it is not too hard, kill the unit and use the advanced correct field

12:08 &lt;+petermr&gt; ==SPARQL

12:08 &lt;+petermr&gt; like DBPedia

12:08 &lt;+petermr&gt; here you can ask natural queries without needing a database

12:08 &lt;+petermr&gt; WP becomes the database

12:09 &lt;+petermr&gt; it works very well for smallish numbers - e.g. low thousands of entries is not aproblem

12:09 &lt;+petermr&gt; but can access any of the indexed fields

12:09 &lt;+Rifleman_82&gt; we have 3000 and change of chemicals in wp, not counting drugbox

12:10 &lt;+petermr&gt; so goal is to create WPChem in RDF. Example of single compound is shown on Wikipage

12:10 &lt;+petermr&gt; OK - I will hack drug box now I know it works

12:10 &lt;+petermr&gt; each compound is a node in RDF graph and has names, properties, formula, etc hanging off it

12:11 &lt;+dmacks&gt; (which wikipage?)

12:11 &lt;+petermr&gt; created by: XML=&gt; box =&gt; CML =&gt; RDF

12:11 * petermr goes to look

12:12 &lt;+petermr&gt; on []

12:12 &lt;+petermr&gt; there is a link to my blog entry which has a RDF graph of a WP entry

12:13 &lt;+petermr&gt; has ca. 20 nodes

12:13 &lt;+petermr&gt; We have slightly modified the structure since

12:13 &lt;+petermr&gt; each arc is a URI in a CML dictionary entry

12:14 &lt;+petermr&gt; so all properties are semantic

12:14 &lt;+ChemSpiderMan&gt; Peter...I don't see any graph on that page

12:14 &lt;+petermr&gt; of course the dictionary entries could be derived from WP and this is the best way where they exist

12:14 &lt;+ChemSpiderMan&gt; This is the Microsft eChistry page?

12:15 &lt;+ChemSpiderMan&gt; excuse typos

12:15 &lt;+dmacks&gt; Ah found it: http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=890

12:15 &lt;+petermr&gt; The graph is a topological graph

12:15 &lt;+ChemSpiderMan&gt; thx

12:15 &lt;+petermr&gt; it consists of nodes linked by arcs

12:16 &lt;+petermr&gt; each arc is subject-&gt;predicate-&gt;object

12:16 &lt;+petermr&gt; water-&gt;hasBoilingPoint-&gt;373

12:16 &lt;+petermr&gt; and hasBolingPoint-&gt;hasUnits-&gt;K

12:17 &lt;+petermr&gt; or something similar

12:17 &lt;+petermr&gt; in that way all the information can be recorded

12:17 &lt;@Beetstra&gt; Pooh .. yes, that is also a way ..

12:17 &lt;+petermr&gt; RDF is now accepted as the approparite way to manage Web 2.0 things

12:17 &lt;+Rifleman_82&gt; what's CML? is it an acceptable format ?

12:18 &lt;+petermr&gt; It's accepted by the ACS, RoyalSocChemistry, NIH

12:18 &lt;@Beetstra&gt; So we need a BoilingPt and a BoilingPtUnit .. not like what it is now in e.g. Tributylphosphine (just changed it to BoilingPtK)

12:18 &lt;+petermr&gt; It's emitted by ChemDraw, ACD

12:18 &lt;+petermr&gt; see [Chemical_Markup_Language]]

12:19 &lt;+petermr&gt; something like that

12:19 &lt;+ChemSpiderMan&gt; ACD/labs CML suppotr has never been validated

12:19 &lt;+ChemSpiderMan&gt; not sure about CHemDraw

12:19 &lt;+dmacks&gt; Beetstra: From a maintainability POV, having the data "actually" entered in C seems preferable?

12:20 &lt;+petermr&gt; let's not get into the validation discussion. No formats are validated

12:20 &lt;+Rifleman_82&gt; so imperial college is handling the development of this format?

12:20 &lt;@Beetstra&gt; If we do that, dmacks, then we don't need a unit field ..

12:20 &lt;+petermr&gt; it's not critical at this stage.

12:21 &lt;@Beetstra&gt; i programmed all three (C, K and F) for ease of conversion, all three were used

12:21 &lt;+dmacks&gt; Ah, gotcha Beetstra

12:21 &lt;+petermr&gt; there are complex properties that require structure such as BPt at given Pressure

12:21 * petermr asks walkerma how we are for time

12:21 * Beetstra starts to think about going home ..

12:21 &lt;+Rifleman_82&gt; we might's well do with one... much more convenient; we can ignore bpt at reduced pressure? just quote atmospheric

12:22 &lt;+petermr&gt; many compounds decompose before atmospheric

12:22 &lt;@Beetstra&gt; Some boiling points are simply not known at atmospheric .. or may not exist

12:22 &lt;+walkerma&gt; I think we're OK, if people can hang on. I'd like to ask Petermr what he needs us to do

12:22 &lt;+dmacks&gt; (/me okay with going longer)

12:22 &lt;+walkerma&gt; I'd also like him to say what he plans to do with the data collection

12:22 * ChemSpiderMan okay with time

12:22 &lt;+petermr&gt; OK. First I am clear that the boxes are being cleaned up.

12:23 &lt;+petermr&gt; that's very valuable.

12:23 &lt;+walkerma&gt; But let's try and finish fairly soon

12:23 &lt;+petermr&gt; it would be useful to have agreement on how information can be extracted to and written from boxes automatically

12:23 * Rifleman_82 notices that it's 1+ am here

12:24 &lt;+petermr&gt; then we can write software that is maintainable

12:24 &lt;@Beetstra&gt; Petermr, you need a bot-account for that ..

12:24 &lt;+Rifleman_82&gt; beetstra's your man for awb scripts... he can hand it to me and i can run it on chem-awb

12:24 &lt;+petermr&gt; this also allows machine validation of consistency

12:24 &lt;+Rifleman_82&gt; or anyone else can get a bot account, it's not a big deal

12:24 &lt;@Beetstra&gt; AWB is limited .. you need 'real' programming languages for this

12:24 &lt;+petermr&gt; what is AWB

12:24 &lt;+walkerma&gt; http://en.wikipedia.org/wiki/Wikipedia:AutoWikiBrowser

12:24 &lt;+Rifleman_82&gt; WP:AWB

12:25 &lt;+petermr&gt; I agree we need real languages. I have a fairly good parser for correct chembox

12:25 &lt;@Beetstra&gt; AWB is autowikibrowser, script editor, but mainly for find-and-replace .. which you can make as complex as you want

12:25 &lt;@Beetstra&gt; using regex ..

12:25 &lt;+Rifleman_82&gt; okay, then the next step will be fore petermr to get a bot account. you can see WP:BAG

12:25 &lt;+petermr&gt; Next.. character encodings - can we find a way of using only ASCII. Not easy, but characters really foul up machines

12:26 &lt;@Beetstra&gt; But if you need data from external databases it is better to write it properly in perl/python/c/etc

12:26 &lt;+Rifleman_82&gt; what sort of characters screw things up?

12:26 &lt;+Rifleman_82&gt; &middot; &degree; ?

12:26 &lt;+petermr&gt; At present I am working with what I have already got and will return if I need to work regularly with a bot

12:26 &lt;+dmacks&gt; Weak objection petermr: WP is primarily a display engine, so it needs to display well, then it's up to our parsers to read that.

12:26 &lt;+petermr&gt; yes, those can be problems. It's worse when people paste in higher encodings from word

12:27 &lt;+petermr&gt; this is the classic differentce between semantics and display. Maybe both are required. Maybe a adatbase offline which is transformed

12:27 &lt;+dmacks&gt; "It's all just data", so easy enough to regex replace them for internal processing in [whatever you're doing].

12:27 &lt;+petermr&gt; it will also make your display more consistent anyway

12:28 &lt;+petermr&gt; then I wanted to apply for the grant and will simply say something like:

12:29 &lt;+petermr&gt; I have discussed the use of WP-Chem in RDF with the WPedians and there are aware of what we plan to do.

12:29 &lt;+petermr&gt; (You may later add positive words if you feel this merits it)

12:30 &lt;+petermr&gt; the main thrust to to create a chemical ontology which supports publications such as theses and papers

12:30 &lt;+Rifleman_82&gt; to make wp the definitive source for chemical information?

12:30 &lt;+walkerma&gt; So chem articles will link to our pages?

12:30 &lt;+petermr&gt; I have also mentioned the value of RDF and alerted you to the likelihood that I will be using a version of WPChem

12:31 &lt;+petermr&gt; Yes to both of you

12:31 &lt;+Rifleman_82&gt; no need to link... can search?

12:31 &lt;+Rifleman_82&gt; or you're going to start assigning wikichemids? ;)

12:31 &lt;+petermr&gt; I have told the world that I see WP becoming the primary reference source for (elementary) chemistry

12:31 &lt;+walkerma&gt; Rifleman82: You could be reading a Chem Communications article, see a blue link, and just click

12:32 &lt;+petermr&gt; At present I assume that each entry has a unique ID in the XML. But that doesn't allow linking to article

12:32 &lt;+petermr&gt; (I ignore version IDs at this stage)

12:32 &lt;+petermr&gt; is there a wikichemid?

12:33 &lt;+petermr&gt; as opposed to a page ID? I think it would be useful to have wikichemID

12:33 &lt;@Beetstra&gt; IIRC, each page has an ID, which is linked to a pagename, a page-move is merely a change of the name in the name-database ..

12:33 &lt;+Rifleman_82&gt; haha... i think that'd be redundant...

12:33 &lt;+Rifleman_82&gt; i was comparing a wikichemid with pubchemid

12:33 &lt;+Rifleman_82&gt; and cas no and einecs and a whole bunch of other arbitrary identifiers

12:33 &lt;@Beetstra&gt; WikiChemID would be difficult to control, 'there can be only one' ..

12:33 &lt;+Rifleman_82&gt; we can just use cas numbers

12:33 &lt;+petermr&gt; yes - a wikichemid would be similar to the use of cas and enices

12:34 &lt;+Rifleman_82&gt; and redirect where necessary - D and L can redirect to the correct article

12:34 &lt;+petermr&gt; You can't use CAS where it doesn't exist. I don't know where that might be.

12:34 &lt;+ChemSpiderMan&gt; but at least open to anyone to review

12:34 &lt;+walkerma&gt; We can't use CAS, they're too ambiguous too

12:34 &lt;+ChemSpiderMan&gt; absolutely agreed

12:34 &lt;+Rifleman_82&gt; what's the problemw ith cas?

12:34 &lt;+dmacks&gt; Do we really need Yet Another ID? vs canonical-SMILES, or InChi or somesuch?

12:34 &lt;+Rifleman_82&gt; i think they'll work for 90 % or so...

12:35 &lt;+Rifleman_82&gt; problems would be for ... raney nickel which isn't well-defined...?

12:35 &lt;+walkerma&gt; InChIKey?

12:35 &lt;+petermr&gt; I see a distinction between WP on the one hand (fairly small, common chemicals) and ChemSpider or Pubchem - very large

12:35 &lt;+petermr&gt; WP Chem covers a differeht constiuency

12:35 &lt;+ChemSpiderMan&gt; InCHiKey will have problems for many chemicals - orgamometallics, polymers, inorganics

12:35 &lt;+petermr&gt; InChI will not work for large numbers of non-organic compounds

12:36 &lt;+walkerma&gt; True

12:36 &lt;+dmacks&gt; good point

12:36 &lt;+petermr&gt; There has to be an authority-based Id for named substances

12:36 &lt;+ChemSpiderMan&gt; agreed

12:36 &lt;+petermr&gt; what is InChI for gasoline?

12:36 &lt;+Rifleman_82&gt; it's not a chemical compound

12:36 &lt;+Rifleman_82&gt; :P

12:36 &lt;+petermr&gt; but I think there is a CAS number

12:36 &lt;+ChemSpiderMan&gt; ABout $3.20 :-)

12:36 &lt;+Rifleman_82&gt; so it's not a chemical, no chembox

12:37 &lt;+petermr&gt; OK, what is INcHI for glucose?

12:37 &lt;+ChemSpiderMan&gt; what form?

12:37 &lt;+Rifleman_82&gt; look at it this way, all chemicals will have a cas, if there's a cas there need not be a chemical

12:37 &lt;+Rifleman_82&gt; but that's good enough for us?

12:37 &lt;+dmacks&gt; So this database will include materials and mixtures (performance/industrial chemicals), not just fine chemicals and pure substances?

12:37 &lt;+Rifleman_82&gt; and wikipedia can redirect all the various cas numbers to the glucose article?

12:37 &lt;+petermr&gt; The only problem with CAS is ownership and independence. It's a good system

12:38 &lt;+Rifleman_82&gt; heh... can persuade ACS to license CAS under creative commons or something? :)

12:38 &lt;+ChemSpiderMan&gt; we come ack to CAS...who's going to get the "right one"?

12:38 &lt;+Rifleman_82&gt; "right one"?

12:38 &lt;+ChemSpiderMan&gt; the right CAS number for the article

12:38 &lt;+walkerma&gt; I think we'll need to discuss this at length another time. I think there are issues with all of these formats

12:38 &lt;+ChemSpiderMan&gt; the material, compound, structure, mixture

12:38 &lt;+petermr&gt; there is no "right one" - but this is a long discussion. Different people have clear disagreements about chemicals

12:38 &lt;+ChemSpiderMan&gt; exactly.

12:39 &lt;+ChemSpiderMan&gt; CAS won't work

12:39 &lt;+petermr&gt; exactly. Our RDF work will highlight exactly what agreement and what disagreement there is between WP and other common sources such as MSDS

12:39 &lt;+Rifleman_82&gt; how about using the names?

12:39 &lt;+ChemSpiderMan&gt; I agree that a WikipediaID may be of value but why not the name

12:39 &lt;+Rifleman_82&gt; all common variants should redirect to the correct article

12:39 &lt;+petermr&gt; names are just as bad. What is the formula for "snow"

12:39 &lt;+ChemSpiderMan&gt; there isn't one

12:40 &lt;+ChemSpiderMan&gt; but is there an article?

12:40 * Beetstra is leaving .. I'll be here on other times when people want to discuss things

12:40 &lt;@Beetstra&gt; bye!

12:40 &lt;+Rifleman_82&gt; night dirk!

12:40 &lt;+ChemSpiderMan&gt; g'night

12:40 -!- Beetstra [n=djbeetst@Wikimedia/Beetstra] has quit ["Bye Bye"]

12:40 &lt;+dmacks&gt; Problem here seems to be trying to force 1:1 mapping of "some identifier" to "some chemical"

12:40 &lt;+ChemSpiderMan&gt; http://en.wikipedia.org/wiki/Snow

12:40 &lt;+petermr&gt; names are not unique and are unlikely to be so. there is a formaul for "snow" in pubchem. It is correct and different from what you might expect.

12:40 &lt;+Rifleman_82&gt; oh?

12:41 &lt;+ChemSpiderMan&gt; cocaine

12:41 &lt;+petermr&gt; So the only absolute resuolution os for WPedians to agree that certain entries have certain WPids

12:41 &lt;+ChemSpiderMan&gt; also, berries, nose candy, pimp powder etc

12:41 &lt;+Rifleman_82&gt; if we need to have WPIDs, we should have them for all

12:41 &lt;+Rifleman_82&gt; we could be the authority?

12:41 &lt;+petermr&gt; RDF can support a many:many mapping and we are using that to find out how serious the problem is for - say - 5000 common compounds

12:42 &lt;+Rifleman_82&gt; but curation will fall on someone's or some peoples' shoulders

12:42 &lt;+petermr&gt; yes - WP will become the authority - whether it wants to or not!

12:42 &lt;+petermr&gt; any successful collection becomes an authority

12:43 &lt;+walkerma&gt; We are already the de facto authority for chemistry students under the age of 18.....

12:43 &lt;+petermr&gt; at present the only thing common to almost all common source is CAS

12:43 &lt;+dmacks&gt; What's wrong with having the ChemComm article link going to a list of compounds (all the different protonation forms) if the article-author wasn't specific enough in his linking?

12:43 &lt;+petermr&gt; exactly

12:43 * Rifleman_82 has failed students for plagiarizing WP... even the [edit] tags

12:43 &lt;+dmacks&gt; me too Rifleman_82

12:43 &lt;+walkerma&gt; haha

12:44 &lt;+petermr&gt; ideally the author should indicate what they are linking to - a precise name, a generic name, etc.

12:44 &lt;+dmacks&gt; Got one that happened to copy a paragraph that had been vandalized, didn't notice a "This is gay!" sentence in the middle of the paragraph.

12:44 &lt;+Rifleman_82&gt; hahaha

12:44 &lt;+walkerma&gt; One problem we have is that we on WP organise by article, not by chemical compound

12:45 * petermr asks whether we have covered most of it or want some more

12:45 &lt;+Rifleman_82&gt; i remember marking lab reports... all of them insisted that methyl salicylate was a "rubificient"... something about blocking light...!?

12:45 &lt;+petermr&gt; but the infobox is percompound?

12:45 &lt;+Rifleman_82&gt; ideally each compound has its own article

12:45 &lt;+Rifleman_82&gt; but look at the article on carbodiimide

12:45 &lt;+walkerma&gt; Consider tartaric acid: We have one page but there are many CAS nos and InChIs

12:45 &lt;+Rifleman_82&gt; and cresol

12:46 &lt;+petermr&gt; These are DIFFICULT problems withy no simple answer. maybe there should be multiple infoboxes

12:46 &lt;+walkerma&gt; R,R; S,S; meso, unspecified, racemic mixture, etc

12:46 &lt;+ChemSpiderMan&gt; I think you're back to the article name as the primary WP key

12:46 &lt;+petermr&gt; there is a major problem between macroscopic and microscopic.

12:46 &lt;+dmacks&gt; yeah, again ChemComm author might not care, but also might.

12:46 &lt;+walkerma&gt; Using the article name as the primary WP key would be the most natural way for Wikipedians

12:47 &lt;+petermr&gt; that is why CML uses substance for macro and molecule for micro

12:47 &lt;+Rifleman_82&gt; orgsynth used to list cas numbers beside substances

12:47 &lt;+dmacks&gt; So whatever system or unique id we're gonna promote needs to account for both types of things.

12:47 &lt;+petermr&gt; CAS numbers are designed to describe substances. They sometimes work for molecules

12:48 &lt;+ChemSpiderMan&gt; mol/sdf is not as capable as CML for supporting substance...it is why I have not been able to deal with many of them

12:48 &lt;+petermr&gt; InChIs are designed to work for molecules.

12:48 &lt;+ChemSpiderMan&gt; and SMILES

12:48 &lt;+petermr&gt; InChIs sometimes work for pure substances

12:48 &lt;+ChemSpiderMan&gt; they are all limited in the same way

12:48 &lt;+petermr&gt; SMILES &lt;--&gt; InChI

12:48 &lt;+ChemSpiderMan&gt; similar way

12:49 &lt;+walkerma&gt; OK, I think we have plenty to ponder. Should we call it a day/night? And perhaps we can have this discussion on wiki?

12:49 &lt;+petermr&gt; The Merck handbook deals with substances

12:49 &lt;+petermr&gt; BTW I see WP as amjor competitor to Merck

12:49 &lt;+petermr&gt; == major

12:49 &lt;+dmacks&gt; yup to both.

12:49 &lt;+Rifleman_82&gt; and crc handbook :)

12:50 &lt;+walkerma&gt; Is everyone OK with me making a transcript of this discussion public?

12:50 &lt;+Rifleman_82&gt; sure

12:50 * petermr waves bye to all left and thanks for discussion.

12:50 &lt;+ChemSpiderMan&gt; yes

12:50 &lt;+Rifleman_82&gt; petermr

12:50 &lt;+petermr&gt; yes to public

12:50 &lt;+dmacks&gt; walkerma: Sure.

12:50 &lt;+Rifleman_82&gt; before you go

12:50 &lt;+Rifleman_82&gt; last question

12:50 &lt;+petermr&gt; sure...

12:50 &lt;+ChemSpiderMan&gt; I need a couple of moments from anyone who's interested please

12:50 &lt;+walkerma&gt; OK

12:50 &lt;+Rifleman_82&gt; should this database stay on wiki? or should we split it off to something off-wiki

12:50 &lt;+walkerma&gt; Good point

12:50 &lt;+Rifleman_82&gt; or on wiki but off-of the article space

12:51 &lt;+Rifleman_82&gt; vandalism is a problem, but can be fixed

12:51 &lt;+Rifleman_82&gt; i'm more worried if to make it efficient the syntax will not be easily figured out for the drive-by chembox filler

12:51 &lt;+dmacks&gt; If on-wiki, then separate page.

12:51 &lt;+Rifleman_82&gt; as in, the syntax will be so complicated

12:52 &lt;+walkerma&gt; We need to allow drive-by users to add information, then we with bots & AWB go and format the info

12:52 &lt;+walkerma&gt; If needed

12:52 &lt;+Rifleman_82&gt; we can have a tag asking them to add it to the talk?

12:52 &lt;+Rifleman_82&gt; if needed of course

12:53 &lt;+Rifleman_82&gt; anyway, if there're no responses to my comments we can sleep on it

12:53 &lt;+Rifleman_82&gt; ChemSpiderMan had something to say?

12:53 &lt;+ChemSpiderMan&gt; yes

12:53 * dmacks not sure how much more complicated the syntax would be than cheminfobox now.

12:53 &lt;+ChemSpiderMan&gt; please look at: http://en.wikipedia.org/wiki/N-Cyclohexyl-2-aminoethanesulfonic acid

12:54 &lt;+Rifleman_82&gt; yup

12:54 &lt;+ChemSpiderMan&gt; click on the CAS number

12:54 &lt;+dmacks&gt; okay...

12:54 -!- Kelson [n=Kelson@77-57-4-193.dclient.hispeed.ch] has joined #wikichem

12:54 -!- mode/#wikichem [+v Kelson] by ChanServ

12:54 &lt;+ChemSpiderMan&gt; and http://en.wikipedia.org/wiki/N-Acetylmannosamine

12:54 &lt;+Rifleman_82&gt; sodium salt, free acid

12:55 &lt;+ChemSpiderMan&gt; click on the CAS number

12:55 &lt;+ChemSpiderMan&gt; My opinion....CAS numbers should not link out

12:55 &lt;+ChemSpiderMan&gt; it adds confusion when there are many "forms" of the molecule with the same CAS

12:55 -!- carl-m [n=cbm@wikipedia/cbm] has joined #wikichem

12:56 &lt;+ChemSpiderMan&gt; salts, with/without stereo, incomplete stereo

12:56 &lt;+walkerma&gt; Hi Carl, Kelson, we're just finishing up here

12:56 &lt;+ChemSpiderMan&gt; The CAS in the ChemBox should be for the structure drawn

12:56 &lt;+petermr&gt; this was raised on the talk pages earlier I think

12:56 &lt;+walkerma&gt; ChemSpiderMan: This is bad - they would have different CAS nos anyway

12:56 &lt;+petermr&gt; as to where if anywhere the CAS should link

12:57 * Rifleman_82 wonders if we can start adding IR 1H, 13C, F, P, etc. NMR, other spectra... and complement/compete with SDBS and NIST webbook

12:57 &lt;+Kelson&gt; walkerma: :)

12:57 &lt;+Rifleman_82&gt; if we're going to be the authority, we needn't bother with external links

12:57 &lt;+ChemSpiderMan&gt; Yes...there may be many CAS numbers for the sAME structure

12:57 &lt;+walkerma&gt; NMR etc: Yes we should, but not today...!

12:57 &lt;+Rifleman_82&gt; linkfarms and other search engines will be linking to us instead

12:57 * petermr agrees

12:57 &lt;+dmacks&gt; yup

12:57 * petermr about the links

12:57 &lt;+ChemSpiderMan&gt; But using the CSA numbers to link out to many structures is different

12:58 &lt;+ChemSpiderMan&gt; One structure &gt; Many CAS but I don't think Many structure &gt; ONE CAS

12:58 &lt;+ChemSpiderMan&gt; anyhow...worth a talk page discussion.

12:58 -!- petermr [n=chatzill@arcturus.ch.cam.ac.uk] has quit ["ChatZilla 0.9.80 [Firefox 2.0.0.11/2007112718]"]

12:59 &lt;+Rifleman_82&gt; yes... many structure -&gt; one cas

12:59 &lt;+walkerma&gt; Yes, we have a lot to discuss on wiki

12:59 &lt;+Rifleman_82&gt; tartaric acid?

13:00 &lt;+walkerma&gt; exactly - tartaric acid must have at least 5

13:00 &lt;+ChemSpiderMan&gt; Rifleman...re spectra...they are being populated on ChemSpider. You might want to consider another ID...a ChemSpider ID in your box :-)

13:00 &lt;+ChemSpiderMan&gt; http://www.chemspider.com/docs/Uploading_Spectra_onto_ChemSpider.htm

13:00 &lt;+Rifleman_82&gt; chemspider: thanks... i've got a few spectra... from aldrich bottles

13:01 &lt;+ChemSpiderMan&gt; I'd welcome them...we can chat offline. u have my email?

13:01 &lt;+ChemSpiderMan&gt; walkerma..we're done maybe?

13:01 &lt;+Rifleman_82&gt; what's the copyright for these spectra? okay to upload on wiki? to discuss the finer points of NMR for example

13:01 &lt;+Rifleman_82&gt; ChemSpiderMan: yeah, i still have it from the chemsketch discussion a year ago

13:01 &lt;+Rifleman_82&gt; i'm done, we can discuss again

13:01 &lt;+ChemSpiderMan&gt; most are OPEN data

13:01 &lt;+Rifleman_82&gt; should we set a date for next week?

13:02 &lt;+ChemSpiderMan&gt; with me or the group? With me...sure

13:02 &lt;+Rifleman_82&gt; need not be next week

13:02 &lt;+walkerma&gt; Yes let's end it there I think. Should we meet at exactly one week after this first meeting? Rifleman, can you stay up?

13:02 &lt;+Rifleman_82&gt; i was thinking an open invitation for all

13:02 &lt;+dmacks&gt; works for me.

13:02 &lt;+Rifleman_82&gt; i can stay up... it's 2 am now, but i start work at 10 am

13:02 &lt;+walkerma&gt; I have office hours, but otherwise OK

13:02 &lt;+Rifleman_82&gt; so 12-2 my time is fine

13:03 &lt;+ChemSpiderMan&gt; am I invited back?

13:03 &lt;+Rifleman_82&gt; so.. why don't we meet here next tue, same time

13:03 &lt;+walkerma&gt; You're one of us now aren;t you, ChemSpiderMan?

13:03 &lt;+ChemSpiderMan&gt; If so...11am is good.

13:03 &lt;+ChemSpiderMan&gt; Sounds like fun...

13:03 &lt;+Rifleman_82&gt; no obligation, whoever wants to come can come

13:03 &lt;+Rifleman_82&gt; and... we can always hang out here

13:04 &lt;+Rifleman_82&gt; like beetstra and itub do

13:04 &lt;+ChemSpiderMan&gt; if not lunacy... I hhave added Chromene to ChemSpider... :-)

13:05 &lt;+Rifleman_82&gt; martin you might be interestedhttp://stats.grok.se/en/200712/diphosphines

13:05 &lt;+Rifleman_82&gt; a tool to check how often an article is viewed

13:05 &lt;+Rifleman_82&gt; may be helpful for your wp v 1.0 or whatever it is

13:05 * ChemSpiderMan ChemSpiderman leaving

13:05 -!- carl-m is now known as carl-m|away

13:05 -!- carl-m|away [n=cbm@wikipedia/cbm] has left #wikichem []

13:05 -!- ChemSpiderMan [n=ChemSpid@c-68-33-151-242.hsd1.md.comcast.net] has quit []

13:06 -!- You're now known as dmacks_away

13:06 &lt;+Rifleman_82&gt; night people

13:06 -!- Rifleman_82 [n=Rifleman@wikipedia/Rifleman-82] has quit []

13:07 &lt;+walkerma&gt; Thanks Rifleman! OK, I suggest to Kelson and carl-m that we relocate to #wikipedia-1.0

13:07 &lt;+walkerma&gt; Bye!

13:08 -!- walkerma [n=chatzill@cpe-74-71-213-87.twcny.res.rr.com] has left #wikichem []

15:47 -!- carl-m [n=cbm@wikipedia/cbm] has joined #wikichem

15:49 -!- Kelson [n=Kelson@77-57-4-193.dclient.hispeed.ch] has quit ["Client exiting"]

15:49 -!- carl-m [n=cbm@wikipedia/cbm] has left #wikichem []

15:52 -!- Physchim62 [n=Physchim@unaffiliated/physchim62] has joined #wikichem

15:52 -!- mode/#wikichem [+v Physchim62] by ChanServ

15:52 -!- Physchim62 is now known as PC62|away

15:55 -!- PC62|away is now known as Physchim62

--- Log closed Tue Jan 15 16:37:27 EST 2008