Wikipedia:WikiProject Chemistry/IRC discussions/28 July 2009


 * 17:34:25] Hello everybody
 * 17:34:35]  Greetings.
 * 17:34:37]  welcome to WikiChem!
 * 17:34:49] Hello! Thanks for joining us!
 * 17:34:50] Now, who are you all IRL?chemspiderman and walkerma I know
 * 17:35:56] OK, shall I start by explaining RXNO, how we invented it and what it was originally for?
 * 17:36:03]  IRL, I am **********, PhD chemist and ex-member of various RSC committees, now a freelance writer, translator and literary agent (when I'm not editing Wikipedia)
 * 17:36:43]  what WAS it originally for?
 * 17:37:14]  I'm Alex, undergrad (in chemistry), UK.
 * 17:37:23] Originally it was intended to be 'something' that mentions of 'dead German chemists' in text could be referred to.
 * 17:38:02]  OK, the IUPAC Gold Book doesn't do that very well at all, which is why I was excited when I found RXNO
 * 17:38:27] Precisely; there are only four or five name reactions in the Gold Book.
 * 17:38:36] (which I shall call AU for short here)
 * 17:38:58] The Gold Book is fine for sending users of an enhanced web page to encyclopaedia entries...
 * 17:39:28] ... but in the longer term we want either ourselves or other people to be able to make inferences about sentences in papers.
 * 17:40:09] ... and encyclopaedia entries are not ideal for this...
 * 17:40:14]  at Wikipedia, we found that AU entries were usually either too short or too specialized for our purposes
 * 17:40:40] ... because an encyclopaedia with one entry per 'natural kind'/'experimental kind' would be unreadable.
 * 17:41:01] (I may be slightly incoherent as have only come back from ontology conference in the states)
 * 17:41:03]  hmm, carry on…
 * 17:41:31] By which I mean that it's perfectly reasonable for the Gold Book to, on their page for "cis", define "cis" and "trans"
 * 17:41:51]  I'm Norm Reitzel, BSEE, BSChem, MSChem, gave up on PhD when my PI was fired.
 * 17:41:51] but if you want a computer to reason over things, you can't go around giving opposite things the same URL.
 * 17:42:27] Likewise you have to distinguish carefully between experiments and the instruments used in the experiments.
 * 17:42:54]  ( same url if it includes site/Topic#othertopic ? )
 * 17:43:03]  OK, I think I get you
 * 17:43:30]  Instruments meaning equipment or a symthetic method?
 * 17:43:39]  Ugh, synthetic.
 * 17:43:59]  That is a distinction which has caused a considerable amount of debate on Wikipedia over the years that I've been involved
 * 17:44:00] @NormWork: one might be able to do that with a specialized reasoner, but reasoners are not very bright.
 * 17:44:31] By instruments here I'm thinking of TEM, which can stand for "transmission electron microscope" and "transmission electron microscopy", which are different things.
 * 17:44:49]  Ahhh.
 * 17:45:07] I think it's unreasonable to constrain wikipedia, which is meant to be human-readable, to be one-page-per-entity.
 * 17:45:30]  NormWork, even that solution has problems for the average user of Wikipedia: we can set up a URL to direct to a specified section of an article, but will the user realise why he/she has just arived in the middle of an article rather than at the start
 * 17:46:11] So, we needed a classification scheme as a first pass, and what we (me, Celia Gitterman and David Barden) came up with...
 * 17:46:18] <Physchim62> what we can do, is have URLs for each RXNO entry
 * 17:46:30] <NormWork> @batchelorc: They should be bright.  I spent first 20 years of my career making brighter heuristics for stuff like that.
 * 17:46:33] ... was a classification based on what happened to the 'skeleton' of the molecule in each reaction.
 * 17:47:26] So we looked at lots of name reactions, and worked out a list of about twenty kinds of reaction...
 * 17:47:54] ... and then determined whether we could assign the kind of reaction (joining, aromatic substitution, functional group modification &c &c) just by looking at the reaction scheme.
 * 17:48:13] This led us to working out a decision tree and amending the classification scheme accordingly.
 * 17:48:41] One can in principle work out automatically whether something is necessarily a five-membered ring synthesis or a six-membered ring synthesis...
 * 17:49:19] ... so we keep the human classification effort for the hard part, which involves tracking the identity of a skeleton through the synthesis.
 * 17:49:29] Or subskeletons in the case where you make two halves of a molecule and join them up.
 * 17:49:30] <NormWork> Nice, I think.
 * 17:49:38] We have guidelines for how to assign skeleta.
 * 17:49:49] We also really really need to write all this up properly.
 * 17:49:51] <NormWork> canonical representation molfile?
 * 17:50:00] Not yet.
 * 17:50:31] What we have at the moment is a purely textual definition, hopefully in Aristotelian form.
 * 17:51:03] Aristotelian form is where you give the genus (what kind of thing something is) and differentiae (what distinguishes it from its siblings).
 * 17:51:30] Hence, a plate is a piece of crockery for eating food off.
 * 17:51:37] A bowl is a piece of crockery for eating food out of.
 * 17:51:47] A mug is a piece of crockery for drinking out of, and so forth.
 * 17:52:42] We want to do some tests with independent annotators, independent in the sense that they weren't involved in writing the guidelines, to make sure they can classify reactions in the same way.
 * 17:52:52] <Physchim62> yes, you really should write it up properly!
 * 17:52:58] That way we'll know whether it's a good classification scheme.
 * 17:53:06] So: next steps.
 * 17:53:14] <Physchim62> I think Wikipedia could help there
 * 17:53:32] The first is to map the class names (aldehyde, Michael receptor, and so forth), onto the ChEBI ontology.
 * 17:54:03] http://www.ebi.ac.uk/chebi/
 * 17:54:06] Thanks Martin.
 * 17:54:19] I'll remember to do URLs from now on.
 * 17:54:37] ChEBI at the moment has mol files for its fully-specified molecules.
 * 17:54:59] But their plan is to create proper definitions and mol files &c for underspecified molecules like "aldehyde".
 * 17:55:37] Once we have done that we should be able to look at extra axes of classification, like all the reactions that involve a ketone, say.
 * 17:56:03] <ali_as> Where do lit refs come into the description of the reactions?
 * 17:56:06] Another thing will be to formally represent reagents, like nickel ions. These are all in ChEBI too, and if they aren't, then they can be added.
 * 17:56:15] We need literature references, definitely.
 * 17:56:30] So how that works in OBO format, which is one way of supplying the ontology...
 * 17:56:45] ... is that the def: line in the stanza has square brackets at the end into which can be inserted DOIs.
 * 17:56:50] And we need to do this.
 * 17:56:56] <ali_as> I'm just looking at the zip file on rsc.org and I see a huge list of descriptions with no lit refs I can find.
 * 17:57:49] Reaction conditions, in terms of temperature, let's say, or pressure, are harder.
 * 17:58:06] <NormWork> /aside Do we have to be careful with lit refs, zB, March ?
 * 17:58:28] What is zB?
 * 17:58:28] <NormWork> nl res
 * 17:58:47] <Physchim62> What really struck me about RXNO is that it gives a hierarchy of reaction types, which is something that is very interesting to me at least
 * 17:58:51] By "nl res" do you mean "Dutch-language resource"? zB = zum Beispiel?
 * 17:59:06] The hierarchy of reaction types was the time-consuming bit.
 * 17:59:14] <NormWork> ( sorry, I'm swapping back and forth, running a varian spectrometer )
 * 17:59:16] <Physchim62> zB = zum Beispiel, for example
 * 17:59:34] <Physchim62> I'm sure
 * 17:59:43] <ali_as> Are oxidation/reduction part of the descriptions or isn't this unimportant?
 * 18:00:15] <ali_as> Ohh double negative, I'm surpassing myself today.
 * 18:00:20] @ali_as, if the oxidation or reduction is the reason why somebody is doing it, like turning a hydroxy group into an oxo group, then we record that.
 * 18:00:28] <Physchim62> redox eactions are classified as to what they do to he overall carbon skeleton, not as a separate type
 * 18:00:49] <NormWork> Phys: Yes, it's an old habit from days when my PI spoke Deutsch.
 * 18:00:52] but what we don't do, at least not yet, is decide whether all reactions are oxidations or reductions.
 * 18:01:10] We have a parallel ontology called MOP.
 * 18:01:23] http://www.rsc.org/ontologies/MOP
 * 18:01:24] <NormWork> nl res
 * 18:01:30] <NormWork> damnit...
 * 18:01:30] <ali_as> But if it's just part of the reaction conditions for removing an azide then it isn't noted?
 * 18:01:33] <Physchim62> so cyclohexane -> adipic acid is a "ring opening reaction", not an oxidation
 * 18:01:44] @Physchim62: exactly.
 * 18:02:04] In order to get a clean classification hierarchy we needed to identify what the most important thing going on at each stage was.
 * 18:02:35] By referring to MOP, into which we're starting to add things, we can more fully describe cyclohexane -> adipic acid.
 * 18:02:36] <Physchim62> ali_as, on Wikipedia, we can always use multiple classifications schemes, we are not tied down to a single hierarchy
 * 18:03:10] <ali_as> I'm just fitting this into my head, it's not a comment.
 * 18:03:16] And indeed in ontologies one can use multiple inheritance, but we should be very careful about using the is_a relation, example:
 * 18:03:29] in chemistry one classifies molecules according to their parts.
 * 18:03:50] We could do this in zoology, and say that a bird is a wing-containing organism, a head-containing organism, a beak-containing organism
 * 18:04:08] ... but that would lead to an explosively large classification.
 * 18:04:20] Instead we do anatomy and say that the bill is part of the head is part of the bird
 * 18:05:09] That's the story so far. I'm hoping to finish the ChEBI mapping this month.
 * 18:05:16] Are you familiar with my PhD adviser's work in this area? http://pubs.acs.org/doi/abs/10.1021/ci60019a004
 * 18:05:45] http://www3.interscience.wiley.com/journal/112298216/abstract?CRETRY=1&SRETRY=0
 * 18:05:47] @martin, thanks for this.
 * 18:06:05] He started work on rxn classifications on computers in 1967!
 * 18:06:20] He's preparing a paper right now, at the age of 81!
 * 18:06:38] I find the cheminformatics literature generally hard to navigate because there was a lot of interesting stuff done in the 1970s...
 * 18:06:40] <NormWork> /aside some of us old farts are still doing work we hope is useful.
 * 18:06:52] ... which you don't see followed up on these days.
 * 18:07:07] Hendrickson is really an organic chemist who plays with computers, not the other way round
 * 18:07:08] <NormWork> /aside before a very rich persion decided none of that information was relevant to PC's
 * 18:07:21] So if you're a chemist, you'll understand him!
 * 18:07:21] <NormWork> <-- the other way around
 * 18:07:35] I like this paper a lot.
 * 18:08:03] I think the approach looks very tractable.
 * 18:08:24] How does the approach handle ring-formation, though?
 * 18:08:50] @walkerma, could you provide the DOI for the wiley paper---the Wiley paper isn't liking the URL
 * 18:09:18] ... expired page
 * 18:09:40] From memory - Rings are treated as skeletons, even if nitrogens or oxidations are part of the ring. He looks at some rules for ring formations as well when formin them
 * 18:09:47] Also see http://www.webreactions.net/
 * 18:10:01] That uses the approach in the Wiley paper
 * 18:10:27] DOI:10.1002/chem.19950010710
 * 18:10:57] Sorry, I meant to say OXYGENS are part of the ring!
 * 18:11:17] Interesting. But I assume there are licences on the webreactions stuff.
 * 18:11:41] We count oxygen atoms as parts of rings, unless it's something mad like C1OOOCCC1.
 * 18:11:58] and thanks for the DOI.
 * 18:12:49] <Physchim62> which comes nicely to an important point for RXNO – would you or the RSC object to it being used as one of the ways of classifying 'dead German' reactions on Wikipedia?
 * 18:13:03] Aha, licensing. No, it's under CC-BY 3.0.
 * 18:13:21] In fact we'd welcome people reusing it and coming up with suggestions.
 * 18:13:49] I need to fix the RXNO page on the RSC web site so it has a link to: http://code.google.com/p/rxno/
 * 18:13:51] I'll email you the background on licences.
 * 18:13:57] <Physchim62> OK, we can make a suitable attribution template that can be modified when you write it up formally then
 * 18:14:13] Great.
 * 18:14:38] CC-BY 3.0 is the standard licence for biomedical ontologies, so we just followed that.
 * 18:14:41] (on webreactions that is)
 * 18:15:22] Thinking of your wikipedia use case, is there anything, other than the original refs, that we need to add...
 * 18:15:32] ... that I haven't mentioned so far.
 * 18:15:38] ?
 * 18:15:59] <NormWork> spin=20 method='wshim' locktc=3 shim
 * 18:16:10] <NormWork> sorry.
 * 18:16:12] <Physchim62> Wikipedia probably has some name reactions which are not (yet) included in RXNO – we have some pretty esoteric content out there – so that could also be a test of your classification scheme
 * 18:16:24] That would be cool.
 * 18:16:43] Oh yes---how did we decide which ones to start off with?
 * 18:16:45] Well...
 * 18:16:57] <ali_as> PC, are we looking at something extra to the reaction box or seperate?
 * 18:16:59] ... we searched the RSC journal archive for name reactions back to 1841
 * 18:17:10] and took the 200 most common ones.
 * 18:17:17] Then we added more as and when.
 * 18:17:28] I had one reaction that I and even my PhD adviser (living encyclopedia) had never heard of - some students wanted to use it in their homework. They ended up using it on the exam, too - and they both got As!
 * 18:17:39] They found it on Wikipedia!
 * 18:17:47] <Physchim62> I didn't want to charge ahead and reorganise Wikipedia without asking first…
 * 18:17:49] What was the reaction, out of interest?
 * 18:18:01] http://en.wikipedia.org/wiki/List_of_organic_reactions
 * 18:18:03] We have a tracker at: http://code.google.com/p/rxno/issues/list
 * 18:19:02] Can't remember the specific reaction now, but I think it converted a aromatic nitrile into an ester, or something like that
 * 18:19:21] <Physchim62> ali_as, the reaction box is separate, but related. we're basically looking at our categorization scheme, altough we will probably have to write a few articles to explain it as well
 * 18:19:22] RXNO isn't at the stage where it can be queried for that, but it will be soon
 * 18:19:52] I think for the categorization scheme you should pick one that works best for the reader.
 * 18:20:22] How easy is it to have multiple infoboxes on the same page?
 * 18:20:41] And is it possible to externally reference an infobox if you don't know the page?
 * 18:20:42] <Physchim62> as a parallel move, we are also looking to get machine-readable data into our reaction articles: for the moment, this will probably be added by hand as part of our general verification drive
 * 18:21:11] I'm a great fan of starting off doing machine-readable data by hand...
 * 18:21:17] <Physchim62> the other thing which we can do fairly easily is to get an index of Wikipedia pages to RXNO terms
 * 18:21:33] Yep.
 * 18:21:55] <Physchim62> vice-versa is simple, we simply create redirects
 * 18:21:58] I'll pick people's brains in our IT department about the mapping table so we can link back.
 * 18:22:11] That would be great
 * 18:22:21] Oh, redirects, of course, yes. Our internal documentation (on a MediaWiki wiki) is chockfull of them.
 * 18:23:59] <Physchim62> RXNO has chosen a house style which is slightly different from Wikipedia's, but that not an insurmountable problem. I can create a tab-separeted file with RXNO names vs. WP names in less than a day
 * 18:24:13] OK, do you mind having redirects from pages of the form http://en.wikipedia.org/wiki/RXNO_0000001, or is that an abuse of the namespace?
 * 18:24:43] I think that should be fine
 * 18:24:44] I should caution that the ID is primary, and the name may change if we have a good reason.
 * 18:24:53] <Physchim62> I'm not sure.
 * 18:25:09] It feels a bit spammy to me.
 * 18:25:29] We're only talking a few hundred redirects, aren't we?
 * 18:25:42] Yes.
 * 18:25:53] <ali_as> Would http://en.wikipedia.org/wiki/RXNO/0000001 be any better?
 * 18:26:01] <Physchim62> At WikiChemistry, we'd have no problems over it, but I can't guarantee that some liberal arts undergrad won't come along and object…
 * 18:26:27] If we (WikiProject Chemistry) do it, I don't see a problem. I've never seen deletion debates over redirects
 * 18:26:40] The aphorism is, "Redirects are cheap"
 * 18:26:43] <Physchim62> walkerma, good point
 * 18:27:01] And we're never going to go much over 10^3 terms in RXNO.
 * 18:27:49] <NormWork> Hah. Sorry, I just laugh by reflex at "never going to do" comments.
 * 18:28:18] <Physchim62> let's do it ourselves, rather than getting Colin and his crew involved in WikiPolitics. The downside of that approach is completeness, but we can discuss that again maybe in the autumn…
 * 18:28:19] If there is a massive explosion in the number of really useful reactions, it isn't going to be due to a different human being in each case.
 * 18:28:36] OK, great. Are any of you coming to ACS Fall?
 * 18:28:40] Yes
 * 18:28:54] <NormWork> I'm scheduled to, but one can never tell, around here.
 * 18:28:54] See you there.
 * 18:29:07] <NormWork> Instant projects appear as if by some evil spell.
 * 18:29:11] Before you go - You mentioned queries - do you plan something like Larock's Comprehensive Organic Transformations? Or like the reaction index in Jerry March (you can look up how to prepare aldehydes, for example)?
 * 18:29:13] <Physchim62> not me, I'm still trying to sort things out for IUPAC Galsgow
 * 18:29:47] <NormWork> I like the index in March, myself -- because I learned how to use it.
 * 18:30:05] Since there is a straightforward mapping from OBO onto OWL, one could use XSLT/XQuery/SPARQL to pull things out.
 * 18:30:07] <NormWork> It's not so straightforward, this from somebody who instructs student nubies.
 * 18:30:08] <Physchim62> NormWork, a spirit that is indeed evil, as it generates projects but not the volunteers to staff them!
 * 18:30:30] I'll just fetch a local copy of March, hold on...
 * 18:31:06] Not the main index, but rather the (lengthy) one before that, organised by functional group
 * 18:31:43] I suspect that index is one of the reasons why many organic chemists refer to March as the "bible"
 * 18:32:13] <Physchim62> it is certainly not for March's classification scheme, which has never been used outside his textbook
 * 18:32:17] Aha, so in the one I have to hand, Appendix B, 1. "Acetals and Ketals"---0.12 EReaction between alkoxides and gem-dihalides (Williamson) or alpha-halo ethers
 * 18:32:37] <ali_as> Just occured to me to ask, are there any plans to include protecting groups into RXNO?
 * 18:32:47] It lists all the reactions you can use to make acetals and ketals
 * 18:33:06] <Physchim62> there are some protecting group reactions in there, they're just well hidden (or "protected")
 * 18:33:11] ali_as: really good question
 * 18:33:21] What I've been doing is getting protecting groups added to ChEBI.
 * 18:33:46] <ali_as> It would be nice to have a list of functional groups that a reaction would NOT alter.
 * 18:34:26] ali_as: really good idea; can you log it on the issue tracker so I don't forget?
 * 18:34:44] PG reactions could easily be lost under things like "esterification". One issue is that for PGs the conditions for the reactions are critical, whereas for most named reactions the molecular transformation is the main feature (though conditions obviously matter)
 * 18:34:53] In the classification scheme, we have "deprotection" and "protection" as top-level entries.
 * 18:35:02] <ali_as> Will do.
 * 18:35:27] So this means that if you do the kind of protection that replaces an oxo oxygen with a -OCCO- group (can't remember name)
 * 18:35:31] (not actually organic chemist)
 * 18:35:33] <NormWork> ali_as, extend that one level farther, include "protected functional groups" and/or protection schemes.
 * 18:35:46] then that doesn't count as a ring-formation.
 * 18:36:01] So, yes, we definitely need to add protection reactions.
 * 18:36:06] And deprotection.
 * 18:36:31] <NormWork> Well, "protection scheme" necesarily includes how to remove/undo protecting groups.
 * 18:36:47] <NormWork> Otherwise it's not "protection" - it's "oops"
 * 18:37:08] * NormWork knows more than he ever, EVER wanted to know about protecting groups on steroids.
 * 18:37:25] Have spent some of the last week trying to sort out the philosophical issues surrounding what a protecting group actually *is*.
 * 18:37:29] <NormWork> My PI was a terpene chemist.
 * 18:37:33] But that needn't concern us here.
 * 18:38:00] <NormWork> " a less labile group except under very specific conditions"
 * 18:38:40] Hi Beetstra!
 * 18:38:41] My collaborator thinks of the process they're involved in as the process equivalent of a hole.
 * 18:39:09] <ali_as> A scenic route to a higher yeild.
 * 18:39:16] Beetstra, could you say a bit about yourself IRL?
 * 18:39:26] <Physchim62> well functional group protection is functional group modification on the RXNO scheme
 * 18:39:46] Oh, well spotted. I thought I was going mad there.
 * 18:40:03] <Physchim62> happens to the best of us ;)
 * 18:40:19] So we have Boc protection and Fmoc protection; we can extend this.
 * 18:40:27] <NormWork> It's theoretically a subcategory of modification that has a "how to put it back" subnote.
 * 18:41:38] It's a fun classification task because there's a huge component of human intention there.
 * 18:42:38] A Martian who could see a protecting group reaction wouldn't be able to tell that it was only there because a deprotection was coming later.
 * 18:42:45] <Physchim62> exactly why I wanted to ask before we took it on to Wikipedia: there has obviously been a huge amount of intellectual work which should be duly acknowledged
 * 18:43:32] We (RSC) really need to write the paper; would you be able to acknowledge that when/if it's written, accepted and published?
 * 18:43:33] <Physchim62> bof, it all goes to CO2 and H2O in the end!
 * 18:43:40] Sometimes the protection is combined with a secondary aspect - chelation to allow regioselective lithiation, for example - or sometimes to alter the electronics & chemoselectivity (is use of a dithiane a protecting group or a synthetic use of umpoling?)
 * 18:44:20] walkerma: nice example of intention there
 * 18:44:49] My boss in industry used to say that all our organic processes were just about ever more elaborate ways to make sodium chloride! Amazing how often that is true - if you include workups
 * 18:44:57] <Physchim62> batchelorc, of course. If we put a template up there linking to the RSC ontologies page for the moment, we can easily modify that to include a reference to the formal description once it's published.
 * 18:45:10] Physchim62, that's splendid.
 * 18:45:15] OK, is there anything else you need from me?
 * 18:45:29] It's very exciting that you think this stuff is useful.
 * 18:45:47] <NormWork> I'm .. a little shocked you should say that. Of course it's useful.
 * 18:46:08] As long as we have full permission to "run with this" ontology work, and we won't run into copyright issues, then I think we're happy
 * 18:46:12] <NormWork> It's another step towards automata that can actually help more than just facilitate.
 * 18:46:30] <Physchim62> I'm excited by it because it's useful. It partially solves a very real problem for Wikipedia (and probably may other people besides)
 * 18:46:49] <NormWork> Yes, expecially "many other people"
 * 18:47:14] NormWork, there's a tremendous focus, at least in the cheminformatics meetings I go to, on statistical and quantitative methods rather than this kind of lexical work.
 * 18:48:08] OK, you absolutely have full permission to work with all this. I should head off now because the jetlag is really setting in.
 * 18:48:21] I'm really glad you're linking it up with the ChEBI work - I think it will create something bigger than the sum of its parts
 * 18:48:23] Very keen to hear how you get on.
 * 18:48:29] Thanks a lot for coming!
 * 18:48:37] <Physchim62> we had a link up with the people at the Gold Book site a while back: their lexical work is second to none, but we could find a way to use much of it
 * 18:48:47] And some of us will hope to see you in Washington DC
 * 18:49:16] Thanks for having me. Cheerio! (ali_as: aha! will try that. walkerma: see you in DC)