Wikipedia:WikiProject Chemistry/IRC discussions/9 September 2008

--- Log opened Tue Sep 09 12:03:04 EDT 2008

12:03 &lt;walkerma&gt; JaGa, are you around?

12:03 &lt;JaGa_&gt; yep

12:03 &lt;+dmacks&gt; User:Casforty is doing a lot of CAS# work. Not sure who it is.

12:03 &lt;+dmacks&gt; sorry, chemspiderID work.

12:05 &lt;walkerma&gt; Great. I've never met Casforty, but from what he's been doing (WikiGnome work), I'm sure he'd be happy to help with this too.

12:06 &lt;walkerma&gt; Well, we only have one major thing to resolve - how coordinate the upload of validated data from the SDF?

12:06 &lt;walkerma&gt; We have this page, where we have to add in the VersionIDs

12:06 &lt;walkerma&gt; http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Chemicals/Index

12:07 &lt;walkerma&gt; And on my birthday (May 12th), ChemSpiderMan sent us an SDF file with 500 validated entries

12:07 &lt;walkerma&gt; including validated CAS nos. from CAS

12:08 &lt;walkerma&gt; Could we each take 100, and do these?

12:08 &lt;+dmacks&gt; I was talking to Dirk a week or so ago, found we didn't have a clean way to handle pages where "some" of the data values are verified.

12:08 &lt;JaGa_&gt; I could help, no prob - I just need to know exactly what is needed

12:09 &lt;JaGa_&gt; for instance, where can I find the 500 entries?

12:10 &lt;+dmacks&gt; (example: if CAS is verified but bp is not, we don't want to over-state our confidence in the bp)

12:10 &lt;walkerma&gt; JaGa, they aren't on-wiki, in fact the CAS agreement doesn't allow us to post complete collections of CAS #s, we are supposed to put them up as separate entries. I can email you the SDF file (if you can read SDF)

12:12 &lt;walkerma&gt; dmacks: If only part of the data are validated, then the entry can't be called validated. We can only handle the fields that are in the SDF for now - structure, IUPAC name, InChI, InChIKey, ChemSpiderID and CAS#

12:12 &lt;JaGa_&gt; well, you could send it to me and I could figure out how to open it if I can't already - I'm not familiar with that mime type

12:13 &lt;+dmacks&gt; Two possible solutions are to rename the fields that the infobox treats like normal but that aren't tracked by chemobot.

12:14 &lt;walkerma&gt; I thought CheMoBot was only going to track the validated fields - wasn't that the point?

12:14 &lt;walkerma&gt; I know we wanted the option of adding BP, MP, etc later, though

12:15 &lt;+dmacks&gt; Right. But what happens when a new field *is* added to the tracked set? Have to make sure it doesn't suddenly appear that all previously-verified pages (though only verified for cas/etc) are also verified for the bp/etc.

12:16 &lt;+dmacks&gt; chembox_new_fields=C|H|N|O|P|Cl|Br|I|B|IUPACName|CASNo|EINECS|PubChem|SMILES|InChI|RTECS|MeSHName|ChEBI|KEGG|Formula|MolarMass|Density|MeltingPt|MeltingPtC|MeltingPtK|MeltingPtF|BoilingPt|BoilingPtC|BoilingPtK|BoilingPtF

12:16 &lt;+dmacks&gt; That's the current /Settings list.

12:17 &lt;walkerma&gt; Ah, I see, it's an all or nothing thing, right?

12:17 &lt;+dmacks&gt; Yeah, looks like you can't have "some" boxes be checked for various fields.

12:18 &lt;walkerma&gt; Well, I think once we have a validated set together, we'll have to look at how we add MP and BP data

12:19 &lt;+dmacks&gt; Second solution is to put a template or other tag on the data values (either to signify "validated" or else not-validated) and that way the verified revid includes information about exactly which items are validated in that revid.

12:20 &lt;walkerma&gt; dmacks: I like that second solution a lot! Is that possible?

12:21 &lt;+dmacks&gt; Yeah. Or even an inline &lt;!-- --&gt; comment in the value field.

12:22 &lt;walkerma&gt; What I'd really like is something that shows the field in red if unchecked, yellow if tentatively checked, and green if validated. Is that possible, or am I just dreaming here?

12:22 &lt;+dmacks&gt; The change-tracking doesn't do any analysis of the contents of the infobox field, just compares "is the string the same as it was?", so comments, templates and other wikimarkup all count as "changes"

12:23 &lt;walkerma&gt; But we can always update the revid, right?

12:23 &lt;+dmacks&gt; Ooh, *that's* interesting!

12:24 &lt;+dmacks&gt; (right...If a certain piece of data is checked, just add "&lt;!-- checked --&gt;" to the value and update the revid, now the data and "the fact that it's checked" is tracked)

12:25 &lt;+dmacks&gt; Putting the value in color (or an asterisk or other symbol) is easy; altering the box background color or the color of the field *name* in the page display is harder.

12:25 &lt;+dmacks&gt; Former is just about the infobox data, latter has to be implemented in the infobox templates themselves)

12:26 &lt;walkerma&gt; Having the value in color would be perfect, IMHO

12:26 &lt;walkerma&gt; And if some random vandal tries to be clever, by adding &lt;!-- checked --&gt; into the field, the bot will record it as an unauthorized change and won't update the revid!

12:26 &lt;+dmacks&gt; exactly

12:27 &lt;walkerma&gt; JaGa_ : http://en.wikipedia.org/wiki/Chemical_table_file#SDF

12:27 &lt;JaGa_&gt; thanks - I'll check it out

12:28 &lt;JaGa_&gt; oh yeah, I was already looking at that

12:28 &lt;walkerma&gt; At WP, we are allowed use of one SDF reader free of charge, I can't recall what it's called, though. I need to get it myself, as my ACD one stopped working

12:29 * dmacks uses Itub's perl modules, but can't access the machine where I have them installed right now.

12:29 &lt;walkerma&gt; OK, should we just set up a page on WP, and each sign up to do 100 articles from the list? Say, by the end of the month? ChemSpiderMan said he would help with that, too.

12:30 &lt;JaGa_&gt; the best would be if someone could put the first entry or two in the Index page so I could make sure I'm doing it right

12:31 &lt;walkerma&gt; dmacks: can you generate a wiki-formatted list of the 500 from that SDF file?

12:31 &lt;+dmacks&gt; Is the CAS of Benzene bold for you?

12:32 &lt;walkerma&gt; Yes it is!

12:32 &lt;walkerma&gt; Clever!

12:33 &lt;walkerma&gt; What do you think of the red-yellow-green system?

12:34 &lt;+dmacks&gt; Sounds reasonable to me.

12:35 &lt;+dmacks&gt; Have to be careful with mol-formula, since it's sometimes colored already.

12:35 &lt;walkerma&gt; Oh, right. Well, we can discuss that aspect. I think it would be pretty clear without people having to read instructions, though.

12:35 &lt;+dmacks&gt; Also, there have been complaints other times we've used color or other interface-tricks to "mean something"

12:36 &lt;walkerma&gt; How else can we concisely show this?

12:36 &lt;walkerma&gt; I'm happy for the green one to be in bold, too!

12:36 &lt;+dmacks&gt; Only other way would be with an asterisk or some other symbol, but that gets cluttered in a hurry.

12:37 &lt;walkerma&gt; And machines can't parse out data properly..

12:37 &lt;walkerma&gt; our

12:37 &lt;+dmacks&gt; Right.

12:37 &lt;+dmacks&gt; Unless it's a standard wiki footnote.

12:38 &lt;walkerma&gt; Explain?

12:38 &lt;+dmacks&gt; Some data values are specifically &gt;ref&lt;erenced already, so a parser must be able to handle that.

12:40 &lt;+dmacks&gt; So if our "this data is validated" is indicated by a &lt;ref&gt; tag (or at least something that parses the same way), we're safe.

12:40 &lt;walkerma&gt; OK, I'll have to go and teach lab soon. Is it possible for you to upload to the wiki a list of the validated 500 articles, as a list for us to work through?

12:41 &lt;walkerma&gt; We could perhaps put it here (page to be created):

12:41 &lt;+dmacks&gt; Actually in terms of the lay public and wiki standards, all we're really doing is finding a WP:RS that supports our claim of what the CAS, MW, etc are. So we *could* just add a real &gt;ref&lt; (again included in the infobox value, so it would be tracked by chemobot)

12:41 &lt;walkerma&gt; http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Chemicals/Chembox_validation

12:41 &lt;+dmacks&gt; that links to a page describing this project)

12:41 &lt;walkerma&gt; WikiProject Chemicals/Chembox validation would be the new page

12:41 &lt;walkerma&gt; Yes, that ref scheme sounds like an excellent idea.

12:42 &lt;+dmacks&gt; I thought I had parsed out the chemicals list from the rdf a while ago. Will check tonight.

12:42 &lt;walkerma&gt; The link could even show who signed off on having checked the data

12:43 &lt;walkerma&gt; That is one of my goals in the validation project - a "paper trail" so you can see for yourself that it was checked, by whom, and when

12:43 &lt;+dmacks&gt; yeah

12:43 &lt;walkerma&gt; dmacks - you parsed the big list of 5000. But we only have the CAS numbers fully integrated into 500 - the SDF from May 12th

12:43 &lt;+dmacks&gt; One caveat (chemobot limitation, Dirk knows) is that only the first line of a multiline value is checked, so there is a blind spot for vandalism.

12:44 &lt;walkerma&gt; Ha! So now I can have CASNo = [572-34-1]&lt;br/&gt;Bush is gay LOL?

12:44 &lt;+dmacks&gt; Oooh, right. I don't have an rdf file from later than march so I don't seem to have the 500cas list.

12:45 &lt;walkerma&gt; I'll resend it to you now

12:45 &lt;+dmacks&gt; "multiline" in terms of wiki page source, not html or other layout.

12:45 &lt;walkerma&gt; JaGa_: Can I have your email address so I can send it to you?

12:45 &lt;+dmacks&gt; http://en.wikipedia.org/w/index.php?title=Benzene&diff=234611708&oldid=234584325

12:45 &lt;+dmacks&gt; ^ that change was not detected.

12:45 &lt;JaGa_&gt; oh, sure, I thought you'd send it through wiki

--- Log closed Tue Sep 09 12:45:58 EDT 2008