Wikipedia:Reference desk/Archives/Computing/2024 May 23

= May 23 =

Organizing text and data
I'm working on a project that would go lot more smoothly if I could get myself organized. What I've got is pieces of text that I need to be able to classify in various ways and apply attribute tags to (e.g. this text has the tags applied for "Religion" and "Finances" while this other one has only "Animals", etc.). I would normally use Excel for something of this scale, but the text pieces aren't really appropriate for stuffing into a cell (and some have particular formatting I'd like to preserve, which again doesn't work great with Excel). At this point, my plan is to indeed do it in Excel, but hyperlink the text pieces, which is clunky at best. Any other options that spring to mind? There will be hundreds of records, which is large enough to need organization, but not zillions and zillions and it's a personal project, so I'm not looking to spend a lot. Any programs spring to mind as appropriate? Matt Deres (talk) 14:58, 23 May 2024 (UTC)


 * You could run a local copy of MediaWiki (the operation of which you are already very familiar), using categories for the classification. It's an issue if you want to produce automated reports (e.g. "list all the text that is in category X"), but a small php script should be able to do that. -- Finlay McWalter··–·Talk 21:44, 23 May 2024 (UTC)
 * I would personally use MediaWiki. It is easy to install and use. But, you are describing a common use-case for NoSQL databases. 75.136.148.8 (talk) 11:19, 24 May 2024 (UTC)
 * Are these pieces of text each in separate files, or in one large file, or are they divided across several files, some of which contain several classifiable items? Almost all approaches require that you already have, or create, a unique identifier for each item you want to classify. Suppose you are done with the job of classifying. Presumably you want to make some use of the fruits of your labour. What kind of searches/queries/other uses do you envisage? The best approaches may depend on the answers. There is a risk of us trying to solve an XY problem. --Lambiam 11:55, 24 May 2024 (UTC)


 * Fair questions. The use case is for organizing folklore snippets in such a way that I can 1) keep them organized, 2) apply different kinds of tags to them (source location, source date, topics, etc.) for ease of grouping them in various ways, and 3) ideally find ways to connect related bits (e.g. this piece and that piece are likely variations on the same theme). Some of the snippets are literally on scraps of paper, others are from printed sources, still others are from online sources (documents, web sites), and some are audio files I'll need to transcribe. My earlier point about formatting being important is because, especially for the transcriptions of the audio stuff, I'd like to be able to show stresses, pauses, emphasized words or phrases, that kind of thing. Nothing crazy (italics and bolding, mostly), but Excel's ability to word process within a cell is extremely rudimentary; it's just not meant for that work. Matt Deres (talk) 17:13, 24 May 2024 (UTC)
 * It appears to me that the lion's share of the effort will be in labeling (with unique identifiers) and archiving the snippets in a way that allows you to retrieve them by their labels. If you scan or transcribe the items, you can store them as files with the labels as file names. The system for associating attribute tags with the item labels can then be purely (vanilla ASCII) text-based, whether an Excel work sheet or a database. TerminusDB, a free document-oriented database, should be eminently suitable for your purpose. While perhaps overkill for the immediate future, investing effort in becoming acquainted with its use may pay off in the end as your collection grows and your investigations become more sophisticated. --Lambiam 07:01, 25 May 2024 (UTC)
 * You might consider Obsidian (software), which supports tags . But see also the various links and lists under "see also" on that page, and the categories. Personal wiki software, note taking software, there's a lot available. Card Zero  (talk) 07:20, 25 May 2024 (UTC)

Org mode has sufficed for me, but maybe you need something fancier for more complicated info. The general approach is Zettelkasten and there is a lot different software for it, none of which I've used. 2601:644:8501:AAF0:0:0:0:1ECE (talk) 05:08, 1 June 2024 (UTC)