Wikipedia:Reference desk/Archives/Computing/2016 February 21

= February 21 =

Name generators
Are there any reliable secondary sources out there that discuss name generators (such as http://FakeNameGenerator.com) in any kind of detail? Thanks. —67.14.236.50 (talk) 07:59, 21 February 2016 (UTC)


 * Markov text generators might get you started. -- Finlay McWalterᚠTalk 13:08, 21 February 2016 (UTC)


 * For people's names, which tend not to be very random, your best bet is probably just to pick names from the set of most popular first and second names for whatever region you're coming from - and for the age of the imaginary person. The US Social Security people have lists of the most popular first names over the last 100 years and has the most common surnames for every decade...and there are many other sources like it.  Markov chains are better for making up long strings of nonsense words that kinda sound real - or stringing together real words into nonsense sentences that kinda make sense.  But names that are made up of nonsense don't usually sound real - so picking off of a list is easier and more convincing.  I'd be very surprised if FakeNameGenerator.com did anything more than pair first and second names picked at random from a census or something. SteveBaker (talk) 21:50, 23 February 2016 (UTC)

What naming scheme(s) does Wikipedia use? Is it structured, flat (unstructured), or, attribute based?
I'm learning distributed systems and one of the topic is naming. So I'm wondering what naming scheme(s) does Wikipedia use? Is it structured, flat (unstructured), or, attribute based? Do other web services also use the same naming scheme? I can't seem to find information on structured, flat, or, attribute-based naming scheme on Wikipedia. Could you please point me into the direction of finding the information? Thank you so much. — Preceding unsigned comment added by 116.87.54.73 (talk) 09:44, 21 February 2016 (UTC)


 * There is no "scheme". The relevant policy is at WP:TITLE. -- Roger (Dodger67) (talk) 11:00, 21 February 2016 (UTC)
 * I think he’s asking about technical behind-the-scenes details, not about how we choose the human-friendly article titles. I’m not sure if the question is even applicable to Wikipedia (I don’t know much about distributed systems), but I know our revisions are identified by a simple incrementing number, currently in the hundreds of millions. So, I guess that would make it flat. But for naming the articles (for public use; I don’t know if they have a different identifier behind the scenes), we have namespaces like “WIkipedia:”, “Help:”, “Template:”, and a “talk” namespace for each of them. We have e.g. an article named Wikipedia and a project page named Wikipedia. I’m not sure if that makes it structured. Hope that helps, though I don’t know if I gave you any new information. —67.14.236.50 (talk) 03:50, 22 February 2016 (UTC)
 * In the early days, Wikipedia articles had names like "Namibia/Economy" and "Namibia/History" and the software understood that those were child articles of "Namibia" (which is structured naming), but now those articles are called "Economy of Namibia" and "History of Namibia" and there is no well defined hierarchy (which is flat naming). If you include pages in other namespaces like Talk: and Wikipedia:, the hierarchical names are still used (on this page, for example), and the namespace can also be seen as the top level of the hierarchy.
 * I've never heard the terms "flat naming" and "structured naming" outside of university courses, but that may be because I don't work in distributed systems. Wikipedia should probably cover them regardless. Someone just needs to write the article. -- BenRG (talk) 03:45, 22 February 2016 (UTC)
 * Now the names to refer to article sections would be "Namibia#Economy" and "Namibia#History". StuRat (talk) 04:01, 22 February 2016 (UTC)
 * Wait a minute. What is being named in distributed systems? If the question pertains to the files on the project’s web servers or the way they’re networked, then none of us are qualified to answer it unless we’re familiar with Wikimedia’s backend, which has nothing to do with how articles are named. —67.14.236.50 (talk) 04:03, 22 February 2016 (UTC)


 * Thank you all for your replies. I'm still trying to understand how naming works in distributed systems, so forgive me if my questions are not appropriate. I guess my next question would be why Wikipedia uses colon in its URL, example would be https://en.wikipedia.org/wiki/Wikipedia:Reference_desk. I understand this is to divide into different categories but then why not just use slash (/) instead like https://en.wikipedia.org/wiki/Wikipedia/Reference_desk or https://en.wikipedia.org/wiki/Wikipedia/Talk_page_guidelines. Thanks.  — Preceding unsigned comment added by 203.217.187.24 (talk) 12:30, 23 February 2016 (UTC)
 * Colon is for Namespace. Many of the namespaces have special software features associated with them. Slash is for Subpages. PrimeHunter (talk) 12:43, 23 February 2016 (UTC)


 * Wikipedia uses MediaWiki, a server software program mostly written in PHP. Most of the content is stored in a relational database using SQL; some data is directly backed by files.  These architectures implement the mapping between user-visible named content - like article pages - and their backing-storage.  More specifically, titled content uses a Title object that can be used to retrieve a valid URL; it is this code that converts a wiki page title into the naming system that can be understood by a web server that hosts MediaWiki.  The web server may respond to a URL retrieval request by forwarding it to MediaWiki (and then traversing backing storage to get the content, or running some special feature implemented in MediaWiki)... or the web server may do "something else."  You can review all of the things on a wiki that have "titles" - not every such entity is a content page.  Consider, for example, special pages.
 * The key to understanding all of this is to recognize that there are multiple complex subsystems - a web server, a PHP program, a database, a file system... and more. Each subsystem has its own convention and/or standardized naming system.  The user-interface hides these details and presents you with one single name-convention; but under the proverbial hood, there are many different types of naming-systems.
 * Nimur (talk) 16:11, 23 February 2016 (UTC)

.MACOSX directory in zip file
I got some zipped lectures, and there were a __MACOSX directory with files like ._.DS_Store, and ._ .mp4 inside. What is this?--Llaanngg (talk) 13:57, 21 February 2016 (UTC)
 * It means the zip file was made on a Macintosh computer. You can safely ignore them. See superuser.com and Resource fork and .DS_Store for more info. The Quixotic Potato (talk) 14:06, 21 February 2016 (UTC)

Regexp's literal strings
Following regexp works for me:

rename 's/(Chapt).(.*)/$2.$1er/' *

but in emacs, when I try to extract a pattern, and include the found group into the target string, I have to do:

\(Chapt\).\(.*$\) -> \2.\1er

and not:

(Chapt).(.*$) -> \2.\1er

Why is it different? And why isn't \(\) in emacs reserved for the case where you actually want to reuse the found pattern? --Scicurious (talk) 23:10, 21 February 2016 (UTC)
 * Emacs uses an older syntax for regular expressions (a combination of the POSIX basic and extended syntaxes) than Perl (the modern standard). The biggest change (for the common subset of functionality) is that you must escape {} exactly when you must not for Perl.  No regular expression system uses \(\) to turn on (or off) capturing: you turn off capturing (where it's supported) with (?:&hellip;).  In Emacs, you escape those parentheses with backslashes (just like for a capturing group).  --Tardis (talk) 01:09, 22 February 2016 (UTC)


 * Calling Perl regexes "the modern standard" is a bit misleading. For one, they aren't standardized. Neither is PCRE, which means things that claim to support "PCREs" often have subtle incompatibilities with each other. It's a bit of a mess (insert xkcd comic about standards here). Yes, most of the time if you stick to the simpler PCRE features things will be portable, but don't assume it. Look at the documentation for whatever you're using. --71.119.131.184 (talk) 05:17, 22 February 2016 (UTC)
 * Fair enough&mdash;I should have said merely that Perl's choices about the basic features are currently preferred. For example, Python used to follow Emacs' convention, then switched to Perl's.  --Tardis (talk) 02:43, 24 February 2016 (UTC)


 * (EC)What you have encountered are some of the differences between the many flavors of regular expressions. As regular expression implementations have added features over the years, there have also been corresponding variations in the supported syntax, some of which are not compatible with one another. One early distinction was between the UNIX Basic Regular Expression (BRE) and Extended Regular Expression (ERE) syntax. (See Regular expression - POSIX basic and extended.) Emacs implements BREs (or some variant thereof), where only a few characters have special meaning when not escaped.
 * For more information and a comparison of syntax, I suggest you look at . Set the drop-down flavor selectors to "POSIX BRE" "POSIX ERE" as a start, or select any other flavors that my be of interest. Syntax feature categories (Characters, basic features, quantifiers, ...) can be selected at the left. -- Tom N  talk/contrib 01:49, 22 February 2016 (UTC)


 * This page is also a good reference on regex syntax. Also note that "rename" is not a standardized Unix command, although a lot of people think it is. There are multiple programs called "rename" floating around out there, with incompatible syntax. See here for some more portable ways to rename files. --71.119.131.184 (talk) 05:17, 22 February 2016 (UTC)