Wikipedia:Reference desk/Archives/Computing/2010 June 21

= June 21 =

How can I get Unicode range of a Language.
Dear Experts,

I want to know the unicode range of following languages.

For example Japaneese scripts are Hiragana and Katakana starts from 0x400 and ends at 0x4ff. Like Japaneese what are the starting and ending of the below languages.

Russian,German, Italian, French, Spanish, Danish, Swedish, Norwegian, Finnish.

Thanks in Advance,

Santhosh4g (talk) 05:36, 21 June 2010 (UTC)\
 * Try here first. If that doesn't solve it let us know. The range is in the first page of the language's PDF. http://www.unicode.org/charts/index.html --mboverload @ 07:39, 21 June 2010 (UTC)

I already checked this page. But I dont know the exact script name of all languages. For example Japaneese scripts are Hiragana and Katakana, also Russian characters are identified in Cyrillic pdfs.

How can I map Russian to Cyrillic ? Also what are the script names of other languages ?

Hope my problem is clear. Santhosh4g (talk) 10:53, 21 June 2010 (UTC)


 * Strictly speaking, Unicode does not try to encode for specific languages. It encodes for character sets - that is, ways of graphically representing text.  These writing schemes are independent of language - that's an idea higher up on the "conceptual ladder" than character-encoding.  Because Unicode codes for all possible kinds of writing systems, you need to keep in mind that this requires a sort of complicated architecture.  Every single unicode code point has a unique numeric value (usually, this "code point" maps to "one single character", but there are exceptions).  Read about terminology for Unicode.  Groups of code points are arranged into Unicode planes, and the Basic Multilingual Plane can be subcategorized into separate language families - see this table and the others below it in our article.  However, there can be significant overlap between languages, (e.g. English, French, and Czech all share many characters, but each also use characters not found in the other two).  Similarly, Unicode understands and categorizes CJK characters - glyphs that are common to Chinese, Japanese, and Korean.  Because of the unique way that these written languages represent their ideas, Unicode attempts to share these character encodings to the maximum extent possible.  So, for basic Japanese writing, you probably will find that your codepoints will span these categories on the basic multilingual plane:


 * Supplemental Punctuation (2E00–2E7F)
 * CJK Radicals Supplement (2E80–2EFF)
 * Kangxi Radicals (2F00–2FDF)
 * Ideographic Description Characters (2FF0–2FFF)
 * CJK Symbols and Punctuation (3000–303F)
 * Hiragana (3040–309F)
 * Katakana (30A0–30FF)
 * Bopomofo (3100–312F)
 * CJK Strokes (31C0–31EF)
 * Katakana Phonetic Extensions (31F0–31FF)
 * Enclosed CJK Letters and Months (3200–32FF)
 * CJK Compatibility (3300–33FF)
 * CJK Unified Ideographs Extension A (3400–4DBF)
 * You may find that Russian can be written entirely in Cyrillic and its supplements:


 * Cyrillic (0400–04FF)
 * Cyrillic Supplement (0500–052F)
 * However, it depends on exactly what you are writing. Unicode allows you to mix and match elements from these different character sets because they are all encoded in a uniform way - so that your text file is represented and rendered universally, independent of the language used to interpret it.  That is the purpose of Unicode - to separate character representation from linguistic interpretation.  In a former era, if you were to open a perfectly formatted Cyrillic text file that you mistakenly thought was English, your computer would decode it in the wrong way and spew gibberish to the screen.   You wouldn't know what the source language was, except that "it wasn't ASCII".  Now, if your computer opens a Unicode file with Russian language text in it, you might not understand the language, but at least it prints properly.  Nimur (talk) 14:20, 21 June 2010 (UTC)

Still my problem is not solved. Im trying to display different language texts with opengl. Inorder to support whole unicode character set, I should create 65536 bitmaps. It will consume huge memory and not practical at all. Now my idea is to create required bitmaps for current system language.

For example, if current system language is Russian I'll create bitmaps from 0x400 to 0x4ff.

I dont know maximum unicode range to support following languages. German Italian French Spanish Danish Swedish Norwegian Finnish

Inorder to support a single language at a time, I have to know the unicode character range of these languages.

Thanks in advance, —Preceding unsigned comment added by Santhosh4g (talk • contribs) 13:48, 22 June 2010 (UTC)

Binary - Difference Between Displaying Text And Numbers
how do computers tell the difference between text and numbers in binary? i know very little about binary. from what i understand "97" is the same as "a". is an extra piece of binary added or what? 68.7.192.98 (talk) 06:42, 21 June 2010 (UTC)legoman
 * You mean like ASCII or Unicode? Don't fully understand the question. --mboverload @ 07:43, 21 June 2010 (UTC)
 * To a computer everything is numbers. Say you have a byte in binary "11001011" - it might mean 203 in decimal, or it might mean -53 in two's complement, or it might mean the letter "Í" in ISO/IEC 8859-1 encoding. You tell the computer what you want to do with it - whether you want it as an unsigned number, a signed number, or a character, but to the computer it's all the same number. --antilivedT 09:24, 21 June 2010 (UTC)


 * Antilived is correct; allow me to expand on the explanation a bit for those who may not follow that explanation.


 * Modern computers store everything as collections of 1s and 0s. A specific collection of 1s and 0s may represent just about anything: numbers, text, pictures, videos, etc.  In order to interpret a given collection, the computer must assume how it is to be interpreted.


 * Your "97", for instance, is represented as 01100001 in an eight-bit 'byte', a common organization on modern computers, and does represent the number 'a' in ASCII, a commonly used set of codes for letters. If the 'a' were to be represented in EBCDIC, it would have a different 8-bit code, and in Unicode it actually has 16 bits.  My point is that, as said above, everything stored by the computer is done in 1s and 0s, and representing letters is a matter of deciding what arrangement of 1s and 0s will represent what letters.  Usually this is, in turn, a matter of using a standard set of such arrangements, mostly so that other programs and other computers will also be able to interpret your output.


 * As to how the computer tells, it is normally told how to interpret any given set of data. For instance, on a Windows system, computer programs rely heavily on file extensions for information on how to interpret data: .TXT is thought of as a file of letter codes in ASCII, .MPG as a file of codes for video, .XLS as an Excel file, etc.


 * If a computer were just given the binary equivalent of 97, then it could be a number of things, and there is no way for the computer (or any engine or person) to say for sure what was meant by it. rc (talk) 11:04, 21 June 2010 (UTC)


 * ... or "01100001" could represent part of a floating-point number or the colour of a pixel in an image or the address of some other bit of data or a processor instruction - the possibilities are endless. Basically, the computer does not know or care what "01100001" represents - it leaves that up to the program that it is running. To do this, the program has to have some way of remembering what type of data is stored in each memory address that it is using - so it uses more memory addresses to store that information too. Different programming languages handle this in different ways. Some langauges insist that the programmer identifies the data type of each program variable before they use it in the program. Other languages work out the data type of a program variable from the way that the programmer uses it. Some languages only allow the programmer to use data types from a fixed set of built-in types; other languages allow the programmer to define their own data types. Some simple languages even side-step the whole issue by not bothering to remember the data type of each variable at all - the danger then is that the programmer can easily produce nonsense by, say, multiplying the letter "a" by the colour blue. Gandalf61 (talk) 11:20, 21 June 2010 (UTC)


 * Most modern "normal" computers have specialized circuitry (digital function units inside the CPU) to deal with the following types of data: whole numbers ("int), including negative numbers; numbers that represent fractions ("float"); numbers that represent locations inside the computer (pointers); and (sometimes) special circuitry for "characters" - numbers that directly map to the most common representation of letters and printable characters. (There are variations, especially with regard to the accuracy and range of these values, depending on the kind of computer you have).  These special interpretations of binary are called "primitive data types" and have special meaning in most programming languages.  They are the most efficient kinds of data to work with, because the concept represented in the programming language directly maps to a single, specific operation with one particular electronic circuit inside the CPU.  Every other type of data, like a picture, video, or music sample, must be represented as some combination of instructions and streams of primitive data.  For example, a character string is often constructed using several characters and a special numeric code, NULL (but this depends on your programming language and how you choose to represent a string).  Similarly, you can construct a PCM audio buffer out of a stream of integers, with a few special numeric tags to describe certain formatting details.  Other kinds of data, like variable-sized compressed data, can be carefully treated "as if" it were a regular numeric value, as long as the programmer is careful with the operations applied to that data.  Some computers have specialized hardware to deal with such "compound" data types - vector computers, digital signal processors, and to some extent, even a standard modern PC will have extensions to its circuitry that know about extremely specific combinations of numbers that represent intermediate steps in common algorithms like video processing.  (You can say the same about hardware representations of triangles, lines, and image textures on a GPU).  In each of these specialized cases, the circuitry needs to be carefully controlled - programmed - so that the correct data is loaded into the correct location.  Otherwise, unusual and unspecified behavior can occur.  In modern computers, the program is also represented as a set of numbers - you can think of these as a bunch of "settings" for switches that direct "the next number" to the correct circuit inside the CPU.  Because modern computers are sort of complicated, there are literally thousands of possible combinations of those switch settings - billions of possibilities, actually (though only around a thousand are used).  For this reason, programming languages were designed to simplify the task of converting conceptual operations into specific machine instructions.  Nimur (talk) 15:28, 21 June 2010 (UTC)


 * Ok, I am now convinced that all the answers above (including mine!) tried to do more than answer the OP's question. I'm going to try to answer that succinctly here.


 * Modern computers store and process binary; whether a given piece of binary data is a number, letter, or something else cannot necessarily be determined by looking at that piece of data. Your "97", for instance, could be 'a', or 97, or something else.  In order for the computer (or computer program) to tell the difference, it must have some other information.


 * A computer program can sometimes look at a whole bunch of data and make a reasonable guess as to its use; a long string of bytes encoded as letters in a specific encoding, for instance, tend to have strings of binary data in specific ranges and not in other ranges. But, in fact, most programs that determine a datum's type automatically are using something else -- a file extension, a code embedded near the front of a data file, etc.


 * I hope this answers your original question, and I *know* it doesn't branch out into other areas nearly as much.


 * 204.152.3.37 (talk) 20:59, 21 June 2010 (UTC)

Subnet calculator
I am looking for an online netmask calculator that translates a generic IP-range to a minimal list of subnets. For example:
 * Input
 * Min to Max address: 10.112.0.0 - 10.123.255.255


 * Output
 * Subnet_1: 10.112.0.0/13
 * Subnet_2: 10.120.0.0/14


 * Proof
 * Subnet_1 = 10.112.0.1 - 10.119.255.255
 * Subnet_2 = 10.120.0.1 - 10.123.255.255

There are tons of online calculators (see example in proof) that do the opposite. Any idea anyone? DVdm (talk) 09:56, 21 June 2010 (UTC)
 * -- zzuuzz (talk) 10:47, 21 June 2010 (UTC)


 * Bingo. A thousand and twenty-four thanks! - DVdm (talk) 11:10, 21 June 2010 (UTC)

Files to Images
Is it possible to encode a non-image file into an image? Say for example a small mp3 converted to png. Obviously the image would look like gibberish machine code or randomness. Is this possible? What programs can do this? 82.43.90.93 (talk) 11:29, 21 June 2010 (UTC)


 * You can rename the file from test.mp3 to test.png. When you open it with a viewer, it might produce a black screen, or gibberish, or a program or system crash, all depending on your viewer software and operating system. DVdm (talk) 11:46, 21 June 2010 (UTC)


 * Yes. Assume you take the mp3 file byte by byte.  Each byte has a value from 0 to 256.  If you take those three at a time, you have red, green, and blue values for a pixel.  You can write a program to color in the pixels of an image using those values.  Now, the problem is getting the mp3 back.  You cannot use a lossy format for the image because you will lose important data for the mp3.  That is why encoded secret messages inside of images isn't such a trivial task.  You cannot have the lossy format messing with the message. -- k a i n a w &trade; 12:20, 21 June 2010 (UTC)


 * Container format might help. Both PNG and MP3 use a file format that contains a fixed-format header followed by data.  It sounds like you want to take the data payload out of an MP3 and put it in place of the data-payload of the PNG.  (If you just try to use an MP3 as a PNG, you will be missing the necessary metadata, so whatever program you use to interpret or display your PNG will probably simply refuse to do anything).  You might benefit from using a library that already knows about these file formats so that you end up with a "properly formed" PNG (even it its data payload is gibberish).  Also note that because the PNG and MP3 formats are both compressed, it is not safe to say that "any combination of bytes" will represent a valid PNG (i.e. it might not even be possible to display gibberish - certain combinations of bytes will be "improperly formed" and can not be mapped back into an image of any kind).  Compare this to uncompressed formats, like WAV files or BMP files.  Any combination of bytes you stuff into the data payload of these uncompressed formats will display as something (though depending on your BMP reader, it may hiccup of die if your payload size doesn't match the metadata expected size).  Nimur (talk) 15:41, 21 June 2010 (UTC)


 * As an aside - if you're just doing this for the sake of hobbyist exploration, you might find GNU Octave a helpful tool. In exactly three lines of code, you will be able to read an arbitrary file of any type; reshape it to dimensions of your preference ("height x width"), and display it as an image buffer.  Consider the Wikipedia logo: (you can put a URL in the fopen command, or save it to disk)


 * In the first line, I open, read, and store the Wikipedia logo to an array, "x". In the second line, I reshape that file into a 119 x 131 matrix (by luck, the file happens to be 15589 bytes, which can be factored into a square-ish matrix).  Then, I display it to screen with the "image scale" command.  If you find an MP3 or OGG file, you can open it in a similar way, and display its raw data as an image buffer.  I just tested this technique on this audio of a red panda - hint, the file can be displayed as a 767 x 409 element buffer.  It looks like noise (as we would expect).  (Note that by using this method, we're treating the raw file, including its header, as part of a continuous uncompressed bitmap).  Nimur (talk) 15:53, 21 June 2010 (UTC)

What's a good tool for Knowledge Engineering?
I'm looking for a readily available tool that can be used for modeling an expert system. Perhaps something like PCPACK, but something I can get quickly instead of without having to wait for some time. Any ideas? 122.255.2.108 (talk) 12:22, 21 June 2010 (UTC)

Key code of win-key on azerty keyboard
I am installing Ubuntu and want to have the normal functionality of the win-key. Here I found a solution, with the following code:
 * #!/bin/bash
 * xmodmap -e 'keycode 115=Menu'

But it turns out that 115 and 116 are the codes for the right- and end-keys. Does anyone know the code for the win-key on an azerty keyboard or how I can find that out? Also, the script changes those keys into the functionality of the context menu key (next to the win-key). So should I change 'menu' into something else? DirkvdM (talk) 12:56, 21 June 2010 (UTC)


 * Switch to a virtual console that doesn't have an X server on it (e.g. ctrl-alt-f6), login, and run showkey. That displays the raw keycodes for each key you press. On my uk-qwerty keyboard, the left and right windows keys are 125 and 126 respectively. -- Finlay McWalter • Talk 13:37, 21 June 2010 (UTC)


 * But (at least for Ubuntu's normal GNOME desktop environment) this works (the thing you linked to seemed like only half the solution). Note that he's using xev instead of showkey, which for me is returning different results (I think other X keyboard remappings are changing things); for your use the result reported by <tt>xev</tt> will be the one to use. -- Finlay McWalter • Talk 13:43, 21 June 2010 (UTC)


 * Also note that you can put the 'keycode blah=bar' straight to ~/.xmodmap and X will load it at login. --194.197.235.240 (talk) 15:43, 21 June 2010 (UTC)


 * Ah, ok, I have to run xev, keep the little 'event tester' window active and then press the win-key to see info about that in the console. Took me some time to figure that out. :) Turns out it's 133 for Super_L and 134 for Super_R (the left and right win-keys).
 * For the rest, gconf is for Gnome (I use KDE) and I don't have a file ~/.xmodmap. So I just put the number 134 in the script I already wrote, logged out and back in and now the win-key behaves like the context menu key, as I feared. So I changed 'menu' to 'panel_main_menu', but that didn't work. Nor did 'panel_menu'. Or Panel_Main_Menu' (upper cases). DirkvdM (talk) 17:50, 21 June 2010 (UTC)

Ubuntu/Vista Dual-Boot Broken - Reason
Recently I had a problem, in which Vista wouldn't start up on my dual-boot system here. After going through a (fairly) lengthy process of trying to fix it, I ended up reinstalling Windows. The problem I was encountering was that after the Microsoft loading bar had been on for a short while, I would get the blue screen of death and an auto-reboot. Stopping the auto-reboot managed to get me to the error messages on the blue screen, but they were not very informative. Anyway, that's all in the past now. However, I've been wondering how I could avoid such a problem occurring again and going over my actions prior to the problem. The only thing that I remember doing was seeing if I could run a program from my Windows partition whilst in Ubuntu using Wine. This, of course, failed (because the program was a game and not one supported by Wine). It was shortly after this that I had the problem. Could anyone confirm that this might have been the actual cause of the problem, or am I still in danger of this just happening again? -- <font face="Freestyle Script" color="blue">KägeTorä - (影虎) (TALK) 14:42, 21 June 2010 (UTC)


 * Running programs already installed on a Windows partition is usually a bad idea since there will probably be .dll files and registry entries missing and things like that since you didn't install it in Wine. Depending on how you mounted the Windows partition it could inadvertently modify files needed by Windows, but unless you know what broke Windows there isn't much more to say than don't use wine to run Windows installs of programs unless you have to and know it works. --antilivedT 02:34, 22 June 2010 (UTC)

Why do the XHTML specifications "forbid" serving XML syntax as text/html?
I prefer writing markup for the XHTML serialization (i.e. all lower case, expanded attributes, open/closing tags, self closing tags etc) but the HTML specifications actually say it is incorrect to serve this format as text/html. Now, obviously browsers don't actually care if you do but why is this "restriction" in place anyway? 193.122.22.247 (talk) 16:21, 21 June 2010 (UTC)
 * Oh, browsers do care. Serve XHTML as application/xhtml+xml, and watch IE getting confused, and Mozilla/Safari switching to a much stricter DOM model (createElement isn't good enough anymore, and (at least older versions of) Safari would even forget about 'document.title'. On the plus side, serving as xhtml will put browsers in a much stricter parsing mode and they will refuse to process non-wellformed XML documents.
 * As http://www.w3.org/TR/xhtml-media-types/ explains, you should use the XHTML mimteype unless you structure your document to be usable by both HTML and XHTML parsers (ie, use the short syntax for tags like, but the long syntax for tags like ). Whether XHTML actually servers any practical use, is probably still a matter of debate. Unilynx (talk) 21:13, 21 June 2010 (UTC)
 * I guess my real point is why is XML syntax for HTML considered "invalid" HTML unless it's served as XML? Why can't both forms be valid (i.e. &lt;br&gt; vs &lt;br /&gt;, or vs ). Obviously since IE won't render XHTML correctly with the XML mimetype it's basically pointless for the majority of pages to be written in XHTML syntax, but there are benefits to writing in this way (especially when dealing with things like the &lt;p&gt; element which have optional closing tags. Since the browsers seem not to care anyway I guess it doesn't matter that much...yet. 193.122.22.247 (talk) 12:51, 22 June 2010 (UTC)
 * XML syntax in HTML is invalid. The standards are not compatible. For example, <tt>&lt;br/></tt> is a line break in XHTML but it's a line break followed by a greater-than sign in HTML. No major browser will print that greater-than sign, but that's because they deliberately violate the HTML standard for compatibility reasons. See this crazy document, and the explanation. -- BenRG (talk) 03:45, 23 June 2010 (UTC)

Editing an MPG file
I took a video with my Sony digital camera lasting about 1 minute. When I was finished I forgot to stop recording and slipped the camera into my pocket. As a result, the recording continued for another 10 minutes, stopping only when the memory stick was full. I would like to keep just the first minute of the video and discard the rest. Is there any software that lets me do this? Hemoroid Agastordoff (talk) 16:38, 21 June 2010 (UTC)


 * Try virtualdub -- Finlay McWalter • Talk 22:05, 21 June 2010 (UTC)


 * TMPGEnc has a nice set of tools for cutting and joining MPEGs without recompression. -- BenRG (talk) 03:46, 23 June 2010 (UTC)