Webtags (preserving history): Difference between revisions

m
Line 32: Line 32:
== Encoding ==
== Encoding ==


Let me expain that technical term, essentailly encoding refers to the character set used by any file. A computer uses binary, binary can only be in state 0 or state 1, so a combination of 0 and 1 states needs to be defined for every character you want to represent. What you can include in that character set depends to some extent on how many binary bits are used to be mapped to individual characters; and if more than one byte worth of bits is used the order in which the bits within the multiple bytes are used must be defined for each particualar encoding. With any fixed number of bits available, there will be a limit to how many characters can be defined, and different organisations might select different characters to include. This is what leads to multiple encoding standards. One might use a particular arrangement of bits to represent the degree symbol, while another encoding uses that particular arrangement of bits for a different purpose. This means that when you read a file you might find the letters A to Z where you expect them, but unless you know the encoding used, you don't know what character to display for certain combinations of bits.
Let me expain that technical term, essentailly encoding refers to the character set used by any file. A computer uses binary, binary can only be in state 0 or state 1, so a combination of 0 and 1 states needs to be defined for every character you want to represent. What you can include in that character set depends to some extent on how many binary bits are used to be mapped to individual characters; and if more than one byte worth of bits is used the order in which the bits within the multiple bytes are used must be defined for each particualar encoding. With any fixed number of bits available, there will be a limit to how many characters can be defined, and different organisations might select different characters to include. This is what leads to multiple encoding standards. One might use a particular arrangement of bits to represent the degree symbol, while another encoding uses that particular arrangement of bits for a different purpose. This means that when you read a file you might find the letters A to Z where you expect them, but actually some encodings put capital letters at lower binary values than lower case letters, and some put capitals at higher binary values. The general problem is that unless you know the encoding used, you don't know what character to display for certain combinations of bits.


If you use 7 bits, you have 127 combinations, enough for standard 26 letters in bothe upper case, and lower case, plus 10 digits (0 to 9), some punctuation, and some control characters (like new line, end of file, and so on). If you use 8 bits, a whole byte, you have 254 combinations, and you can start coping with accented letters, with alphabets that don't have 26 letters, and even add some symbols. Obviously, once you start using more than one byte, you can have 16, 32, or more bits to use and can include lots more characters.  
If you use 7 bits, you have 127 combinations, enough for standard 26 letters in both capitals, and lower case, plus 10 digits (0 to 9), some punctuation, and some control characters (like new line, end of file, and so on). If you use 8 bits, a whole byte, you have 254 combinations, and you can start coping with accented letters, with alphabets that don't have 26 letters, and even add some symbols. Obviously, once you start using more than one byte, you can have 16, 32, or more bits to use and can include lots more characters.  
   
   
In April 2014, Steve introduced the choice in Cumulus 1 of either ISO-8859-1 encoding (as he used originally) or UTF-8 encoding (what he migrated his web page templates to) for these reports. This choice remains unchanged in MX. The default selected by Steve Loft is his original ISO-8859-1 encoding, but be aware the encoding you use should match the encoding of any web page used for viewing these reports, and most modern web pages use UTF-8 encoding. The encoding can be selected on the NOAA Settings screen of either Cumulus 1 or MX.
In April 2014, Steve introduced the choice in Cumulus 1 of either ISO-8859-1 encoding (as he used originally) or UTF-8 encoding (what he migrated his web page templates to) for these reports. This choice remains unchanged in MX. The default selected by Steve Loft is his original ISO-8859-1 encoding, but be aware the encoding you use should match the encoding of any web page used for viewing these reports, and most modern web pages (including the standard web templates provided with both flavours of Cumulus) use UTF-8 encoding. The encoding can be selected on the NOAA Settings screen of either Cumulus 1 or MX.


== The format used for naming ==
== The format used for naming ==
5,838

edits