Webtags (preserving history): Difference between revisions

Webtags (preserving history) (view source)

Revision as of 14:02, 1 June 2020

525 bytes added , 14:02, 1 June 2020

m

→‎Encoding

Sfws

5,838

edits

@@ Line 32: / Line 32: @@
 == Encoding ==
-Let me expain that technical term, essentailly encoding refers to the character set used by any file. A computer uses binary, binary can only be in state 0 or state 1, so a combination of 0 and 1 states needs to be defined for every character you want to represent. What you can include in that character set depends to some extent on how many binary bits are used to be mapped to individual characters; and if more than one byte worth of bits is used the order in which the bits within the multiple bytes are used must be defined for each particualar encoding. With any fixed number of bits available, there will be a limit to how many characters can be defined, and different organisations might select different characters to include. This is what leads to multiple encoding standards. One might use a particular arrangement of bits to represent the degree symbol, while another encoding uses that particular arrangement of bits for a different purpose. This means that when you read a file you might find the letters A to Z where you expect them, but actually some encodings put capital letters at lower binary values than lower case letters, and some put capitals at higher binary values. The general problem is that unless you know the encoding used, you don't know what character to display for certain combinations of bits.
+You can skip this section, but if you have problems with a web page not displaying the '''&deg;''' symbol correctly, it will be because the encoding declared in  your web page does not match the encoding you have selected for Cumulus to use when generating this report. Put simply, most modern web pages use "utf-8" encoding, but for historical reasons Cumulus defaults to producing files in ISO-8859-1 encoding. This causes the mismatch. With that introduction, you can now choose whether to read the rest of this section.
+Let me explain that technical term, essentially encoding refers to the character set used by any file. A computer uses binary, binary can only be in state 0 or state 1, so a combination of 0 and 1 states needs to be defined for every character you want to represent. What you can include in that character set depends to some extent on how many binary bits are used to be mapped to individual characters; and if more than one byte worth of bits is used the order in which the bits within the multiple bytes are used must be defined for each particular encoding. With any fixed number of bits available, there will be a limit to how many characters can be defined, and different organisations might select different characters to include. This is what leads to multiple encoding standards. One might use a particular arrangement of bits to represent the degree symbol, while another encoding uses that particular arrangement of bits for a different purpose. This means that when you read a file you might find the letters A to Z where you expect them, but actually some encodings put capital letters at lower binary values than lower case letters, and some put capitals at higher binary values. The general problem is that unless you know the encoding used, you don't know what character to display for certain combinations of bits.
 If you use 7 bits, you have 127 combinations, enough for standard 26 letters in both capitals, and lower case, plus 10 digits (0 to 9), some punctuation, and some control characters (like new line, end of file, and so on). If you use 8 bits, a whole byte, you have 254 combinations, and you can start coping with accented letters, with alphabets that don't have 26 letters, and even add some symbols. Obviously, once you start using more than one byte, you can have 16, 32, or more bits to use and can include lots more characters.