ASCII and EBCDIC

 ASCII and EBCDICWhat is a “Character Set?”

A character set is a collection or library of characters, (letters and symbols), and their identifying number.  Included with the printable characters, (letters and punctuation) are some unprintable yet important characters.  Characters are used to form messages.

Characters are not fonts.  Characters exist under the font that represent the definition of the character the  font is attempting to display.  When you change the font on a document the A is changed to an A, but the underlying character that identifies its meaning remains the same.  The font identifies how the character is displayed.  You can even convert to Wing Dings and the underlying character remains the same.

We can imagine that if I wrote this post using a character set that I created myself.  And then you came and tried you read it, without knowing what my character set was, you would see a bunch of garbage on the screen like if you go to a foreign language web page without the correct fonts loaded.  Even worse would be if you used the same characters, but had different Identifiers for them.  If an A is 001 (my set starts at A and moves on numerically) and you try to read it, (but in your character set you numbered the vowels after the consonants) and 001 to you is B.  Now all of the letters will be wrong.  And we get garbage.

Fortunately, some people got together early on and created a standard for characters.  The American Standard Code for Information Interchange created the character set we call ASCII.  The Extended Binary Coded Decimal Information Code was created by IBM, but they use ASCII now as well.

What is ASCII

ASCII is the acronym for American Standard Code for Information Interchange, and is a collection of characters defined from 0 to 127.  These definitions represent all of the standard English characters, numbers and symbols.  A number of other, unprintable, characters are also included.  You use one or two of these each time you hit the “Enter” key on your keyboard.  Depending on your operating systems, this sends the “carriage return” and or “line feed”

A “carriage return” comes from a printer where the head would move back and forth on the roller.  CR would tell the printer to move the head all the way to the left of its printing area.  A “line feed” is also from a printer perspective.  This tells the printer to roll the paper so that the head will be writing on the next line.  These are both examples of unprintable characters.  You can probably think of others.  For a complete list of ASCII characters, you can check out this table in my toolbox.

What is EBCDIC

EBCDIC is the acronym for Extended Binary Coded Decimal Information Code.  This was created by IBM back in the day.  IBM now uses ASCII just like everyone else, but there are legacies that are still with us.  Old terminals like VT100 and some legacy communications equipment still expect messages using the EBCDIC character set.

What is the big deal

As I said, some systems still want to use some of the characters in the EBCDIC system.  Even fancy new systems producing XML will sometimes fall into this trap and cause problems.  The one that I have run into is the use of | called ‘pipe’  ASCII and EBCDIC use different character IDs for this character.  And I have seen e-commerce systems, that are using ASCII for everything else, throw in an EBCDIC pipe as a control character.  When this happens, other systems will choke on it.

When you find yourself getting an invalid character message, but the characters look fine.  Remember that there are some twists that may exist in the underlying character set.  If you can, manually replace the character with the character that it looks like, (in my case the EBCDIC | with a ASCII | )  and see if the parser likes the file now.  If it does, you have encountered the character set problem as I have.  This can be a difficult problem to solve if you have never encountered it.

If the character that is causing problems is not a pipe, you may want to look at IBM’s ASCII to EBCDIC conversion table.  This can be difficult to communicate with others that have never encountered it, so using the ACSII and EBCDIC identifier designation can help explain what we are saying in email and documentation when we are trying to correct the issue.

Subscribe to "The Integration Engineer" by Email
Find out about the tools and services available at The Integration Engineer's Consulting site.

Related Articles:

2 Responses to “ASCII and EBCDIC”

  1. Gary Lee Says:

    A few notes. The VT-100 was an ASCII terminal by Digital Equipment Corporation, who were never known for embracing or using EBCDIC at all (http://en.wikipedia.org/wiki/VT100). That brings up the correct acronym for Extended Binary Coded Decimal Information Code, which is EBCDIC. Don’t believe me, check out http://publib.boulder.ibm.com/infocenter/zos/basics/index.jsp?topic=/com.ibm.zos.zappldev/zappldev_14.htm, one of a few thousand references you could ahve checked. And the entire z-Series mainframe line still uses EBCDIC, as do selected other machines, although even the large mainframes have also supported ASCII if it was required for the past forty years or so.

    Next time try doing some research before you write an article under your self-aggrandizing banner subtitle, “When it just has to work.” Accuracy will enhance the image the rest of us have of you.

  2. Roy Says:

    Gary,

    Didn’t mean to touch a nerve. When I said that, ” some systems still want to use some of the characters in the EPSIDIC system” it looks like I am saying these are only old or legacy systems. You are absolutely correct in that there are new systems that are using EBCDIC . What I was hoping to convey is that this is still a big deal issue. Especially when integrating systems that are using different character sets natively. If you have never see this, which is common for people new to EDI and data integration, it can be really confusing.

    As for the VT-100. I used to use that terminal to interface our systems all of the time. I hope that we can both agree that it is a old interface anc can be an example of a legacy system or tool. (And it does use EBCDIC but not exclusively, it depends on the system it is installed on.

    And wow, I can’t believe that I had that acronym wrong. Especially when it was spelled out in the text with the correct words. I have corrected it in the above text. Guess that happens some times when you are your own editor. I hope the world can forgive the many typos that I make as there will be many many more.

    And finally Gary, let me thank you for taking the time to read my blog and comment on my post. I know you will love the one in my archive where I misspell disaster. Its kind of epic fail.

Leave a Reply

  • Sign up for our FREE Newsletter

  • Catagories


  • Affiliate Ads

Powered by WP Robot