<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Integration Engineer &#187; File</title>
	<atom:link href="http://www.theintegrationengineer.com/category/data/file/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.theintegrationengineer.com</link>
	<description>When it just has to work.</description>
	<lastBuildDate>Tue, 27 Jul 2010 17:33:47 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=abc</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>What&#8217;s the DIFF?</title>
		<link>http://www.theintegrationengineer.com/whats-the-diff/</link>
		<comments>http://www.theintegrationengineer.com/whats-the-diff/#comments</comments>
		<pubDate>Wed, 18 Nov 2009 14:50:18 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[File]]></category>
		<category><![CDATA[change]]></category>
		<category><![CDATA[compare]]></category>
		<category><![CDATA[diff]]></category>
		<category><![CDATA[difference]]></category>
		<category><![CDATA[directory]]></category>
		<category><![CDATA[tool]]></category>

		<guid isPermaLink="false">http://www.theintegrationengineer.com/?p=517</guid>
		<description><![CDATA[
One of the basic tasks Integration Engineers do is to compare files that we use or receive.  There are some interesting and useful tools that people can get out there to DIFF files.  But on Linux and Unix machines around the world there is a native tool that is almost always present.  Amazingly it is [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-518" title="apple-and-orange_pzl" src="http://www.theintegrationengineer.com/wp-content/uploads/2009/08/apple-and-orange_pzl.jpg" alt="apple and orange pzl Whats the DIFF?" width="191" height="159" /></p>
<p>One of the basic tasks Integration Engineers do is to compare files that we use or receive.  There are some interesting and useful tools that people can get out there to DIFF files.  But on Linux and Unix machines around the world there is a native tool that is almost always present.  Amazingly it is called DIFF.</p>
<p>Like some other command-line tools, its interface is not really intuitive.  Lets walk through the basics of how to get use out of this handy file comparing tool.  (If you are working with and comparing EDI files, you might want to look at the <a href="http://www.theintegrationengineer.com/edi-wrapped-and-unwrapped/">post on how to unwrap</a> your EDI file so that our line by line comparison is more meaningful.)</p>
<p><span id="more-517"></span></p>
<p><strong>How to used &#8220;diff&#8221;</strong></p>
<p>You can get the real basics by executing &#8220;diff &#8211;help&#8221; to get the basic help and options for this application.  But in short, here is the thumb nail. &#8220;diff&#8221; is followed by some options.  Options are designated by a &#8220;-&#8221; and then a letter indicating the option.  Any options are then followed by the two file names that are being compared.  Lets look at an example.</p>
<p><span style="text-decoration: underline;"><em>Example:</em></span></p>
<p>We have two files, file1.txt and file2.txt</p>
<table style="height: 151px;" border="1" cellspacing="5" cellpadding="5" width="477">
<tbody>
<tr style="text-align: center;">
<th>File1.txt</th>
<th>File2.txt</th>
</tr>
<tr>
<td width="50%">This is a test file:<br />
And this is the first line of the first file.<br />
Thanks.</td>
<td>This is a test file:<br />
And this is the first line of the second file.<br />
Thanks.<br />
Again.</td>
</tr>
</tbody>
</table>
<p>When we issue this command:  &#8220;diff file1.txt file2.txt&#8221; we get this result.</p>
<p style="padding-left: 60px;">2c2<br />
&lt; And this is the first line of the first file.<br />
&#8212;<br />
&gt; And this is the first line of the second file.<br />
3a4<br />
&gt; Again.</p>
<ul>
<li>The first thing we see is &#8220;2c2&#8243;  This is line 2 of the first file, compared or changed to line 2 of second file.</li>
<li>Next we have a &lt; indicating the first file, and the line echoed.</li>
<li>Following this we have a &#8220;&#8212;&#8221; as a separator between the line compared.</li>
<li>Next we have &gt; indicating the second file, and then that line is echoed.</li>
<li>This is a comparison between to lines that were found to be different.</li>
<li>For the next line that is shown, we have &#8220;3a4&#8243; that indicates that there is a line added to the file.</li>
<li>Finally, &gt; indicates the second file followed by the line being echoed.</li>
</ul>
<p>If we were to compare them in the other order, we end with these two lines:</p>
<p style="padding-left: 60px;">4d3<br />
&lt; Again.</p>
<ul>
<li>Here, &#8220;4d3&#8243; that the 4th line on the first file is deleted from the second file.</li>
<li>Following this is &lt; indicating the first file, and echoing the line.</li>
</ul>
<p><strong>Regular Options</strong></p>
<p>Here are the list of options that &#8220;&#8211;help&#8221; gives you, with maybe some more explanation.</p>
<p><em>diff [-b] [-i] [-t] [-w] [-c] [-C] [-e] [-f] [-h] [-n] [-D string] [-l] [-r] [-s] [-S name] [fileone filetwo ] [directoryone directorytwo]</em></p>
<table class="mtable" style="width: 100%;" border="0" cellspacing="1" cellpadding="5">
<tbody>
<tr class="tcw">
<td style="width: 120px;" valign="top">-b</td>
<td valign="top">Ignores spacing differences.  This is useful when white-space doesn&#8217;t matter in what you are comparing.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">-i</td>
<td valign="top">Ignores case.  This is useful when case doesn&#8217;t matter in what you are comparing.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">-t</td>
<td valign="top">Expands TAB characters in output lines. Normal or -c output adds character(s) to the front of each  line that may adversely affect the indentation of the original source lines and make the output lines difficult to interpret. This option will preserve the original source&#8217;s indentation.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">-w</td>
<td valign="top">Ignores spaces and tabs.  Again, for when we don&#8217;t want to include changes in the white-space.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">-c</td>
<td valign="top">Produces a listing of differences with three lines of context. With this option output format is modified slightly: output begins with identification of the files involved and their creation dates, then each change is separated by a line with a dozen *&#8217;s. The lines removed from file1 are marked with &#8216;-&#8217;; those added to file2 are marked &#8216;+&#8217;. Lines that are changed from one file to the other are marked in both files with &#8216;!&#8217;.</p>
<p>With our two files we get this output:</p>
<p>*** file1.txt    2009-11-17 10:20:38.000000000 -0700<br />
&#8212; file2.txt    2009-11-17 10:20:51.000000000 -0700<br />
***************<br />
*** 1,3 ****<br />
This is a test file:<br />
! And this is the first line of the first file.<br />
Thanks.<br />
&#8212; 1,4 &#8212;-<br />
This is a test file:<br />
! And this is the first line of the second file.<br />
Thanks.<br />
+ Again.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">-C</td>
<td valign="top">Produces a listing of differences identical to that produced by -c with number lines of context.</p>
<p>There is no difference to just -c with our examples if you supply a number.  i.e diff -c 1 file1.txt file1.txt</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">-e</td>
<td valign="top">Output an ed script.  I have looked at these, but really haven&#8217;t used this feature for anything real.  I may later if I have time.</p>
<p>With our files it looks like this:</p>
<p>3a<br />
Again.<br />
.<br />
2c<br />
And this is the first line of the second file.<br />
.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">-f</td>
<td valign="top">Produces a similar script, not useful with ed , in the opposite order.  (Really, this is exactly like -e except in reverse order.)</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">-h</td>
<td valign="top">Does a fast, half-hearted job. It works only when changed stretches are short and well separated, but does work on files of unlimited length.  Options -c, -e, -f, and -n are unavailable with -h. diff does not descend into directories with this option.</p>
<p>With our example files it produces the same output as with no options.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">-n</td>
<td valign="top">Produces a script similar to -e, but in the opposite order and with a count of changed  lines on each insert or delete command.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">-D string</td>
<td valign="top">Creates a merged version of file1 and file2 with C preprocessor controls included so that a compilation of the result without defining string is equivalent to compiling file1, while defining string will yield file2.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">-l</td>
<td valign="top">Produce output in long format. Before the diff, each text file is piped through &#8216;pr&#8217; to paginate it. Other differences are remembered and summarized after all text file differences are reported.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">-r</td>
<td valign="top">Applies diff recursively to common subdirectories encountered.  Just like you would expect if you have ever used this with any other command line tools like grep or rm.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">-s</td>
<td valign="top">Reports files that are the identical; these would not otherwise be mentioned.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">-S name</td>
<td valign="top">Starts a directory diff in the middle, beginning with the file name.  Basically this is a compare directory after a supplied file name.  Make sure this file exists in both directories or you will be disappointed.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">filenameone</td>
<td valign="top">File one for comparing.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">filenametwo</td>
<td valign="top">File two for comparing.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">directoryone</td>
<td valign="top">Directory one for comparing.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">directorytwo</td>
<td valign="top">Directory two for comparing.</td>
</tr>
</tbody>
</table>
<p>For comparing file, the first four options (-b, -i, -t, -w) are the most useful.  I don&#8217;t start with any options and add them as I need them to reduce the amount of change noise reported in the result set.</p>
<p><strong>diff is your friend</strong></p>
<p>Like many basic tools, &#8220;diff&#8221; is almost always there.  And if you know how to use it effectively, it can really save time and frustration.  Sure there are other cool file comparison tools.  Some are even embedded into other products.  But knowing how to use the basic tools that are always there will be a life saver in a crisis situation.  And the only way to know how to use them is to actually use them sometimes.</p>
<p>Do you use &#8220;diff&#8221; with a set of options that does a specific task for you?  If so, what are they, please share.  And what other basic tools do you use?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.theintegrationengineer.com/whats-the-diff/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>5 Tools of an Integration Engineer</title>
		<link>http://www.theintegrationengineer.com/5-tools-of-an-integration-engineer/</link>
		<comments>http://www.theintegrationengineer.com/5-tools-of-an-integration-engineer/#comments</comments>
		<pubDate>Sat, 13 Jun 2009 17:30:32 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[Database]]></category>
		<category><![CDATA[File]]></category>
		<category><![CDATA[Calc]]></category>
		<category><![CDATA[editor]]></category>
		<category><![CDATA[Excel]]></category>
		<category><![CDATA[notepad]]></category>
		<category><![CDATA[Open Office]]></category>
		<category><![CDATA[spread sheet]]></category>
		<category><![CDATA[spreadsheet]]></category>
		<category><![CDATA[SQL Worksheet]]></category>
		<category><![CDATA[squirrel]]></category>
		<category><![CDATA[techrepublic]]></category>
		<category><![CDATA[text]]></category>
		<category><![CDATA[text pad]]></category>
		<category><![CDATA[textpad]]></category>
		<category><![CDATA[TOAD]]></category>
		<category><![CDATA[tools]]></category>
		<category><![CDATA[ultra Edit]]></category>
		<category><![CDATA[ultraedit]]></category>
		<category><![CDATA[vi]]></category>
		<category><![CDATA[white board]]></category>
		<category><![CDATA[whiteboard]]></category>

		<guid isPermaLink="false">http://www.theintegrationengineer.com/?p=115</guid>
		<description><![CDATA[There are job or task specific tools that will have a high importance to each integration task.  When working on an SAP system, your SAP tools will be very important.  But there are tools and skills that are also important regardless of the systems and technologies that you are working on.  For me, these are [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-144" title="tool_pile_puzzlepiece1" src="http://www.theintegrationengineer.com/wp-content/uploads/2009/06/tool_pile_puzzlepiece1.jpg" alt="tool pile puzzlepiece1 5 Tools of an Integration Engineer" width="131" height="108" />There are job or task specific tools that will have a high importance to each integration task.  When working on an SAP system, your SAP tools will be very important.  But there are tools and skills that are also important regardless of the systems and technologies that you are working on.  For me, these are the top 5 tools that an Integration Engineer should be able to use proficiently.  Do you use any of these?  Do you have others?<span id="more-115"></span></p>
<p><strong>1.  A big Whiteboard</strong></p>
<p>This is probably my number one requirement.  When I am thinking, I like to draw it out.  I haven&#8217;t found an application that gives me the same creative release and adaptability as my whiteboard.  After starting to <img class="alignright size-medium wp-image-122" title="whiteboard" src="http://www.theintegrationengineer.com/wp-content/uploads/2009/05/whiteboard-300x300.jpg" alt="whiteboard 300x300 5 Tools of an Integration Engineer" width="146" height="146" />do some of my work from home, I went out and acquired a 10 X 4 whiteboard to put in my home office.  Having the extra room is essential.  I can get a call, and walk over to the whiteboard and start drawing out what needs to be built to solve the problems while I am still on the call.</p>
<p>You may not need a white board as big as mine.  And you may have an electronic solution that you like better.  If so, please let me know, I like to try out gadgets.</p>
<p><strong>2.  Spreadsheet</strong></p>
<p>Now this may not sound earth shattering, but we are not just talking about the basics.  You will need to learn to write filters, import data, link cells and perform calculations.  If you think that a spreadsheet is like a ledger, then you are missing the power of a spreadsheet.  If you don&#8217;t think you have the skills you need, here are some links to tutorials for the two most popular spreadsheet software.</p>
<ul>
<li>Very Basic Open Office Tutorial <a href="http://www.tutorialsforopenoffice.org/tutorial/Spreadsheet_Basics.html">http://www.tutorialsforopenoffice.org/</a></li>
<li>More specific/advanced tutorials <a href="http://openoffice.blogs.com/openoffice/">http://openoffice.blogs.com/openoffice/</a></li>
<li>Some basic and advanced help for MS Excel <a href="http://www.internet4classrooms.com/on-line_excel.htm">http://www.internet4classrooms.com</a></li>
</ul>
<p><strong>3.  Text editor.</strong></p>
<p>Familiarity with more than one is needed as you will find yourself on Windows and Unix servers and they will have different sets of tools.  One of the things that is the most frustrating is to not know how to use the native editor of the system you are on.  So get familiar with Notepad, and then get familiar with VI.  You can add in other tools like UltraEdit, TextPad, and more, but you should know the native ones first, if not best<strong>.</strong></p>
<ul>
<li>Ultra Edit is a popular tool in some circles.  <a href="http://www.ultraedit.com/">http://www.ultraedit.com/</a></li>
<li>Text Pad is also popular.  <a href="http://www.textpad.com/">http://www.textpad.com/</a></li>
<li>Believe it or not, notepad has some tutorials.  <a href="http://bink.nu/news/notepad-tips-and-tricks.aspx">http://bink.nu/news</a></li>
<li>And here is a cheat sheet for VI commands.  <a href="http://downloads.techrepublic.com.com/abstract.aspx?docid=172404">http://downloads.techrepublic.com</a></li>
</ul>
<p><strong>4.  File compare.</strong></p>
<p>One task that will need to be done is to compare a file with another file to detect changes or differences.  This happens in both new integrations, and in trouble shooting or investigating existing ones.  Some systems have native comparison tools, others don&#8217;t.  And there is much variety in how they work.  Here are some of the tools I have seen and used.</p>
<ul>
<li><a href="http://en.wikipedia.org/wiki/Diff">Wiki</a> has a great list.  I don&#8217;t know that I can improve on.</li>
</ul>
<p><strong>5.  DB query </strong></p>
<p>Much of the time, integration involves a database somewhere sometime.  To work on an integration and not have to have a DBA sitting in your lap, you need to be competent to query the database.  And this involves using some tool.  Sometimes systems will have native tools, other times you will need to connect your own.  Here is a list of tutorials for DB Query tools that I have seen and used.</p>
<ul>
<li>Squirrel SQL <a href="http://squirrel-sql.sourceforge.net/">http://squirrel-sql.sourceforge.net/</a></li>
<li>SQL Worksheet</li>
<li><a href="http://www.toadsoft.com">TOAD<br />
</a></li>
</ul>
<p><strong>Summary</strong></p>
<p>This is by no means attempting to be a comprehensive list of tools.  Such a list would be long, if it were possible.  And I don&#8217;t think that it is.  There are however some tools/skills that help us to be more effective as Integration Engineers.  Seeing what others use is helpful especially when we find that the tool we used to use is not longer around.</p>
<p>What tools, applications, or skills do you find you are falling back on often, on more than one project?  Or what new tools have you found that you think you will use often in the future?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.theintegrationengineer.com/5-tools-of-an-integration-engineer/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ASCII and EPSIDIC</title>
		<link>http://www.theintegrationengineer.com/ascii-and-epsidic/</link>
		<comments>http://www.theintegrationengineer.com/ascii-and-epsidic/#comments</comments>
		<pubDate>Thu, 19 Mar 2009 01:32:27 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[File]]></category>
		<category><![CDATA[ASCII]]></category>
		<category><![CDATA[character set]]></category>
		<category><![CDATA[compatable]]></category>
		<category><![CDATA[Data]]></category>
		<category><![CDATA[EPSIDIC]]></category>
		<category><![CDATA[format]]></category>
		<category><![CDATA[legacy]]></category>
		<category><![CDATA[pipe]]></category>
		<category><![CDATA[Standard]]></category>
		<category><![CDATA[text]]></category>

		<guid isPermaLink="false">http://www.theintegrationengineer.com/?p=81</guid>
		<description><![CDATA[What is a &#8220;Character Set?&#8221;
A character set is a collection or library of characters, (letters and symbols), and their identifying number.  Included with the printable characters, (letters and punctuation) are some unprintable yet important characters.  Characters are used to form messages.
Characters are not fonts.  Characters exist under the font that represent the definition of the [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-112" title="characters" src="http://www.theintegrationengineer.com/wp-content/uploads/2009/03/characters.jpeg" alt=" ASCII and EPSIDIC" width="129" height="78" /><strong>What is a &#8220;Character Set?&#8221;</strong></p>
<p>A character set is a collection or library of characters, (letters and symbols), and their identifying number.  Included with the printable characters, (letters and punctuation) are some unprintable yet important characters.  Characters are used to form messages.</p>
<p>Characters are not fonts.  Characters exist under the font that represent the definition of the character the  font is attempting to display.  When you change the font on a document the <strong>A</strong> is changed to an <em>A</em>, but the underlying character that identifies its meaning remains the same.  The font identifies how the character is displayed.  You can even convert to Wing Dings and the underlying character remains the same.<span id="more-81"></span></p>
<p>We can imagine that if I wrote this post using a character set that I created myself.  And then you came and tried you read it, without knowing what my character set was, you would see a bunch of garbage on the screen like if you go to a foreign language web page without the correct fonts loaded.  Even worse would be if you used the same characters, but had different Identifiers for them.  If an A is 001 (my set starts at A and moves on numerically) and you try to read it, (but in your character set you numbered the vowels after the consonants) and 001 to you is B.  Now all of the letters will be wrong.  And we get garbage.</p>
<p>Fortunately, some people got together early on and created a standard for characters.  The American Standard Code for Information Interchange created the character set we call ASCII.  The Extended Binary Coded Decimal Information Code was created by IBM, but they use ASCII now as well.</p>
<p><strong>What is ASCII</strong></p>
<p>ASCII is the acronym for American Standard Code for Information Interchange, and is a collection of characters defined from 0 to 127.  These definitions represent all of the standard English characters, numbers and symbols.  A number of other, unprintable, characters are also included.  You use one or two of these each time you hit the &#8220;Enter&#8221; key on your keyboard.  Depending on your operating systems, this sends the &#8220;carriage return&#8221; and or &#8220;line feed&#8221;</p>
<p>A &#8220;carriage return&#8221; comes from a printer where the head would move back and forth on the roller.  CR would tell the printer to move the head all the way to the left of its printing area.  A &#8220;line feed&#8221; is also from a printer perspective.  This tells the printer to roll the paper so that the head will be writing on the next line.  These are both examples of unprintable characters.  You can probably think of others.  For a complete list of ASCII characters, you can check out this table in my <a href="http://www.theintegrationengineer.com/tool-box">toolbox</a>.</p>
<p><strong>What is EPSIDIC</strong></p>
<p>EPSIDIC  is the acronym for Extended Binary Coded Decimal Information Code.  This was created by IBM back in the day.  IBM now uses ASCII just like everyone else, but there are legacies that are still with us.  Old terminals like VT100 and some legacy communications equipment still expect messages using the EPSIDIC character set.</p>
<p><strong>What is the big deal</strong></p>
<p>As I said, some systems still want to use some of the characters in the EPSIDIC system.  Even fancy new systems producing XML will sometimes fall into this trap and cause problems.  The one that I have run into is the use of | called &#8216;pipe&#8217;  ASCII and EPSIDIC use different character IDs for this character.  And I have seen e-commerce systems, that are using ASCII for everything else, throw in an EPSIDIC pipe as a control character.  When this happens, other systems will choke on it.</p>
<p>When you find yourself getting an invalid character message, but the characters look fine.  Remember that there are some twists that may exist in the underlying character set.  If you can, manually replace the character with the character that it looks like, (in my case the EPSIDIC | with a ASCII | )  and see if the parser likes the file now.  If it does, you have encountered the character set problem as I have.  This can be a difficult problem to solve if you have never encountered it.</p>
<p>If the character that is causing problems is not a pipe, you may want to look at IBM&#8217;s <a href="http://publib.boulder.ibm.com/infocenter/macxhelp/v6v81/index.jsp?topic=/com.ibm.xlf81m.doc/pgs/lr393.htm">ASCII to EPSIDIC conversion table</a>.  This can be difficult to communicate with others that have never encountered it, so using the ACSII and EPSIDIC identifier designation can help explain what we are saying in email and documentation when we are trying to correct the issue.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.theintegrationengineer.com/ascii-and-epsidic/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Flat Files</title>
		<link>http://www.theintegrationengineer.com/flat-files/</link>
		<comments>http://www.theintegrationengineer.com/flat-files/#comments</comments>
		<pubDate>Mon, 05 Jan 2009 18:38:47 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[Delimiters]]></category>
		<category><![CDATA[File]]></category>
		<category><![CDATA[Mapping]]></category>
		<category><![CDATA[b2b]]></category>
		<category><![CDATA[Character Delimited]]></category>
		<category><![CDATA[Comma Delimited]]></category>
		<category><![CDATA[csv]]></category>
		<category><![CDATA[Data Export]]></category>
		<category><![CDATA[Data Import]]></category>
		<category><![CDATA[delimiter]]></category>
		<category><![CDATA[EDI]]></category>
		<category><![CDATA[ETL]]></category>
		<category><![CDATA[Fixed Position]]></category>
		<category><![CDATA[Fixed Width Files]]></category>
		<category><![CDATA[Space]]></category>
		<category><![CDATA[spreadsheet]]></category>
		<category><![CDATA[White Space]]></category>

		<guid isPermaLink="false">http://www.theintegrationengineer.com/?p=5</guid>
		<description><![CDATA[What is a flat file?
Files are called &#8220;Flat Files&#8221; when they contain a single data structure.  Generally this structure is the column and row structure like a spreadsheet or table, but a file in binary or encrypted with a single encryption key could also be called a flat file.  Files that are not flat; marked [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-75" title="Flat File" src="http://www.theintegrationengineer.com/wp-content/uploads/2009/01/flatfile.jpg" alt="flatfile Flat Files" width="160" height="120" /><strong>What is a flat file?</strong></p>
<p>Files are called &#8220;Flat Files&#8221; when they contain a single data structure.  Generally this structure is the column and row structure like a spreadsheet or table, but a file in binary or encrypted with a single encryption key could also be called a flat file.  Files that are not flat; marked up files like XML or HTML, <a href="http://www.theintegrationengineer.com/what-is-edi/">EDI </a>files, other formats like HL7 or SEF files and others.  Here I am going to briefly discuss two flat file types; Delimited Files, and Fixed Width Files.<span id="more-5"></span></p>
<p><strong>What is a Delimited File?</strong></p>
<p>Ok, to describe it briefly, a delimited file is a file where the data is organized in rows and columns.  Each row has a set of data, and each column has a type of data.  If it sounds like I am describing a spreadsheet, you are right on the money.  To make the column, each row has the columns separated with a character called a delimiter.  See the example below.</p>
<p><img class="aligncenter size-full wp-image-72" title="Illustration of Delimited Data" src="http://www.theintegrationengineer.com/wp-content/uploads/2009/01/delimitedillustration.jpg" alt="delimitedillustration Flat Files" width="554" height="128" /></p>
<p>Tables of data and spreadsheets are both similar to a delimited file in the way they organize data.  In the delimited file all of the empty space, or white space is removed.  What we see here is a classic example of exporting a spreadsheet table as a comma delimited file.  In theory, this data can be imported by any other application that can read a delimited file.</p>
<p><em>Believe it of not, a space is a character, and takes up space in a file.  Back in the day people went out of their way to save space so that files could be send over slow modem connections.</em></p>
<p><strong>What is a Fixed Width File?</strong></p>
<p>There is another type of file, is is called a Fixed Width or Fixed Position file.  It is different from a delimited file in that the data fields are defined by the character position.  See the example below.</p>
<p><img class="aligncenter size-full wp-image-73" title="Fixed Width File Illustration" src="http://www.theintegrationengineer.com/wp-content/uploads/2009/01/wffileillustration.jpg" alt="wffileillustration Flat Files" width="570" height="132" /></p>
<p>In a fixed width file, the delimiter characters are eliminated.  If the data is formulated such that the data fields are the same size, this format can be more compact than a delimited file. You can see here that we know the size of the Birthdate data, so we eliminate all the spaces between the Bdate and Department fields.  If all of the data was formatted for size like this, we could really make this file small, so that it only contains the data.</p>
<p>We also eliminate the pesky problem of delimiters found in data.  The issue of a comma delimited file containing a field that has a comma in the data.  How does the parser know that this comma is not really a delimiter, but is part of the data?  Anyway, that problem is eliminated in a fixed width file.</p>
<p><strong>Comparison</strong></p>
<p>This is not a contest of which format is superior.  Both file architectures are useful and both are used commonly enough that you need to be at ease working with both.  Delimited files are really easy to work with as long as your data is clean of the delimiter character.  Doing quick integration of data common in ETL tasks, delimited files are far more common that Fixed Width.  Continuous operations of data integration and importation many times find that Fixed Width or Position files are more reliable for the unattended operation, even ETL if it is unattended.</p>
<p>As with many things in integration work, we want to pick the best option.  Knowing and working with both fixed and delimited files will help you determine which is the right choice for the task you have before you.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.theintegrationengineer.com/flat-files/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
