<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Integration Engineer &#187; Data</title>
	<atom:link href="http://www.theintegrationengineer.com/category/data/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.theintegrationengineer.com</link>
	<description>When it just has to work.</description>
	<lastBuildDate>Tue, 27 Jul 2010 17:33:47 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=abc</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>What&#8217;s the DIFF?</title>
		<link>http://www.theintegrationengineer.com/whats-the-diff/</link>
		<comments>http://www.theintegrationengineer.com/whats-the-diff/#comments</comments>
		<pubDate>Wed, 18 Nov 2009 14:50:18 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[File]]></category>
		<category><![CDATA[change]]></category>
		<category><![CDATA[compare]]></category>
		<category><![CDATA[diff]]></category>
		<category><![CDATA[difference]]></category>
		<category><![CDATA[directory]]></category>
		<category><![CDATA[tool]]></category>

		<guid isPermaLink="false">http://www.theintegrationengineer.com/?p=517</guid>
		<description><![CDATA[
One of the basic tasks Integration Engineers do is to compare files that we use or receive.  There are some interesting and useful tools that people can get out there to DIFF files.  But on Linux and Unix machines around the world there is a native tool that is almost always present.  Amazingly it is [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-518" title="apple-and-orange_pzl" src="http://www.theintegrationengineer.com/wp-content/uploads/2009/08/apple-and-orange_pzl.jpg" alt="apple and orange pzl Whats the DIFF?" width="191" height="159" /></p>
<p>One of the basic tasks Integration Engineers do is to compare files that we use or receive.  There are some interesting and useful tools that people can get out there to DIFF files.  But on Linux and Unix machines around the world there is a native tool that is almost always present.  Amazingly it is called DIFF.</p>
<p>Like some other command-line tools, its interface is not really intuitive.  Lets walk through the basics of how to get use out of this handy file comparing tool.  (If you are working with and comparing EDI files, you might want to look at the <a href="http://www.theintegrationengineer.com/edi-wrapped-and-unwrapped/">post on how to unwrap</a> your EDI file so that our line by line comparison is more meaningful.)</p>
<p><span id="more-517"></span></p>
<p><strong>How to used &#8220;diff&#8221;</strong></p>
<p>You can get the real basics by executing &#8220;diff &#8211;help&#8221; to get the basic help and options for this application.  But in short, here is the thumb nail. &#8220;diff&#8221; is followed by some options.  Options are designated by a &#8220;-&#8221; and then a letter indicating the option.  Any options are then followed by the two file names that are being compared.  Lets look at an example.</p>
<p><span style="text-decoration: underline;"><em>Example:</em></span></p>
<p>We have two files, file1.txt and file2.txt</p>
<table style="height: 151px;" border="1" cellspacing="5" cellpadding="5" width="477">
<tbody>
<tr style="text-align: center;">
<th>File1.txt</th>
<th>File2.txt</th>
</tr>
<tr>
<td width="50%">This is a test file:<br />
And this is the first line of the first file.<br />
Thanks.</td>
<td>This is a test file:<br />
And this is the first line of the second file.<br />
Thanks.<br />
Again.</td>
</tr>
</tbody>
</table>
<p>When we issue this command:  &#8220;diff file1.txt file2.txt&#8221; we get this result.</p>
<p style="padding-left: 60px;">2c2<br />
&lt; And this is the first line of the first file.<br />
&#8212;<br />
&gt; And this is the first line of the second file.<br />
3a4<br />
&gt; Again.</p>
<ul>
<li>The first thing we see is &#8220;2c2&#8243;  This is line 2 of the first file, compared or changed to line 2 of second file.</li>
<li>Next we have a &lt; indicating the first file, and the line echoed.</li>
<li>Following this we have a &#8220;&#8212;&#8221; as a separator between the line compared.</li>
<li>Next we have &gt; indicating the second file, and then that line is echoed.</li>
<li>This is a comparison between to lines that were found to be different.</li>
<li>For the next line that is shown, we have &#8220;3a4&#8243; that indicates that there is a line added to the file.</li>
<li>Finally, &gt; indicates the second file followed by the line being echoed.</li>
</ul>
<p>If we were to compare them in the other order, we end with these two lines:</p>
<p style="padding-left: 60px;">4d3<br />
&lt; Again.</p>
<ul>
<li>Here, &#8220;4d3&#8243; that the 4th line on the first file is deleted from the second file.</li>
<li>Following this is &lt; indicating the first file, and echoing the line.</li>
</ul>
<p><strong>Regular Options</strong></p>
<p>Here are the list of options that &#8220;&#8211;help&#8221; gives you, with maybe some more explanation.</p>
<p><em>diff [-b] [-i] [-t] [-w] [-c] [-C] [-e] [-f] [-h] [-n] [-D string] [-l] [-r] [-s] [-S name] [fileone filetwo ] [directoryone directorytwo]</em></p>
<table class="mtable" style="width: 100%;" border="0" cellspacing="1" cellpadding="5">
<tbody>
<tr class="tcw">
<td style="width: 120px;" valign="top">-b</td>
<td valign="top">Ignores spacing differences.  This is useful when white-space doesn&#8217;t matter in what you are comparing.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">-i</td>
<td valign="top">Ignores case.  This is useful when case doesn&#8217;t matter in what you are comparing.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">-t</td>
<td valign="top">Expands TAB characters in output lines. Normal or -c output adds character(s) to the front of each  line that may adversely affect the indentation of the original source lines and make the output lines difficult to interpret. This option will preserve the original source&#8217;s indentation.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">-w</td>
<td valign="top">Ignores spaces and tabs.  Again, for when we don&#8217;t want to include changes in the white-space.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">-c</td>
<td valign="top">Produces a listing of differences with three lines of context. With this option output format is modified slightly: output begins with identification of the files involved and their creation dates, then each change is separated by a line with a dozen *&#8217;s. The lines removed from file1 are marked with &#8216;-&#8217;; those added to file2 are marked &#8216;+&#8217;. Lines that are changed from one file to the other are marked in both files with &#8216;!&#8217;.</p>
<p>With our two files we get this output:</p>
<p>*** file1.txt    2009-11-17 10:20:38.000000000 -0700<br />
&#8212; file2.txt    2009-11-17 10:20:51.000000000 -0700<br />
***************<br />
*** 1,3 ****<br />
This is a test file:<br />
! And this is the first line of the first file.<br />
Thanks.<br />
&#8212; 1,4 &#8212;-<br />
This is a test file:<br />
! And this is the first line of the second file.<br />
Thanks.<br />
+ Again.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">-C</td>
<td valign="top">Produces a listing of differences identical to that produced by -c with number lines of context.</p>
<p>There is no difference to just -c with our examples if you supply a number.  i.e diff -c 1 file1.txt file1.txt</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">-e</td>
<td valign="top">Output an ed script.  I have looked at these, but really haven&#8217;t used this feature for anything real.  I may later if I have time.</p>
<p>With our files it looks like this:</p>
<p>3a<br />
Again.<br />
.<br />
2c<br />
And this is the first line of the second file.<br />
.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">-f</td>
<td valign="top">Produces a similar script, not useful with ed , in the opposite order.  (Really, this is exactly like -e except in reverse order.)</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">-h</td>
<td valign="top">Does a fast, half-hearted job. It works only when changed stretches are short and well separated, but does work on files of unlimited length.  Options -c, -e, -f, and -n are unavailable with -h. diff does not descend into directories with this option.</p>
<p>With our example files it produces the same output as with no options.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">-n</td>
<td valign="top">Produces a script similar to -e, but in the opposite order and with a count of changed  lines on each insert or delete command.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">-D string</td>
<td valign="top">Creates a merged version of file1 and file2 with C preprocessor controls included so that a compilation of the result without defining string is equivalent to compiling file1, while defining string will yield file2.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">-l</td>
<td valign="top">Produce output in long format. Before the diff, each text file is piped through &#8216;pr&#8217; to paginate it. Other differences are remembered and summarized after all text file differences are reported.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">-r</td>
<td valign="top">Applies diff recursively to common subdirectories encountered.  Just like you would expect if you have ever used this with any other command line tools like grep or rm.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">-s</td>
<td valign="top">Reports files that are the identical; these would not otherwise be mentioned.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">-S name</td>
<td valign="top">Starts a directory diff in the middle, beginning with the file name.  Basically this is a compare directory after a supplied file name.  Make sure this file exists in both directories or you will be disappointed.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">filenameone</td>
<td valign="top">File one for comparing.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">filenametwo</td>
<td valign="top">File two for comparing.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">directoryone</td>
<td valign="top">Directory one for comparing.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">directorytwo</td>
<td valign="top">Directory two for comparing.</td>
</tr>
</tbody>
</table>
<p>For comparing file, the first four options (-b, -i, -t, -w) are the most useful.  I don&#8217;t start with any options and add them as I need them to reduce the amount of change noise reported in the result set.</p>
<p><strong>diff is your friend</strong></p>
<p>Like many basic tools, &#8220;diff&#8221; is almost always there.  And if you know how to use it effectively, it can really save time and frustration.  Sure there are other cool file comparison tools.  Some are even embedded into other products.  But knowing how to use the basic tools that are always there will be a life saver in a crisis situation.  And the only way to know how to use them is to actually use them sometimes.</p>
<p>Do you use &#8220;diff&#8221; with a set of options that does a specific task for you?  If so, what are they, please share.  And what other basic tools do you use?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.theintegrationengineer.com/whats-the-diff/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Canonical Data Animation</title>
		<link>http://www.theintegrationengineer.com/canonical-data-animation/</link>
		<comments>http://www.theintegrationengineer.com/canonical-data-animation/#comments</comments>
		<pubDate>Tue, 06 Oct 2009 12:51:07 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[YouTube Posts]]></category>
		<category><![CDATA[canonical]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[File]]></category>
		<category><![CDATA[YouTube]]></category>

		<guid isPermaLink="false">http://www.theintegrationengineer.com/?p=728</guid>
		<description><![CDATA[Sometimes a picture can bring more clarity to a concept.  For Canonical Data, an animation is what is called for.  I found this animation of canonical data and its implementation.  I think the first minute and a half paint a very good picture of how canonical data is implemented and can be leveraged.  [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-786" title="cannon_pzl" src="http://www.theintegrationengineer.com/wp-content/uploads/2009/10/cannon_pzl.JPG" alt=" Canonical Data Animation" width="129" height="102" />Sometimes a picture can bring more clarity to a concept.  For Canonical Data, an animation is what is called for.  I found this animation of canonical data and its implementation.  I think the first minute and a half paint a very good picture of how canonical data is implemented and can be leveraged.  Later in the animation they start to describe a global vision of implementation.  Unfortunately I must disagree with this vision.  I don&#8217;t think that having a global canonical form of data will ever truly be a solution that works.</p>
<p><span id="more-728"></span></p>
<p>Take a look at this and tell me what you think.<br />
<center><br />
<object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="344" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://www.youtube.com/v/mj-kCFzF0ME&amp;color1=0xb1b1b1&amp;color2=0xcfcfcf&amp;hl=en&amp;feature=player_embedded&amp;fs=1" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="425" height="344" src="http://www.youtube.com/v/mj-kCFzF0ME&amp;color1=0xb1b1b1&amp;color2=0xcfcfcf&amp;hl=en&amp;feature=player_embedded&amp;fs=1" allowscriptaccess="always" allowfullscreen="true"></embed></object><br />
</center><br />
The illustration of data that is passed between two applications, having gaps and overflow of the data needed by a third party application is all to real.  And this is a good representation of what can be done using a Canonical format to handle the needs of all of the parties.</p>
<p>There are two options with different benefits that I have used.  One is to build a actual file format that holds your canonical data.  Another is to use a database to act as your canonical.  Both of these have trade offs.</p>
<p><strong>File based Canonical:</strong></p>
<p>To make a canonical file format you will need to pick the type of file that you want all of your applications to be receiving either directly or through an adapter of some kind.  Building from scratch a flat file or XML file is the most flexible, but requires you to do a bunch of planning.  And after it is done, this format must be maintained.</p>
<p>Using a file based canonical does allow you a fairly easy way to find the state of a failed step, as you can look at the file and identify what is wrong.  You can also correct the data there and allow the process to continue.  You can also make copies of these files for your monitoring so that tracking your data and transactions becomes easy, and you performance metrics become rich with data.</p>
<p><strong>Database based Canonical:</strong></p>
<p>Sometimes people, DBAs especially, get excited when we talk about doing this.  They are visualizing one massive Canonical Database that holds all of the transactions and is accessed by all of the applications.  And there are many products that work this way internally.  But this is not the only approach.</p>
<p>The one database to rule all canonical, or as I like to call it &#8220;Lord of the Databases&#8221;, requires a DBA to pay attention to optimizing, backing up, and all of the other care and feeding tasks that go along with having a database that you maintain for the long haul.  This is efficient in that you can get all of your performance data from one place, and monitoring is one connection.  However, some times applications have limitations in how they talk to a database that is not theirs, and this can make implementation complex.</p>
<p>Another way to use a Database based Canonical model is to use disposable databases.  In the Disposable Database implementation, you create a database that is small, only contains the structures and tables for the one transaction, and gets destroyed at the end of the transaction life cycle.  Using the Disposable Database, you don&#8217;t ever have to optimize them, back them up, or any of the other care and feeding tasks that are part of the LOTDB implementation.</p>
<p><strong>Comparative implementations:</strong></p>
<p>I want to examine and compair the File vs DB canonical implementations in more detail in another article.  If you have another Canonical implementation that I haven&#8217;t seen, please let me know.  I would love to examine that as well.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.theintegrationengineer.com/canonical-data-animation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Canonical Data</title>
		<link>http://www.theintegrationengineer.com/canonical-data/</link>
		<comments>http://www.theintegrationengineer.com/canonical-data/#comments</comments>
		<pubDate>Tue, 29 Sep 2009 22:18:48 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[canonical]]></category>
		<category><![CDATA[application  data]]></category>
		<category><![CDATA[Data]]></category>
		<category><![CDATA[Data-set]]></category>
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://www.theintegrationengineer.com/?p=690</guid>
		<description><![CDATA[Like the exchange of data between Sellers and Suppliers, the exchange of data within the company is vital to successful processing of transactions.  This could be the Seller or the Supplier, it doesn&#8217;t really matter.  As data passes from the processing application, to the external interface.  We have a concept that is called a Canonical [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-719" title="emergency_traffic_cone_pzl" src="http://www.theintegrationengineer.com/wp-content/uploads/2009/09/emergency_traffic_cone_pzl.png" alt="emergency traffic cone pzl Canonical Data" width="76" height="85" />Like the exchange of data between Sellers and Suppliers, the exchange of data within the company is vital to successful processing of transactions.  This could be the Seller or the Supplier, it doesn&#8217;t really matter.  As data passes from the processing application, to the external interface.  We have a concept that is called a Canonical form of data.  Canonical data is the data that is required and essential to completing your business.  I have written about this in passing before, but today I want to talk about what goes in it, and what it looks like.<span id="more-690"></span></p>
<p><strong>What is Canonical Data?</strong></p>
<p>Canonical Data is the data that you or your business needs to complete the job you have to do with it.  If we are talking about order processing, then the Canonical Data will comprise the data that is used to complete the order.  This is not the minimal data set, this is the whole data set.  This is not necessarily the EDI enveloping data, it may be Identifiers that relate to the TPIDs, and it may not contain control numbers and other things that may only be found in the EDI file.  But it will contain information that you need for the order, PO Number, Line Item data, and so forth.  All of this data will need to be passed from your integration, translation, and or communication software, to your order processing systems.  And it may be passed around between systems as the order is filled and processed or what ever else you are doing with it.</p>
<p>What this data is, becomes defined based on the requirements of your internal systems, software and data needs.  One company may take shipping data and stick it into a data base outside of the canonical data.  It gets retrieved later, but the internal applications don&#8217;t have to deal with and preserve it.  Others may include it and leverage it on some internal aspects of their order fulfilment.  But the Canonical Dataset that your business needs and uses will be defined by the internal systems so that they can do what you need them to do.</p>
<p><strong>What form should it take?</strong></p>
<p>This is largely a choice that is up to you and you organization.  There will probably be some formats that are native to some applications.  As long as these fill the requirements of encompassing the data that you need to process on, these are fine choices.  Also, looking at working with these internally through upgrades to 3rd party applications, and possible inclusion of new applications, a flexible format that can be updated is important.</p>
<p>For me, in my experience I have seen flat file and XML used with a good deal of success.  EDI may have a comprehensive set of data, but can become really clumsy when working with internal applications that need to be extended by adding data to the canonical file.  So I would discourage EDI or a canonized XML like cXML to be used at the canonical</p>
<p><strong>Who sees the Canonical Documents?</strong></p>
<p>Please keep your Canonical Documents for internal consumption only.  It is not that they contain secret information, (they may have confidential data)  but they are designed just for you and  your applications.  I worked with a supply chain company once that developed their internal XML format, and then published it out to a couple of trading partners.  These partners integrated to this.  Eighteen months later, when the company was doing some growing, they were stuck supporting this one-off format that was no longer used as the internal canonical.</p>
<p>Developers, and those supporting the applications are going to see and deal with the canonical data and data files.  Monitoring and reporting systems may represent the canonical data, but it is rare that you need to invest time training and supporting bringing people outside of this group up to a familiar speed on your canonical file format.</p>
<p><strong>What have you seen done?</strong></p>
<p>Does your company have a formal canonical dataset?  Or what have you seen done that works to fill the need of canonical data?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.theintegrationengineer.com/canonical-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mapping Excersizes: EDI Invoice to Open Office Tables (part One)</title>
		<link>http://www.theintegrationengineer.com/mapping-excersizes-edi-invoice-to-open-office-tables-part-one/</link>
		<comments>http://www.theintegrationengineer.com/mapping-excersizes-edi-invoice-to-open-office-tables-part-one/#comments</comments>
		<pubDate>Tue, 07 Jul 2009 17:30:24 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[Database]]></category>
		<category><![CDATA[EDI]]></category>
		<category><![CDATA[Mapping]]></category>
		<category><![CDATA[Mapping Exercise]]></category>
		<category><![CDATA[Invoice]]></category>
		<category><![CDATA[Open Office]]></category>
		<category><![CDATA[Paper Map]]></category>
		<category><![CDATA[target]]></category>

		<guid isPermaLink="false">http://www.theintegrationengineer.com/?p=37</guid>
		<description><![CDATA[This is a mapping exercise that will go through the process of creating a paper map, or mapping document.  We will start with an empty paper map that you can get here.  And we will end with a completed paper map document that documents what data from the source goes into what fields on the [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-162" title="math" src="http://www.theintegrationengineer.com/wp-content/uploads/2009/06/math.jpg" alt="math Mapping Excersizes: EDI Invoice to Open Office Tables (part One)" width="173" height="173" />This is a mapping exercise that will go through the process of creating a paper map, or mapping document.  We will start with an empty paper map that you can get <a href="http://www.theintegrationengineer.com/tool-box/#papermap">here</a>.  And we will end with a completed paper map document that documents what data from the source goes into what fields on the target.  This process will take more than one post, and I will link them together so that you can follow from one to the next.  Along the way, we will discuss the things that we are doing so that you can apply this technique in your mapping using the target and source in your own mapping tasks.<span id="more-37"></span></p>
<p><strong>The Target</strong></p>
<p>Open Office tables are divided into two tables. <em> Invoice</em> and <em>Invoice Details</em>.  This can be mapped in two ways; The first way is to map the data into one common format and rely on whatever ETL tool is importing the data to catch and split the data.  Second, a key can be acquired or constructed in the transformation and then the the data can be divided in to matching input formats.  Then when these inputs are moved into the database, they will relate to each other on this key.<a href="http://65e92d0uv89gefp2xcimn8dp2a.hop.clickbank.net/" target="_top"><img class="alignright size-full wp-image-116" title="ssn_databasejpeg" src="http://www.databasedesign-resource.com/images/NormalizationBook.jpg" alt="Paper Database" width="176" height="338" /></a></p>
<p>The choice of how you will do this will depend on your environment.  Questions like, &#8220;Will I have enough data to provide a unique key?&#8221;, or &#8220;is there a way to get a key with an API call or database query?&#8221;  The answers to these questions will determine what course you will take.</p>
<p>If the system ultimately receiving the data is asynchronous to the transformation, and you need to send the invoice and invoice details data separately, some care needs to be taken to ensure that the data can be related after it is separated.</p>
<p>So what data in the invoice can be used to tie the invoice to the invoice details?  The first answer might be, &#8220;The Invoice Number.&#8221;  But this number is not guaranteed to be unique across multiple vendors.  In EDI and cXML there are document unique identifiers.  Since we are using EDI, we can use a combination of the ISA Sender, Receiver, and Control number.  We will also want to use the GS control number, and ST control number in the event that more than one invoice is sent in a single <a href="http://www.theintegrationengineer.com/edi-enveloping-part-one/">envelope</a>.  And we might as well tack on the actual invoice number from the BIG_02.</p>
<p style="padding-left: 30px;"><em><strong>Database Tables</strong></em></p>
<table border="0" cellspacing="10">
<tbody>
<tr valign="top">
<td>
<p style="padding-left: 30px;">Invoice Table:</p>
</td>
<td>
<p style="padding-left: 30px;">Invoice Details Table:</p>
</td>
</tr>
<tr valign="top">
<td><img class="aligntopsize-full wp-image-54" title="Invoice Table Definition" src="http://www.theintegrationengineer.com/wp-content/uploads/2008/11/invoicetabledef.jpg" alt="invoicetabledef Mapping Excersizes: EDI Invoice to Open Office Tables (part One)" width="203" height="205" /></td>
<td><img class="aligntop size-full wp-image-55" title="Invoice Details Table Definition" src="http://www.theintegrationengineer.com/wp-content/uploads/2008/11/invoicedetailstabledef.jpg" alt="invoicedetailstabledef Mapping Excersizes: EDI Invoice to Open Office Tables (part One)" width="203" height="148" /></td>
</tr>
</tbody>
</table>
<p><strong>The Paper Map</strong></p>
<p>Now that we know what the target looks like, we fill out the target side of the paper map.  Since we will create two &#8220;files&#8221; in our output.  We are creating the Invoice and the InvoiceDetails file, but we can use one paper map for both, and will distinguish this with a bar between the two &#8220;files&#8221;.  (I am saying files, but this could be a queue, or a post, or an insert over odbc, etc)</p>
<p style="text-align: center;"><img class="size-full wp-image-181 aligncenter" title="invoiceMap_target" src="http://www.theintegrationengineer.com/wp-content/uploads/2009/06/invoiceMap_target.png" alt="invoiceMap target Mapping Excersizes: EDI Invoice to Open Office Tables (part One)" width="281" height="435" /></p>
<p><strong>What&#8217;s Next</strong></p>
<p>Today we went through the process of identifying the target, and creating a paper map with the target format identified.  We talked about some of the strategy that we use in deciding what to map and how to map it.  Next time we will identify the source, and begin mapping data from the source.</p>
<p><strong> </strong></p>
]]></content:encoded>
			<wfw:commentRss>http://www.theintegrationengineer.com/mapping-excersizes-edi-invoice-to-open-office-tables-part-one/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>5 Tools of an Integration Engineer</title>
		<link>http://www.theintegrationengineer.com/5-tools-of-an-integration-engineer/</link>
		<comments>http://www.theintegrationengineer.com/5-tools-of-an-integration-engineer/#comments</comments>
		<pubDate>Sat, 13 Jun 2009 17:30:32 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[Database]]></category>
		<category><![CDATA[File]]></category>
		<category><![CDATA[Calc]]></category>
		<category><![CDATA[editor]]></category>
		<category><![CDATA[Excel]]></category>
		<category><![CDATA[notepad]]></category>
		<category><![CDATA[Open Office]]></category>
		<category><![CDATA[spread sheet]]></category>
		<category><![CDATA[spreadsheet]]></category>
		<category><![CDATA[SQL Worksheet]]></category>
		<category><![CDATA[squirrel]]></category>
		<category><![CDATA[techrepublic]]></category>
		<category><![CDATA[text]]></category>
		<category><![CDATA[text pad]]></category>
		<category><![CDATA[textpad]]></category>
		<category><![CDATA[TOAD]]></category>
		<category><![CDATA[tools]]></category>
		<category><![CDATA[ultra Edit]]></category>
		<category><![CDATA[ultraedit]]></category>
		<category><![CDATA[vi]]></category>
		<category><![CDATA[white board]]></category>
		<category><![CDATA[whiteboard]]></category>

		<guid isPermaLink="false">http://www.theintegrationengineer.com/?p=115</guid>
		<description><![CDATA[There are job or task specific tools that will have a high importance to each integration task.  When working on an SAP system, your SAP tools will be very important.  But there are tools and skills that are also important regardless of the systems and technologies that you are working on.  For me, these are [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-144" title="tool_pile_puzzlepiece1" src="http://www.theintegrationengineer.com/wp-content/uploads/2009/06/tool_pile_puzzlepiece1.jpg" alt="tool pile puzzlepiece1 5 Tools of an Integration Engineer" width="131" height="108" />There are job or task specific tools that will have a high importance to each integration task.  When working on an SAP system, your SAP tools will be very important.  But there are tools and skills that are also important regardless of the systems and technologies that you are working on.  For me, these are the top 5 tools that an Integration Engineer should be able to use proficiently.  Do you use any of these?  Do you have others?<span id="more-115"></span></p>
<p><strong>1.  A big Whiteboard</strong></p>
<p>This is probably my number one requirement.  When I am thinking, I like to draw it out.  I haven&#8217;t found an application that gives me the same creative release and adaptability as my whiteboard.  After starting to <img class="alignright size-medium wp-image-122" title="whiteboard" src="http://www.theintegrationengineer.com/wp-content/uploads/2009/05/whiteboard-300x300.jpg" alt="whiteboard 300x300 5 Tools of an Integration Engineer" width="146" height="146" />do some of my work from home, I went out and acquired a 10 X 4 whiteboard to put in my home office.  Having the extra room is essential.  I can get a call, and walk over to the whiteboard and start drawing out what needs to be built to solve the problems while I am still on the call.</p>
<p>You may not need a white board as big as mine.  And you may have an electronic solution that you like better.  If so, please let me know, I like to try out gadgets.</p>
<p><strong>2.  Spreadsheet</strong></p>
<p>Now this may not sound earth shattering, but we are not just talking about the basics.  You will need to learn to write filters, import data, link cells and perform calculations.  If you think that a spreadsheet is like a ledger, then you are missing the power of a spreadsheet.  If you don&#8217;t think you have the skills you need, here are some links to tutorials for the two most popular spreadsheet software.</p>
<ul>
<li>Very Basic Open Office Tutorial <a href="http://www.tutorialsforopenoffice.org/tutorial/Spreadsheet_Basics.html">http://www.tutorialsforopenoffice.org/</a></li>
<li>More specific/advanced tutorials <a href="http://openoffice.blogs.com/openoffice/">http://openoffice.blogs.com/openoffice/</a></li>
<li>Some basic and advanced help for MS Excel <a href="http://www.internet4classrooms.com/on-line_excel.htm">http://www.internet4classrooms.com</a></li>
</ul>
<p><strong>3.  Text editor.</strong></p>
<p>Familiarity with more than one is needed as you will find yourself on Windows and Unix servers and they will have different sets of tools.  One of the things that is the most frustrating is to not know how to use the native editor of the system you are on.  So get familiar with Notepad, and then get familiar with VI.  You can add in other tools like UltraEdit, TextPad, and more, but you should know the native ones first, if not best<strong>.</strong></p>
<ul>
<li>Ultra Edit is a popular tool in some circles.  <a href="http://www.ultraedit.com/">http://www.ultraedit.com/</a></li>
<li>Text Pad is also popular.  <a href="http://www.textpad.com/">http://www.textpad.com/</a></li>
<li>Believe it or not, notepad has some tutorials.  <a href="http://bink.nu/news/notepad-tips-and-tricks.aspx">http://bink.nu/news</a></li>
<li>And here is a cheat sheet for VI commands.  <a href="http://downloads.techrepublic.com.com/abstract.aspx?docid=172404">http://downloads.techrepublic.com</a></li>
</ul>
<p><strong>4.  File compare.</strong></p>
<p>One task that will need to be done is to compare a file with another file to detect changes or differences.  This happens in both new integrations, and in trouble shooting or investigating existing ones.  Some systems have native comparison tools, others don&#8217;t.  And there is much variety in how they work.  Here are some of the tools I have seen and used.</p>
<ul>
<li><a href="http://en.wikipedia.org/wiki/Diff">Wiki</a> has a great list.  I don&#8217;t know that I can improve on.</li>
</ul>
<p><strong>5.  DB query </strong></p>
<p>Much of the time, integration involves a database somewhere sometime.  To work on an integration and not have to have a DBA sitting in your lap, you need to be competent to query the database.  And this involves using some tool.  Sometimes systems will have native tools, other times you will need to connect your own.  Here is a list of tutorials for DB Query tools that I have seen and used.</p>
<ul>
<li>Squirrel SQL <a href="http://squirrel-sql.sourceforge.net/">http://squirrel-sql.sourceforge.net/</a></li>
<li>SQL Worksheet</li>
<li><a href="http://www.toadsoft.com">TOAD<br />
</a></li>
</ul>
<p><strong>Summary</strong></p>
<p>This is by no means attempting to be a comprehensive list of tools.  Such a list would be long, if it were possible.  And I don&#8217;t think that it is.  There are however some tools/skills that help us to be more effective as Integration Engineers.  Seeing what others use is helpful especially when we find that the tool we used to use is not longer around.</p>
<p>What tools, applications, or skills do you find you are falling back on often, on more than one project?  Or what new tools have you found that you think you will use often in the future?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.theintegrationengineer.com/5-tools-of-an-integration-engineer/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What is a Database</title>
		<link>http://www.theintegrationengineer.com/what-is-a-database/</link>
		<comments>http://www.theintegrationengineer.com/what-is-a-database/#comments</comments>
		<pubDate>Mon, 08 Jun 2009 16:26:56 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[Database]]></category>
		<category><![CDATA[embeded database]]></category>
		<category><![CDATA[external database]]></category>
		<category><![CDATA[Hirarchial Database]]></category>
		<category><![CDATA[query]]></category>
		<category><![CDATA[RDBMS]]></category>
		<category><![CDATA[Relational Database]]></category>
		<category><![CDATA[schema]]></category>
		<category><![CDATA[Single Table]]></category>
		<category><![CDATA[spreadsheet database]]></category>
		<category><![CDATA[Table]]></category>
		<category><![CDATA[tree]]></category>

		<guid isPermaLink="false">http://www.theintegrationengineer.com/?p=106</guid>
		<description><![CDATA[Definition:  A database is a structured collection of records or data.
Many moons ago, I was shown an database that was constructed back in the 50s.  It was hand held and consisted of a stack of cards that contained information about plant biology.  There was a series of holes that wrapped around the cards, some of [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Definition</strong>:  A <a href="http://en.wikipedia.org/wiki/Database"><em>database</em></a> is a structured collection of records or data.</p>
<p>Many moons ago, I was shown an database that was constructed back in the 50s.  It was hand held and consisted of a stack of cards that contained information about plant biology.  There was a series of holes that wrapped around the cards, some of the holes were notches, and others were not.</p>
<p>To query the database, one placed a pin through one of the wholes and let the cards that had a notch fall out of the stack.  You could continue this process by removing and adding pins to select different combination of data.</p>
<p>Once you had narrowed your search, you could read the cards, and retrieve the data.</p>
<p>This was a very manual process, and I doubt that anyone today would take the time to learn to use such a tool, let alone update or add records to it.<span id="more-106"></span></p>
<p><strong></strong><img class="alignleft size-full wp-image-116" title="ssn_databasejpeg" src="http://www.theintegrationengineer.com/wp-content/uploads/2009/04/ssn_databasejpeg.jpeg" alt="Paper Database" width="250" height="200" /><strong>More Modern Databases</strong></p>
<p>Replacing Card Catalogs in libraries, Rolodex on desktops, and filing rooms in offices, is our new electronic concept of databases.  At a rudimentary level, and fitting under the definition of the database that we started with fall many simple documents.</p>
<p>A post-it note is not a database, but a list of names and numbers on a contact list is.  It has regular structure, and the data can be retrieved using a routine.  And this is as far as most people need to go.  They see databases like spreadsheets.  They even use filters and link cells for calculations.</p>
<p><strong>More than just a list</strong></p>
<p>A database that just holds a list of records is a single table Database.  This is like a spreadsheet and the only real advantage is that Databases on computers are generally designed to be queried by other applications.  The contact list in your email client is not much more than a flat table.  But it is queried and updated from the email application.  This makes the application more useful.</p>
<p>Again, long ago, I had a watch that held phone numbers.  This worked pretty well, and had an alpha-numeric keyboard.  But the data was in a silo.  There was no way to get the data in or out other than the small screen and keypad.  Application bound, embedded databases are also silos of data.</p>
<p>Many applications now use, or have the option to use, an embedded database.  But databases that can be used by more than one application, and are treated like an object are even more useful.  When a database is shared between applications, we can gain exponential utility.  Every new application that can query the shared database becomes more efficient.</p>
<p><strong>RDBMS More than a single table.</strong></p>
<p>One of the things that we will discuss more later, is the concept of a relational database.  What this means is that we have more than one table, that have related fields.  A analogous example is the an email system with an integrated calender.  Now we have the schema or table that contains the contact and relate them to people invited to the events.  These systems have been around for quite a while and gives us a conceptual model of how related tables bring power to a Relational Database Management System (RDBMS).</p>
<p><strong>Data in a Hierarchy<br />
</strong></p>
<p>RDBMS is not the only way to organize data.  One way that some databases organize data is in a hierarchical schema.  In this fashion instead of tables, we have a conceptual model like a directory tree.  Data is located in a name space that is organized like files in directories.  Data in this type of structure gains some implicit structure and relationship as a parent child relationships exists in the data naturally.  For small sets of data this can make the database small and fast.  However the ability to build abstract structures, relationships and queries are more difficult with this structure, and the database really grows when extended to cover the features of multiple relational data reference.</p>
<p><strong>All Shapes and Sizes</strong></p>
<p>More basic than the most simple table or hierarchy based database is the file based system.  This is really what it sounds like.  A simple text file or an XML file can be used as a database.  These can support queries, inserts and joins.  Even though the database is in a standard file that can be opened and accessed with a common text editor, it has a structure and can be used to keep the application data organized.</p>
<p><strong>Databases for the Integration Engineer</strong></p>
<p>As integration Engineers, we will sometimes find ourselves working directly with and application&#8217;s database.  Sometimes this database will be a full blown enterprise level database implementation.  Many times, however, it will be something more simple like an embedded table or xml file that the application treats as its database.  Whatever the case, we need to learn and respect the structure of the data, and work to ensure that our integration preserves the integrity of the data structure.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.theintegrationengineer.com/what-is-a-database/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>RFID Data Gathering and Commerce</title>
		<link>http://www.theintegrationengineer.com/rfid-data-gathering-and-commerce/</link>
		<comments>http://www.theintegrationengineer.com/rfid-data-gathering-and-commerce/#comments</comments>
		<pubDate>Mon, 01 Jun 2009 16:03:23 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[Supply Chain]]></category>
		<category><![CDATA[YouTube Posts]]></category>
		<category><![CDATA[Automated Data collection]]></category>
		<category><![CDATA[BPM]]></category>
		<category><![CDATA[RFID]]></category>
		<category><![CDATA[YouTube]]></category>

		<guid isPermaLink="false">http://www.theintegrationengineer.com/?p=57</guid>
		<description><![CDATA[I found this video on YouTube.
RFID is exciting technology for integration engineers.  All of this data coming in will have to be aggregated and integrated by someone.  Also, one of the focuses of integration is to provide more and better information to people and systems.  With more and better information people, businesses and systems can [...]]]></description>
			<content:encoded><![CDATA[<p>I found this video on YouTube.</p>
<p>RFID is exciting technology for integration engineers.  All of this data coming in will have to be aggregated and integrated by someone.  Also, one of the focuses of integration is to provide more and better information to people and systems.  With more and better information people, businesses and systems can make better choices and decisions.<span id="more-57"></span></p>
<p>This is a good thing.</p>
<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="344" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube.com/v/0VbMr2gnGDE&amp;hl=en&amp;fs=1" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="425" height="344" src="http://www.youtube.com/v/0VbMr2gnGDE&amp;hl=en&amp;fs=1" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>People talk about privacy issues and consumer protection.  I am sure that debate will continue to rage on long after the keyboard and mouse you are using have RFID tags embedded.  But the focus of this blog is how these technologies can be used for the collection and integration of information.  Implementing RFID technologies provides a hugely beneficial and accurate stream of data.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.theintegrationengineer.com/rfid-data-gathering-and-commerce/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ASCII and EPSIDIC</title>
		<link>http://www.theintegrationengineer.com/ascii-and-epsidic/</link>
		<comments>http://www.theintegrationengineer.com/ascii-and-epsidic/#comments</comments>
		<pubDate>Thu, 19 Mar 2009 01:32:27 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[File]]></category>
		<category><![CDATA[ASCII]]></category>
		<category><![CDATA[character set]]></category>
		<category><![CDATA[compatable]]></category>
		<category><![CDATA[Data]]></category>
		<category><![CDATA[EPSIDIC]]></category>
		<category><![CDATA[format]]></category>
		<category><![CDATA[legacy]]></category>
		<category><![CDATA[pipe]]></category>
		<category><![CDATA[Standard]]></category>
		<category><![CDATA[text]]></category>

		<guid isPermaLink="false">http://www.theintegrationengineer.com/?p=81</guid>
		<description><![CDATA[What is a &#8220;Character Set?&#8221;
A character set is a collection or library of characters, (letters and symbols), and their identifying number.  Included with the printable characters, (letters and punctuation) are some unprintable yet important characters.  Characters are used to form messages.
Characters are not fonts.  Characters exist under the font that represent the definition of the [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-112" title="characters" src="http://www.theintegrationengineer.com/wp-content/uploads/2009/03/characters.jpeg" alt=" ASCII and EPSIDIC" width="129" height="78" /><strong>What is a &#8220;Character Set?&#8221;</strong></p>
<p>A character set is a collection or library of characters, (letters and symbols), and their identifying number.  Included with the printable characters, (letters and punctuation) are some unprintable yet important characters.  Characters are used to form messages.</p>
<p>Characters are not fonts.  Characters exist under the font that represent the definition of the character the  font is attempting to display.  When you change the font on a document the <strong>A</strong> is changed to an <em>A</em>, but the underlying character that identifies its meaning remains the same.  The font identifies how the character is displayed.  You can even convert to Wing Dings and the underlying character remains the same.<span id="more-81"></span></p>
<p>We can imagine that if I wrote this post using a character set that I created myself.  And then you came and tried you read it, without knowing what my character set was, you would see a bunch of garbage on the screen like if you go to a foreign language web page without the correct fonts loaded.  Even worse would be if you used the same characters, but had different Identifiers for them.  If an A is 001 (my set starts at A and moves on numerically) and you try to read it, (but in your character set you numbered the vowels after the consonants) and 001 to you is B.  Now all of the letters will be wrong.  And we get garbage.</p>
<p>Fortunately, some people got together early on and created a standard for characters.  The American Standard Code for Information Interchange created the character set we call ASCII.  The Extended Binary Coded Decimal Information Code was created by IBM, but they use ASCII now as well.</p>
<p><strong>What is ASCII</strong></p>
<p>ASCII is the acronym for American Standard Code for Information Interchange, and is a collection of characters defined from 0 to 127.  These definitions represent all of the standard English characters, numbers and symbols.  A number of other, unprintable, characters are also included.  You use one or two of these each time you hit the &#8220;Enter&#8221; key on your keyboard.  Depending on your operating systems, this sends the &#8220;carriage return&#8221; and or &#8220;line feed&#8221;</p>
<p>A &#8220;carriage return&#8221; comes from a printer where the head would move back and forth on the roller.  CR would tell the printer to move the head all the way to the left of its printing area.  A &#8220;line feed&#8221; is also from a printer perspective.  This tells the printer to roll the paper so that the head will be writing on the next line.  These are both examples of unprintable characters.  You can probably think of others.  For a complete list of ASCII characters, you can check out this table in my <a href="http://www.theintegrationengineer.com/tool-box">toolbox</a>.</p>
<p><strong>What is EPSIDIC</strong></p>
<p>EPSIDIC  is the acronym for Extended Binary Coded Decimal Information Code.  This was created by IBM back in the day.  IBM now uses ASCII just like everyone else, but there are legacies that are still with us.  Old terminals like VT100 and some legacy communications equipment still expect messages using the EPSIDIC character set.</p>
<p><strong>What is the big deal</strong></p>
<p>As I said, some systems still want to use some of the characters in the EPSIDIC system.  Even fancy new systems producing XML will sometimes fall into this trap and cause problems.  The one that I have run into is the use of | called &#8216;pipe&#8217;  ASCII and EPSIDIC use different character IDs for this character.  And I have seen e-commerce systems, that are using ASCII for everything else, throw in an EPSIDIC pipe as a control character.  When this happens, other systems will choke on it.</p>
<p>When you find yourself getting an invalid character message, but the characters look fine.  Remember that there are some twists that may exist in the underlying character set.  If you can, manually replace the character with the character that it looks like, (in my case the EPSIDIC | with a ASCII | )  and see if the parser likes the file now.  If it does, you have encountered the character set problem as I have.  This can be a difficult problem to solve if you have never encountered it.</p>
<p>If the character that is causing problems is not a pipe, you may want to look at IBM&#8217;s <a href="http://publib.boulder.ibm.com/infocenter/macxhelp/v6v81/index.jsp?topic=/com.ibm.xlf81m.doc/pgs/lr393.htm">ASCII to EPSIDIC conversion table</a>.  This can be difficult to communicate with others that have never encountered it, so using the ACSII and EPSIDIC identifier designation can help explain what we are saying in email and documentation when we are trying to correct the issue.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.theintegrationengineer.com/ascii-and-epsidic/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Circular Files</title>
		<link>http://www.theintegrationengineer.com/circular-files/</link>
		<comments>http://www.theintegrationengineer.com/circular-files/#comments</comments>
		<pubDate>Wed, 11 Feb 2009 21:16:08 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[Logging]]></category>
		<category><![CDATA[circular file]]></category>
		<category><![CDATA[error tracking]]></category>
		<category><![CDATA[log]]></category>
		<category><![CDATA[logging]]></category>
		<category><![CDATA[round file]]></category>
		<category><![CDATA[trouble shooting]]></category>
		<category><![CDATA[troubleshooting]]></category>

		<guid isPermaLink="false">http://www.theintegrationengineer.com/?p=79</guid>
		<description><![CDATA[A circular file is not a nickname for the waste can. Circular files, sometimes called round files, are useful in some applications and support tasks.  With a normal log file or repository, the log grows as logged events are added to the log.  The obvious danger is that if the space where the log is [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-92" title="circular-file" src="http://www.theintegrationengineer.com/wp-content/uploads/2009/01/circular-file.jpg" alt="circular file Circular Files" width="86" height="99" />A <a href="http://www.thefreedictionary.com/circular+file">circular file</a> is not a nickname for the waste can. Circular files, sometimes called round files, are useful in some applications and support tasks.  With a normal log file or repository, the log grows as logged events are added to the log.  The obvious danger is that if the space where the log is located becomes saturated when the log grows to fill this space.  Many applications will shut down and refuse to restart if this happens.  For some applications, having the log write to a circular file is the answer.</p>
<p><span id="more-79"></span></p>
<h3>What is a Circular File?</h3>
<p>This is not like sending the log data to /dev/null or some other black hole.  It is a file that can&#8217;t grow beyond a specified size, but never refuses to accept new data.  This is not a paradox.  Visualizing this file like a roll of paper instead of a sheet of paper.  On the roll, there is no end, it just goes around and around.</p>
<p>Say you have a small circular file, 10 MB in our example.  Then lets say, every minute, you add 1 MB of data to your file. For 9 minutes after you make a log entry, you can see this entry in the file.  Each time it moves down on minutes worth of space on the log.  At 10+ minutes, it is gone.  The circular file has rolled over and overwritten that location with more current entry.</p>
<p>Okay, so a 10 minute log may not be that useful, and we would probably want to used a bigger size.  But the point is, that if disaster strikes, the log won&#8217;t grow like crazy and create an even bigger problem by filling up the disk.</p>
<h3>Why aren&#8217;t all logs like this?</h3>
<p>Using a circular file for a log is very useful, but if you have an error, and then the log scrolls on for long enough for the error to be lost, all you have are the symptoms not the cause.  So you have saved yourself the problem of making, &#8220;The smtp server failed, and the Web tool started logging errors that it couldn&#8217;t send emails.  People kept clicking, resend mail until the log filled up the disk, and the app died.&#8221; explanations.  But they are replaced with the, &#8220;Yes we logged the error, and then the app started logging all of the following actions and the error fell out of the log.&#8221;  followed by an explanation of what a circular file is and why this bad news is really good news.</p>
<p>Its best to use more than one logging method.  Say we have a circular log that allows us to see a week of normal activity.  Lets pretend that is 10 GB.  And we have a log that is flat, and only logs critical errors.  Now we have a way to not get clobbered by all the actions that start happening when the rain comes, and we don&#8217;t lose that original error that started the mess.</p>
<h3>Utopalog</h3>
<p>Okay, I am making this up, but here is my dream strategy.  First, we use a circular log that can be size configured on the fly.  Second we use an error database.  We have two table structures; First Occurrence, and Last Occurrence.  Just as they sound, when we have a new error, debug, whatever, it gets logged in the First and Last Occurrence tables.  But the Next time it happens, the time on the Last Occurrence table is the only thing that gets updated.</p>
<p>To use this, we periodically truncate or archive the tables.  The log is self cleaning.  We can do this each night, or on demand, or pick your own schedule.  Now when a problem occurs, we can look for errors that happened before the beginning of the log, but we won&#8217;t have a mountain of data that will crush us.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.theintegrationengineer.com/circular-files/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Flat Files</title>
		<link>http://www.theintegrationengineer.com/flat-files/</link>
		<comments>http://www.theintegrationengineer.com/flat-files/#comments</comments>
		<pubDate>Mon, 05 Jan 2009 18:38:47 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[Delimiters]]></category>
		<category><![CDATA[File]]></category>
		<category><![CDATA[Mapping]]></category>
		<category><![CDATA[b2b]]></category>
		<category><![CDATA[Character Delimited]]></category>
		<category><![CDATA[Comma Delimited]]></category>
		<category><![CDATA[csv]]></category>
		<category><![CDATA[Data Export]]></category>
		<category><![CDATA[Data Import]]></category>
		<category><![CDATA[delimiter]]></category>
		<category><![CDATA[EDI]]></category>
		<category><![CDATA[ETL]]></category>
		<category><![CDATA[Fixed Position]]></category>
		<category><![CDATA[Fixed Width Files]]></category>
		<category><![CDATA[Space]]></category>
		<category><![CDATA[spreadsheet]]></category>
		<category><![CDATA[White Space]]></category>

		<guid isPermaLink="false">http://www.theintegrationengineer.com/?p=5</guid>
		<description><![CDATA[What is a flat file?
Files are called &#8220;Flat Files&#8221; when they contain a single data structure.  Generally this structure is the column and row structure like a spreadsheet or table, but a file in binary or encrypted with a single encryption key could also be called a flat file.  Files that are not flat; marked [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-75" title="Flat File" src="http://www.theintegrationengineer.com/wp-content/uploads/2009/01/flatfile.jpg" alt="flatfile Flat Files" width="160" height="120" /><strong>What is a flat file?</strong></p>
<p>Files are called &#8220;Flat Files&#8221; when they contain a single data structure.  Generally this structure is the column and row structure like a spreadsheet or table, but a file in binary or encrypted with a single encryption key could also be called a flat file.  Files that are not flat; marked up files like XML or HTML, <a href="http://www.theintegrationengineer.com/what-is-edi/">EDI </a>files, other formats like HL7 or SEF files and others.  Here I am going to briefly discuss two flat file types; Delimited Files, and Fixed Width Files.<span id="more-5"></span></p>
<p><strong>What is a Delimited File?</strong></p>
<p>Ok, to describe it briefly, a delimited file is a file where the data is organized in rows and columns.  Each row has a set of data, and each column has a type of data.  If it sounds like I am describing a spreadsheet, you are right on the money.  To make the column, each row has the columns separated with a character called a delimiter.  See the example below.</p>
<p><img class="aligncenter size-full wp-image-72" title="Illustration of Delimited Data" src="http://www.theintegrationengineer.com/wp-content/uploads/2009/01/delimitedillustration.jpg" alt="delimitedillustration Flat Files" width="554" height="128" /></p>
<p>Tables of data and spreadsheets are both similar to a delimited file in the way they organize data.  In the delimited file all of the empty space, or white space is removed.  What we see here is a classic example of exporting a spreadsheet table as a comma delimited file.  In theory, this data can be imported by any other application that can read a delimited file.</p>
<p><em>Believe it of not, a space is a character, and takes up space in a file.  Back in the day people went out of their way to save space so that files could be send over slow modem connections.</em></p>
<p><strong>What is a Fixed Width File?</strong></p>
<p>There is another type of file, is is called a Fixed Width or Fixed Position file.  It is different from a delimited file in that the data fields are defined by the character position.  See the example below.</p>
<p><img class="aligncenter size-full wp-image-73" title="Fixed Width File Illustration" src="http://www.theintegrationengineer.com/wp-content/uploads/2009/01/wffileillustration.jpg" alt="wffileillustration Flat Files" width="570" height="132" /></p>
<p>In a fixed width file, the delimiter characters are eliminated.  If the data is formulated such that the data fields are the same size, this format can be more compact than a delimited file. You can see here that we know the size of the Birthdate data, so we eliminate all the spaces between the Bdate and Department fields.  If all of the data was formatted for size like this, we could really make this file small, so that it only contains the data.</p>
<p>We also eliminate the pesky problem of delimiters found in data.  The issue of a comma delimited file containing a field that has a comma in the data.  How does the parser know that this comma is not really a delimiter, but is part of the data?  Anyway, that problem is eliminated in a fixed width file.</p>
<p><strong>Comparison</strong></p>
<p>This is not a contest of which format is superior.  Both file architectures are useful and both are used commonly enough that you need to be at ease working with both.  Delimited files are really easy to work with as long as your data is clean of the delimiter character.  Doing quick integration of data common in ETL tasks, delimited files are far more common that Fixed Width.  Continuous operations of data integration and importation many times find that Fixed Width or Position files are more reliable for the unattended operation, even ETL if it is unattended.</p>
<p>As with many things in integration work, we want to pick the best option.  Knowing and working with both fixed and delimited files will help you determine which is the right choice for the task you have before you.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.theintegrationengineer.com/flat-files/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Nature of NULL</title>
		<link>http://www.theintegrationengineer.com/the-nature-of-null/</link>
		<comments>http://www.theintegrationengineer.com/the-nature-of-null/#comments</comments>
		<pubDate>Tue, 16 Dec 2008 21:26:35 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[data base]]></category>
		<category><![CDATA[NULL]]></category>
		<category><![CDATA[string]]></category>

		<guid isPermaLink="false">http://www.theintegrationengineer.com/?p=62</guid>
		<description><![CDATA[&#8220;Is not the beginning of wisdom the                words: &#8216;I do not know&#8217;?&#8221;
&#8211; Data, Star Trek: Next Generation:                   &#8220;Where Silence Has Lease&#8220;
If the [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-63" title="null_modem" src="http://www.theintegrationengineer.com/wp-content/uploads/2008/12/null_modem.jpg" alt="NULL Modem" width="100" height="104" /><span style="font-family: Trebuchet MS,Arial,Univers,Zurich BT,Verdana,Helvetica;"><em>&#8220;Is not the beginning of wisdom the                words: &#8216;I do not know&#8217;?&#8221;</em></span></p>
<blockquote><p>&#8211; Data, Star Trek: Next Generation:                   &#8220;<em>Where Silence Has Lease</em>&#8220;</p></blockquote>
<p>If the beginning of wisdom is to realize what it is that we do not know. NULL, by its definition is this not knowing.  We do not know what NULL is, this is why it is NULL.<span id="more-62"></span></p>
<p><strong>What is this?</strong></p>
<p>Null is not a number, or letter.  It may not even be a character.  Using Occam&#8217;s razor, things are either NULL or NOT NULL.  Things that are NULL are completely unknown at the time they are NULL.  And things that are NOT NULL are not completely unknown when they are NOT NULL.</p>
<p>I feel the hair splitting on my head, so let me explain one important point.  Things that are NULL are not destined to stay that way.  Where /dev/null/ is the black hole of output that we don&#8217;t need, NULL is not the data equivalent of a black hole.  Things that are currently NULL may become NOT NULL at any time, as soon as we know something about them.  Pretty much anything really.</p>
<p><strong>Empty String Theory</strong><img class="alignright size-full wp-image-64" title="Ball of String" src="http://www.theintegrationengineer.com/wp-content/uploads/2008/12/ballofstring.jpg" alt="Ball of String" width="100" height="66" /></p>
<p>Unlike string theory, which attempts to define the nature of the universe, the &#8216;empty string theory&#8217; is that empty strings are the same thing as NULL.  They are not.  If they were NULL, we would not know that they were strings.  This seems quite clear.  NULL is NULL and an empty string is NOT NULL.</p>
<p>&#8220;But wait!&#8221;  I hear a Database Developer cry.  &#8220;I can create a field in my database that is a string, and allow it to be NULL.&#8221;  And this is correct, but this is not a contradiction.  When you allow a database field to be NULL you have allowed it to receive NULL as an input.  But it may or may not give NULL as an output.  Because we can put NULL into a variable, when we query for that variable, element, whatever, we may get NULL, or we may get an empty string or property defined value of that data type of the field, that just happens to be empty.</p>
<p>In final reply to this inquiry I must observe that NULL is not equal to anything.  Any conditional statement where NULL is compared to something will be false.  Where, &#8220;&#8221; represents an empty string.  If one writes the statement, ["" == ""], or in English, &#8220;if an empty string is equal to an empty string,&#8221; it evaluates as a true statement.  However if one writes the statement [NULL == NULL], or in English, &#8220;if NULL is equal to NULL,&#8221; it evaluates to a false statement.  Since an empty string is equal to another empty string, but NULL is not equal to NULL, then NULL cannot be the same thing as an empty string.</p>
<p><strong>Why NULL is important</strong></p>
<p>Just as early civilizations had no concept of zero, early discourses of NULL struggle with it.  Where we can argue if you have zero apples, is the same as having nothing.  Yet it can be the absence of something that is known.  Zero can be having none of something.  But NULL is not even knowing what the something is that you don&#8217;t have.</p>
<p><img class="alignleft size-full wp-image-65" title="apple_bushel_small" src="http://www.theintegrationengineer.com/wp-content/uploads/2008/12/apple_bushel_small.jpg" alt="Bushel of Apples" width="56" height="60" /><em>An apple basket can contain 0 apples as a way to say, &#8220;Hey, this basket is for apples!&#8221;  So is 0 greater than NULL.  Because with null we don&#8217;t know what the basket is for.  We may not even have a basket at all.</em></p>
<p>There are a great number of operations that we couldn&#8217;t do well or efficiently if we didn&#8217;t have NULL.  So understanding what we are asking when we look for NULL is important.  Whenever we have a search for things that don&#8217;t match something else, we are searching for NULL, even if the values themselves are not NULL.  We do this a lot, and we couldn&#8217;t do it without NULL.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.theintegrationengineer.com/the-nature-of-null/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
