<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Integration Engineer &#187; Data</title>
	<atom:link href="http://www.theintegrationengineer.com/category/data/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.theintegrationengineer.com</link>
	<description>When it just has to work.</description>
	<lastBuildDate>Fri, 03 Feb 2012 00:21:58 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=</generator>
		<item>
		<title>Change Magento Order Customer Association</title>
		<link>http://www.theintegrationengineer.com/change-magento-order-customer-association/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=change-magento-order-customer-association</link>
		<comments>http://www.theintegrationengineer.com/change-magento-order-customer-association/#comments</comments>
		<pubDate>Wed, 08 Jun 2011 11:50:00 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[Database]]></category>
		<category><![CDATA[Magento]]></category>
		<category><![CDATA[associate order]]></category>
		<category><![CDATA[customer account]]></category>
		<category><![CDATA[mysql]]></category>

		<guid isPermaLink="false">http://www.theintegrationengineer.com/?p=2315</guid>
		<description><![CDATA[Ever have a customer contact you because they forgot to place their order with their registered profile?  Well I have.  Alot. And Magento doesn&#8217;t have an &#8220;out of the box&#8221; way to modify the order to customer profile relationship even though the solution is really simple. Simple that is, if you are comfortable writing queries [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.theintegrationengineer.com/wp-content/uploads/2010/04/magento_button.png"><img class="alignleft size-full wp-image-1128" src="http://www.theintegrationengineer.com/wp-content/uploads/2010/04/magento_button.png" alt="magento button Change Magento Order Customer Association" width="89" height="89" title="Change Magento Order Customer Association" /></a>Ever have a customer contact you because they forgot to place their order with their registered profile?  Well I have.  Alot.</p>
<p>And Magento doesn&#8217;t have an &#8220;out of the box&#8221; way to modify the order to customer profile relationship even though the solution is really simple.</p>
<p>Simple that is, if you are comfortable writing queries in the database.</p>
<p><span id="more-2315"></span></p>
<p>Here are the steps to do this once you are in the mysql database:</p>
<p><strong>Associate order with Customer Account</strong></p>
<p style="padding-left: 30px">1.  Get customer account entity_id based on email:</p>
<p style="padding-left: 60px">SELECT entity_id FROM customer_entity WHERE email = &#8216;[email address]&#8216;;</p>
<p style="padding-left: 60px">You can also pluck this id from the end of the url in the manage customers section of the admin panel if you are already there.</p>
<p style="padding-left: 30px">2.  Verify Order</p>
<p style="padding-left: 60px">SELECT entity_id, email, customer_id, created_at  FROM sales_order WHERE increment_id = [order number];</p>
<p style="padding-left: 60px">This step is really to verify that it is in fact the right order.  Associating the wrong order with the customer will make your store look less secure.</p>
<p style="padding-left: 30px">3.  Associate Order with Customer</p>
<p style="padding-left: 60px">UPDATE sales_order SET customer_id = [customer entity id] WHERE increment_id = [order number];</p>
<p style="padding-left: 60px">This sets the association up so that the customer can now see the status and order data from their customer profile.</p>
<p>And thats it.  The customer to order relationship has been corrected.</p>
<p>(Note, this will not change the listing of how the order was placed.  If placed as guest, the order view will still show this.  But I have never found this to be important.)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.theintegrationengineer.com/change-magento-order-customer-association/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Change Magento Order Email</title>
		<link>http://www.theintegrationengineer.com/change-magento-order-email/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=change-magento-order-email</link>
		<comments>http://www.theintegrationengineer.com/change-magento-order-email/#comments</comments>
		<pubDate>Thu, 02 Jun 2011 12:05:37 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[Database]]></category>
		<category><![CDATA[Magento]]></category>
		<category><![CDATA[Customer]]></category>
		<category><![CDATA[ecommerce]]></category>
		<category><![CDATA[email]]></category>
		<category><![CDATA[mysql]]></category>

		<guid isPermaLink="false">http://www.theintegrationengineer.com/?p=2275</guid>
		<description><![CDATA[Sometimes a customer will input a typo when entering their email address in Magento.  If they do this they won&#8217;t get their email notifying them of the order.  (if you have this setup.) And other things may not happen as expected and or desired. Out of the box, the Magento admin does not have the [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.theintegrationengineer.com/wp-content/uploads/2010/04/magento_button.png"><img class="alignleft size-full wp-image-1128" src="http://www.theintegrationengineer.com/wp-content/uploads/2010/04/magento_button.png" alt="magento button Change Magento Order Email" width="89" height="89" title="Change Magento Order Email" /></a>Sometimes a customer will input a typo when entering their email address in Magento.  If they do this they won&#8217;t get their email notifying them of the order.  (if you have this setup.) And other things may not happen as expected and or desired.</p>
<p>Out of the box, the Magento admin does not have the ability to change this.  But if you are comfortable in the database, then these steps will let you make this change in seconds.</p>
<p><span id="more-2275"></span></p>
<p style="padding-left: 30px">1. Get order entity id</p>
<p style="padding-left: 60px">SELECT entity_id FROM sales_order WHERE increment_id = [order number];</p>
<p style="padding-left: 30px">2. Find order email address and get it&#8217;s value_id</p>
<p style="padding-left: 60px">SELECT value_id, value FROM sales_order_varchar WHERE entity_id = [order entity id];</p>
<p style="padding-left: 60px">This will return many values.  The email address should look distinctively like an email address.   In my current instance, the attribute ID is 124, but that may or may not be the same for you.  In any case, what we are looking for is the value_id for the email address that we will use in the next step.</p>
<p style="padding-left: 30px">3. Update email</p>
<p style="padding-left: 60px">UPDATE sales_order_varchar SET  value = &#8216;[new email address]&#8216; WHERE value_id = [value id for the email address];</p>
<p style="padding-left: 60px">
<p>And that is it, the email address has been corrected and the customer will get future emails relating to this order.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.theintegrationengineer.com/change-magento-order-email/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What is a Database</title>
		<link>http://www.theintegrationengineer.com/what-is-a-database/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=what-is-a-database</link>
		<comments>http://www.theintegrationengineer.com/what-is-a-database/#comments</comments>
		<pubDate>Mon, 08 Jun 2009 16:26:56 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[Database]]></category>
		<category><![CDATA[embeded database]]></category>
		<category><![CDATA[external database]]></category>
		<category><![CDATA[Hirarchial Database]]></category>
		<category><![CDATA[query]]></category>
		<category><![CDATA[RDBMS]]></category>
		<category><![CDATA[Relational Database]]></category>
		<category><![CDATA[schema]]></category>
		<category><![CDATA[Single Table]]></category>
		<category><![CDATA[spreadsheet database]]></category>
		<category><![CDATA[Table]]></category>
		<category><![CDATA[tree]]></category>

		<guid isPermaLink="false">http://www.theintegrationengineer.com/?p=106</guid>
		<description><![CDATA[Definition:  A database is a structured collection of records or data. Many moons ago, I was shown an database that was constructed back in the 50s.  It was hand held and consisted of a stack of cards that contained information about plant biology.  There was a series of holes that wrapped around the cards, some [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Definition</strong>:  A <a href="http://en.wikipedia.org/wiki/Database"><em>database</em></a> is a structured collection of records or data.</p>
<p>Many moons ago, I was shown an database that was constructed back in the 50s.  It was hand held and consisted of a stack of cards that contained information about plant biology.  There was a series of holes that wrapped around the cards, some of the holes were notches, and others were not.</p>
<p>To query the database, one placed a pin through one of the wholes and let the cards that had a notch fall out of the stack.  You could continue this process by removing and adding pins to select different combination of data.</p>
<p>Once you had narrowed your search, you could read the cards, and retrieve the data.</p>
<p>This was a very manual process, and I doubt that anyone today would take the time to learn to use such a tool, let alone update or add records to it.<span id="more-106"></span></p>
<p><strong></strong><img class="alignleft size-full wp-image-116" title="ssn_databasejpeg" src="http://www.theintegrationengineer.com/wp-content/uploads/2009/04/ssn_databasejpeg.jpeg" alt=" What is a Database" width="250" height="200" /><strong>More Modern Databases</strong></p>
<p>Replacing Card Catalogs in libraries, Rolodex on desktops, and filing rooms in offices, is our new electronic concept of databases.  At a rudimentary level, and fitting under the definition of the database that we started with fall many simple documents.</p>
<p>A post-it note is not a database, but a list of names and numbers on a contact list is.  It has regular structure, and the data can be retrieved using a routine.  And this is as far as most people need to go.  They see databases like spreadsheets.  They even use filters and link cells for calculations.</p>
<p><strong>More than just a list</strong></p>
<p>A database that just holds a list of records is a single table Database.  This is like a spreadsheet and the only real advantage is that Databases on computers are generally designed to be queried by other applications.  The contact list in your email client is not much more than a flat table.  But it is queried and updated from the email application.  This makes the application more useful.</p>
<p>Again, long ago, I had a watch that held phone numbers.  This worked pretty well, and had an alpha-numeric keyboard.  But the data was in a silo.  There was no way to get the data in or out other than the small screen and keypad.  Application bound, embedded databases are also silos of data.</p>
<p>Many applications now use, or have the option to use, an embedded database.  But databases that can be used by more than one application, and are treated like an object are even more useful.  When a database is shared between applications, we can gain exponential utility.  Every new application that can query the shared database becomes more efficient.</p>
<p><strong>RDBMS More than a single table.</strong></p>
<p>One of the things that we will discuss more later, is the concept of a relational database.  What this means is that we have more than one table, that have related fields.  A analogous example is the an email system with an integrated calender.  Now we have the schema or table that contains the contact and relate them to people invited to the events.  These systems have been around for quite a while and gives us a conceptual model of how related tables bring power to a Relational Database Management System (RDBMS).</p>
<p><strong>Data in a Hierarchy<br />
</strong></p>
<p>RDBMS is not the only way to organize data.  One way that some databases organize data is in a hierarchical schema.  In this fashion instead of tables, we have a conceptual model like a directory tree.  Data is located in a name space that is organized like files in directories.  Data in this type of structure gains some implicit structure and relationship as a parent child relationships exists in the data naturally.  For small sets of data this can make the database small and fast.  However the ability to build abstract structures, relationships and queries are more difficult with this structure, and the database really grows when extended to cover the features of multiple relational data reference.</p>
<p><strong>All Shapes and Sizes</strong></p>
<p>More basic than the most simple table or hierarchy based database is the file based system.  This is really what it sounds like.  A simple text file or an XML file can be used as a database.  These can support queries, inserts and joins.  Even though the database is in a standard file that can be opened and accessed with a common text editor, it has a structure and can be used to keep the application data organized.</p>
<p><strong>Databases for the Integration Engineer</strong></p>
<p>As integration Engineers, we will sometimes find ourselves working directly with and application&#8217;s database.  Sometimes this database will be a full blown enterprise level database implementation.  Many times, however, it will be something more simple like an embedded table or xml file that the application treats as its database.  Whatever the case, we need to learn and respect the structure of the data, and work to ensure that our integration preserves the integrity of the data structure.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.theintegrationengineer.com/what-is-a-database/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Who Owns Data</title>
		<link>http://www.theintegrationengineer.com/who-owns-data/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=who-owns-data</link>
		<comments>http://www.theintegrationengineer.com/who-owns-data/#comments</comments>
		<pubDate>Wed, 29 Jun 2011 12:25:59 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[b2b]]></category>
		<category><![CDATA[Data]]></category>
		<category><![CDATA[Supply Chain]]></category>

		<guid isPermaLink="false">http://www.theintegrationengineer.com/?p=2380</guid>
		<description><![CDATA[As someone that has been involved in the supply chain industry, this questions has come up more than once, and at more than one place of business. For both vendors and retailers, having access to better, broader and more accurate information is worth money.  And is sometimes the difference between life and death of a [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.theintegrationengineer.com/wp-content/uploads/2009/01/delimitedillustration.jpg"><img class="alignleft size-thumbnail wp-image-72" title="Illustration of Delimited Data" src="http://www.theintegrationengineer.com/wp-content/uploads/2009/01/delimitedillustration-150x150.jpg" alt="delimitedillustration 150x150 Who Owns Data" width="150" height="150" /></a>As someone that has been involved in the supply chain industry, this questions has come up more than once, and at more than one place of business.  For both vendors and retailers, having access to better, broader and more accurate information is worth money.  And is sometimes the difference between life and death of a business in a competitive market.</p>
<p>When businesses realize that the catalog data that they have may be worth as much as the products or services that they sell, they may be tempted to sell that instead.  Before they do, they need to think about who owns the data.</p>
<p><span id="more-2380"></span></p>
<p>Lets say hypothetically that there is an integration tool that helps people that sell widgets, get widget pricing and data from people that make widgets.  And this integration tool also helps people that sell widgets, place orders, and people who make widgets, invoice and collect payment from those that sell.  The data that passes accross this integration tool is, in itself valuable to both the people that sell the widgets, and the people that buy widgets.</p>
<p>Other companies making widgets would be interested to know that there are people selling widgets that are getting shipping delays when they order from other widget makers.  And people who sell widgets would like to know if there are people that make widgets cheaper than from those they are buying from.</p>
<p>But if I make widgets, I don&#8217;t want other widget makers to see what my prices are.   If I sell widgets, I don&#8217;t want other widget sellers to know what I am paying for my widgets.  This information is something that makes me able to compete.  It is the type of information that the widget maker and seller want to own, and control when and if it is released.</p>
<p>But there is one person that knows all of this information about all of the widget makers and sellers.   That is the person that has access to the data of the integration tool.  It is also likely that the manufacturer and retailer have signed contracts about the disclosure of the information.  This information is confidential and can not be disclosed to other parties.</p>
<p>I was involved with one company that sold the information or information about it back to the manufacturers and retailers.  Since we had all of the data, we could easily report when there were discrepancies between orders, and shipments.  So we reported to the retailer patterns of  order anomalies they were having with all of their vendors.  We also reported catalog errors to the vendors so they could see what retailers continued to use bad or old part numbers.</p>
<p>And then the tricky part.  We convinced the vendors and retailers to let us have access to their data for &#8220;Industry Analytic&#8221;  products.  For this the information became abstracted so that no one could see information about a specific retailer or vendor.  But they could see industry trends or get a score card reports of a variety of metrics relating to their supply chain health.</p>
<p>This was a fun project to be a part of.  And the company I was with created whole new products and revenue streams.  But you don&#8217;t have to be a supply chain integration company to make use of data that you own.  If you have vendors here are a three questions that you can ask your database people to provide you with.</p>
<ol>
<li>Order time to Fulfillment time.</li>
<li>Back Order or Out of Stock frequency.</li>
<li>Order Price to Invoice price accuracy.</li>
</ol>
<p>And there are more questions to ask.  Asking them can help you to see places where your supply chain is leaky, and can help you start to leverage data that you own to improve your own bottom line.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.theintegrationengineer.com/who-owns-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Learn XML in 4 Minutes</title>
		<link>http://www.theintegrationengineer.com/learn-xml-in-4-minutes/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=learn-xml-in-4-minutes</link>
		<comments>http://www.theintegrationengineer.com/learn-xml-in-4-minutes/#comments</comments>
		<pubDate>Fri, 03 Jun 2011 12:13:47 +0000</pubDate>
		<dc:creator>EvilRobot</dc:creator>
				<category><![CDATA[EvilRobot]]></category>
		<category><![CDATA[XML]]></category>
		<category><![CDATA[Youtube]]></category>
		<category><![CDATA[Learn]]></category>
		<category><![CDATA[Minutes]]></category>
		<category><![CDATA[Video]]></category>
		<category><![CDATA[YouTube]]></category>

		<guid isPermaLink="false">http://www.theintegrationengineer.com/?p=2292</guid>
		<description><![CDATA[If only it were really that simple. Well, maybe it should be. The evil robot found this video, and thinks it is good enough to share. But don&#8217;t get your hopes up. This lays out some foundational information, but there are so many ways to use XML that this is well short of a complete [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.theintegrationengineer.com/wp-content/uploads/2009/05/xml-tag.jpg"><img class="alignleft size-full wp-image-130" title="xml-tag" src="http://www.theintegrationengineer.com/wp-content/uploads/2009/05/xml-tag.jpg" alt="xml tag Learn XML in 4 Minutes" width="282" height="197" /></a>If only it were really that simple.</p>
<p>Well, maybe it should be.  The evil robot found this video, and thinks it is good enough to share.  But don&#8217;t get your hopes up.  This lays out some foundational information, but there are so many ways to use XML that this is well short of a complete overview.</p>
<p>But it was still fun to watch.</p>
<p>Once.<span id="more-2292"></span></p>
<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="355" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="src" value="http://www.youtube.com/v/saQK7SlFiho?fs=1" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="425" height="355" src="http://www.youtube.com/v/saQK7SlFiho?fs=1" allowfullscreen="true"></embed></object></p>
<p><strong>Video Description:</strong></p>
<p>&#8220;Here you will learn what you NEED to know about XML. You will learn: XML syntax XML and CSS You will not learn XML and XSLT or JavaScript, because that is not necessary and it is very complicated. Have fun with YOUR commands.&#8221;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.theintegrationengineer.com/learn-xml-in-4-minutes/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Mapping Excersizes: EDI Invoice to Open Office Tables (part Two)</title>
		<link>http://www.theintegrationengineer.com/mapping-excersizes-edi-invoice-to-open-office-tables-part-two/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=mapping-excersizes-edi-invoice-to-open-office-tables-part-two</link>
		<comments>http://www.theintegrationengineer.com/mapping-excersizes-edi-invoice-to-open-office-tables-part-two/#comments</comments>
		<pubDate>Wed, 22 Jul 2009 03:33:44 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[Mapping Exercise]]></category>
		<category><![CDATA[conversion]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[EDI]]></category>
		<category><![CDATA[Invoice]]></category>
		<category><![CDATA[Mapping]]></category>
		<category><![CDATA[Open Office]]></category>

		<guid isPermaLink="false">http://www.theintegrationengineer.com/?p=149</guid>
		<description><![CDATA[Continuing Mapping Exercise Today we will identify our data source, and begin mapping the source data to the target data.  We identified our target format and placed that in the paper map last time.  If you didn&#8217;t read that post yet, you might want to review it quickly before continuing.  (read part One) The Source [...]]]></description>
			<content:encoded><![CDATA[<p><strong><img class="alignleft size-full wp-image-276" title="mapping_pzl" src="http://www.theintegrationengineer.com/wp-content/uploads/2009/07/mapping_pzl.jpg" alt="mapping pzl Mapping Excersizes: EDI Invoice to Open Office Tables (part Two)" width="201" height="108" />Continuing Mapping Exercise</strong></p>
<p>Today we will identify our data source, and begin mapping the source data to the target data.  We identified our target format and placed that in the paper map last time.  If you didn&#8217;t read that post yet, you might want to review it quickly before continuing.  (<a href="mapping-excersizes-edi-invoice-to-open-office-tables-part-one">read part One</a>)</p>
<p><span id="more-149"></span></p>
<p><strong>The Source</strong></p>
<p>The source is an EDI Invoice.  We will pick an X12 4010 810 as our standard, and I am providing you a link to the Standards document here.  This is a sample standard and does not contain an exhaustive list of elements, just the ones that we are using.   <a href="http://www.theintegrationengineer.com/wp-content/uploads/2009/07/TIE810.pdf">TIE810</a></p>
<p>We then insert the source elements for the data that we have in this way, starting with the most straight forward data.  I am starting at the bottom and going up.  IT1_04 is the Unit Price.  And it will map directly.  So do ProductID and Quantity.  Quantity from IT1_02 and Product ID from IT1_07 where IT1_06 is &#8220;VC&#8221; (I will also look on down the line at other odd numbered elements greater than 7 that have a preceding qualifier of &#8220;VC&#8221;, this allows the execution of the map to be more flexible if the invoice line item is formatted freely.  You may not be able to do this depending on your mapping technology.)</p>
<p>After the easy ones, the direct mapping ones, we get to the concatenated ones.  InvoiceID is the concatenation of the ISA_06, ISA_08, ISA_12, GS_06,  ST_02 and BIG_02.  This is straightforward as well.</p>
<p><img class="alignnone size-full wp-image-309" title="InvoiceMapping2" src="http://www.theintegrationengineer.com/wp-content/uploads/2009/07/InvoiceMapping2.jpg" alt="InvoiceMapping2 Mapping Excersizes: EDI Invoice to Open Office Tables (part Two)" width="597" height="438" /></p>
<p><strong>Conversion</strong></p>
<p>The status will be pulled from the BIG_09 and we will want to convert this value.  Here is where the fun begins.  We may say, &#8220;Only this subset of values is accepted.&#8221;  List a set in our usage spec, and fail any that don&#8217;t comply.  Or we can try to map all of the invoices status that are in the standard to an internal status.  But this also leaves us with a possible gap if our Trading Partner doesn&#8217;t comply with the standard.<a href="http://65e92d0uv89gefp2xcimn8dp2a.hop.clickbank.net/" target="_top"><img class="alignright size-full wp-image-116" title="ssn_databasejpeg" src="http://www.databasedesign-resource.com/images/NormalizationBook.jpg" alt="NormalizationBook Mapping Excersizes: EDI Invoice to Open Office Tables (part Two)" width="176" height="338" /></a></p>
<p>Or we can split the difference.  We map status codes that make sense to our system to their corresponding status.  And for all others, we map them to &#8220;OTHER&#8221; and note what they were into the Notes Field.  Doing this retains the data for someone to look at later.  It prevents failures on this point, and keeps the status from becoming abstracted from the original intent.</p>
<p><strong>Homework time</strong></p>
<p>Go ahead and map the other straight forward data, and document any of the transformation rules that you can.  When you are done you can check out the map that I completed up to this point.  They don&#8217;t have to be exactly the same, and only look at the example after you have given a go at doing it yourself.  Next week we will finish the mapping and talk about ways to solve some of the problems faced by the more complex and missing data.</p>
<p>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.theintegrationengineer.com/mapping-excersizes-edi-invoice-to-open-office-tables-part-two/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mapping Excersizes: EDI Invoice to Open Office Tables (part One)</title>
		<link>http://www.theintegrationengineer.com/mapping-excersizes-edi-invoice-to-open-office-tables-part-one/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=mapping-excersizes-edi-invoice-to-open-office-tables-part-one</link>
		<comments>http://www.theintegrationengineer.com/mapping-excersizes-edi-invoice-to-open-office-tables-part-one/#comments</comments>
		<pubDate>Tue, 07 Jul 2009 17:30:24 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[Database]]></category>
		<category><![CDATA[EDI]]></category>
		<category><![CDATA[Mapping]]></category>
		<category><![CDATA[Mapping Exercise]]></category>
		<category><![CDATA[Invoice]]></category>
		<category><![CDATA[Open Office]]></category>
		<category><![CDATA[Paper Map]]></category>
		<category><![CDATA[target]]></category>

		<guid isPermaLink="false">http://www.theintegrationengineer.com/?p=37</guid>
		<description><![CDATA[This is a mapping exercise that will go through the process of creating a paper map, or mapping document.  We will start with an empty paper map that you can get here.  And we will end with a completed paper map document that documents what data from the source goes into what fields on the [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-162" title="math" src="http://www.theintegrationengineer.com/wp-content/uploads/2009/06/math.jpg" alt="math Mapping Excersizes: EDI Invoice to Open Office Tables (part One)" width="173" height="173" />This is a mapping exercise that will go through the process of creating a paper map, or mapping document.  We will start with an empty paper map that you can get <a href="http://www.theintegrationengineer.com/tool-box/#papermap">here</a>.  And we will end with a completed paper map document that documents what data from the source goes into what fields on the target.  This process will take more than one post, and I will link them together so that you can follow from one to the next.  Along the way, we will discuss the things that we are doing so that you can apply this technique in your mapping using the target and source in your own mapping tasks.<span id="more-37"></span></p>
<p><strong>The Target</strong></p>
<p>Open Office tables are divided into two tables. <em> Invoice</em> and <em>Invoice Details</em>.  This can be mapped in two ways; The first way is to map the data into one common format and rely on whatever ETL tool is importing the data to catch and split the data.  Second, a key can be acquired or constructed in the transformation and then the the data can be divided in to matching input formats.  Then when these inputs are moved into the database, they will relate to each other on this key.<a href="http://12967i2wnvdrcfsnl1fdmv2gvr.hop.clickbank.net/" target="_top"><img class="alignright size-full" src="http://www.pdf-creator.us/images/m-softbox.jpg" width="176" height="338" title="Mapping Excersizes: EDI Invoice to Open Office Tables (part One)" alt="m softbox Mapping Excersizes: EDI Invoice to Open Office Tables (part One)" /></a></p>
<p>The choice of how you will do this will depend on your environment.  Questions like, &#8220;Will I have enough data to provide a unique key?&#8221;, or &#8220;is there a way to get a key with an API call or database query?&#8221;  The answers to these questions will determine what course you will take.</p>
<p>If the system ultimately receiving the data is asynchronous to the transformation, and you need to send the invoice and invoice details data separately, some care needs to be taken to ensure that the data can be related after it is separated.</p>
<p>So what data in the invoice can be used to tie the invoice to the invoice details?  The first answer might be, &#8220;The Invoice Number.&#8221;  But this number is not guaranteed to be unique across multiple vendors.  In EDI and cXML there are document unique identifiers.  Since we are using EDI, we can use a combination of the ISA Sender, Receiver, and Control number.  We will also want to use the GS control number, and ST control number in the event that more than one invoice is sent in a single <a href="http://www.theintegrationengineer.com/edi-enveloping-part-one/">envelope</a>.  And we might as well tack on the actual invoice number from the BIG_02.</p>
<p style="padding-left: 30px;"><em><strong>Database Tables</strong></em></p>
<table border="0" cellspacing="10">
<tbody>
<tr valign="top">
<td>
<p style="padding-left: 30px;">Invoice Table:</p>
</td>
<td>
<p style="padding-left: 30px;">Invoice Details Table:</p>
</td>
</tr>
<tr valign="top">
<td><img class="aligntopsize-full wp-image-54" title="Invoice Table Definition" src="http://www.theintegrationengineer.com/wp-content/uploads/2008/11/invoicetabledef.jpg" alt="invoicetabledef Mapping Excersizes: EDI Invoice to Open Office Tables (part One)" width="203" height="205" /></td>
<td><img class="aligntop size-full wp-image-55" title="Invoice Details Table Definition" src="http://www.theintegrationengineer.com/wp-content/uploads/2008/11/invoicedetailstabledef.jpg" alt="invoicedetailstabledef Mapping Excersizes: EDI Invoice to Open Office Tables (part One)" width="203" height="148" /></td>
</tr>
</tbody>
</table>
<p><strong>The Paper Map</strong></p>
<p>Now that we know what the target looks like, we fill out the target side of the paper map.  Since we will create two &#8220;files&#8221; in our output.  We are creating the Invoice and the InvoiceDetails file, but we can use one paper map for both, and will distinguish this with a bar between the two &#8220;files&#8221;.  (I am saying files, but this could be a queue, or a post, or an insert over odbc, etc)</p>
<p style="text-align: center;"><img class="size-full wp-image-181 aligncenter" title="invoiceMap_target" src="http://www.theintegrationengineer.com/wp-content/uploads/2009/06/invoiceMap_target.png" alt="invoiceMap target Mapping Excersizes: EDI Invoice to Open Office Tables (part One)" width="281" height="435" /></p>
<p><strong>What&#8217;s Next</strong></p>
<p>Today we went through the process of identifying the target, and creating a paper map with the target format identified.  We talked about some of the strategy that we use in deciding what to map and how to map it.  Next time we will identify the source, and begin mapping data from the source.</p>
<p><strong> </strong></p>
]]></content:encoded>
			<wfw:commentRss>http://www.theintegrationengineer.com/mapping-excersizes-edi-invoice-to-open-office-tables-part-one/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>5 Tools of an Integration Engineer</title>
		<link>http://www.theintegrationengineer.com/5-tools-of-an-integration-engineer/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=5-tools-of-an-integration-engineer</link>
		<comments>http://www.theintegrationengineer.com/5-tools-of-an-integration-engineer/#comments</comments>
		<pubDate>Sat, 13 Jun 2009 17:30:32 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[Database]]></category>
		<category><![CDATA[File]]></category>
		<category><![CDATA[Calc]]></category>
		<category><![CDATA[editor]]></category>
		<category><![CDATA[Excel]]></category>
		<category><![CDATA[notepad]]></category>
		<category><![CDATA[Open Office]]></category>
		<category><![CDATA[spread sheet]]></category>
		<category><![CDATA[spreadsheet]]></category>
		<category><![CDATA[SQL Worksheet]]></category>
		<category><![CDATA[squirrel]]></category>
		<category><![CDATA[techrepublic]]></category>
		<category><![CDATA[text]]></category>
		<category><![CDATA[text pad]]></category>
		<category><![CDATA[textpad]]></category>
		<category><![CDATA[TOAD]]></category>
		<category><![CDATA[tools]]></category>
		<category><![CDATA[ultra Edit]]></category>
		<category><![CDATA[ultraedit]]></category>
		<category><![CDATA[vi]]></category>
		<category><![CDATA[white board]]></category>
		<category><![CDATA[whiteboard]]></category>

		<guid isPermaLink="false">http://www.theintegrationengineer.com/?p=115</guid>
		<description><![CDATA[There are job or task specific tools that will have a high importance to each integration task.  When working on an SAP system, your SAP tools will be very important.  But there are tools and skills that are also important regardless of the systems and technologies that you are working on.  For me, these are [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-144" title="tool_pile_puzzlepiece1" src="http://www.theintegrationengineer.com/wp-content/uploads/2009/06/tool_pile_puzzlepiece1.jpg" alt="tool pile puzzlepiece1 5 Tools of an Integration Engineer" width="131" height="108" />There are job or task specific tools that will have a high importance to each integration task.  When working on an SAP system, your SAP tools will be very important.  But there are tools and skills that are also important regardless of the systems and technologies that you are working on.  For me, these are the top 5 tools that an Integration Engineer should be able to use proficiently.  Do you use any of these?  Do you have others?<span id="more-115"></span></p>
<p><strong>1.  A big Whiteboard</strong></p>
<p>This is probably my number one requirement.  When I am thinking, I like to draw it out.  I haven&#8217;t found an application that gives me the same creative release and adaptability as my whiteboard.  After starting to <img class="alignright size-medium wp-image-122" title="whiteboard" src="http://www.theintegrationengineer.com/wp-content/uploads/2009/05/whiteboard-300x300.jpg" alt="whiteboard 300x300 5 Tools of an Integration Engineer" width="146" height="146" />do some of my work from home, I went out and acquired a 10 X 4 whiteboard to put in my home office.  Having the extra room is essential.  I can get a call, and walk over to the whiteboard and start drawing out what needs to be built to solve the problems while I am still on the call.</p>
<p>You may not need a white board as big as mine.  And you may have an electronic solution that you like better.  If so, please let me know, I like to try out gadgets.</p>
<p><strong>2.  Spreadsheet</strong></p>
<p>Now this may not sound earth shattering, but we are not just talking about the basics.  You will need to learn to write filters, import data, link cells and perform calculations.  If you think that a spreadsheet is like a ledger, then you are missing the power of a spreadsheet.  If you don&#8217;t think you have the skills you need, here are some links to tutorials for the two most popular spreadsheet software.</p>
<ul>
<li>Very Basic Open Office Tutorial <a href="http://www.tutorialsforopenoffice.org/tutorial/Spreadsheet_Basics.html">http://www.tutorialsforopenoffice.org/</a></li>
<li>More specific/advanced tutorials <a href="http://openoffice.blogs.com/openoffice/">http://openoffice.blogs.com/openoffice/</a></li>
<li>Some basic and advanced help for MS Excel <a href="http://www.internet4classrooms.com/on-line_excel.htm">http://www.internet4classrooms.com</a></li>
</ul>
<p><strong>3.  Text editor.</strong></p>
<p>Familiarity with more than one is needed as you will find yourself on Windows and Unix servers and they will have different sets of tools.  One of the things that is the most frustrating is to not know how to use the native editor of the system you are on.  So get familiar with Notepad, and then get familiar with VI.  You can add in other tools like UltraEdit, TextPad, and more, but you should know the native ones first, if not best<strong>.</strong></p>
<ul>
<li>Ultra Edit is a popular tool in some circles.  <a href="http://www.ultraedit.com/">http://www.ultraedit.com/</a></li>
<li>Text Pad is also popular.  <a href="http://www.textpad.com/">http://www.textpad.com/</a></li>
<li>Believe it or not, notepad has some tutorials.  <a href="http://bink.nu/news/notepad-tips-and-tricks.aspx">http://bink.nu/news</a></li>
<li>And here is a cheat sheet for VI commands.  <a href="http://downloads.techrepublic.com.com/abstract.aspx?docid=172404">http://downloads.techrepublic.com</a></li>
</ul>
<p><strong>4.  File compare.</strong></p>
<p>One task that will need to be done is to compare a file with another file to detect changes or differences.  This happens in both new integrations, and in trouble shooting or investigating existing ones.  Some systems have native comparison tools, others don&#8217;t.  And there is much variety in how they work.  Here are some of the tools I have seen and used.</p>
<ul>
<li><a href="http://en.wikipedia.org/wiki/Diff">Wiki</a> has a great list.  I don&#8217;t know that I can improve on.</li>
</ul>
<p><strong>5.  DB query </strong></p>
<p>Much of the time, integration involves a database somewhere sometime.  To work on an integration and not have to have a DBA sitting in your lap, you need to be competent to query the database.  And this involves using some tool.  Sometimes systems will have native tools, other times you will need to connect your own.  Here is a list of tutorials for DB Query tools that I have seen and used.</p>
<ul>
<li>Squirrel SQL <a href="http://squirrel-sql.sourceforge.net/">http://squirrel-sql.sourceforge.net/</a></li>
<li>SQL Worksheet</li>
<li><a href="http://www.toadsoft.com">TOAD<br />
</a></li>
</ul>
<p><strong>Summary</strong></p>
<p>This is by no means attempting to be a comprehensive list of tools.  Such a list would be long, if it were possible.  And I don&#8217;t think that it is.  There are however some tools/skills that help us to be more effective as Integration Engineers.  Seeing what others use is helpful especially when we find that the tool we used to use is not longer around.</p>
<p>What tools, applications, or skills do you find you are falling back on often, on more than one project?  Or what new tools have you found that you think you will use often in the future?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.theintegrationengineer.com/5-tools-of-an-integration-engineer/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mapping Exercise:  EDI to Flat file.</title>
		<link>http://www.theintegrationengineer.com/mapping-exercise-edi-to-flat-file/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=mapping-exercise-edi-to-flat-file</link>
		<comments>http://www.theintegrationengineer.com/mapping-exercise-edi-to-flat-file/#comments</comments>
		<pubDate>Tue, 24 Feb 2009 05:32:51 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[Mapping Exercise]]></category>
		<category><![CDATA[EDI]]></category>
		<category><![CDATA[Flat File]]></category>
		<category><![CDATA[Invoice]]></category>
		<category><![CDATA[Mapping]]></category>
		<category><![CDATA[Mapping Excersize]]></category>
		<category><![CDATA[source]]></category>
		<category><![CDATA[target]]></category>

		<guid isPermaLink="false">http://www.theintegrationengineer.com/?p=34</guid>
		<description><![CDATA[Introduction: This is a quick exercise to familiarize you with mapping from an EDI file to a Flat File.  If you are new to mapping, or want an idea of what mapping EDI will be like, this exercise should be a good place to start.  If you are familiar with mapping this should be a [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-98" title="sextant" src="http://www.theintegrationengineer.com/wp-content/uploads/2009/02/sextant.jpg" alt="sextant Mapping Exercise:  EDI to Flat file." width="119" height="117" /><strong>Introduction:</strong> This is a quick exercise to familiarize you with mapping from an EDI file to a Flat File.  If you are new to mapping, or want an idea of what mapping EDI will be like, this exercise should be a good place to start.  If you are familiar with mapping this should be a quick review with a few tips.  I use Target Based mapping.  Check out my post on Target Mapping <a href="http://www.theintegrationengineer.com/data-mapping/">here</a>.  If EDI is unfamiliar and you need some basic information, my EDI primer is <a href="http://www.theintegrationengineer.com/category/edi/the-edi-primer/">here</a>.  If you are ready to map, and understand EDI basics, then lets get started.</p>
<p><span id="more-34"></span></p>
<p><strong>Mapping:</strong></p>
<p>I practice and advocate the <a href="http://www.theintegrationengineer.com/data-mapping/">target oriented mapping</a> approach.  So we will start with defining the target.  We will do this in our 6 column spreadsheet mapping template.  This is also referred to as a Paper Map.  To download a blank template, <a href="http://www.theintegrationengineer.com/wp-content/uploads/2009/02/basepapermap.xls">click here</a>.  Once we have the target defined, we will add the source to the spreadsheet.  After both the source and the target have their fields on the Paper Map, we add the rules.  We will also need to insert some control structures to make everything flow correctly.  Once this is done we will have addressed the data format issues and constructing the map for our translator becomes just a technical exercise.</p>
<p><strong>The Flat File Target:</strong></p>
<p>What are we mapping?  This is the first question that the target defines.  In this example it is an invoice.  So we will want the following information in the following format.</p>
<p>RecordID|VendorID|DateOfInvoice|InvoiceNumber|DateOfPO|PONumber|POLineNumber|QTY|</p>
<p>~UnitCost|LineTotal|InvoiceTotal</p>
<p>Here are the parameters for these values:</p>
<p><strong>RecordID</strong>:  A unique alpha-numeric.  It is derived by contacting the ISA_Sender, ISA_Control_Number, GS_Sender, GS_Control_Number, ST_Control_Number, and Line_Number where the line number exists.  Max size of 74 characters.</p>
<p><strong>VendorID</strong>:  The ISA_Sender ID.  Alphanumeric with a max size of 15 characters.</p>
<p><strong>DateOfInvoice</strong>:  A long date, max 10 character formatted (mmddCCyy)</p>
<p><img class="alignright size-full wp-image-94" title="mappingexcersize1" src="http://www.theintegrationengineer.com/wp-content/uploads/2009/02/mappingexcersize1.jpg" alt="mappingexcersize1 Mapping Exercise:  EDI to Flat file." width="358" height="251" /></p>
<p><strong>InvoiceNumber</strong>:  Invoice number from the Vendor, Alpha numeric, max 22 characters.</p>
<p><strong>DateOfPO</strong>:   A long date, max 10 character formatted (mmddCCyy)</p>
<p><strong>PONumber</strong>:  PO number sent on the PO.  Alphanumeric, max 22 characters.</p>
<p><strong>POLineNumber</strong>:  PO line number must be the same as the invoice line number.  AlphaNumeric, max 20 characters.</p>
<p><strong>QTY</strong>:  Quantity of items invoiced.  Numeric, max 10 characters.</p>
<p><strong>UnitCost</strong>: Cost per unit.  Currency, max 17 characters.</p>
<p><strong>LineTotal</strong>:  Total of line cost.  Currency, max 17 characters.  sum(QTY*UnitCost)</p>
<p><strong>InvoiceTotal</strong>:  Total of all line totals.  Currency, max 17 characters.  sum(all LineTotals for Invoice)</p>
<p><strong>The EDI Source:</strong></p>
<p><img class="alignleft size-full wp-image-95" title="mappingexcersize2" src="http://www.theintegrationengineer.com/wp-content/uploads/2009/02/mappingexcersize2.jpg" alt="mappingexcersize2 Mapping Exercise:  EDI to Flat file." width="143" height="280" />We are going to get our data directly from the EDI.  We could be doing API calls to resolve VendorID values or getting an incrementing RecordID, but we can get what we need from the EDI under the right circumstances and so that is what we will do here in this example.</p>
<p>For the record ID we are going to do a simple operation and concatenate several values to create a unique value.  This will almost always be unique.  And it will refer back to the original document as a bonus.  If we wanted to ensure that it was always unique we could include data and timestamps in the concatenation.</p>
<p>For the dates, we are converting the date from the EDI format into our preferred format.  Many systems understand date format conversions, so it might be as simple as a cast or convert command on the existing string of numbers.  If not, we will have to parse out the separate values and then string them back together concatenating in with dashes.</p>
<p>And there are two values that we will need to calculate. We will calculate the value for the invoice total before we create any lines.  And we will need to create the line totals.  Since these require processing the same data, we might create all of the totals and put them in an array to easy access.  Anyway, this is the time to be creative.</p>
<p><strong>Constructing Rules:</strong></p>
<p>Now that we have the source and the target we will makes the rules, and we already have some in mind.  We know that we are concatenating the record ID, reformatting the dates, and calculating the totals.  We also add in some other rules like the trimming of the space on the <a href="http://www.theintegrationengineer.com/edi-enveloping-part-two-the-isa">ISA_06</a> to Vendor ID operation.  We will also need to do that in the record ID or it will look odd.</p>
<p><img class="aligncenter size-full wp-image-97" title="mappingexcersize31" src="http://www.theintegrationengineer.com/wp-content/uploads/2009/02/mappingexcersize31.jpg" alt="mappingexcersize31 Mapping Exercise:  EDI to Flat file." width="675" height="286" /></p>
<p><strong>Controls:</strong></p>
<p>Controls are like rules, and they may be written in the same place if we need to.  But where rules describe how we completed our mapping, controls tell us when to and not to map something.  Sometimes mapping is conditional.  Controls tell the map what conditions allow for valid mapping when conditions exist.</p>
<p>There may be times that we don&#8217;t want the map to go on and complete the process.  Obviously if we have missing data.  If there is no invoice or PO number, you may want to stop with an exception.  Generally these types of controls are easy to build as you make the target field required.  To indicate this you may want to place an additional column to indicate what data is required.  On simple projects you can just put this in the notes.</p>
<p>There will also be other times.  For instance, it might be that some invoices have their own total.  If they do not match our calculated total, we want them to fail mapping and go to manual resolution.  A use case would be to have additional charges attached to the invoice. This extra line charges would need to be handled outside of our map.</p>
<p>Controls can be written to make the mapping very powerful and provide a gateway to your data that ensures it stays valid and accurate.  Some controls go beyond stopping the map.  Instead we use them to put conditional logic into our data mapping project.</p>
<p><strong>Summary:</strong></p>
<p>This example project provided a short exercise to familiarize you with using a paper map and the fundamentals of target mapping.  In this simple example we had a target specification, and a source specification.  We created rules to move and change data from source to target, and we discussed controls.  These are the fundamental tasks of mapping.</p>
<p>But this is not the end.  I know that as I worked on this project, I saw things that might improve the map.  I didn&#8217;t include them because I wanted to keep this mapping project simple.  But you don&#8217;t have to.  You can add your own improvements to this project and make it do what you want.  I would be happy to respond to these if you post them in the comments.</p>
<p><OBJECT classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000" codebase="http://fpdownload.macromedia.com/get/flashplayer/current/swflash.cab" id="Player_05c95da9-4dbe-4ee8-9e7f-c5da5f522b61"  WIDTH="400px" HEIGHT="150px"> <PARAM NAME="movie" VALUE="http://ws.amazon.com/widgets/q?ServiceVersion=20070822&MarketPlace=US&ID=V20070822%2FUS%2Ftheinteengi-20%2F8010%2F05c95da9-4dbe-4ee8-9e7f-c5da5f522b61&Operation=GetDisplayTemplate"><PARAM NAME="quality" VALUE="high"><PARAM NAME="bgcolor" VALUE="#FFFFFF"><PARAM NAME="allowscriptaccess" VALUE="always"><embed src="http://ws.amazon.com/widgets/q?ServiceVersion=20070822&MarketPlace=US&ID=V20070822%2FUS%2Ftheinteengi-20%2F8010%2F05c95da9-4dbe-4ee8-9e7f-c5da5f522b61&Operation=GetDisplayTemplate" id="Player_05c95da9-4dbe-4ee8-9e7f-c5da5f522b61" quality="high" bgcolor="#ffffff" name="Player_05c95da9-4dbe-4ee8-9e7f-c5da5f522b61" allowscriptaccess="always"  type="application/x-shockwave-flash" align="middle" height="150px" width="400px"></embed></OBJECT> <NOSCRIPT><A HREF="http://ws.amazon.com/widgets/q?ServiceVersion=20070822&MarketPlace=US&ID=V20070822%2FUS%2Ftheinteengi-20%2F8010%2F05c95da9-4dbe-4ee8-9e7f-c5da5f522b61&Operation=NoScript">Amazon.com Widgets</A></NOSCRIPT></p>
<p>You can find the resoucres for this project in the<a href="http://www.theintegrationengineer.com/tool-box"> tool box.</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.theintegrationengineer.com/mapping-exercise-edi-to-flat-file/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Data Integration Theory &amp; Lecture</title>
		<link>http://www.theintegrationengineer.com/data-integration-theory-lecture/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=data-integration-theory-lecture</link>
		<comments>http://www.theintegrationengineer.com/data-integration-theory-lecture/#comments</comments>
		<pubDate>Tue, 02 Dec 2008 16:57:18 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[Mapping]]></category>
		<category><![CDATA[Data Exchange]]></category>
		<category><![CDATA[Data Integration]]></category>
		<category><![CDATA[Schema Mapping]]></category>

		<guid isPermaLink="false">http://www.theintegrationengineer.com/?p=53</guid>
		<description><![CDATA[I found this video on YouTube. Theoretical discussion of integration of data and information. Speaker is Alan Nash Alan discusses two fundamental problems in information integration: (1) How to answer a query over a public interface which combines data from several sources and (2) How to create a single database conforming to the public interface [...]]]></description>
			<content:encoded><![CDATA[<p>I found this video on <a href="http://youTube.com">YouTube</a>.</p>
<p>Theoretical discussion of integration of data and information.</p>
<p><span>Speaker is Alan Nash</span></p>
<p><span id="more-53"></span></p>
<p>Alan discusses two fundamental problems in information integration:</p>
<p>(1) How to answer a query over a public interface which combines data from several sources and</p>
<p><span>(2) How to create a single database conforming to the public interface which combines data from several sources.</span></p>
<p>Alan is using databases as both source and target and uses them in his examples of how data exchange and integration problems are addressed and solved.  This is a little bit heavy, but if you feel up to it, it is 53 minutes long.</p>
<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="344" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube.com/v/dLo8meG4TJQ&amp;hl=en&amp;fs=1" /><embed type="application/x-shockwave-flash" width="425" height="344" src="http://www.youtube.com/v/dLo8meG4TJQ&amp;hl=en&amp;fs=1" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>This is a still a pretty high level discussion, even the practical examples are abstracted.  I have used a tool that helps do the tasks described here.  If you have this type of task, you might want to check out a company called Convertabase at <a href="http://www.convertabase.com">http://www.convertabase.com</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.theintegrationengineer.com/data-integration-theory-lecture/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>RFID Data Gathering and Commerce</title>
		<link>http://www.theintegrationengineer.com/rfid-data-gathering-and-commerce/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=rfid-data-gathering-and-commerce</link>
		<comments>http://www.theintegrationengineer.com/rfid-data-gathering-and-commerce/#comments</comments>
		<pubDate>Mon, 01 Jun 2009 16:03:23 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[Supply Chain]]></category>
		<category><![CDATA[YouTube Posts]]></category>
		<category><![CDATA[Automated Data collection]]></category>
		<category><![CDATA[BPM]]></category>
		<category><![CDATA[RFID]]></category>
		<category><![CDATA[YouTube]]></category>

		<guid isPermaLink="false">http://www.theintegrationengineer.com/?p=57</guid>
		<description><![CDATA[I found this video on YouTube. RFID is exciting technology for integration engineers.  All of this data coming in will have to be aggregated and integrated by someone.  Also, one of the focuses of integration is to provide more and better information to people and systems.  With more and better information people, businesses and systems [...]]]></description>
			<content:encoded><![CDATA[<p>I found this video on YouTube.</p>
<p>RFID is exciting technology for integration engineers.  All of this data coming in will have to be aggregated and integrated by someone.  Also, one of the focuses of integration is to provide more and better information to people and systems.  With more and better information people, businesses and systems can make better choices and decisions.<span id="more-57"></span></p>
<p>This is a good thing.</p>
<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="344" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube.com/v/0VbMr2gnGDE&amp;hl=en&amp;fs=1" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="425" height="344" src="http://www.youtube.com/v/0VbMr2gnGDE&amp;hl=en&amp;fs=1" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>People talk about privacy issues and consumer protection.  I am sure that debate will continue to rage on long after the keyboard and mouse you are using have RFID tags embedded.  But the focus of this blog is how these technologies can be used for the collection and integration of information.  Implementing RFID technologies provides a hugely beneficial and accurate stream of data.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.theintegrationengineer.com/rfid-data-gathering-and-commerce/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ASCII and EBCDIC</title>
		<link>http://www.theintegrationengineer.com/ascii-and-ebcdic/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=ascii-and-ebcdic</link>
		<comments>http://www.theintegrationengineer.com/ascii-and-ebcdic/#comments</comments>
		<pubDate>Thu, 19 Mar 2009 01:32:27 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[File]]></category>
		<category><![CDATA[ASCII]]></category>
		<category><![CDATA[character set]]></category>
		<category><![CDATA[compatable]]></category>
		<category><![CDATA[Data]]></category>
		<category><![CDATA[EBCDIC]]></category>
		<category><![CDATA[format]]></category>
		<category><![CDATA[legacy]]></category>
		<category><![CDATA[pipe]]></category>
		<category><![CDATA[Standard]]></category>
		<category><![CDATA[text]]></category>

		<guid isPermaLink="false">http://www.theintegrationengineer.com/?p=81</guid>
		<description><![CDATA[What is a &#8220;Character Set?&#8221; A character set is a collection or library of characters, (letters and symbols), and their identifying number.  Included with the printable characters, (letters and punctuation) are some unprintable yet important characters.  Characters are used to form messages. Characters are not fonts.  Characters exist under the font that represent the definition [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-112" title="characters" src="http://www.theintegrationengineer.com/wp-content/uploads/2009/03/characters.jpeg" alt=" ASCII and EBCDIC" width="129" height="78" /><strong>What is a &#8220;Character Set?&#8221;</strong></p>
<p>A character set is a collection or library of characters, (letters and symbols), and their identifying number.  Included with the printable characters, (letters and punctuation) are some unprintable yet important characters.  Characters are used to form messages.</p>
<p>Characters are not fonts.  Characters exist under the font that represent the definition of the character the  font is attempting to display.  When you change the font on a document the <strong>A</strong> is changed to an <em>A</em>, but the underlying character that identifies its meaning remains the same.  The font identifies how the character is displayed.  You can even convert to Wing Dings and the underlying character remains the same.<span id="more-81"></span></p>
<p>We can imagine that if I wrote this post using a character set that I created myself.  And then you came and tried you read it, without knowing what my character set was, you would see a bunch of garbage on the screen like if you go to a foreign language web page without the correct fonts loaded.  Even worse would be if you used the same characters, but had different Identifiers for them.  If an A is 001 (my set starts at A and moves on numerically) and you try to read it, (but in your character set you numbered the vowels after the consonants) and 001 to you is B.  Now all of the letters will be wrong.  And we get garbage.</p>
<p>Fortunately, some people got together early on and created a standard for characters.  The American Standard Code for Information Interchange created the character set we call ASCII.  The Extended Binary Coded Decimal Information Code was created by IBM, but they use ASCII now as well.</p>
<p><strong>What is ASCII</strong></p>
<p>ASCII is the acronym for American Standard Code for Information Interchange, and is a collection of characters defined from 0 to 127.  These definitions represent all of the standard English characters, numbers and symbols.  A number of other, unprintable, characters are also included.  You use one or two of these each time you hit the &#8220;Enter&#8221; key on your keyboard.  Depending on your operating systems, this sends the &#8220;carriage return&#8221; and or &#8220;line feed&#8221;</p>
<p>A &#8220;carriage return&#8221; comes from a printer where the head would move back and forth on the roller.  CR would tell the printer to move the head all the way to the left of its printing area.  A &#8220;line feed&#8221; is also from a printer perspective.  This tells the printer to roll the paper so that the head will be writing on the next line.  These are both examples of unprintable characters.  You can probably think of others.  For a complete list of ASCII characters, you can check out this table in my <a href="http://www.theintegrationengineer.com/tool-box">toolbox</a>.</p>
<p><strong>What is EBCDIC</strong></p>
<p>EBCDIC is the acronym for Extended Binary Coded Decimal Information Code.  This was created by IBM back in the day.  IBM now uses ASCII just like everyone else, but there are legacies that are still with us.  Old terminals like VT100 and some legacy communications equipment still expect messages using the EBCDIC character set.</p>
<p><strong>What is the big deal</strong></p>
<p>As I said, some systems still want to use some of the characters in the EBCDIC system.  Even fancy new systems producing XML will sometimes fall into this trap and cause problems.  The one that I have run into is the use of | called &#8216;pipe&#8217;  ASCII and EBCDIC use different character IDs for this character.  And I have seen e-commerce systems, that are using ASCII for everything else, throw in an EBCDIC pipe as a control character.  When this happens, other systems will choke on it.</p>
<p>When you find yourself getting an invalid character message, but the characters look fine.  Remember that there are some twists that may exist in the underlying character set.  If you can, manually replace the character with the character that it looks like, (in my case the EBCDIC | with a ASCII | )  and see if the parser likes the file now.  If it does, you have encountered the character set problem as I have.  This can be a difficult problem to solve if you have never encountered it.</p>
<p>If the character that is causing problems is not a pipe, you may want to look at IBM&#8217;s <a href="http://publib.boulder.ibm.com/infocenter/macxhelp/v6v81/index.jsp?topic=/com.ibm.xlf81m.doc/pgs/lr393.htm">ASCII to EBCDIC conversion table</a>.  This can be difficult to communicate with others that have never encountered it, so using the ACSII and EBCDIC identifier designation can help explain what we are saying in email and documentation when we are trying to correct the issue.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.theintegrationengineer.com/ascii-and-ebcdic/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Perl Tricks &#8211; When errors happen</title>
		<link>http://www.theintegrationengineer.com/perl-tricks-when-errors-happen/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=perl-tricks-when-errors-happen</link>
		<comments>http://www.theintegrationengineer.com/perl-tricks-when-errors-happen/#comments</comments>
		<pubDate>Thu, 23 Sep 2010 15:03:08 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[Logging]]></category>
		<category><![CDATA[PerlTips]]></category>
		<category><![CDATA[alert]]></category>
		<category><![CDATA[error]]></category>
		<category><![CDATA[error messages]]></category>
		<category><![CDATA[logging]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[objects]]></category>
		<category><![CDATA[Perl]]></category>

		<guid isPermaLink="false">http://www.theintegrationengineer.com/?p=1071</guid>
		<description><![CDATA[Back in the day, I used to use a program called Windows made by a little company called Microsoft.  I liked windows, and for the most part it did what I needed it to do.  But it also had a nasty habit of crashing when I tried to do something complex.  And this was really [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.theintegrationengineer.com/wp-content/uploads/2009/08/monitoring_pzl.jpg"><img class="alignleft size-thumbnail wp-image-596" title="monitoring_pzl" src="http://www.theintegrationengineer.com/wp-content/uploads/2009/08/monitoring_pzl-150x150.jpg" alt="monitoring pzl 150x150 Perl Tricks   When errors happen" width="150" height="150" /></a>Back in the day, I used to use a program called Windows made by a little company called Microsoft.  I liked windows, and for the most part it did what I needed it to do.  But it also had a nasty habit of crashing when I tried to do something complex.  And this was really annoying.</p>
<p>Well as time went on, and I became more adept at using windows, I noticed that it tried to tell me what was wrong.  Windows was writing out a crash report.  And it had a message on the blue screen of death.  (a mostly incomprehensible message, but at least it was trying.)  And I discovered that if you looked for the error codes in Microsoft&#8217;s knowledge base, you could find the errors had corresponding articles for many of them.  (some of these articles were also incomprehensible.  )<span id="more-1071"></span></p>
<p>Well when you write an application in perl or any other language you will need to do something other than just crash when you encounter an error.  There are two views on what to do with an error condition; write the message to a log, and write the error out in a separate error message or file.</p>
<p>I guess there could be a third view, and that is to do both.</p>
<p><strong>Logging:</strong></p>
<p>Writing errors to a log, can be just part of the normal logging that you do as your application processes.  If you aren&#8217;t logging, you might want to consider it.  There is value in having a log file of what your application is doing.  In order to keep your logging coherant, I create a logging object and then just pass my log messages to it.  This way they can all have the same format with timestamps and such without having to write it out every time.</p>
<p><strong>Error Messages:</strong></p>
<p>My application is processing messages, transforming CSV and other types of files.  So when errors in processing happen, I move the files to an error directory, and I also write a special error file with them so that I don&#8217;t have to parse through the log to find it.</p>
<p><strong>Using alerts:</strong></p>
<p>Writing messages to a log and moving files to an error directory are fairly passive ways to handle errors.  A more active way to handle and error is to send an email or trigger an alert of some kind.  You can also use a monitoring technology like nagios to watch for errors, and send alerts.</p>
<p>What are ways that you have handled errors and logging?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.theintegrationengineer.com/perl-tricks-when-errors-happen/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Circular Files</title>
		<link>http://www.theintegrationengineer.com/circular-files/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=circular-files</link>
		<comments>http://www.theintegrationengineer.com/circular-files/#comments</comments>
		<pubDate>Wed, 11 Feb 2009 21:16:08 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[Logging]]></category>
		<category><![CDATA[circular file]]></category>
		<category><![CDATA[error tracking]]></category>
		<category><![CDATA[log]]></category>
		<category><![CDATA[logging]]></category>
		<category><![CDATA[round file]]></category>
		<category><![CDATA[trouble shooting]]></category>
		<category><![CDATA[troubleshooting]]></category>

		<guid isPermaLink="false">http://www.theintegrationengineer.com/?p=79</guid>
		<description><![CDATA[A circular file is not a nickname for the waste can. Circular files, sometimes called round files, are useful in some applications and support tasks.  With a normal log file or repository, the log grows as logged events are added to the log.  The obvious danger is that if the space where the log is [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-92" title="circular-file" src="http://www.theintegrationengineer.com/wp-content/uploads/2009/01/circular-file.jpg" alt="circular file Circular Files" width="86" height="99" />A <a href="http://www.thefreedictionary.com/circular+file">circular file</a> is not a nickname for the waste can. Circular files, sometimes called round files, are useful in some applications and support tasks.  With a normal log file or repository, the log grows as logged events are added to the log.  The obvious danger is that if the space where the log is located becomes saturated when the log grows to fill this space.  Many applications will shut down and refuse to restart if this happens.  For some applications, having the log write to a circular file is the answer.</p>
<p><span id="more-79"></span></p>
<h3>What is a Circular File?</h3>
<p>This is not like sending the log data to /dev/null or some other black hole.  It is a file that can&#8217;t grow beyond a specified size, but never refuses to accept new data.  This is not a paradox.  Visualizing this file like a roll of paper instead of a sheet of paper.  On the roll, there is no end, it just goes around and around.</p>
<p>Say you have a small circular file, 10 MB in our example.  Then lets say, every minute, you add 1 MB of data to your file. For 9 minutes after you make a log entry, you can see this entry in the file.  Each time it moves down on minutes worth of space on the log.  At 10+ minutes, it is gone.  The circular file has rolled over and overwritten that location with more current entry.</p>
<p>Okay, so a 10 minute log may not be that useful, and we would probably want to used a bigger size.  But the point is, that if disaster strikes, the log won&#8217;t grow like crazy and create an even bigger problem by filling up the disk.</p>
<h3>Why aren&#8217;t all logs like this?</h3>
<p>Using a circular file for a log is very useful, but if you have an error, and then the log scrolls on for long enough for the error to be lost, all you have are the symptoms not the cause.  So you have saved yourself the problem of making, &#8220;The smtp server failed, and the Web tool started logging errors that it couldn&#8217;t send emails.  People kept clicking, resend mail until the log filled up the disk, and the app died.&#8221; explanations.  But they are replaced with the, &#8220;Yes we logged the error, and then the app started logging all of the following actions and the error fell out of the log.&#8221;  followed by an explanation of what a circular file is and why this bad news is really good news.</p>
<p>Its best to use more than one logging method.  Say we have a circular log that allows us to see a week of normal activity.  Lets pretend that is 10 GB.  And we have a log that is flat, and only logs critical errors.  Now we have a way to not get clobbered by all the actions that start happening when the rain comes, and we don&#8217;t lose that original error that started the mess.</p>
<h3>Utopalog</h3>
<p>Okay, I am making this up, but here is my dream strategy.  First, we use a circular log that can be size configured on the fly.  Second we use an error database.  We have two table structures; First Occurrence, and Last Occurrence.  Just as they sound, when we have a new error, debug, whatever, it gets logged in the First and Last Occurrence tables.  But the Next time it happens, the time on the Last Occurrence table is the only thing that gets updated.</p>
<p>To use this, we periodically truncate or archive the tables.  The log is self cleaning.  We can do this each night, or on demand, or pick your own schedule.  Now when a problem occurs, we can look for errors that happened before the beginning of the log, but we won&#8217;t have a mountain of data that will crush us.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.theintegrationengineer.com/circular-files/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Flat Files</title>
		<link>http://www.theintegrationengineer.com/flat-files/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=flat-files</link>
		<comments>http://www.theintegrationengineer.com/flat-files/#comments</comments>
		<pubDate>Mon, 05 Jan 2009 18:38:47 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[b2b]]></category>
		<category><![CDATA[Data]]></category>
		<category><![CDATA[Delimiters]]></category>
		<category><![CDATA[File]]></category>
		<category><![CDATA[Mapping]]></category>
		<category><![CDATA[Character Delimited]]></category>
		<category><![CDATA[Comma Delimited]]></category>
		<category><![CDATA[csv]]></category>
		<category><![CDATA[Data Export]]></category>
		<category><![CDATA[Data Import]]></category>
		<category><![CDATA[delimiter]]></category>
		<category><![CDATA[EDI]]></category>
		<category><![CDATA[ETL]]></category>
		<category><![CDATA[Fixed Position]]></category>
		<category><![CDATA[Fixed Width Files]]></category>
		<category><![CDATA[Space]]></category>
		<category><![CDATA[spreadsheet]]></category>
		<category><![CDATA[White Space]]></category>

		<guid isPermaLink="false">http://www.theintegrationengineer.com/?p=5</guid>
		<description><![CDATA[What is a flat file? Files are called &#8220;Flat Files&#8221; when they contain a single data structure.  Generally this structure is the column and row structure like a spreadsheet or table, but a file in binary or encrypted with a single encryption key could also be called a flat file.  Files that are not flat; [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-75" title="Flat File" src="http://www.theintegrationengineer.com/wp-content/uploads/2009/01/flatfile.jpg" alt="flatfile Flat Files" width="160" height="120" /><strong>What is a flat file?</strong></p>
<p>Files are called &#8220;Flat Files&#8221; when they contain a single data structure.  Generally this structure is the column and row structure like a spreadsheet or table, but a file in binary or encrypted with a single encryption key could also be called a flat file.  Files that are not flat; marked up files like XML or HTML, <a href="http://www.theintegrationengineer.com/what-is-edi/">EDI </a>files, other formats like HL7 or SEF files and others.  Here I am going to briefly discuss two flat file types; Delimited Files, and Fixed Width Files.<span id="more-5"></span></p>
<p><strong>What is a Delimited File?</strong></p>
<p>Ok, to describe it briefly, a delimited file is a file where the data is organized in rows and columns.  Each row has a set of data, and each column has a type of data.  If it sounds like I am describing a spreadsheet, you are right on the money.  To make the column, each row has the columns separated with a character called a delimiter.  See the example below.</p>
<p><img class="aligncenter size-full wp-image-72" title="Illustration of Delimited Data" src="http://www.theintegrationengineer.com/wp-content/uploads/2009/01/delimitedillustration.jpg" alt="delimitedillustration Flat Files" width="554" height="128" /></p>
<p>Tables of data and spreadsheets are both similar to a delimited file in the way they organize data.  In the delimited file all of the empty space, or white space is removed.  What we see here is a classic example of exporting a spreadsheet table as a comma delimited file.  In theory, this data can be imported by any other application that can read a delimited file.</p>
<p><em>Believe it of not, a space is a character, and takes up space in a file.  Back in the day people went out of their way to save space so that files could be send over slow modem connections.</em></p>
<p><strong>What is a Fixed Width File?</strong></p>
<p>There is another type of file, is is called a Fixed Width or Fixed Position file.  It is different from a delimited file in that the data fields are defined by the character position.  See the example below.</p>
<p><img class="aligncenter size-full wp-image-73" title="Fixed Width File Illustration" src="http://www.theintegrationengineer.com/wp-content/uploads/2009/01/wffileillustration.jpg" alt="wffileillustration Flat Files" width="570" height="132" /></p>
<p>In a fixed width file, the delimiter characters are eliminated.  If the data is formulated such that the data fields are the same size, this format can be more compact than a delimited file. You can see here that we know the size of the Birthdate data, so we eliminate all the spaces between the Bdate and Department fields.  If all of the data was formatted for size like this, we could really make this file small, so that it only contains the data.</p>
<p>We also eliminate the pesky problem of delimiters found in data.  The issue of a comma delimited file containing a field that has a comma in the data.  How does the parser know that this comma is not really a delimiter, but is part of the data?  Anyway, that problem is eliminated in a fixed width file.</p>
<p><strong>Comparison</strong></p>
<p>This is not a contest of which format is superior.  Both file architectures are useful and both are used commonly enough that you need to be at ease working with both.  Delimited files are really easy to work with as long as your data is clean of the delimiter character.  Doing quick integration of data common in ETL tasks, delimited files are far more common that Fixed Width.  Continuous operations of data integration and importation many times find that Fixed Width or Position files are more reliable for the unattended operation, even ETL if it is unattended.</p>
<p>As with many things in integration work, we want to pick the best option.  Knowing and working with both fixed and delimited files will help you determine which is the right choice for the task you have before you.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.theintegrationengineer.com/flat-files/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Nature of NULL</title>
		<link>http://www.theintegrationengineer.com/the-nature-of-null/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=the-nature-of-null</link>
		<comments>http://www.theintegrationengineer.com/the-nature-of-null/#comments</comments>
		<pubDate>Tue, 16 Dec 2008 21:26:35 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[data base]]></category>
		<category><![CDATA[NULL]]></category>
		<category><![CDATA[string]]></category>

		<guid isPermaLink="false">http://www.theintegrationengineer.com/?p=62</guid>
		<description><![CDATA[&#8220;Is not the beginning of wisdom the words: &#8216;I do not know&#8217;?&#8221; &#8211; Data, Star Trek: Next Generation: &#8220;Where Silence Has Lease&#8220; If the beginning of wisdom is to realize what it is that we do not know. NULL, by its definition is this not knowing.  We do not know what NULL is, this is [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-63" title="null_modem" src="http://www.theintegrationengineer.com/wp-content/uploads/2008/12/null_modem.jpg" alt="null modem The Nature of NULL" width="100" height="104" /><span style="font-family: Trebuchet MS,Arial,Univers,Zurich BT,Verdana,Helvetica;"><em>&#8220;Is not the beginning of wisdom the                words: &#8216;I do not know&#8217;?&#8221;</em></span></p>
<blockquote><p>&#8211; Data, Star Trek: Next Generation:                   &#8220;<em>Where Silence Has Lease</em>&#8220;</p></blockquote>
<p>If the beginning of wisdom is to realize what it is that we do not know. NULL, by its definition is this not knowing.  We do not know what NULL is, this is why it is NULL.<span id="more-62"></span></p>
<p><strong>What is this?</strong></p>
<p>Null is not a number, or letter.  It may not even be a character.  Using Occam&#8217;s razor, things are either NULL or NOT NULL.  Things that are NULL are completely unknown at the time they are NULL.  And things that are NOT NULL are not completely unknown when they are NOT NULL.</p>
<p>I feel the hair splitting on my head, so let me explain one important point.  Things that are NULL are not destined to stay that way.  Where /dev/null/ is the black hole of output that we don&#8217;t need, NULL is not the data equivalent of a black hole.  Things that are currently NULL may become NOT NULL at any time, as soon as we know something about them.  Pretty much anything really.</p>
<p><strong>Empty String Theory</strong><img class="alignright size-full wp-image-64" title="Ball of String" src="http://www.theintegrationengineer.com/wp-content/uploads/2008/12/ballofstring.jpg" alt="ballofstring The Nature of NULL" width="100" height="66" /></p>
<p>Unlike string theory, which attempts to define the nature of the universe, the &#8216;empty string theory&#8217; is that empty strings are the same thing as NULL.  They are not.  If they were NULL, we would not know that they were strings.  This seems quite clear.  NULL is NULL and an empty string is NOT NULL.</p>
<p>&#8220;But wait!&#8221;  I hear a Database Developer cry.  &#8220;I can create a field in my database that is a string, and allow it to be NULL.&#8221;  And this is correct, but this is not a contradiction.  When you allow a database field to be NULL you have allowed it to receive NULL as an input.  But it may or may not give NULL as an output.  Because we can put NULL into a variable, when we query for that variable, element, whatever, we may get NULL, or we may get an empty string or property defined value of that data type of the field, that just happens to be empty.</p>
<p>In final reply to this inquiry I must observe that NULL is not equal to anything.  Any conditional statement where NULL is compared to something will be false.  Where, &#8220;&#8221; represents an empty string.  If one writes the statement, ["" == ""], or in English, &#8220;if an empty string is equal to an empty string,&#8221; it evaluates as a true statement.  However if one writes the statement [NULL == NULL], or in English, &#8220;if NULL is equal to NULL,&#8221; it evaluates to a false statement.  Since an empty string is equal to another empty string, but NULL is not equal to NULL, then NULL cannot be the same thing as an empty string.</p>
<p><strong>Why NULL is important</strong></p>
<p>Just as early civilizations had no concept of zero, early discourses of NULL struggle with it.  Where we can argue if you have zero apples, is the same as having nothing.  Yet it can be the absence of something that is known.  Zero can be having none of something.  But NULL is not even knowing what the something is that you don&#8217;t have.</p>
<p><img class="alignleft size-full wp-image-65" title="apple_bushel_small" src="http://www.theintegrationengineer.com/wp-content/uploads/2008/12/apple_bushel_small.jpg" alt="apple bushel small The Nature of NULL" width="56" height="60" /><em>An apple basket can contain 0 apples as a way to say, &#8220;Hey, this basket is for apples!&#8221;  So is 0 greater than NULL.  Because with null we don&#8217;t know what the basket is for.  We may not even have a basket at all.</em></p>
<p>There are a great number of operations that we couldn&#8217;t do well or efficiently if we didn&#8217;t have NULL.  So understanding what we are asking when we look for NULL is important.  Whenever we have a search for things that don&#8217;t match something else, we are searching for NULL, even if the values themselves are not NULL.  We do this a lot, and we couldn&#8217;t do it without NULL.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.theintegrationengineer.com/the-nature-of-null/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Data Mapping</title>
		<link>http://www.theintegrationengineer.com/data-mapping/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=data-mapping</link>
		<comments>http://www.theintegrationengineer.com/data-mapping/#comments</comments>
		<pubDate>Tue, 05 Aug 2008 17:51:32 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[Mapping]]></category>
		<category><![CDATA[b2b]]></category>
		<category><![CDATA[data source]]></category>
		<category><![CDATA[data target]]></category>
		<category><![CDATA[mapping process]]></category>
		<category><![CDATA[source]]></category>
		<category><![CDATA[spreadsheet]]></category>
		<category><![CDATA[target]]></category>

		<guid isPermaLink="false">http://www.theintegrationengineer.com/?p=8</guid>
		<description><![CDATA[<img class="alignleft size-medium wp-image-14" title="bullseye" src="http://www.theintegrationengineer.com/wp-content/uploads/2008/08/bullseye-300x266.jpg" alt="" width="120" height="107" />In the beginning we start with the data target.  This may be strange if you have not done any mapping, but the first, best thing that you can do to make a mapping project successful and fast is to start with a well defined target for your data.]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-medium wp-image-14" title="bullseye" src="http://www.theintegrationengineer.com/wp-content/uploads/2008/08/bullseye-300x266.jpg" alt="bullseye 300x266 Data Mapping" width="120" height="107" />In the beginning we start with the data target.  This may be strange if you have not done any mapping, but the first, best thing that you can do to make a mapping project successful and fast is to start with a well defined target for your data.</p>
<p>I believe that Steven Covey says it this way, &#8220;Begin with the end in  mind.&#8221;  We do the same thing in data mapping, except that we begin with the end or target if you will.  The data target naturally leads us to the data sources that we need.<span id="more-8"></span></p>
<p>If one starts with the source, a lot of work goes into the project, like organizing the source data, before we know if we need any of it.  Sure, you may know that you will need the shipping addresses if you are mapping to a shipping transaction, but spending time gathering the information on the source of this data is still not effective until we know where we will be putting it.</p>
<p><strong>A guide to mapping data from source to target</strong></p>
<p><img class="aligncenter size-full wp-image-9" title="samplemappling-1" src="http://www.theintegrationengineer.com/wp-content/uploads/2008/07/samplemappling-1.gif" alt="samplemappling 1 Data Mapping" width="500" height="41" /></p>
<p>Spreadsheets are great for mapping.  We start with six columns.  From left to right we have source, source data type, rules, target, target data type, and notes.  We start wit the target and list the data elements that the target needs in the order that the target needs them to approximate the target form.  If this is a database table then this is really simple, and we are just listing the columns names in order.<img class="alignright size-full wp-image-11" title="samplemappling-2" src="http://www.theintegrationengineer.com/wp-content/uploads/2008/07/samplemappling-2.gif" alt="samplemappling 2 Data Mapping" width="104" height="106" /></p>
<p>If the target is something less list like, we will need to add some location information.  In XML we might want to include the xpath to the spreadsheet either in the cell above, or in a logical grouping.  Both approaches work and I will show you and example of each.</p>
<p>As we list the target we will also want to note the data type in the column to the right.  As we note the type of any data with specific formatting, like date or email addresses, we can place that definition here.<br />
<img class="alignleft size-full wp-image-12" title="samplemappling-3" src="http://www.theintegrationengineer.com/wp-content/uploads/2008/07/samplemappling-3.gif" alt="samplemappling 3 Data Mapping" width="429" height="114" /></p>
<p>After we have the target well defined it is time to decide where this data will come from.  We probably already have some idea, so now it is time to work down the source column and fill in the data reference.  There may be more than one field that maps to a single field on the right.  And there may be one field on the left that maps to multiple fields on the right.  There may also be completely derived data that is calculated at runtime.  There can also be multiple data sources.  You will probably encounter all of these and more in your career as and integration engineer.</p>
<p><img class="aligncenter size-full wp-image-13" title="samplemappling-4" src="http://www.theintegrationengineer.com/wp-content/uploads/2008/07/samplemappling-4.gif" alt="samplemappling 4 Data Mapping" width="681" height="101" /></p>
<p><strong>Upcomming Articles:</strong></p>
<ol>
<li><a href="http://www.theintegrationengineer.com/mapping-exercise-edi-to-flat-file/">EDI mapping excersize</a>.</li>
<li>XML mapping excersize.</li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://www.theintegrationengineer.com/data-mapping/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Everything takes 2 weeks</title>
		<link>http://www.theintegrationengineer.com/everything-takes-2-weeks/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=everything-takes-2-weeks</link>
		<comments>http://www.theintegrationengineer.com/everything-takes-2-weeks/#comments</comments>
		<pubDate>Thu, 10 Dec 2009 15:14:03 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[Mapping]]></category>
		<category><![CDATA[planning]]></category>
		<category><![CDATA[project]]></category>
		<category><![CDATA[Scope]]></category>
		<category><![CDATA[team]]></category>

		<guid isPermaLink="false">http://www.theintegrationengineer.com/?p=558</guid>
		<description><![CDATA[&#8220;So, how long will that take?&#8221;  Is a question that some of us have grown to hate.  And it seems that it is a question, that in various forms, we are asked daily.  (If not more frequently)  And if you have been doing this for any time, you have probably come up with a way [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-975" title="stopwatch" src="http://www.theintegrationengineer.com/wp-content/uploads/2009/11/stopwatch.jpg" alt="stopwatch Everything takes 2 weeks" width="53" height="71" />&#8220;So, how long will that take?&#8221;  Is a question that some of us have grown to hate.  And it seems that it is a question, that in various forms, we are asked daily.  (If not more frequently)  And if you have been doing this for any time, you have probably come up with a way to answer these questions.  There was a time when I started answering this question with a standard answer of, &#8220;2 weeks.&#8221;  And let me explain why.</p>
<p><strong><span id="more-558"></span>My Answer:</strong></p>
<p>First, let me just say that my answer was really, &#8220;40 hours or 2 calendar weeks.&#8221;  When asked if I could do it in 40 hours but in one week, I would say, &#8220;No.&#8221; and explain that I always had interruptions and other priorities to juggle.  There was never a time that I could only focus on just one issue for a whole week.  Thus it would not make sense to give them the expectation that I could be done in a week, even if the actual hours worked on project was 40 hours.</p>
<p>Second, let me say that these where all mapping change requests, or new document mapping requests.  I am extremely good at this, and have a system for keeping things simple and effective.  You can read more about that by reading some of my mapping posts here.  And following that system, I could pull of a changed or new map in 2 weeks with confidence.</p>
<p>When asked if I was sure that it wouldn&#8217;t take longer, I would shrug and say, &#8220;It never has before.&#8221;  Well not a detailed justification for my time spent, this was an expression of confidence to get my time estimate accepted and put on the plan.</p>
<p><strong>Other Answers:</strong></p>
<p>There are many ways to come up with a time estimate.  I took a class once, that discussed project planning and explained Gantt charts and time lines.  I had been using versions of these for years without knowing they had a name.  I called mine a &#8220;white board&#8221; but I did learn the lingo of projects that was useful when communicating with other how your time line is looking.</p>
<p>Project Planning is a great skill to have, but don&#8217;t get carried away.  Lots of time can be spent on making a project plan instead of actually working.  When you are working as a lone ranger, or small team, extensive planning can actually get in the way.</p>
<p><strong>Team Work:</strong></p>
<p>One of the things that let me get away with the &#8220;2 Week&#8221; time estimate was that I was the only one doing what I was doing.  Its not that I am anti social, it just that I was the only one left from what was once a team of 6.  I was helping support the product, building new features, and writing my own requirements based on tickets and user/manager responses.  And not having to coordinate with other people made me much much faster.  (but of course you have to know what you are doing to pull this off)</p>
<p>I remember sitting in a meeting with a group of developers.  They had been working on an integration process for 6 months and were still months away from completion.  I remember feeling really sorry for them when I realised that my boss had asked me to produce an integration solution for the same thing they had been working on.  Without knowing their situation, I responded with my 2 week estimate.</p>
<p>Well, they were floored, and didn&#8217;t believe I could do it.  I did, and they realised why.  I didn&#8217;t have team meetings, (it was just me).  I didn&#8217;t follow their coding standards and review processes, (not having to teach a team about what I had just done).  I had a test system that looked just like production that I built on, (they had a dev system, that all of their projects were being built on.  And this meant they were always in each others way.)</p>
<p>In the end, (after I had done it my way) they redeveloped it in their way and the project got completed.</p>
<p>This is not a knock on teams.  Teams are important to have.  But sometimes it is smarter and faster, just to give a guy a direction, and then get out of his way.  Teams help with stability and code longevity, but they can slow the process of creation down.  And the bigger the team, the slower it can go.</p>
<p><OBJECT classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000" codebase="http://fpdownload.macromedia.com/get/flashplayer/current/swflash.cab" id="Player_c5bd6e46-e0fb-4818-9100-71aa784ab009"  WIDTH="500px" HEIGHT="175px"> <PARAM NAME="movie" VALUE="http://ws.amazon.com/widgets/q?ServiceVersion=20070822&MarketPlace=US&ID=V20070822%2FUS%2Ftheinteengi-20%2F8010%2Fc5bd6e46-e0fb-4818-9100-71aa784ab009&Operation=GetDisplayTemplate"><PARAM NAME="quality" VALUE="high"><PARAM NAME="bgcolor" VALUE="#FFFFFF"><PARAM NAME="allowscriptaccess" VALUE="always"><embed src="http://ws.amazon.com/widgets/q?ServiceVersion=20070822&MarketPlace=US&ID=V20070822%2FUS%2Ftheinteengi-20%2F8010%2Fc5bd6e46-e0fb-4818-9100-71aa784ab009&Operation=GetDisplayTemplate" id="Player_c5bd6e46-e0fb-4818-9100-71aa784ab009" quality="high" bgcolor="#ffffff" name="Player_c5bd6e46-e0fb-4818-9100-71aa784ab009" allowscriptaccess="always"  type="application/x-shockwave-flash" align="middle" height="175px" width="500px"></embed></OBJECT> <NOSCRIPT><A HREF="http://ws.amazon.com/widgets/q?ServiceVersion=20070822&MarketPlace=US&ID=V20070822%2FUS%2Ftheinteengi-20%2F8010%2Fc5bd6e46-e0fb-4818-9100-71aa784ab009&Operation=NoScript">Amazon.com Widgets</A></NOSCRIPT></p>
]]></content:encoded>
			<wfw:commentRss>http://www.theintegrationengineer.com/everything-takes-2-weeks/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Mapping Excersize: 832 to DB</title>
		<link>http://www.theintegrationengineer.com/mapping-excersize-832-to-db/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=mapping-excersize-832-to-db</link>
		<comments>http://www.theintegrationengineer.com/mapping-excersize-832-to-db/#comments</comments>
		<pubDate>Tue, 01 Dec 2009 22:54:29 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[Mapping Exercise]]></category>
		<category><![CDATA[832]]></category>
		<category><![CDATA[EDI]]></category>
		<category><![CDATA[map]]></category>
		<category><![CDATA[Mapping]]></category>

		<guid isPermaLink="false">http://www.theintegrationengineer.com/?p=896</guid>
		<description><![CDATA[Its time for another mapping exercise.  This time we will receive an EDI 832 in 4010 format, and map the data to a DB or flat file.  Getting catalog data into your procurement system is an important task.  And creating  variety of mapping exercises provides us with a better understanding of how mapping projects work [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-276" title="mapping_pzl" src="http://www.theintegrationengineer.com/wp-content/uploads/2009/07/mapping_pzl.jpg" alt="mapping pzl Mapping Excersize: 832 to DB" width="200" height="108" />Its time for another mapping exercise.  This time we will receive an EDI 832 in 4010 format, and map the data to a DB or flat file.  Getting catalog data into your procurement system is an important task.  And creating  variety of mapping exercises provides us with a better understanding of how mapping projects work than just having one that tries to be everything.  And mapping is one of the fun things that we get to do.  Following this exercise we will have worked through the the common issues and demonstrated how this process comes together.</p>
<p><strong>Mapping Steps </strong></p>
<p><span id="more-896"></span></p>
<p>There are five steps to this mapping process.  Here is a list with a brief description of each one;</p>
<ul>
<li><em>Defining the target</em>:  This is the first step to mapping.  We must know where we are going if we plan on getting there.  And using a <a href="http://www.theintegrationengineer.com/data-mapping/">Target based mapping</a> process we decrease the time and efort of the other steps.</li>
<li><em>Defining the Data Source</em>:  This is the second step to mapping.  Once we know what the target looks like, we know what data we need to complete it.  So this naturally leads us to defining the source(s) of the data.  This can be a single input data file or record, or it can be multiple types of data from multiple sources.</li>
<li><em>Defining the Processes</em>:  Some of the data that we will need in our target will not be in the proper form or format in the source.  We will have to have a process defined in our map to convert the source data into the correct form for the target format.</li>
<li><em>Handling customisation Points</em>:  When we are mapping data, we will see points where we will want to make a decision on what to do.  Not all data is created or received in an equal form.  Thus we may have points in our mapping process where we will need to do a different process depending on the source or content of the data.</li>
<li><em>Monitoring the Process</em>:  After we have created a mapping process we will need to monitor it so that we can know when something unexpected happens.  To do this we must identify the right places in the mapping process for it to report its status to a monitoring process.  This is the last step in the mapping process.</li>
</ul>
<p><strong>Catalog Target</strong></p>
<p>If you have a catalog, then you will have your own schema.  We are getting our schema from a standard Open Office DB template.  This way you can create your own version for this exercise.</p>
<p>The Product table in the Open Office Business table has these fields:</p>
<ul>
<li>ID, INT</li>
<li>CategoryID, INT</li>
<li>Discontinued, BOOLEAN</li>
<li>LeadTime, VARCHAR</li>
<li>ProductID, INT</li>
<li>ProductDescription, VARCHAR</li>
<li>ProductName, VARCHAR</li>
<li>ReorderLevel, INT</li>
<li>Serialnumber, VARCHAR</li>
<li>SupplierID, INT</li>
<li>UnitPrice, DECIMAL</li>
<li>UnitsInStock, INT</li>
<li>UnitsOnOrder, INT</li>
</ul>
<p>These are the basic fields that we will be inserting data into.  We will probably have a process that will validate the supplier ID before inserting new records, but that will come in during the process step.</p>
<p><strong>Next Steps</strong></p>
<p>Following defining our target, we will define our source.  But we will do that in the next instalment of this exercise.   <a href="http://www.theintegrationengineer.com/wp-content/uploads/2009/02/basepapermap.xls">Download</a> and create start your paper map now, we will show them in the next post as well.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.theintegrationengineer.com/mapping-excersize-832-to-db/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What&#8217;s the DIFF?</title>
		<link>http://www.theintegrationengineer.com/whats-the-diff/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=whats-the-diff</link>
		<comments>http://www.theintegrationengineer.com/whats-the-diff/#comments</comments>
		<pubDate>Wed, 18 Nov 2009 14:50:18 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[File]]></category>
		<category><![CDATA[change]]></category>
		<category><![CDATA[compare]]></category>
		<category><![CDATA[diff]]></category>
		<category><![CDATA[difference]]></category>
		<category><![CDATA[directory]]></category>
		<category><![CDATA[tool]]></category>

		<guid isPermaLink="false">http://www.theintegrationengineer.com/?p=517</guid>
		<description><![CDATA[One of the basic tasks Integration Engineers do is to compare files that we use or receive.  There are some interesting and useful tools that people can get out there to DIFF files.  But on Linux and Unix machines around the world there is a native tool that is almost always present.  Amazingly it is [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-518" title="apple-and-orange_pzl" src="http://www.theintegrationengineer.com/wp-content/uploads/2009/08/apple-and-orange_pzl.jpg" alt="apple and orange pzl Whats the DIFF?" width="191" height="159" /></p>
<p>One of the basic tasks Integration Engineers do is to compare files that we use or receive.  There are some interesting and useful tools that people can get out there to DIFF files.  But on Linux and Unix machines around the world there is a native tool that is almost always present.  Amazingly it is called DIFF.</p>
<p>Like some other command-line tools, its interface is not really intuitive.  Lets walk through the basics of how to get use out of this handy file comparing tool.  (If you are working with and comparing EDI files, you might want to look at the <a href="http://www.theintegrationengineer.com/edi-wrapped-and-unwrapped/">post on how to unwrap</a> your EDI file so that our line by line comparison is more meaningful.)</p>
<p><span id="more-517"></span></p>
<p><strong>How to used &#8220;diff&#8221;</strong></p>
<p>You can get the real basics by executing &#8220;diff &#8211;help&#8221; to get the basic help and options for this application.  But in short, here is the thumb nail. &#8220;diff&#8221; is followed by some options.  Options are designated by a &#8220;-&#8221; and then a letter indicating the option.  Any options are then followed by the two file names that are being compared.  Lets look at an example.</p>
<p><span style="text-decoration: underline;"><em>Example:</em></span></p>
<p>We have two files, file1.txt and file2.txt</p>
<table style="height: 151px;" border="1" cellspacing="5" cellpadding="5" width="477">
<tbody>
<tr style="text-align: center;">
<th>File1.txt</th>
<th>File2.txt</th>
</tr>
<tr>
<td width="50%">This is a test file:<br />
And this is the first line of the first file.<br />
Thanks.</td>
<td>This is a test file:<br />
And this is the first line of the second file.<br />
Thanks.<br />
Again.</td>
</tr>
</tbody>
</table>
<p>When we issue this command:  &#8220;diff file1.txt file2.txt&#8221; we get this result.</p>
<p style="padding-left: 60px;">2c2<br />
&lt; And this is the first line of the first file.<br />
&#8212;<br />
&gt; And this is the first line of the second file.<br />
3a4<br />
&gt; Again.</p>
<ul>
<li>The first thing we see is &#8220;2c2&#8243;  This is line 2 of the first file, compared or changed to line 2 of second file.</li>
<li>Next we have a &lt; indicating the first file, and the line echoed.</li>
<li>Following this we have a &#8220;&#8212;&#8221; as a separator between the line compared.</li>
<li>Next we have &gt; indicating the second file, and then that line is echoed.</li>
<li>This is a comparison between to lines that were found to be different.</li>
<li>For the next line that is shown, we have &#8220;3a4&#8243; that indicates that there is a line added to the file.</li>
<li>Finally, &gt; indicates the second file followed by the line being echoed.</li>
</ul>
<p>If we were to compare them in the other order, we end with these two lines:</p>
<p style="padding-left: 60px;">4d3<br />
&lt; Again.</p>
<ul>
<li>Here, &#8220;4d3&#8243; that the 4th line on the first file is deleted from the second file.</li>
<li>Following this is &lt; indicating the first file, and echoing the line.</li>
</ul>
<p><strong>Regular Options</strong></p>
<p>Here are the list of options that &#8220;&#8211;help&#8221; gives you, with maybe some more explanation.</p>
<p><em>diff [-b] [-i] [-t] [-w] [-c] [-C] [-e] [-f] [-h] [-n] [-D string] [-l] [-r] [-s] [-S name] [fileone filetwo ] [directoryone directorytwo]</em></p>
<table class="mtable" style="width: 100%;" border="0" cellspacing="1" cellpadding="5">
<tbody>
<tr class="tcw">
<td style="width: 120px;" valign="top">-b</td>
<td valign="top">Ignores spacing differences.  This is useful when white-space doesn&#8217;t matter in what you are comparing.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">-i</td>
<td valign="top">Ignores case.  This is useful when case doesn&#8217;t matter in what you are comparing.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">-t</td>
<td valign="top">Expands TAB characters in output lines. Normal or -c output adds character(s) to the front of each  line that may adversely affect the indentation of the original source lines and make the output lines difficult to interpret. This option will preserve the original source&#8217;s indentation.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">-w</td>
<td valign="top">Ignores spaces and tabs.  Again, for when we don&#8217;t want to include changes in the white-space.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">-c</td>
<td valign="top">Produces a listing of differences with three lines of context. With this option output format is modified slightly: output begins with identification of the files involved and their creation dates, then each change is separated by a line with a dozen *&#8217;s. The lines removed from file1 are marked with &#8216;-&#8217;; those added to file2 are marked &#8216;+&#8217;. Lines that are changed from one file to the other are marked in both files with &#8216;!&#8217;.</p>
<p>With our two files we get this output:</p>
<p>*** file1.txt    2009-11-17 10:20:38.000000000 -0700<br />
&#8212; file2.txt    2009-11-17 10:20:51.000000000 -0700<br />
***************<br />
*** 1,3 ****<br />
This is a test file:<br />
! And this is the first line of the first file.<br />
Thanks.<br />
&#8212; 1,4 &#8212;-<br />
This is a test file:<br />
! And this is the first line of the second file.<br />
Thanks.<br />
+ Again.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">-C</td>
<td valign="top">Produces a listing of differences identical to that produced by -c with number lines of context.</p>
<p>There is no difference to just -c with our examples if you supply a number.  i.e diff -c 1 file1.txt file1.txt</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">-e</td>
<td valign="top">Output an ed script.  I have looked at these, but really haven&#8217;t used this feature for anything real.  I may later if I have time.</p>
<p>With our files it looks like this:</p>
<p>3a<br />
Again.<br />
.<br />
2c<br />
And this is the first line of the second file.<br />
.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">-f</td>
<td valign="top">Produces a similar script, not useful with ed , in the opposite order.  (Really, this is exactly like -e except in reverse order.)</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">-h</td>
<td valign="top">Does a fast, half-hearted job. It works only when changed stretches are short and well separated, but does work on files of unlimited length.  Options -c, -e, -f, and -n are unavailable with -h. diff does not descend into directories with this option.</p>
<p>With our example files it produces the same output as with no options.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">-n</td>
<td valign="top">Produces a script similar to -e, but in the opposite order and with a count of changed  lines on each insert or delete command.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">-D string</td>
<td valign="top">Creates a merged version of file1 and file2 with C preprocessor controls included so that a compilation of the result without defining string is equivalent to compiling file1, while defining string will yield file2.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">-l</td>
<td valign="top">Produce output in long format. Before the diff, each text file is piped through &#8216;pr&#8217; to paginate it. Other differences are remembered and summarized after all text file differences are reported.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">-r</td>
<td valign="top">Applies diff recursively to common subdirectories encountered.  Just like you would expect if you have ever used this with any other command line tools like grep or rm.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">-s</td>
<td valign="top">Reports files that are the identical; these would not otherwise be mentioned.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">-S name</td>
<td valign="top">Starts a directory diff in the middle, beginning with the file name.  Basically this is a compare directory after a supplied file name.  Make sure this file exists in both directories or you will be disappointed.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">filenameone</td>
<td valign="top">File one for comparing.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">filenametwo</td>
<td valign="top">File two for comparing.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">directoryone</td>
<td valign="top">Directory one for comparing.</td>
</tr>
<tr class="tcw">
<td style="width: 120px;" valign="top">directorytwo</td>
<td valign="top">Directory two for comparing.</td>
</tr>
</tbody>
</table>
<p>For comparing file, the first four options (-b, -i, -t, -w) are the most useful.  I don&#8217;t start with any options and add them as I need them to reduce the amount of change noise reported in the result set.</p>
<p><strong>diff is your friend</strong></p>
<p>Like many basic tools, &#8220;diff&#8221; is almost always there.  And if you know how to use it effectively, it can really save time and frustration.  Sure there are other cool file comparison tools.  Some are even embedded into other products.  But knowing how to use the basic tools that are always there will be a life saver in a crisis situation.  And the only way to know how to use them is to actually use them sometimes.</p>
<p>Do you use &#8220;diff&#8221; with a set of options that does a specific task for you?  If so, what are they, please share.  And what other basic tools do you use?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.theintegrationengineer.com/whats-the-diff/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using disk: enhanced
Database Caching using disk: basic

Served from: www.theintegrationengineer.com @ 2012-02-05 10:36:17 -->
