Friday, June 29, 2007

Working with Lookup

Lookup is a passive transformation which accept parameters and returns one ore more columns in result. There are different types of lookup transformations like
1. Connected Cached Lookup ( Also referred as Static lookup)
2. Connected Uncached Lookup
3. Unconnected Cached Lookup
4. Unconnected Uncached Lookup
5. Dynamic Lookup

The same lookup transformation can be changed to Cached/uncached/dynamic based on the property selected.




Connected Cached Lookup

As name suggest, this transformation is in the flow of each row and input and output port is connected to other transformation in the mapping. This lookup can return more than one column in the result.
Characteristics of Connected Lookup
  1. A connected Lookup transformation receives input values directly from another transformation in the pipeline.
  2. For each input row, the PowerCenter Server queries the lookup source or cache based on the lookup ports and the condition in the transformation.
  3. If the transformation is uncached or uses a static cache, the PowerCenter Server returns values from the lookup query.
  4. The PowerCenter Server passes return values from the query to the next transformation.
Characteristics of Unconnected Lookup

An unconnected Lookup transformation receives input values from the result of a :LKP expression in another transformation. Unconnected Lookup transformation can be called more than once in a mapping.

  1. An unconnected Lookup transformation receives input values from the result of a :LKP expression in another transformation, such as an Update Strategy transformation.
  2. The PowerCenter Server queries the lookup source or cache based on the lookup ports and condition in the transformation.
  3. The PowerCenter Server returns one value into the return port of the Lookup transformation.
  4. The Lookup transformation passes the return value into the :LKP expression.

Friday, June 22, 2007

Informaitca Lookups

Many people has asked the difference between Dynamic and Static lookups.



1. As the name suggest, dynamic lookup changes in between the process and Static remain the same for the process.

2. Dynamic and Static name are with respect to the Cache. In dynamic lookup, lookup Cache are updated in between the process.

3. Static lookup can be uncached. But Dynamic lookup has to cached lookup.

4. Since Cache is updated in dynamic lookup, there is performance impact in case of Dynamic lookup.



Then there is usual question, can we use uncached static lookup as dynamic lookup? Reason this question comes up because in case of static uncached lookup, informatica will perform select statement from database directly for each row. But still this can not replace dynamic lookup. You can think over this and let me know if you can figure it out.

Error Handling Approach

Error Handling is one of the most important component in any Data warehouse or data integration project. If process is not able to handle and manage ERROR effeciently, then it is very rare that the project will succeed.



Following steps and questions should be answered

1. Who is the business owner of the process/project

2. Who will be responsible for Error correction

3. What will the interface to present the ERROR to business community

4. Error correction will be at source or IT will be responsibile to re-run error correction

5. Error retention period in ERROR table



Above are few points to consider before deciding on the ERROR management process. Any ETL will provide feature to implement ERROR management. And sometime we go outside the ETL tool for presentation and correction of ERROR.



Informatica version 7.x onward provide feature where you can capture Error information at transformation level and Source record in ERROR table or ERROR File. These tables or files can be re-processed to present the error records to business community. But in my opinion it is always better to have a common ERROR management process of all data integration project ( independent of tool). Any ETL tool can support that Error management process.

Wednesday, January 24, 2007

Is ETL solution for all problem!!

Many time I come across a situation where management team want to nail down the issue by using a key world "ETL solution". For them ETL is solution for all data transfer performance, data quality etc etc. I hope one day this will be possible to solve most of issue with one single solution.

ETL is not the solution for all Data integration. Yes ETL tool can help and enable data integration. But the business community has to take the ownership to define the business rule and provide regress data cleanising and data integration rules. And is not the end. They have to keep updating the business rule and keep them self on the top of any ERROR which are posted in the process.

ERROR from the data integration create major challenge to IT and business community. Most of time, there is no specific owner assigned to resolve the ERROR. Each group try to push the bug on other side.

ERROR handling ( correcting ) is one of the most in important component to make successful data integration. In the process, we should try to get managable set of error. Means we should not create ERRORs which is almost unrealistic to correct or manage. This again boils down to effective business rule definition. If at initial test we accept huge number of ERROR records, then data has to cleansed before bringing to Data Integration process. There are many tool available in market to do data profiling and data cleanising.

Next topic we will look at different ERROR handling approach.

Friday, January 19, 2007

Why Informatica

This is first question any technical person should have before they put their time and money to learn any tool. Why to learn Informatica ETL tool. Why organization are moving away from PL/SQL or traditional coding practice.

Informatica is engine based ETL tool. Informatica provide set of pre-build transformations, where you pass an input and you get a output. This requires minimal coding and Informatica code can be understood easily by any new developer. This helps in overall maintenance of Informatica code. Informatica has very sweet developer interface for development and also monitoring the jobs. Informatica provides capability to add comments




Check out my remote training courses at LearnersParadise.com at http://www.learnersparadise.com/mentors/cgi-bin/showProfile.pl?class_id=1471&referred_by=contact_ak@yahoo.com