Thursday, October 30, 2008

Informatica Upgrade to 8.6

Version 8.6 Architecture



Data Profiler Connectivity

Sequence of installation (Informatica 8.6)

Install domain and create domain tables.
Define domain name and assign nodes
In PowerCenter Administrator console create repository service.
Create repository (or upgrade repository)
In PowerCenter Administrator console create Integration service.
In PowerCenter Administrator console create Metadata service.
Create Data profiling warehouse table in Data Profiling schema.
In PowerCenter Administrator console create Reporting service.
In reporting service create data analyzer repository in Data Analyzer schema.
Note: Installtion need atleast 2GB temp space .
Need 16K default page tablespace for domain creation.
To Start the Service
../server/tomcat/bin/infaservice.sh startup



Friday, June 29, 2007

Working with Lookup

Lookup is a passive transformation which accept parameters and returns one ore more columns in result. There are different types of lookup transformations like
1. Connected Cached Lookup ( Also referred as Static lookup)
2. Connected Uncached Lookup
3. Unconnected Cached Lookup
4. Unconnected Uncached Lookup
5. Dynamic Lookup

The same lookup transformation can be changed to Cached/uncached/dynamic based on the property selected.




Connected Cached Lookup

As name suggest, this transformation is in the flow of each row and input and output port is connected to other transformation in the mapping. This lookup can return more than one column in the result.
Characteristics of Connected Lookup
  1. A connected Lookup transformation receives input values directly from another transformation in the pipeline.
  2. For each input row, the PowerCenter Server queries the lookup source or cache based on the lookup ports and the condition in the transformation.
  3. If the transformation is uncached or uses a static cache, the PowerCenter Server returns values from the lookup query.
  4. The PowerCenter Server passes return values from the query to the next transformation.
Characteristics of Unconnected Lookup

An unconnected Lookup transformation receives input values from the result of a :LKP expression in another transformation. Unconnected Lookup transformation can be called more than once in a mapping.

  1. An unconnected Lookup transformation receives input values from the result of a :LKP expression in another transformation, such as an Update Strategy transformation.
  2. The PowerCenter Server queries the lookup source or cache based on the lookup ports and condition in the transformation.
  3. The PowerCenter Server returns one value into the return port of the Lookup transformation.
  4. The Lookup transformation passes the return value into the :LKP expression.

Friday, June 22, 2007

Informaitca Lookups

Many people has asked the difference between Dynamic and Static lookups.



1. As the name suggest, dynamic lookup changes in between the process and Static remain the same for the process.

2. Dynamic and Static name are with respect to the Cache. In dynamic lookup, lookup Cache are updated in between the process.

3. Static lookup can be uncached. But Dynamic lookup has to cached lookup.

4. Since Cache is updated in dynamic lookup, there is performance impact in case of Dynamic lookup.



Then there is usual question, can we use uncached static lookup as dynamic lookup? Reason this question comes up because in case of static uncached lookup, informatica will perform select statement from database directly for each row. But still this can not replace dynamic lookup. You can think over this and let me know if you can figure it out.

Error Handling Approach

Error Handling is one of the most important component in any Data warehouse or data integration project. If process is not able to handle and manage ERROR effeciently, then it is very rare that the project will succeed.



Following steps and questions should be answered

1. Who is the business owner of the process/project

2. Who will be responsible for Error correction

3. What will the interface to present the ERROR to business community

4. Error correction will be at source or IT will be responsibile to re-run error correction

5. Error retention period in ERROR table



Above are few points to consider before deciding on the ERROR management process. Any ETL will provide feature to implement ERROR management. And sometime we go outside the ETL tool for presentation and correction of ERROR.



Informatica version 7.x onward provide feature where you can capture Error information at transformation level and Source record in ERROR table or ERROR File. These tables or files can be re-processed to present the error records to business community. But in my opinion it is always better to have a common ERROR management process of all data integration project ( independent of tool). Any ETL tool can support that Error management process.

Wednesday, January 24, 2007

Is ETL solution for all problem!!

Many time I come across a situation where management team want to nail down the issue by using a key world "ETL solution". For them ETL is solution for all data transfer performance, data quality etc etc. I hope one day this will be possible to solve most of issue with one single solution.

ETL is not the solution for all Data integration. Yes ETL tool can help and enable data integration. But the business community has to take the ownership to define the business rule and provide regress data cleanising and data integration rules. And is not the end. They have to keep updating the business rule and keep them self on the top of any ERROR which are posted in the process.

ERROR from the data integration create major challenge to IT and business community. Most of time, there is no specific owner assigned to resolve the ERROR. Each group try to push the bug on other side.

ERROR handling ( correcting ) is one of the most in important component to make successful data integration. In the process, we should try to get managable set of error. Means we should not create ERRORs which is almost unrealistic to correct or manage. This again boils down to effective business rule definition. If at initial test we accept huge number of ERROR records, then data has to cleansed before bringing to Data Integration process. There are many tool available in market to do data profiling and data cleanising.

Next topic we will look at different ERROR handling approach.

Friday, January 19, 2007

Why Informatica

This is first question any technical person should have before they put their time and money to learn any tool. Why to learn Informatica ETL tool. Why organization are moving away from PL/SQL or traditional coding practice.

Informatica is engine based ETL tool. Informatica provide set of pre-build transformations, where you pass an input and you get a output. This requires minimal coding and Informatica code can be understood easily by any new developer. This helps in overall maintenance of Informatica code. Informatica has very sweet developer interface for development and also monitoring the jobs. Informatica provides capability to add comments




Check out my remote training courses at LearnersParadise.com at http://www.learnersparadise.com/mentors/cgi-bin/showProfile.pl?class_id=1471&referred_by=contact_ak@yahoo.com

Thursday, October 26, 2006

Router and Filters

Router and Filter transformation are similar in behavior but router has some additional features. Informatica added Router transformation version 5.x onward. Lets first talk about Filter transformation
----------------------
1. Filter transformation is an active transformation
2. The Filter transformation allows you to filter rows in a mapping.
3. All the rows which meet the filter condition pass through the transformation.
4. It has only two type of ports, input and output.
5. Use filter transformation as close to source as possible for better performance. This make sense, beacuse number of rows to go through other transformation will be less.
6. Can not merge rows from different transformation in Filter transformation.
7. Expression can be used in the filter condition.

Router
-------
A Router transformation is similar to a Filter transformation because both transformations allow you to use a condition to test data. Router is enhancement of Filter transformation. Developer now prefer to use Router transformation rather than filter. In filter transformation, you can test data for only one condition where as in Router transformation you can test data for more than one condition. The same problem can be solved using Filter and Router transformation. But we have to use multiple filter transformation for each test condition. But with Router you can give different test condition in same transformation and informatica will be create different group for each test condition.

1. Router is also active transformation
2. Only two types of port , Input and Output
3. One default group. If row does not satisfy any test condition, it will fall into default group.
4. Mulitple test data condition can be created as output group.

Thursday, October 12, 2006

Informatica - Architecture

Informatica provides following components


  1. PowerCenter Repository
  2. PowerCenter Repository Server
  3. PowerCenter Client
  4. PowerCenter Server

PowerCenter Repository : It maintains the metadata. Set of tables are created within repository database to maintain the Informatica metadata. PowerCenter client and server access the repository to access metadata.

PowerCenter Repository Server: It manages connections to the repository from client applications.

PowerCenter Client: This is used to maintain the powercenter objects such as Source, Targets, mappings etc. It has following client tools

  • Designer : To create mappings that contain transformation instructions for the PowerCenter Server.
  • Workflow manager: To create, schedule, and run workflows. A workflow is a set of instructions that describes how and when to run tasks related to extracting, transforming, and loading data.
  • Repository Manager: To administer the metadata repository. You can create repository users and groups, assign privileges and permissions, and manage folders and locks.
  • Repository Administrator Admin Console: To administer the Repository Servers and repositories
  • Workflow Monitor: To monitor scheduled and running workflows for each PowerCenter Server.

PowerCenter Server: With help of repository and repository server, powercenter server execute the business logic for extraction, transformation and load.

What is Repository ?

Powercenter repository is maintained in RDBMS . It contains the instruction required to extract, transform and load data. PowerCenter access repository using repository server. Metadata in the repository is added using powercenter client tools. There can be two types of repository

  1. Global Repository: Object created in Global repository can be access across different repository.
  2. Local Repository: Objected created in local repository can only be accessed within the repository. Global repository are registered to local repository so that global repository objects are access in local repository using shortcuts.