Thursday, October 26, 2006

Router and Filters

Router and Filter transformation are similar in behavior but router has some additional features. Informatica added Router transformation version 5.x onward. Lets first talk about Filter transformation
----------------------
1. Filter transformation is an active transformation
2. The Filter transformation allows you to filter rows in a mapping.
3. All the rows which meet the filter condition pass through the transformation.
4. It has only two type of ports, input and output.
5. Use filter transformation as close to source as possible for better performance. This make sense, beacuse number of rows to go through other transformation will be less.
6. Can not merge rows from different transformation in Filter transformation.
7. Expression can be used in the filter condition.

Router
-------
A Router transformation is similar to a Filter transformation because both transformations allow you to use a condition to test data. Router is enhancement of Filter transformation. Developer now prefer to use Router transformation rather than filter. In filter transformation, you can test data for only one condition where as in Router transformation you can test data for more than one condition. The same problem can be solved using Filter and Router transformation. But we have to use multiple filter transformation for each test condition. But with Router you can give different test condition in same transformation and informatica will be create different group for each test condition.

1. Router is also active transformation
2. Only two types of port , Input and Output
3. One default group. If row does not satisfy any test condition, it will fall into default group.
4. Mulitple test data condition can be created as output group.

Thursday, October 12, 2006

Informatica - Architecture

Informatica provides following components


  1. PowerCenter Repository
  2. PowerCenter Repository Server
  3. PowerCenter Client
  4. PowerCenter Server

PowerCenter Repository : It maintains the metadata. Set of tables are created within repository database to maintain the Informatica metadata. PowerCenter client and server access the repository to access metadata.

PowerCenter Repository Server: It manages connections to the repository from client applications.

PowerCenter Client: This is used to maintain the powercenter objects such as Source, Targets, mappings etc. It has following client tools

  • Designer : To create mappings that contain transformation instructions for the PowerCenter Server.
  • Workflow manager: To create, schedule, and run workflows. A workflow is a set of instructions that describes how and when to run tasks related to extracting, transforming, and loading data.
  • Repository Manager: To administer the metadata repository. You can create repository users and groups, assign privileges and permissions, and manage folders and locks.
  • Repository Administrator Admin Console: To administer the Repository Servers and repositories
  • Workflow Monitor: To monitor scheduled and running workflows for each PowerCenter Server.

PowerCenter Server: With help of repository and repository server, powercenter server execute the business logic for extraction, transformation and load.

What is Repository ?

Powercenter repository is maintained in RDBMS . It contains the instruction required to extract, transform and load data. PowerCenter access repository using repository server. Metadata in the repository is added using powercenter client tools. There can be two types of repository

  1. Global Repository: Object created in Global repository can be access across different repository.
  2. Local Repository: Objected created in local repository can only be accessed within the repository. Global repository are registered to local repository so that global repository objects are access in local repository using shortcuts.

Wednesday, October 11, 2006

Transformation - Aggregator

Transformation - Aggregator
Aggregator is an active transformation. Output from the aggregator can be different from input. Designer allows aggregator functions only in this transfromation. Following types of function can be used
  1. MIN
  2. MAX
  3. AVG
  4. COUNT
  5. FIRST
  6. LAST
  7. MEDIAN
  8. PERCENTILE
  9. STDDEV
  10. SUM
  11. VARIANCE

Along with these aggregate function, you can use other row level functions such as IIF, DECODE etc.

In Aggregator transformation, at least one port has to be selected as group by column. By default, aggregator will return the last value for a port ( if there are more than one record for group by column). Aggregator will also sort the data in ASC order on group by port.

NOTE: If primary column of the source is used in group by port, then aggregator will work as sorter transformation.

Nested Aggregate ports can not be used in Aggregator. Means, you can not get the count(*) in one port and use this value in other Aggregator port. This will invalidate the mapping.

Aggregtor has a property "SORTED INPUT". If you check this property, then aggregator assumes that data is coming in sorted order ( on group by ports). If not, at run time session will fail. Sorted Input improves the aggregator performance.

Tuesday, October 10, 2006

Transformation - Source Qualifier

Source Qualifier is the default transformation with RDBMS and Flatfile source. When ever you pull RDBMS or Flatfile source in Mapping Design workspace, informatica will link the source definition with Source Qualifier. This transforamtion can be used to perform following tasks

  1. Join data from Homoginious RDBMS sources
  2. Write SQL filter statment for RDBMS source
  3. Define two join conditions between homogenious sources
  4. Specify Sorted Port
  5. Select distinct value from source

Source Qualifier uses "transformation data type". Trasnformation data type determines how the source database binds data while reading. It is not advisible to change the data type in source qualifier. If datatype between source definition and source qualifier does not match, informatica will mark the mapping invalid.

If more than one source qualifier used in mapping loading data to multiple target, "target load order" can be defined. This control the order in which the powercenter load data in target.

Flatfile Source Qualifer Properties

In flatfile every attribute is read as string and converted to source definition data type. Date attribute can also be directly read from flatfile if you define the datatime format in the propereties table.


You can specify the flatfile type as delimited or fixed width. you can specify the different type of delimiter , flatfile code page, Escape Character and number of records to skip ( this is helpful if the first record in the file is the column names).

Enjoy.

Sunday, October 08, 2006

Transformation

"Transformation" is informatica object which is used to read, manupulate and load data. There are two types of transformation namely "active" and "passive". Each transformations perform specific functions. Data pass in and out from the transformation through "ports". Ports can be of following type
1. Input only
2. Output only
3. Input & Output
4. Variable.

Transformation can be connected to the data flow or they can be unconnected. Unconnected transformation are not connected to any other transformation in the mapping. These unconnected transformation are called within other transformation and returns a value to that transformation.

Active Transformation: They change the number of rows or property of the output rows.

List of Active transformation
-----------------------------------
Aggregator
Source Qualifier
Custom ( can be active or passive)
Filter
Joiner
Normalizer
Rank
Router
Sorter
Transaction Control
Union
Update Strategy
XML Generator
XML Parser
XML Source Qualifier

Passive Transformation: They don't change number of rows or property of output rows from the transfromation.

List of Passive Transformation
-------------------------------------
Expression
External procedure
Input ( Mapplet transformation)
Lookup
Output ( Mapplet Transformation)
Sequence Generator

Enjoy. Next we will talk detail about each transformation.

Friday, October 06, 2006

INFORMATICA WAREHOUSE DESIGNER - QUESTIONS

1. Can you create source definition using target definition?
2. Can you execute DDL commands from warehouse designer?
3. What type of connection warehouse designer use to connect to RDBMS ?
4. What is Native and Transformation Data Types ?

INFORMATICA SOURCE DEFINITION QUESTION

1. What are different ways to import Source definition?
2. I have imported the source layout using a flat file. But when I run the mapping using this source layout and flatfile ( used for creating the layout), mapping fails with invalid data type . WHY ?
3. What type of connection informatica uses to connect to RDBMS to import source definition?
4. Can Source Layout be used to create the target definition ?
5. How to read Date column from Flatfile as DATE data type?

Enjoy.

INFORMATICA - WAREHOUESE DESIGNER ( TARGET DEFINITION)

Second step in Mapping development is to create your targets. A target in a mapping can be a flatfile, RDBMS or XML. From version 6.x onward, informatica support heterogeneous target in mapping. (Version 5.x and before, it supported on homogeneous targets)
Following target definitions can be imported
FLAT FILE:The Warehouse Designer uses the Flat File Wizard to import a target definition from a flat file that exactly matches the structure of the flat file. For more information, see Importing Flat Files.

RELATIONAL TABLE: You can import a relational table to create a target definition that exactly matches the structure of the relational table.

XML FILE: You can import an XML target definition from an XML, DTD, or XML schema file.
Once you add a target definition to the repository, you can use it in a mapping.
Importing a Relational Target Definition
NOTE: Before you can import target definition from RDBMS, create an ODBC connection to the RDBMS.
When you import a target definition from a relational table, the Designer imports the following target details:
1. Target name: The name of the target.
2. Database location: You specify the database location when you import a relational source. You can specify a different location when you edit the target definition in the Warehouse Designer and when you configure a session.
3. Column names: The names of the columns.
4. Datatypes: The Designer imports the native datatype for each column.
5. Key constraints: The constraints in the target definition can be critical, since they may prevent you from moving data into the target if the PowerCenter Server violates a constraint during a workflow.
6. Key relationships: You can customize the Warehouse Designer to automatically create primary key-foreign key relationships.
You can also create logical relationships in the repository. Key relationships do not have to exist in the database.
Steps to import a relational target definition

In the Warehouse Designer, choose Targets-Import from Database.
Select the ODBC data source used to connect to the target database.
Enter the user name and password needed to open a connection to the database, and click Connect.
Specify the Schema Name from where you want to get the target definition.
Select the relational table or tables to import the definitions into the repository.

Informaitca - DESIGNER

DESIGNER

Designer is one of the most common tools used by informatica developer. List of basic task performed in designer are


1. Connect to the repository using the Designer
2. View sources, targets, transformations, and mappings
3. Create/edit sources, targets, transformations, and mappings
4. Create shortcuts from a shared folder
5. Copy object from one folder to another
6. Copy and paste objects in mappings or mapplets in the same folder
7. Compare objects
8. Export objects
9. Import objects
10. Edit breakpoints
11. Create a debug session

Designer itself has different development workspace tools

1. Source Analyzer
2. Warehouse Designer
3. Transformation developer
4. Mapplet designer
5. Mapping designer.


Each of above tool has very specific use in development process.

SOURCE ANALYZER
------------------------------
This is used to maintain (Create, update and delete) Source definition. There can be different type of source definition like Flatfile, Oracle, VSAM etc. Source definition can be created manually (you have type all the column names, dataype and precession) or you can import from RDBMS or flat file layout. While importing the source definition from RDBMS it uses ODBC connection to connect to database.

SOURCE DEFENITION CREATION
----------------------------------------------
Source definition can be created in different way. But important thing to remember is that once source definition is created is store in Informatica repository and has no link with the source from where it is created. Source can be created
1. Manually. Go to "SOURCES" menu, and select "CREATE". Give Source name and define Column Name, Data type and precession.
2. Import from Oracle: Before one can import source definition from Oracle, create a RDBMS connection to RDBMS. Select this connection and provide username, password for the database and SCHEMA name. It will show list of tables in the Schema. Select the table and import the table.
3. Import from Flatfile: One can import Source file definition from Flatfile (fixed width or Delimited). If first row of the file is column name header, then it can used to create the column name else you have to manually give the name ( Informatica by default will give FIELD1, FIELD2 etc.). The data type it will derive from the first data row from the file.

Enjoy this.

Wednesday, October 04, 2006

What is Informatica

Informatica is Engine driven ETL tool mostly used in Data Integration and Data warehouse enviornment. It is supported on different environments ( UNIX, WINDOWS, MAINFRAME etc). Current version in Market is 8.x. It has different tools as
1. Designer
2. Workflow Manager
3. Workflow Monitor
4. Repository Manager
5. Administrator Console

Informatica server runs as service called PowerCenter server. Informatica server does the major work at execution time to extract/load/transform the data based on metadata.
Informatica maintains all its metadata in repository ( RDBMS based). At development time, all work done in designer and workflow manager is stored in repository. At run time informatica server get processing logic from repository.

I will talk detail about individual tools in next blog.