Over the last few years, I've been a member of several different systems development teams. Whether working with Java developers on a web portal project or SQL coders on a financial data warehouse, I found that despite using vastly different tools and skillsets, developers were almost always working within a best practices framework. These best practices are coding norms, implemented to ensure code is efficient and can be understood and supported by other developers. These dictate various aspects of how development should be done, and can include anything from tool functionality and coding practices to commenting and documentation guidelines.

When I first started working with Informatica PowerCenter a few years ago, I watched dozens of tutorials to get started. While I was able to learn how to use the functionality of the program, I realized that there was one area that was particularly hard to find much detail – you guessed it…best practices. Unfortunately, there is only a small amount of information provided in the Informatica tutorial, and it was difficult to find guidance outside of my team for many of the scenarios I encountered. Whether you are a new or seasoned Informatica developer, consider implementing these 5 best practices to your development process.


1. Improve Descriptions and Comments

The best way to ensure your code can be understood by others is to use the built-in features in the ETL tool to comment on the transformations. Good annotation will not only ensure that others can more easily interpret your code, but also help them understand why and how your transformations are working. Writing a good description also serves as a good way to double check your logic and ensure what you've created satisfies the requirements.

Type

Guideline

Mapping description

Include a brief explanation of the mapping. This may also include any indirect dependencies that the mapping has.

Transformation description

Brief description of the purpose of the transformation. Include details like why it is necessary, and what functions it performs.

Port level descriptions

If ports within a transformation contain complex logic, use this field to explain what is happening and why.

2. Use Standard Naming Conventions

Perhaps one of the easiest ways to simplify readability of an ETL process is to create and adhere to standards for naming conventions. This should include all aspects of the process, including the process name, the transformation names, and the port or field names within the transformations. Rules should be created for different scenarios – such as input or output ports, lookup ports, etc. If your organization already has naming conventions, then make sure to abide by them. In addition to the naming expressions that Informatica recommends, here are some other standards that I find simple and logical.

type

Guideline

Example

Mapping name

m_(target_table_name)_(source)_(action)

m_dim_employee_pplsoft_insert

Session name

s_(mapping_name)

s_dim_employee_pplsoft_insert

Connected Lookup

lkpc_(table_ name)

lkpc_dim_employee

Unconnected Lookup

lkpu_(value returned)

lkpu_emp_dmn_id

Unaltered port

field_name

employee_id

Input only port

i_field_name

i_employee_phone_home

Variable port

v_field_name

v_employee_phone_home

Output-only port created in mapping

o_field_name

o_employee_phone_home

Field brought in from a lookup

lkp_field_name

lkp_emp_dmn_id

Field which has been created or changed in an earlier transformation in the mapping

x_field_name

x_source_sys_name

3. Improve Lookups Part 1: Remove unused ports

Lookup transformations should only include required ports. For example, if you are looking up an employee's address in an Employee Dimension table based on an employee ID, you would NOT want to include all of the ports from the Employee Dimension table in the lookup transformation. It's most efficient to only include fields in the lookup that you actually require (such as address1, address2, city, state and zip) and remove all unused fields (such as email address, start date, tenure, etc.). This may sound like common sense, but when a lookup table is created in PowerCenter, all fields from that table are included by default. Trimming out unused ports will boost performance by reducing the cache size, and will improve readability as a user can very quickly see what fields actually matter.

4. Improve Lookups Part 2: Use caching strategies and unconnected lookups appropriately

Consider the amount of rows you are processing from the source against the size of the lookup table. If you are only processing a handful of rows from your source and the lookup table is enormous, it is best to use an uncached lookup. Using this method, PowerCenter will perform a direct lookup to the table each time rather than spending the time to build the cache of the huge lookup table, only to process a few rows against that cache.

On the flip side, use unconnected, cached lookups within expression transformations when possible. Like functions in other programming languages, unconnected lookups will return a single output value based on one or multiple inputs. The unconnected lookup is efficient because it creates a cache at the beginning of the session, and this cache can be reused multiple times in the lookup without having to be rebuilt. Keep in mind though, this cache will remain static, so it should only be used when you aren't expecting values in your lookup table to change as you load data.

5. Simplify router logic

When using a router, it's best to minimize the amount of logic for each route group. The method I recommend is to create processing flags that would contain boolean output such as "o_Insert_Flag" or "o_Error_Flag" in an expression that precedes the router. Then you can simply set the logic of each router group to handle the value that is set in the flags. This not only makes your router cleaner and more efficient, but also will improve readability and support.


Different organizations and teams may have unique needs, so over time, they will need to create their own sets of best practices and development norms. When developers understand and adhere to these, overall quality will rise as the ETL processes run more efficiently and can be supported by other developers within the organization. If you're a new Informatica developer, a member of a development team that needs some coding norms, or just looking for some extra resources, I hope you will consider these five points in your next project.