User:Ramesh1442

Coding techniques in Data migration from old to new applications

Following are the coding and environment setup tips and techniques for Data migration from old to new applications 1. We must find the I/O bottlenecks. More than half of I/O operations overall was done on migration data, and here I mean only data, not indexes. Undo, migration indexes, source data, target data each took about 10% of overall I/O and temp ~5%. So the conclusion for us was to reduce I/O contention as much as we could on migration data. If you experience the same problems and haven’t access and/or resources for high level I/O load balancing schemes you can use at least poor man’s striping for migration data. We didn’t care much about source and target data load balancing because source tables where accessed only once when all data moved to migration tables and target tables also where mostly accessed once when inserting migrated data. Only small amount of additional lookup was necessary to target data in migration process.

2. Adjust enough space for temp and undo. Think in gigabytes here, not megabytes.

3. Use some kind of logger to log start and finish of each transformation and validation. You can use autonomous transaction for this purpose. Measure time each transformation and validation requires and focus on tuning only those that run significant time. If overall migration process takes 3 hours you don’t care about transformation that runs 10 seconds even if you can reduce necessary running time 10 times. But you do care if you can reduce a transformation that takes 2 hours by a small 10%.

4. Most probably you don’t care about media failure during migration (it would be easier restart the process from beginning) so you can use INSERT with APPEND hint and CREATE INDEX with NOLOGGING option. This helps reducing a big amount of redo generation. These two options are especially useful when moving data from source tables to migration tables and later when doing inserts in target tables. But remember that inserting with VALUES keyword always generates redo as well as redo is generated if you have any indexes on the table or database is in FORCE LOGGING mode.

5. In initial load from source tables to migration tables try to filter out data that obviously aren’t possible to migrate to reduce the migration data size that will be changed over and over again. Each full scan on table will go through all table blocks even those you’ll never need.

6. Use referential integrity for migration data only where you really need it. This is contrary to normal applications where you have to define referential integrity as much as possible in database. Why? Because your code would be the only one modifying migration data. Because migration data would be interesting only in migration process and shortly after it, normally you don’t need to enforce integrity constraints because in future someone could easily corrupt integrity. In migration process you don’t care about distant future, you do care about current moment and speed. Of course if you are doubtful and think that your code cannot guarantee uniqueness of some column(-s) and parent/child relationship between some of your tables then you should enforce primary keys and foreign keys respectively. Enforcing them in the database is the fastest way if you think they could be broken.

7. Consider seriously PCTFREE parameter for migration tables. If these are subjects of heavily updates then default value of 10 is too less. With default value you’ll end up with so many chained rows that they will very much slow down updates and selects on your tables. After testing the „real migration” ANALYZE your tables and look for CHAIN_CNT in dba/all/user_tables view. If it is larger than some percents of total rows in table you should increase PCTFREE. Of course it all depends and I cannot give you exact value, but I become suspicious for values larger than 5%.

8. Standardize your migration tables e.g. if you use some kind of logging info then use it in a consistent way. If you have a flag that indicates whether row has to be moved to target tables then name it in the same manner and use the same data type everywhere. We used following standard columns for all our migration tables:

Last transformation or validation that processed this row; Timestamp when it was last processed; Does it must be migrated? Timestamp when it has been migrated to target system.

9. Use SQL and set operations as much as you can. Use row by row operations only if these are really necessary. Remember that you have such constructions like DECODE and its younger brother CASE, NVL, various string functions, WITH keyword for SELECTS, INSERT into multiple tables that somehow helps you calculate and manipulate with values based on current row. And become familiar with ANALYTIC functions that give you ability to look on other rows and their products in a result set. Remember that you can create as many migration tables as you really need. Consider row by row operations and cursors only as a last option.

10. If you really cannot avoid loops and row by row operations then use some kind of tool that provides you with information how much work is done already. This tool for example can be DBMS_APPLICATION_INFO.SET_SESSION_LONGOPS. For more information read Oracle9i Supplied PL/SQL Packages and Types Reference or PL/SQL Packages and Types Reference 10g Release 1.

11. Calculate constants only once. Do not recalculate some lookup parameters (for example current user or user’s organization) for every row. Even if you have some classifier that has several values it is better to initialize some variables or constants in the beginning of package or procedure than to query from a lookup classifier table for each row. Your code wouldn’t be dynamic and probably not so elegant, but it would be much faster and that is main goal at least for migration code.

12. Consider using sysdate everywhere. Do you really care whether row was transformed in 10:31:46 when it really was transformed or all rows were transformed in 10:31:00 when transformation started? It is especially important if you use a construction SELECT sysdate INTO variable FROM dual. Why not simply initialize a variable in beginning of each transformation rather than doing this for each row? If you are concerned whether it costs anything just run dbms_profiler and look at results.

13. In your requirements gathering process you should identify some common transformations and/or validations for migration data. How to realize them? There are two possibilities: as a common procedure/function or everybody uses his own solution. Common solution would give consistent results every time but probably wouldn’t be the fastest one, in opposition to everyone’s own solution. So you have to understand what benefits and disadvantages would give each solution (for example how big the risk is that somebody would code transformation in a way that isn’t compatible with others) and make a decision.

14. Be suspicious about user-defined functions that are called for each row and do some queries on their own. Better incorporate them into upper statement.

15. Full scans aren’t bad. Especially in migration process. Migration involves many transformations and validations that affect all or most of the data in a table. If you have to process all data in a table then do it! Do it with full scan! Try hash joins. Nested loops and access path via indexes can give you wonderful results in an OLTP application where you have to select or update several records, but it can be very slow process if you have to select or update all records. And one more benefit from full scans and hash joins - you can monitor them in v$session_longops and some tools like Enterprise Manager even can visualise them. I prefer to know how much work has been done and how much remained.

16. Before loading data into target schema turn off all auditing and drop all indexes if possible. Consider creating audit data later especially if this is some kind of home grown auditing via triggers. Loading audit data in one single move will be much faster than firing a trigger on each row. The other thing is indexes and constraints on target tables. Creating indexes later would avoid possible fragmentation issues as well as with NOLOGGING will save your time. Enabling constraints NOVALIDATE also will save your time.

17. Measure overall migration performance! We have tried to measure and compare many overall statistics and events from v$session_event and v$mystat and found that only reliable measurement is how many rows package migrated in the given time. Our migration rate was from 20 000 till 100 000 rows in a minute for various data groups. Unfortunately it was obvious that data groups with lowest rate wasn’t the most complicated ones with most quantity or most sophisticated transformations. The biggest reason was the programming style of each developer. Developers having the least ratio didn’t follow at least one or several suggestions given here.

Some more advices for migration performance measurement:

Do that on the same hardware with the same environment; You can measure either target or source rows depending on your specificity. We measured target row count; Analyze the best and worst cases (assuming the complexity is comparable) and try to use practices from best case and avoid those used in worst case.

18. Test migration process at least once on all your real data. This would give you following benefits:

Provide with a real case how long it takes to migrate all the data; Ensure that you have dealt with all possible data combinations at least in that given moment; Give you real information about physical parameters such as overall disk space both for migration data and target data, necessary temp and undo space, PCTFREE parameter Provide possibility to test target data whether anything isn’t forgotten; Give you at least a little guarantee that main migration would run without emergency cases.

19. In actual migration be prepared for emergency. You should monitor migration process even if you have tested it several times, especially if data has been changed since then. New unexpected classifier values, changed execution plans – at least these are things that could affect you without notice.