data management professional

February 12, 2009

ORACLE Tuning – When the SQL can’t be modified

Filed under: Architecture, Database Management, Perfromance Tuning — Tags: , , — James Parnitzke @ 3:15 pm

Many times we may encounter tuning or performance related problems where the underlying data access logic is buried or not available to us. The application may for example use an Object Broker or Query Server where SQL statements are simply unknown because they are generated at runtime and may change for each execution. The application’s query generator may be not able to correctly deal with specific constructs (for example, a restriction in the WHERE clause that cannot be resolved through the use of an index) that are used by many SQL statements. This is not the complete nighmare scenario there are still steps you can take to improve perfroamce. In this case, you may have to resort to techniques that solve the problem at the session or system level, not at the SQL statement level.

When you don’t have complete control of the SQL statements the application creates because the code is not available or is generated dynamically at runtime our tuning choices will probably be restricted to optimizing the data access structures (system) or modifying the execution environment (session). Recall the following can only be used to tune or optimize for specific SQL statements. They may not be applicable at the session or system level.
– Alter SQL Statements
– Use of hints
– SQL Profiles
– Stored Outlines
– SQL Plan Baselines

So what do we do?
The same tools Oracle provides that save hundreds of hours of labor intensive manual SQL tuning can also be used in this scenario to help at the system and session level where the code (SQL) is not available. Using these tools you can apply and measure global changes to the database first to potentially tune hundreds of queries at once. By failing to do global tuning first (for example using SQL with optimizer directives) you almost certainly find that subsequent global changes (e.g., optimizer parameter changes) may in effect “un-tune” carefully crafted individual SQL statements  Start with system-level tuning and establish a baseline before beginning to tune poorly performing individual SQL statements generated on behalf of the application. This means you should carefully use the following methods executed in the following order.

– Tune Operating System, Disk I/O, and Network environment
– Gather and baseline ORACLE optimizer statistics
– Adjust cost based optimizer parameters and tune to meet site specific needs
– Optimize the Application database instance
– Perform Query Optimization and Tune the SQL Access workload

Your best friend in this case is the product’s Cost Based Optimizer model. The query optimizer is directly responsible for the performance of SQL statements The ORACLE optimizer cost model is illustrated in the following high-level diagram.

oracle_cbo5

Note the sensitivity of the cost-based optimizer (CBO) cost model that accounts for I/O and CPU related data as well as table and index metadata. This is why it is important we tune the operating system, disk I/O, and network environment first and then gather and baseline ORACLE optimizer statistics. This cost model allows the optimizer to choose the best execution plans based on statistics of the underlying data. Decisions made by the CBO depend on the integrity of the statistics available at the time that the execution plan is parsed and executed. A common cause of poor performance is bad or outdated statistics in this scenario. Several useful tools for monitoring database performance and activity include the Active Workload Repository (AWR) and Automatic Database Diagnostic Monitor (ADDM). These tools can be used to pinpoint and isolate bad statistics that might be causing the optimizer to make bad decisions in preparing execution plans. Use the V$SYS_OPTIMIZER_ENV view to examine the current values for the optimizer related settings in the database instance. See the following URL for more information about this view:

http://download.oracle.com/docs/cd/B28359_01/server.111/b28320/dynviews_3080.htm

After understanding the common characteristics of the environment and application, then it is time to begin carefully optimizing the application database instance (there is a lot to this as well, will discuss this in a future post). Then and only then are you ready to perform what query optimization you can by tuning the SQL Access workload.
To summarize, the recommended approach to performance optimization for ORACLE when the application product doesn’t provide access to the underlying query engine is to use the following methods executed in the following order.

– Tune Operating System, Disk I/O, and Network environment
– Gather and baseline ORACLE optimizer statistics
– Adjust cost based optimizer parameters and tune to meet site specific needs
– Optimize the Application database instance
– Perform Query Optimization and Tune the SQL Access workload

Performing these steps in this order will maximize the value of the optimization effort and mitigate the risk of altering a global parameter change in the database that will impact the application when you don’t have access to the underlying codebase.

February 11, 2009

Method for an Integrated Knowledge Environment (Mike 2.0)

Filed under: Methodology — Tags: , , — James Parnitzke @ 4:48 am

One of the more interesting sites I have kept an eye on is the open source Method for an Integrated Knowledge Environment better known as Mike 2.0 (http://mike2.openmethodology.org/wiki/MIKE2.0_Methodology). Initially created by a team from BearingPoint, the project started as the development of an approach for Enterprise Information Management in early 2005. Much of the content of the MIKE2.0 Methodology was made available to the Open Source community in late December 2006. The contributor base includes a number of individuals, from BearingPoint and from external community. The Data Governance and Management Consortium (DGMC) is being formulated to take ownership of MIKE2.0 and to lead the extension of many aspects of the methodology.

Sean McClowry is the overall lead for the MIKE2.0 Methodology. Sean wrote much of the Overall Implementation Guide, and was the core contributor on a number of MIKE2.0 Solution Offerings as well as the overall collaborative framework that hosts the MIKE2.0 Methodology. Andreas Rindler is the architect of the collaborative framework that hosts MIKE2.0 and primarily focuses on development of content related to Enterprise Content Management, Enterprise 2.0 and the Open Methodology Framework.

There is a wealth of material to explore on this site, you can spend days wading through the details and templates provided. As an example, I encourage you to see the Data Migration solutions offering found at http://mike2.openmethodology.org/wiki/Data_Migration_Solution_Offering and I think you will agree this is a terrific exposition on a subject area there is just not a lot written about. Of course there is much more to be done, but this represents an invaluable resource for all of us in this profession. Thank you Sean and Andreas for making this readily available to all of us, very much appreciate all the hard work you have put into this.

Data Profiling – Plan on it

It has always been a surprise that addressing data quality and profiling is not an activity I usually see planned in many data warehouse or migration projects. What is more bewildering is the assumption that multiple data sources will somehow magically transform themselves to meet “implicit canonical rules” and stakeholder expectations without investing the time and energy to ensure a common understanding about how to treat the inevitable anomalies and flat-out garbage data that has been spinning for years buried under a lot of application sleight-of-hand.

For example, note the typical flow of activities in a data migration project as represented in the following diagram.

typicalmigrationeffort3

Planning for and adopting an automated data profiling and mapping set of work streams should be an integral part of the overall program, but this is unfortunately the exception rather than the rule. Unless the source data is pristine (yeah sure) across multiple source sinks want to guess what the result is? A lot of unnecessary rework, data quality issues, and dashed stakeholder expectations when the data values are revealed in the acceptance stage. Hoping the data can be sourced, mapped, and transformation rules applied in a paper exercise (or using Excel for more advanced organizations) represents a true failure to understand the opportunity. A more robust model would include planning assumptions to support and enable the following model assumptions about the fundamental processes needed to ensure success (or at least reducing guessing and defects we can expect). The major processes would include a model with the major elements identified in the following diagram.

dm_processes

Note how the profiling activity is used for both data mapping as well as the validation and certification processes with the customer. So the robust and more sensible model has evolved to meet an important and urgent need, reducing overall costs, and presenting real opportunities for compressing key process cycle time in the overall effort as illustrated in the following diagram.

improvedmigrationeffort

So best practice should include the addition of a robust data profiling and mapping work streams to be used in a wide variety of front side activities as well as the validate and certification processes before release as the product approaches production. Will address the details of how this works and look forward to hearing from other hands-on practitioners who have experienced the power of this construct in their own practice.

The best of intentions…

Filed under: About — Tags: — James Parnitzke @ 12:57 am

Welcome to the data management professional blog. Will try to keep this as entertaining as possible and encourage all to contribute and share their personal and professional experiences in data management here. I will try to cover a number of topics ranging from master data management to analytics with a dash of common sense and business experience thrown in to preserve my reputation as a seemingly expensive, over-priced management consultant intact.  This eclectic collection of posts will in many ways reflect the kind of chaos and wide variety of issues will all encounter in our daily professional lives dealing with data and by necessity will require a little patience from time to time. Just like our stakeholders, will try to contribute my understanding of the news, events, and invaluable contributions others have made to our profession. And, thank you for your interest, hope this will meet your expectations and you will become a regular reader and contributor.

Blog at WordPress.com.