Posts Tagged ‘indexes’
The Rumble Down Under
Looks like Richard Foote won’t be on Howard Rogers’ Christmas card list for much longer (see the comments).
Functions Can Trump Indexes
Using a function on a B*Tree (normal) indexed column in your WHERE clause means that Oracle won’t use that index. For example:
select * from emp
where UPPER(lastname) = 'JONES';
will most likely result in a full table scan, despite an index on lastname. However, you could create a function-based index on UPPER(lastname), like this:
create index emp_upper_lastname on
emp(upper(lastname));
Of course, the whole point of Jeff’s post was to describe an alternative to using a function-based index by using UPPER() check constraints that the CBO can utilize.
The world becomes clearer every day.
Compound Primary Keys Must Die
It turns out that a lot of the headaches of the past few weeks at work are all due to an incredibly shitty primary key on a large warehouse table. Before I get into specifics there, I’d like to formally proclaim my allegiance to the use of surrogate keys and my hatred for compound keys. Many of the Oracle “gurus” (including Tom Kyte) agree as well. You can use the Google to find reasons to use them. I’ll list smaller indexes on both the table in question AND tables with foreign keys as a big reason off the top.
Now, on this table at work, which I’ll call FOO, there is currently a 10-field compound primary key. Yeah, that’s right. I take some shame in having this in my database, although the design was inherited, and I’m just now feeling comfortable and confident enough to start suggesting radical changes to our developers. So this primary key begins with columns LOC, CUST_NO, YEAR, and MONTH. Then it includes 5 fields that are rarely, if ever queried (I turned on FGA on this table and verified that). Then the last field is the relatively new FOO_SEQ_NO, which is unique for a LOC, CUST_NO, YEAR and MONTH. Virtually every query against this table uses LOC, CUST_NO, YEAR and MONTH. FYI, this table is also range partitioned on YEAR and MONTH.
The problem is that after we load a new large batch of records from a previously unloaded market into the current partition, the optimizer no longer uses the big PK index, instead choosing to use a second index which doesn’t even use CUST_NO. This causes a LOT of I/O and painfully slow query times, until we are able to gather_table_stats() on it, which can take up to 2 hours.
What became painfully obvious after I turned on fine-grained auditing, is that 95% of the queries only use LOC, CUST_NO, YEAR and MONTH, so the massively compound primary key index is almost 50% useless. On a development instance, I dropped the PK constraint and index and built a new non-unique index on just LOC, CUST_NO, YEAR and MONTH. I do plan to add a unique constraint on the former PK fields just to keep things sane in case some part of the application relied on that. The sample query ran fine before and after the dataload, using the new index right away without having to gather stats.
Now, the case of the FOO table was an EXTREME of an ugly compound key/index, and the problem wasn’t so much the compoundedness of it as much as the index not being reflective of what user queries needed. But I still loathe compound keys.
Reverse Key Indexes
So I started reading the Oracle 10g DBA Handbook because I’ve deluded myself into thinking that I’ll just memorize everything in there. The first chapter seemed to cover very basic stuff that even I had a grasp on, but it didn’t take long for me to learn something new. 18 pages in, I read of the magic of reverse key indexes. This is a pretty fundamental index type that I had somehow never heard about before this, so I thought I’d share it here.
A reverse key index does just that, stores the indexed values in reverse, so an id value of 123456 would be stored as 654321. One immediate advantage in doing so is reducing contention for blocks by distributing inserts across all leaf keys of the index. For example, inserting orders 1234, 1235, 1236 and 1237 would normally probably all go into the same block (if space is available), and the writers would wait for each other to finish with it. With a reverse key index, they would be spread to 4321, 5321, 6321 and 7321, which are not close together and probably not on the same node.
For a complete analysis, the link to Jonathan Lewis’ FAQ above is quality stuff.



