Moving [Back] to Blogger
Hello folks. Just a note to let you know that, on the heels of Google’s recent Google Apps integrations, I’ve decided to go back to blogger and make my life a bit simpler.
die Seilerwerks will now be found at http://www.seiler.us.
Beware the /var/tmp/.oracle Hidden Directory!
NOTE: This post originally appeared on the Pythian blog.
A few months ago, we had a test instance complaining that it couldn’t write to ASM. This was an 11.1.0.7 single (non-RAC) instance on Oracle Enterprise Linux 5, using ASM for the storage. We first saw these errors in the alert log:
ORA-15032: not all alterations performed ORA-29702: error occurred in Cluster Group Service operation ORA-29702: error occurred in Cluster Group Service operation ERROR: error ORA-15032 caught in ASM I/O path
Uh-oh, that doesn’t look good. So I log into the ASM instance and try to see if the disks are OK:
SQL> select path, mount_status from v$asm_disk;
select path, mount_status from v$asm_disk
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-29702: error occurred in Cluster Group Service operation
ORA-29702: error occurred in Cluster Group Service operation
Installing Oracle 11gR2 Enterprise Edition on Ubuntu 10.04 (Lucid Lynx)
Note: This post originally appeared on The Pythian Group blog.
I recently reformatted my laptop with the latest Ubuntu LTS release, 10.04, aka Lucid Lynx. Since I like to have a native client installation as well as a portable sandbox server, I decided to install the latest version of Oracle EE, 11.2.0.1.
Rather than re-invent the wheel, I’m going to direct you to the previous Oracle-on-Ubuntu post by my colleague Augusto Bott. Many of the directions there hold true here (even with 32-bit vs 64-bit), with a few exceptions.
Read the rest of this entry »
Applying Oracle 11.2 April 2010 PSU for Single-Instance ASM and DBMS
Note: This post originally appeared at The Pythian Group blog.
When news of the April 2010 PSU for Oracle 11.2 came out, I was excited to see it, since it marked the first non-one-off patch release for the 11.2 database software. I happened to have an 11gR2 test system running on 11gR2 ASM via standalone Grid Infrastructure. I applied PSU 9352237 to the DBMS home and fired it up, only to see the folly of my ways when any ASM file operations like disk resizing (or auto-extending) failed with ORA-1653. This was due to the DBMS component now having a higher version number than the ASM component, which ASM does not allow. The Grid Infrastucture PSU would need to be applied to bring the ASM component up to snuff, but that patch (9343627) was, at that time, only “announced” with no ETA. Alas, the patch was rolled back and we continued testing without it.
Then this week I check again and saw that PSU 9343627 was released and gave it a whirl. I was a little confused when the README seemed to contain a lot of instructions that always assumed it to be on a clustered, RAC install. My setup was a single-instance Grid Infrastructure installation just to provide ASM. I soon met problem upon problem when going through first this setup step: Read the rest of this entry »
A Drupal Backup Script
I maintain a drupal codebase that hosts multiple sites. I’ve been shamefully lax in getting regular backups of those files and databases, until today. Here is a pretty basic bash script that will create a bzip2 archive of the drupal codebase (including site-specific dirs), and then create mysqldump exports of each site’s database and gzip those files. This particular script requires that the drupal install directory be named “drupal”, but you can change this easily enough to suit your needs.
This does require a ~/.drupalsites file that contains the needed database login info (suggest chmod 600). Like I said, nothing too clever.
#!/bin/bash
DATESTAMP=`date +%Y%m%d`
SITESFILE=${HOME}/.drupalsites
# SITESFILE contents are in the following format:
# SITENAME:DBNAME:DBUSER:DBPASSWD
DRUPALDIR=/path/to/drupal
BACKUPDIR=${HOME}/backups/${DATESTAMP}
mkdir -p ${BACKUPDIR}
# Backup the drupal codebase
echo -n "Backing up ${DRUPALDIR} ... "
cd ${DRUPALDIR}/../
tar -cjpf ${BACKUPDIR}/drupal_${DATESTAMP}.tar.bz2 drupal
echo "Done."
cd ${BACKUPDIR}
# Backup MySQL Databases
cat $SITESFILE | while read line; do
#echo $line
line=(${line//:/ })
echo -n "Backing up MySQL db ${line[1]} ... "
DUMPFILE=${line[0]}_${DATESTAMP}.sql
mysqldump -u ${line[2]} -p${line[3]} ${line[1]} > ${DUMPFILE}
gzip ${DUMPFILE}
echo "Done."
done
echo "Backup completed, all files are in ${BACKUPDIR}."
This script runs fine from cron, I have it scheduled for a weekly run. I then plan to rsync this to my home server for an offsite copy, even though the hosting service provides backups as well.
HOWTO: Oracle Cross-Platform Migration with Minimal Downtime
Originally posted at The Pythian Group blog.
I recently performed a migration from Oracle 10gR2 on Solaris to the same version on Linux, immediately followed by an upgrade to 11g. Both platforms were x86-64. Migrating to Linux also included migrating to ASM, whereas we had been using ZFS to hold the datafiles on Solaris. Restoring files into ASM meant we would have to use RMAN (which we would probably choose to use anyway).
As with many databases, the client wanted minimal downtime. It was obvious to us that the most time-consuming operation would be the restore and recovery into the new instance. We were basically doing a restore and recovery from production backups and archived redo logs. It quickly dawned on me that we could start this operation well before the scheduled cutover time and downtime window, chopping at least six hours from the downtime window. The client would only need to keep the new instance in mount mode after the initial restore/recovery finished, periodically re-catalog the source instance’s FRA (which was mounted via NFS), and then re-run the recover database command in RMAN. Once the time comes to cutover, simply archivelog current the original instance and shutdown immediate. Then open the new instance with the RESETLOGS option, and voila! Migration complete!
I’ll try to recreate a simple example here. Read the rest of this entry »
Moving Oracle Datafiles to a ZFS Filesystem with the Correct Recordsize
Originally posted on The Pythian Group blog.
Full credit for this tale should go to my colleague Christo Kutrovsky for the inspiration and basic plan involved.
We recently migrated a large database from Solaris SPARC to Solaris x86-64. All seemed to go well with the migration, but in the next few weeks, we noticed some I/O issues cropping up. Some research led us to find that the ZFS filesystem used to hold the datafiles was killing us on I/O. The default “recordsize” setting for ZFS was 128k.
$ /usr/sbin/zfs get recordsize zfs-data NAME PROPERTY VALUE SOURCE zfs-data recordsize 128K default
An Oracle database typically uses 8k for the block size, but in this case it was 16k. We saw basically the same thing that Neelakanth Nadgir described in his blog post, Databases and ZFS:
With ZFS, not only was the throughput much lower, but we used more [than] twice the amount of CPU per transaction, and we are doing 2x times the IO. The disks are also more heavily utilized. We noticed that we were not only reading in more data, but we were also doing more IO operations [than] what is needed.
The fix is to set the ZFS recordsize for a datafile filesystem to match the Oracle instance’s db_block_size. We also read in the ZFS Best Practices Guide that redo logs should be in a separate filesystem with the default ZFS recordsize of 128k. We already had them separate, so we just needed to get our datafiles on a ZFS filesystem with a 16k recordsize.
Read the rest of this entry »
Setting up Network ACLs in Oracle 11g… For Dummies
Originally posted on The Pythian Group blog.
Having recently performed a test upgrade for a client from Oracle RDBMS 10g to 11g, I can tell you that one of the big changes that will likely require action on your part as DBA is the new fine-grained access control for the packages UTL_SMTP, UTL_TCP, UTL_MAIL, UTL_HTTP and UTL_INADDR. Part of the Oracle 11g pre-upgrade tool will notify you of users that will require new privileges.
Of course, Oracle’s post-upgrade network ACL setup documentation is much more confusing than it needs to be, at least for simple minds like me. A colleague stepped forward with a simple set of commands for a basic setup that even the tired and stressed can understand.
I’ll share that here, with some basic explanation:
BEGIN
-- Create the new ACL, naming it "netacl.xml", with a description.
-- Also, provide one starter privilege, granting user FOO
-- the privilege to connect.
DBMS_NETWORK_ACL_ADMIN.CREATE_ACL('netacl.xml',
'Allow usage to the UTL network packages', 'FOO', TRUE, 'connect');
-- Now grant privilege to resolve DNS names for FOO,
-- and then grant connect and resolve to user BAR
DBMS_NETWORK_ACL_ADMIN.ADD_PRIVILEGE('netacl.xml' ,'FOO', TRUE, 'resolve');
DBMS_NETWORK_ACL_ADMIN.ADD_PRIVILEGE('netacl.xml' ,'BAR', TRUE, 'connect');
DBMS_NETWORK_ACL_ADMIN.ADD_PRIVILEGE('netacl.xml' ,'BAR', TRUE, 'resolve');
-- Specify which hosts this ACL applies to,
-- for simplicity, we're saying all (*)
-- You might want to specify certain hosts to lock this down.
DBMS_NETWORK_ACL_ADMIN.ASSIGN_ACL('netacl.xml','*');
END;
/
As you can see, this example will let the
FOO and BAR database users connect and resolve to any host. The ASSIGN_ACL section in the full package documentation (see link below) details how this can be used to lock down a user’s ability to make outside connections.
Of course, nothing beats reading the Oracle 11g DBMS_NETWORK_ACL_ADMIN documentation, where you can see some examples of stricter ACL setups.
Turn Off db_cache_advice To Avoid Latch Contention Bugs
Originally posted on The Pythian Group blog.
A couple of weeks ago, we noticed some timeouts in some of our standard Oracle RDBMS health check scripts on a new instance. I had just migrated this instance to bigger, better, badder hardware and so it had been given more SGA to use, namely a bigger buffer cache. The software version was still Oracle 10.2.0.2, as we wanted to introduce as few variables as possible (we were already moving to a new platform with an endian change).
At first the timeouts were infrequent, but over the course of a week started to grow in frequencey until the point where none of the checks were finishing in the allowed timeframe. We ran an AWR report, and tucked far down in the “Latch Activity” section, a colleague noticed this:
Pct Avg Wait Pct
Get Get Slps Time NoWait NoWait
Latch Name Requests Miss /Miss (s) Requests Miss
------------------------ -------------- ------ ------ ------ ------------ ------
...
simulator lru latch 10,032,617 3.3 0.7 44950 336,837 0.3
...
Latch Activity DB/Inst: FOO/foo Snaps: 156-157
-> "Get Requests", "Pct Get Miss" and "Avg Slps/Miss" are statistics for
willing-to-wait latch get requests
-> "NoWait Requests", "Pct NoWait Miss" are for no-wait latch get requests
-> "Pct Misses" for both should be very close to 0.0
Pct Avg Wait Pct
Get Get Slps Time NoWait NoWait
Latch Name Requests Miss /Miss (s) Requests Miss
------------------------ -------------- ------ ------ ------ ------------ ------
transaction branch alloc 112,412 0.0 0.0 0 0 N/A
undo global data 466,321 0.0 0.0 0 0 N/A
user lock 7,440 0.8 0.4 1 0 N/A
-------------------------------------------------------------
The “simulator lru latch” event brought us to MetaLink note 5918642.8 and bug 5918642. Affecting 10g and 11g prior to 10.2.0.4 and 11.1.0.7, respectively. The bug is with the database buffer cache advisor, controlled by the parameter db_cache_advice, which defaults to ON (depending on statistics_level). The note simply states:
High simulator lru latch contention can occur when db_cache_advice is set to ON if there is a large buffer cache.
We simply set db_cache_advice to OFF (thankfully it is a dynamic parameter), and pretty quickly our checks were running just fine.
My suggestion is to simply turn this off unless you are actively using the cache advisor to tune an instance. Once you are done tuning, and are no longer using the advisor, turn it off.
NOTE: As Mladen Gogola pointed out in the comments, turning this off will cause problems if you are using automatic memory management (i.e. sga_target > 0). Re-pasting his post here:
The problem with that advice is that it will prevent automatic memory management from resizing the buffer cache and the instance will end up with a huge, mostly empty, shared pool and default buffer cache. Automatic memory management is biased toward shared pool even with the cache dvice turned on, without it, buffer cache will be reduced to the minimum size, usually only 64MB. If you disable cache advice, I would also recommend disabling the automatic memory management and configuring SGA manually.
Sending Timezone-Aware Email with UTL_SMTP
Originally posted at The Pythian Group blog.
I’m back again with another in what I hope will be a long line of “Quick Tips for Newbies” series.
At The Pythian Group, we have employees all over the globe, from our headquarters in Ottawa to regional offices in Boston, Prague, India and Sydney, and a few scattered remote workers in Seattle, Paris, Kiev, Brazil, South Africa and Wisconsin, among other places. In other words, we are spread across multiple timezones, and since it wasn’t too long ago that everyone was in Ottawa, this is something that still presents little quirks.
One such quirk involved email generated by one of our internal Oracle instances—via a stored procedure that used UTL_SMTP to send the messages—did not have timezone information in the “Date” email header. As a result, they would be stamped with the hour in Eastern timezone (Ottawa time), but the mail clients would think that hour was local. Depending on where you are relative to Ottawa, this could be many hours in the past or future. Of course, this wouldn’t be noticed if you were in Ottawa or even Boston, both in Eastern. For the rest, it was at the very least, an annoyance—but one that is easily fixed.
Looking at the PL/SQL stored procedure that we used to generate email messages, I saw that the “Date” header was being built with this code:
date_hdr := 'Date: '||to_char(sysdate,'dd Mon yy hh24:mi:ss');
The fix is almost trivial—just use SYSTIMESTAMP instead of SYSDATE, and include the timezone in the TO_CHAR function:
date_hdr := 'Date: '||to_char(systimestamp,'dd Mon yy hh24:mi:ss tzhtzm');
Voila! Emails now had a full Date header. And there was much rejoicing from around the world.
Here’s a quick query to highlight the difference:
SQL> select to_char(sysdate,'dd Mon yy hh24:mi:ss') from dual; TO_CHAR(SYSDATE,'D ------------------ 01 Dec 08 18:50:02 SQL> select to_char(systimestamp,'dd Mon yy hh24:mi:ss tzhtzm') from dual; TO_CHAR(SYSTIMESTAMP,'DD ------------------------ 01 Dec 08 18:50:10 -0500
Even if you aren’t sending email to all the ends of the Earth, it won’t hurt to make your messages timezone-aware. I’m sure it will save some confusion and frustration down the line.
Note: I discovered the fix via this blog post, which seems to be invite-only at the time of this writing.



