Saturday, November 28, 2009

Distributed Locks in Oracle & ORA-01591

Distributed locks are created in Oracle by transaction involving a two phase commit or XA transaction or also called Distributed Transaction. It is important to note that it’s NOT necessary to have 2 databases to have a distributed transaction. For e.g. from a Weblogic Server, the datasource can be configured as an XA datasource if the transaction involves JMS as well as database. Hence the transaction in database will be classified as distributed transaction. Sometimes, these transactions can create locks in the database which are called the distributed locks. These locks can withstand a restart of the database. The effect of the distributed lock would be below error in some other part of the application which is trying to access the locked data.

ORA-01591: lock held by in-doubt distributed transaction 1.2.456789

The above error means that there is a distributed lock present in the database, which is locking the data requested by the end user. This means that a transaction has progressed to some state in this database and progressed to a stage in the second source which is part of the distributed transaction. Oracle, in most case, would be able to resolve the condition by the background processes. In case Oracle is not able to resolve the condition, it results in a distributed lock, which needs to be manually resolved. Unless the transaction is manually committed or rolled back, Oracle will not allow any other transactions, including read to happen on the data held by the lock. Oracle prevents read also as it doesn’t know as to which version of data should be shown to the end user.

By querying the pending_trans$ table (under sys schema), the details of the distributed lock can be obtained. The Local Tran Id present in the table should be used to force a commit or rollback of the transaction. The data in the pending_trans$ table should NOT be deleted to release the locks as it will not release the locks. Pending_trans$ table is only a means to show the DBA about the transactions which are to be manually cleared. The syntax for a rollback or commit is shown below

Sql> Commit force 1.2.456789;

Or

Sql> Rollback force 1.2.456789;

Once the data is committed or rolled back, the lock would be cleared and data can be accessed. The DBA should decide if the transaction should be committed or rolled back based on the data in the participating datasource. Also the columns in the pending_trans$ table can help decide the type of command to be used (will cover the details in the next blog. Place a comment if you need this info and I would be more than happy to add it on priority).

In case, the data is deleted from the pending_trans$ table, some extra steps are required to clear the locks. Again I will cover this in the next blog.

Thursday, August 20, 2009

UnCommitted Read in Oracle

There are cases where we would like to find out as to how many rows are processed by a long running SQL query. I have experienced many times where a long running query is still executing and I wanted to know as to how much percentage is complete, to decide to terminate or let the query continue to drag to completion. There are few ways to determine the completion status or the number of rows processed by the long running queries.

From TOAD
In TOAD, open the session browser. (an icon on the top right corner with a lens symbol). Select the session that is executing the query. This can be identified using the userid and the tool from where the query is executed. Navigate to the longops tab and it will list the percentage completion of the query.

Using V$SQLQREA
This approach need to be used when the long running query is a select query and no writes are happening on the database using this query. Here, we use the below query to find the number of rows processed for the sql address and hash value

SELECT ROWS_PROCESSED,sql_text
FROM v$sqlarea
WHERE (address, hash_value) IN
(SELECT sql_address, sql_hash_value
FROM v$session
WHERE sid = &1)
Note: The sid can be obtained by querying the v$session table using the details available in hand like login userid, server name, logon timestamp etc.

Also more details about the query on an aggregate can be obtained by querying the v$sql table. The address of the current SQL that is getting executed can be obtained from the v$session table.

Select * from v$sql where sql_id=? (obtain the sqlid from the v$session table against the corresponding session)

Using UNDO Segments

This approach can be applied where the sql query is performing a DML operation. The data gets written to the UNDO segment and the below query refers to the UNDO segments to get the details.

select a.sid,
a.username,b.xidusn undoseg,
b
.used_urec totrecords,
b
.used_ublk totblocks,
to_char(sysdate,'mmdd hh24:mi:ss') currtime
from
v$session a,
v$transaction b
where
a.saddr=b.ses_addr and a.sid=&1;

Wednesday, August 19, 2009

Sequence NoCache Significance

It’s kind of usual habit to use sequence values for the primary key of a table. In this approach, what you do is to create a sequence and then access the sequence like mysequence.nextval in the insert statement. In this blog, I’m trying to explain the importance of the CACHE value of the sequence create command.

A sequence can be created using the below 2 commands, one with cache and another without cache.

CREATE SEQUENCE

MY_SEQUENCE_PK

INCREMENT BY 1

START WITH 1

NOCYCLE

NOCACHE

NOORDER;


CREATE SEQUENCE

MY_SEQUENCE_PK

INCREMENT BY 1

START WITH 1

NOCYCLE

CACHE 1000

NOORDER;


How does CACHE work:

When the sequence.nextval is invoked on a sequence created with NOCACHE, an update query fired to set the current value to the sequence in the system tables, apart from the select query. If the sequence is accessed 1000 times, then the update query is also fired 1000 times.

When the sequence.nextval is invoked on a sequence created with CACHE 1000, an update query is fired only once and it will set the current value to a number incremented by 1000. The sequence values will be in the buffer cache and the insert query can use those values. Gain is that we save 1000 update queries while inserting data into the table.

Where this should be used:

The CACHE value can be used, for sure, in data migration cases where large number of records is inserted into the database.

Where this should not be used:

  1. You should use the cache pretty much everywhere.
  2. If the sequence is seldom used, then cache value need not be provided. For e.g, primary key for a Airplane Model table. New Models doesn’t get added daily and hence the access to this sequence would be very limited.
  3. If you need almost continuous values from the sequence and it’s a multi user system, then probably cache should not be used. However, this would cost performance of the statements that use nextval call.

Friday, June 5, 2009

Oracle Truncate Table - Storage Clause Significance

This week I performed a truncate on a table in Oracle database to recover some space in my tablespace. Even after truncate the space was not released back to the tablespace. There I realized the storage clause of the table as well as the truncate command is important.

The storage clause in the truncate command is required to be specified while performing a truncate command. The storage clause determine as to what will happened to the space released by the rows that got removed. When a truncate is issued on a table, Oracle Deallocates all space used by the removed rows except that specified by the MINEXTENTS storage parameter (be aware that the de-allocation depends on the min extents clause and be wary when the table is large and is imported)

The options are

Drop Storage – This means that all space used by the object would be released except for the space allocated by the minextent parameter. This space can be used by any other object or by the same object from which the data is truncated. However, if the minextent (along with the initial value) is large enough, this space is NOT released even after the truncate. For e.g. See the storage parameter for a table that got imported into my database

STORAGE (
INITIAL 2432M
MINEXTENTS 1
MAXEXTENTS UNLIMITED
PCTINCREASE 0
BUFFER_POOL DEFAULT
)

If the table with the above definition is truncated, 2432M is not released back to the tablespace for other objects to use. However, if you perform alter table move tablespace X or Drop table, the space would be released back to the tablespace.

Reuse Storage – This means that the space is not released back to the pool instead the space is reserved for the same table. Hence if the table is occupying 50M, a truncate would not release the 50M back to the tablespace instead the space would be reserved for the same table. This option needs to be used in case you are sure that you are going to reload the similar data again into the same table.


Monday, June 1, 2009

Oracle ROWID - its Meaning

ROWID pseudo-column provides the address of a row of a table. Hence it’s the fastest way a row can be accessed in Oracle. When you look at the plan of execution for queries that uses indexes, it shows “Table access by ROWID”. This means that the scan of the indexes has decided to get the row from a set of ROWID’s and the access of the row data will be through the ROWID value for the row.

The ROWID value contains the following information

The data object number of the object –For e.g. if the row belongs to table “Employee”, the data object number can be obtained from the query “select object_id from dba_objects where object_name=’EMPLOYEE’

The data block number in the data file that holds the row – The block number is relative to the tablespace which holds the block.

Position of the row in the data block – This is the position of the row inside the block. If the row is the first one to get inserted in the block, the position would be zero.

The data file in the database that holds the row – This would point to the data file that holds the block. The number is directly mapped to the File# in the V$DATAFILE table.

The ROWID value, when selected directly using the queries will not display in an understandable format. We need to use dbms_rowid package to retrieve the values.

select rowid,dbms_rowid.ROWID_ROW_NUMBER(rowid) ROWPOSITION,

dbms_rowid.ROWID_OBJECT(rowid) OBJECTNUMBER,

dbms_rowid.ROWID_RELATIVE_FNO(rowid) DATAFILENUM,

dbms_rowid.ROWID_BLOCK_NUMBER(rowid) BLOCKNUMBER,

name from EMPLOYEE

ROWID

ROWPOSITION

OBJECTNUMBER

DATAFILENUM

BLOCKNUMBER

NAME

AAAcv2AAOAAANHIAAA

0

117750

14

53704

Myname

The above result with respect to the row that has the name as “Myname” is

Row is placed in the data file identified with File# as 14

Row is part of the object with the id as 117750

Row holds the first position in the block

Row is placed in the block 53704 in the tablespace.

Monday, May 25, 2009

Oracle KEEP Pool - How Does it help?

When a session requests for the data, Oracle first tries to find the data in the Buffer Pool. If its not present, Oracle fetches the data from the disk (which is called the physical reads) and places in the buffer pool. Once the data is placed in the buffer pool, different sessions can access the data for completing the session requirement. However, entire data from the database cannot be placed in the buffer pool (unless you have abundant memory, still not advisable). So Oracle does a cleanup mechanism to keep the buffer pool usage in a controlled manner and within the set size limits.

If you are sure that the data from specific objects will be always accessed by the application, like a User table. When the first session access this data, the data is moved to the buffer pool. At a later point, for some reason, Oracle decides to evict this data from the buffer pool to give space for some other incoming data, the next access of the User table data will be slower as it incurs a physical read. In such cases, we use a feature called KEEP Pool.

KEEP Pool is also a buffer pool but specifically designated for objects that are defined to be using the KEEP Pool during the object creation itself. While defining the objects, it can be mentioned that the data should be pulled into the KEEP pool rather than the DEFAULT BUFFER POOL. This way, the data will not be evicted from the buffer even if the clean up mechanism of default buffer pool is kicked in. So the KEEP pool gives a higher probability for the data to be in buffer since all objects by default goes to the Default Buffer Pool only and not to the KEEP Pool. The syntax for attaching the object to the keep pool is shown below

ALTER TABLE MYOBJECT storage (buffer_pool keep);

For e.g. if there are only 20 objects(tables, indexes) defined to be in the KEEP pool, only the data from these 20 objects will be present in the KEEP Pool. Even if the default buffer pool evicts data by means of the LRU algorithm because of any reason, the data in the KEEP Pool will not be impacted.

In practice, the small tables in the application are normally the candidate for the KEEP Pool. Oracle generally tends to prefer Full table scans for smaller tables and hence the chances of the data getting evicted from the default pool is high. So these small tables can be attached to the KEEP Pool and hence will not be evicted.

When the AMM (Automatic Memory Management) is turned on for Oracle10g, keep in mind that the size of KEEP pool is not part of the SGA Target. (on a side note, so is for RECYCLE Pool, will cover the RECYCLE Pool in later blogs)

Trip to Mackinac Island



During the weekend, I made a quick trip to Mackinac Island. We spent around 6 hours on the Mackinac Island and a night in the St. Ignace City( City of the Mainland). Well, I'm not quite sure as to how I spent the 6 hours on the Island :-) .

The Mackinac Bridge connects the Lower Peninsula and Upper Peninsula of Michigan State. The bridge is the 12th longest suspension bridge in the world (wiki). The view of the bridge from the Road and from Water (while on the ferry) is spectacular. The ferry took us from the main land to the Mackinac Island. On the island, just like other tourist attractions, the market is impressive. Lot many chocolate fudges shop, restaurants, bike rentals, tour operators were seen.
We first visited the Butterflies museum. The museum had two rooms, one with many butterflies flying around and the other with few glass cases with insects. Then we visited the museum that describes the history of the Mackinac Bridge and the Island. There were other places, which were good but did not visit, which includes the Mackinac Fort (saw it from outside and did not feel that it was worth climbing the stairs), Arch Rock (again had to walk and then climb the stairs).
The impressive part of the journey was the Ferry from the Island and back. The speed jets gives you a view of the Mackinac Bridge and also of the beautiful Lake (one of the 5 Great lakes of Michigan).

Saturday, May 16, 2009

High CPU usage of database

In my experience ,Oracle database is capable of taking good amount of load and can utilize the hardware power very effectively for scaling. However in some case , you might find that the CPU usage by the database is considerably high. In my experience , these factors can also cause a high CPU utilization
  • Queries performing excessive work to achieve desired results. This could be more data blocks scanning, more nested loops etc.
  • Table setups can be cause of concern if there is large amount of row chaining etc.
  • Large amount of house keeping work on database like archival sweep, rman processes etc.
  • Application keeps opening new physical connections to the database instead of using the connection pool. Even though application servers configure the connection pool, the connection pool might open new physical connections for all transactions due to wrong set up of connection pool.
  • XA Transaction has a higher impact on the database as it needs to perform the house keeping activity like managing the transactions stages like prepare, commit etc .
  • When less number of CPUs are present and more number of processes are present on the system, the system usage will go high due to context switching.
  • As a matter of fact, the CPU utilization will go invalid or the system starts behaving in an awkward manner when the CPU utilization hits above 85%.

In some cases, correcting the above symptoms would fix the high CPU usage. However , in many other cases, it might be that the applications needs that much processing power to support the database operations.

Friday, May 15, 2009

Article on Cloud Computing

My first article on the web got published few weeks back. The article can be found @

http://ajax.sys-con.com/node/938581

The article concentrates on the online software distribution using the cloud.

Hope you enjoy reading it.

Monday, April 20, 2009

Buffer Busy Waits in Oracle

Last week, I was working with a sample program to test the performance of an oracle query under load. The program spawns multiple threads and each thread will execute the same select statement against the database. When I tried the sample program with just one thread, the select statement gave a response of 20 milliseconds. When the number of threads was increased to 60, the response time increased to 1 second per query execution. The reason- Buffer Busy Waits.

When the data is read by the query, the data is read from the physical disk to the buffer pool first. Any subsequent execute of the same query accessing the same data, Oracle will read the data from the buffer pool itself instead of reading it from the physical disk. When 60 instances tried to execute the same query, all the queries were trying to access the same set of blocks at the same time, creating HOT BLOCKS. Whenever Oracle accesses a block in the buffer pool, a latch is obtained by the process on the memory address in the buffer pool pointing to the block. For the next process (session) to access the same block, it needs to acquire a latch on the block first. When the process or session waits to get the access to the block in the buffer, buffer busy waits are caused. When all of the 60 sessions tried to execute the same query, the response time was higher as there were wait event that was happening in the background.

Apart from this, while one session reads the block from the disk to the buffer pool, other sessions trying to access the same block will need to wait till the buffer population activity by the first session is over. This is yet another cause for buffer busy waits.

Removing buffer busy waits is not simple or straightforward. The only option is to remove hot blocks i.e. blocks that are causing the contention. The possible fix is to reduce the amount of wait event by tuning the queries so that the queries execute faster and doesn’t cause noticeable waits. Another option is to avoid executing the query against the same blocks. This can be done by developing application level cache so that data is not requested frequently from the database.

Saturday, April 18, 2009

Finding bind parameter values in queries in Oracle

While we write sql queries inside the application or use any ORM products, almost all queries are executed with bind parameters rather than giving the actual values in the query string itself. This approach has many benefits but it has certain limitations too in the case of debugging. If the logging of the application is turned off, it’s difficult to find the bind parameter passed onto the query. Hence it’s difficult to find as to what the bind values were and how the query plan was and what data did it return. (Of course a trace would help but there are simpler ways) With Oracle 10g, we can use the system tables to gather the information.

When a query gets executed, an entry is placed in the V$SQL table. For the query that you need to find the bind parameter, find the SQLID from the V$SQL table. For e.g., if the query that I’m trying to find is “select * from employee where empno = :1” . The steps to be followed are as given below

  1. Identify the SQL ID by querying the V$SQL table

SELECT SQL_ID FROM V$SQL WHERE SQL_FULLTEXT LIKE ‘%employee%’ ORDER BY LAST_ACTIVE_TIME

The above select may return many records but its easy to find your SQL based on the timestamp. Also you can get the SQL ID from the AWR report in case you use it.

  1. Based on the SQLID value, query the V$SQL_BIND_CAPTURE system table

SELECT POSITION, DATATYPE,VALUE_STRING, VALUE_ANYDATA FROM V$SQL_BIND_CAPTURE WHERE SQL_ID='anysqlidfromvsql'

From the above query, the position of bind parameters and its value (value_string) can be obtained. The original query with bind positions can be obtained from the V$SQL table. The sql_fulltext column has the full sql query text.

Now you have the actual query and the bind parameters too. Enjoy!

Thursday, April 9, 2009

Clustering Factor for Indexes in Oracle

Indexes are a good way to increase the performance of a query, it can avoid table scan and directly help us to reach the data we need. However there are cases when the CBO will not consider the indexes that are associated to the table. The normal scenario that I see is that the index is not getting used because of poor selectivity or cardinality. There is another factor called Clustering Factor which also determines if the index needs to be used by the query. If the Clustering Factor is almost equal to the same number of rows in the table, then the index is poor and the CBO might not use the index.

To solve the problem, the best way is to reload the table and rebuild the index. During the reload the process, make sure that the column which is referenced by the index is in the ascending order. For e.g. For Employee table, the index XPKEMP is indexed on EMPID column. The process that needs to be followed are

create table EMPLOYEE_BKP as select * from EMPLOYEE order by empid asc;

drop table EMPLOYEE;

Create table EMPLOYEE as select * from EMPLOYEE_BKP;

Create index XPKEMP on EMPLOYEE(empid);

Now the index is created fresh and the clustering factor would be high. This can be verified by

Select clustering_factor from all_indexes where index_name=’XPKEMP’;

The value should be equal to the number of blocks occupied by the table. Now that all the indexes has good clustering factor, the chances of CBO taking the index is relatively high.

Wednesday, March 11, 2009

Unused Indexes in Oracle

Developers keep adding indexes to the database to improve the performance. Later they find it difficult to remove the unused indexes. Index monitoring feature of Oracle can be used to identify the unused indexes and remove them.Oracle gives the capability to add monitoring to the indexes.

alter index indexname monitoring usage;

The above command will mark the index for monitoring. It will add a record in the V$OBJECT_USAGE table. The columns in this table are

  1. Index Name
  2. Table Name
  3. Used (this says if the index is used after the monitoring was enabled)
  4. Monitoring (this says that monitoring is enabled or not)
  5. Start_monitoring (start time of monitoring)
  6. End_monitoring (end time of monitoring)

When the index is getting used, the USED flag against the index name will be marked to YES. So the best approach is to enable monitoring on all indexes in the database including the primary keys. After this step, access the application through the screens. Always try to cover all functionality so that maximum indexes are used.

Once the above exercise is completed, find all the indexes from the v$object_usage table which has the USED flag as NO. These indexes that are not used by the application during the monitoring time and can be dropped.

Select index_name from v$object_usage where used=’NO’;

After completing the exercise, all the monitoring on the indexes should be removed. The script for the same is

alter index indexname nomonitoring usage;

Wednesday, January 28, 2009

Multithreading - Downstream Impacts

Multi threaded model will in most of the cases give better performance for batch applications. I would look at the below mentioned points along with the reengineering of the application code.
  • Processing Power – There should be enough CPU power available in the machine.
  • Memory – Since all threads will be working in parallel, the amount of memory used also will be considerably high. This can be however be reduced by not placing too many objects in JVM. However, the amount of memory required will be proportional to the number of threads in the application.
  • Network Load – If the application is having network interactions like Database queries, FTP etc, the network load also will be high. In some cases, it would be solved by adding GBit connections between servers communicating. In most of the cases, the normal NIC’s itself can process the load.
  • Disk Speed – If the application has lot of File processing, the disk accesses also need to be tuned. It would be better to read the files from SAN rather than NAS as the disk response time is better on SAN (my experience).
  • Database Setup – If there is lot of database interactions, the database also should be made aware of such a change in the application. The load on the database will increase as there will be multiple threads that will try to fetch data from database. Most probably, it will end up increasing the database parameters to accept more loads.
  • GC processing – The GC tuning should be performed. Since many threads are working in parallel, the amount of garbage created also will be high. An effective tune up of GC is required, without which, application will end up in OutofMemory Error. You can consider Parallel GC but ensure that the number of GC threads is mentioned.
  • Synchronization – Objects that are created in JVM scope should be accessed with proper synchronization. If this is not worked out properly, it can cause dreaded issues like data corruption, deadlocks etc…

Multithreading Multiprocessor Relation

A batch application is made scalable by ensuring that the executable can use the complete power provided by the machine. The batch application should be designed as multithreaded model if it’s possible to break the work into multiple smaller units of work. In this way, each thread can work on its own piece of work and complete the work. For e.g. In a single threaded model, the batch processes takes 10 hours to process 2000 customer records. If the same code is written in multi threaded model using 10 threads, the job can be split into 10 units with each unit having to process 200 customers. The same work can be completed in 1 hour. Caveat being the machine has the necessary processing power (CPU)

Any java process executes in a thread of execution. The thread can perform multiple activities like performing the task, waiting on IO, waiting on socket, waiting for lock release etc. While the thread is waiting on something, CPU is intelligent enough to remove the thread from its cycle and take up another thread which can perform the work. Any point in time, a CPU core can execute only one thread. So if the machine has 4 cores of CPU, an ideal count would be to provide 3 threads/core for the application to use. The number of threads per core is dependent on the application, primary driving factor being what is done in each thread. If there considerable wait that will happen in the thread of execution (like File IO, database read, Socket read) , the number of threads per CPU can be increased and if the thread is going to perform operations within the process area without any wait, the number of threads per CPU should be reduced. This is because the machine always pushes out the threads that are waiting and takes in thread that is ready for execution.

Impact on number of threads

More threads/core for application that has less amount of wait.
Let’s take an example where the machine has 2 CPU’s and application is configured to use 10 threads. Since there is not much wait time involved, the CPU will force the executing thread out of its cycle to give fair chance for the remaining 9 threads to execute. The thread that got pushed out will come back to execution after certain CPU cycles. At this time, it needs to rebuild till the point where it was pushed out. If there were only 1 thread of execution per CPU, this type of activity won’t happen and the single thread /CPU can complete the operation without heavy context switching. In such a scenario, it will be detrimental for the application. Such a case will be evident if the batch completes in lesser time when the number of threads for the application is reduced.
Less threads/core for application that has considerable amount of wait.
Let’s take an example where the machine has 8 CPU’s and application is configured to use 8 threads. Each CPU will execute a thread of execution and when any one of the thread goes into WAIT state, the CPU lies idle. Such a case will be evident if the batch completes in lesser time when the number of threads for the application is increased. At any point in time, the CPU usage will not be near 50% or 60%.

As mentioned in the above description, the number of threads per CPU should be decided based on the application characteristics.

Tuesday, January 6, 2009

HAAS - Hardware As A Service

SaaS or Software as a Service has picked up the momentum. The user pays the software for only what they use. This concept picked up in the industry because of many advantages. HAAS Hardware as a Service is catching up slowly. But was this not already existing in the form of hosting services? Is it something new? In hosting services, user for the hosting space for the period of time irrespective of the usage. Agreed, hosting services was existing but HAAS is not the same.

In HAAS, the user pay for the actual usage and not the usage decided in the beginning. In a hosting service, the end user for e.g. have to pay XUSD for say 1GB of space and 1 Web Application for 1 year. Irrespective of the usage of the site, the user need to pay the hosting provider. What if the user need to pay based on the amount of data that is transferred to the site or the amount of processing power the site uses instead of a fixed amount decided in the beginning. What if the application can take any amount of load i.e. scalability is available on- demand. This is possible through HAAS. User pays for the actual usage and gets the features of an enterprise class application.

Amazon has opened up the arena using Amazon Web Services. They have multiple products like Amazon EC2(Elastic Compute Cloud), Amazon S3(Simple Storage Service), Amazon SQS(Simple Queue Service). The whole idea is to achieve the application functionality by using the 3 above services. The unit of work is stored in the S3 area and the EC2 will use that unit to process the application in a scalable manner. And the benefits to the user, they get their application functional without spending heavily to setup the datacenters.