- Processing Power – There should be enough CPU power available in the machine.
- Memory – Since all threads will be working in parallel, the amount of memory used also will be considerably high. This can be however be reduced by not placing too many objects in JVM. However, the amount of memory required will be proportional to the number of threads in the application.
- Network Load – If the application is having network interactions like Database queries, FTP etc, the network load also will be high. In some cases, it would be solved by adding GBit connections between servers communicating. In most of the cases, the normal NIC’s itself can process the load.
- Disk Speed – If the application has lot of File processing, the disk accesses also need to be tuned. It would be better to read the files from SAN rather than NAS as the disk response time is better on SAN (my experience).
- Database Setup – If there is lot of database interactions, the database also should be made aware of such a change in the application. The load on the database will increase as there will be multiple threads that will try to fetch data from database. Most probably, it will end up increasing the database parameters to accept more loads.
- GC processing – The GC tuning should be performed. Since many threads are working in parallel, the amount of garbage created also will be high. An effective tune up of GC is required, without which, application will end up in OutofMemory Error. You can consider Parallel GC but ensure that the number of GC threads is mentioned.
- Synchronization – Objects that are created in JVM scope should be accessed with proper synchronization. If this is not worked out properly, it can cause dreaded issues like data corruption, deadlocks etc…
Wednesday, January 28, 2009
Multithreading - Downstream Impacts
Multithreading Multiprocessor Relation
Any java process executes in a thread of execution. The thread can perform multiple activities like performing the task, waiting on IO, waiting on socket, waiting for lock release etc. While the thread is waiting on something, CPU is intelligent enough to remove the thread from its cycle and take up another thread which can perform the work. Any point in time, a CPU core can execute only one thread. So if the machine has 4 cores of CPU, an ideal count would be to provide 3 threads/core for the application to use. The number of threads per core is dependent on the application, primary driving factor being what is done in each thread. If there considerable wait that will happen in the thread of execution (like File IO, database read, Socket read) , the number of threads per CPU can be increased and if the thread is going to perform operations within the process area without any wait, the number of threads per CPU should be reduced. This is because the machine always pushes out the threads that are waiting and takes in thread that is ready for execution.
Impact on number of threads
More threads/core for application that has less amount of wait.
Let’s take an example where the machine has 2 CPU’s and application is configured to use 10 threads. Since there is not much wait time involved, the CPU will force the executing thread out of its cycle to give fair chance for the remaining 9 threads to execute. The thread that got pushed out will come back to execution after certain CPU cycles. At this time, it needs to rebuild till the point where it was pushed out. If there were only 1 thread of execution per CPU, this type of activity won’t happen and the single thread /CPU can complete the operation without heavy context switching. In such a scenario, it will be detrimental for the application. Such a case will be evident if the batch completes in lesser time when the number of threads for the application is reduced.
Less threads/core for application that has considerable amount of wait.
Let’s take an example where the machine has 8 CPU’s and application is configured to use 8 threads. Each CPU will execute a thread of execution and when any one of the thread goes into WAIT state, the CPU lies idle. Such a case will be evident if the batch completes in lesser time when the number of threads for the application is increased. At any point in time, the CPU usage will not be near 50% or 60%.
As mentioned in the above description, the number of threads per CPU should be decided based on the application characteristics.
Tuesday, January 6, 2009
HAAS - Hardware As A Service
In HAAS, the user pay for the actual usage and not the usage decided in the beginning. In a hosting service, the end user for e.g. have to pay XUSD for say 1GB of space and 1 Web Application for 1 year. Irrespective of the usage of the site, the user need to pay the hosting provider. What if the user need to pay based on the amount of data that is transferred to the site or the amount of processing power the site uses instead of a fixed amount decided in the beginning. What if the application can take any amount of load i.e. scalability is available on- demand. This is possible through HAAS. User pays for the actual usage and gets the features of an enterprise class application.
Amazon has opened up the arena using Amazon Web Services. They have multiple products like Amazon EC2(Elastic Compute Cloud), Amazon S3(Simple Storage Service), Amazon SQS(Simple Queue Service). The whole idea is to achieve the application functionality by using the 3 above services. The unit of work is stored in the S3 area and the EC2 will use that unit to process the application in a scalable manner. And the benefits to the user, they get their application functional without spending heavily to setup the datacenters.
Saturday, December 27, 2008
Outer joins on non-preserved columns
There are 3 tables, MYUSER, TEST1, TEST2. MYUSER is related to TEST1 table and TEST1 table is related to TEST2 table. The requirement is to fetch all the users from MYUSER table and if present fetch the related data from the TEST1 table and related data from TEST2 table.
The query would be
SELECT USERV.USERID ,USERV.NAME,
TEST1.ID, TEST1.TEST1NAME,
TEST2ID , TEST2.TEST2NAME
FROM MYUSER USERV,
TEST1 TEST1,
TEST2 TEST2
WHERE USERV.USERID = TEST1.ID(+)
AND TEST1.ID = TEST2.TEST2ID(+)
order by 1
The point to note is that “AND TEST1.ID = TEST2.TEST2ID(+)“ condition should be with + sign as shown. This is required because the first join will return all the users even if there is no corresponding record in Test1 table. For those preserved rows, the value will be null for TEST1.ID column. For this row to be preserved when joined with TEST2 table, the + on TEST2 should be maintained.
This can be seen by the following example
MYUSER Table
USERID | NAME |
1 | Bill |
2 | Purva |
3 | James |
5 | Meij |
TEST1 Table
ID | TEST1NAME |
1 | test1bill |
2 | test1purva |
TEST2 Table
TEST2ID | TEST2NAME |
2 | test2purva |
3 | test2james |
SELECT USERV.USERID ,USERV.NAME,
TEST1.ID, TEST1.TEST1NAME,
TEST2ID , TEST2.TEST2NAME
FROM MYUSER USERV,
TEST1 TEST1,
TEST2 TEST2
WHERE USERV.USERID = TEST1.ID(+)
AND TEST1.ID = TEST2.TEST2ID(+)
order by 1
ANSI Syntax
TEST1.ID, TEST1.TEST1NAME,
TEST2ID , TEST2.TEST2NAME
FROM MYUSER USERV
LEFT OUTER JOIN TEST1 TEST1 ON (USERV.USERID = TEST1.ID)
LEFT OUTER JOIN TEST2 TEST2 ON (TEST1.ID = TEST2.TEST2ID)
ORDER BY 1
USERID | NAME | ID | TEST1NAME | TEST2ID | TEST2NAME |
1 | Bill | 1 | test1bill | | |
2 | Purva | 2 | test1purva | 2 | test2purva |
3 | James | | | | |
5 | Meij | | | | |
The query2 without outer join on the second clause is
SELECT USERV.USERID ,USERV.NAME,
TEST1.ID, TEST1.TEST1NAME,
TEST2ID , TEST2.TEST2NAME
FROM MYUSER USERV,
TEST1 TEST1,
TEST2 TEST2
WHERE USERV.USERID = TEST1.ID(+)
AND TEST1.ID = TEST2.TEST2ID
order by 1
USERID | NAME | ID | TEST1NAME | TEST2ID | TEST2NAME |
2 | Purva | 2 | test1purva | 2 | test2purva |
To be precise, if a table is part of the outer join and the data from the table is outer joined to any other table, that condition also should be outer joined.
Thursday, December 18, 2008
Consistent Read of the Data in Oracle Queries
In some cases, you might want different queries to return same set of data as far as the where criteria defined on the query are the same. For e.g.
Select price from vehicle_price where status=’A’
Select sum(price) from vehicle_price where status=’A’
In the same transaction, if I try to do these queries, there is a possibility that the second query might give a sum that will not be equal to the sum added up by the price returned in the first query. This is possible if new records get added to the vehicle_price table with status as ‘A’ or any of the existing record status changes from ‘A’ to something else.
To ensure that you always get the same data, there are multiple ways to get this done
1. Lock the table and perform queries on it. This ensures that no one else can modify the data while your transaction is in progress. I wont be surprised if you get shouted by other folks who uses this table in some other part of the application.
2. Make the data time sensitive. For every record that gets into the table, there should be a effective start time and effective end time. The query should always query on these columns. So the row doesn’t get modified but a new row gets added with new status. As long as the query uses the effective start date and effective end date in the query, the data returned in subsequent queries will be same. This is a better approach as other users are not impacted. But there could be size and performance implications.
3. Use Serializable isolation level for the transaction which queries the tables. This ensures that all queries are made to wait serially. So till the read transaction doesn’t get over, the new transaction cannot insert data. This also impacts the table usage.
4. Use Transaction read only feature. There are few limitations in using this feature of transaction. This level can be set only for transactions that have only read statements. Oracle maintains a snapshot of the data and all queries are worked upon the snapshot effective the transaction start time. This should be used if its viable to be. A commit or rollback on this transaction will cause the flag to be turned off. Set Transaction Read only ; is the command.
These are few of the ways by which we can ensure that queries return the same data within a same transaction.
Friday, December 5, 2008
Use Decode and Sign function together
Decode statements are useful in queries to a large extent. Sometimes people write the PLSQL code to achieve somethings that are possible through a normal decode statement.
For e.g. the requirement is to add two columns retrieved from the table and if its less than zero, make it zero and if its greater than zero, provide the sum of the two values as the output. This can be done using the decode statement
select column1 , coulmn2, column1+column2 ,
decode(SIGN(column1+column2), -1, 0, (column1+column2))
from tableA
The above query means that if the sum of column1 and column2 value is less than zero, then output the value as zero and if its greater than zero, output the sum as the value. We had to use SIGN function as decode does not support checks like less than or greater than.
JDK Logging
Log4j is widely accepted as a logging component. There is yet another one, the logging component inbuilt in java. This article will discuss about the logging provided by JDK.
Ideal scenario, the entire configuration can be done through the logging.properties file. The logging.properties file should be present in the classpath of the application. In case you want to provide a different properties file, the same can be set using the system property. The command is as shown below
java –Djava.util.logging.config.file=/home/users/mypath/mylogging.properties
#Defines the handlers which can be used by the logger. It can be comma separated. For
# each handler, definition should be provided in the properties file
handlers= java.util.logging.FileHandler, java.util.logging.ConsoleHandler
#Default level of logging in case specific logging level as shown in the last line is not
# mentioned.
.level= INFO
#Details about the filehandler.
java.util.logging.FileHandler.pattern = %h/java%u.log
java.util.logging.FileHandler.limit = 50000
java.util.logging.FileHandler.count = 1
java.util.logging.FileHandler.formatter = java.util.logging.XMLFormatter
#custom logging level
com.xyz.foo.level = SEVERE
private void initialize(String name)
{
Handler fileHandler;
try {
fileHandler = new FileHandler(filePath, true);
fileHandler.setFormatter(new MyFormatter());
logger=Logger.getLogger(name);
logger.addHandler(fileHandler);
}
public static MyLogger getInstance(String someName)
{
//add the necessary synchronization blocks
mylogger = new MyLogger();
mylogger.initializer(someName);
return mylogger;
}
//this is sample method. Like this you need to provide implementation for
//all other methods
public void debug(String sourceClass, String sourceMethod, String msg)
{
logger.logp(Level.FINEST,sourceClass,sourceMethod,msg);
}
//in your application code, use it this way
MyLogger logger = MyLogger.getInstance(“namethatidentifiestheloglevel”);
logger.debug(“classname”,”methodname”,”msg”);