Dusko Koncaliev
koncadu@earlham.edu
Kennedy Mutio
mutioke@earlham.edu
Log
Nov. 07, 1998 | V. 1.1., revised with specifics about invocation of Transaction Manager, results from Transaction Manager, Error Code contents, Busy Wait, Write Ahead Log, Recovery, and other minor changes. |
Nov. 04, 1998 | V. 1.0. Original version of the project document. Needs revision. |
At this point in the information age, numerous firms, institutions, and
even individuals run into problems of trying to manage their data, which
may be stored in different types of databases. They are often faced with a
dilemma. They have the choice to transform their old data into a new
database in order to be able to use the old data along with new data, or
try to find a way of integrating the data from multiple different
databases.
With this project, we chose to explore the second option and tackle the
various issues that are involved in Cross Database Data Integration and
Transaction Processing.
This project document is aimed at presenting the essence of the Cross
Database Data Integration project, and doing so in an organized and
concise manner. The essence and backbone of the project is to enable one
to use multiple databases such as Oracle7, Postgres, Sybase, etc., and be
able to store and retrieve data from all of them simultaneously. At this
point, the project is only concentrating on the use of Oracle7 and
Postgres databases, but once this goal is accomplished, it would not be
hard to provide for the use of multiple other databases.
Functional Specification
The primary problems in the domain of this project are:
Usage Models
Transactions to Oracle and Postgres databases
The user will be able to carry out transactions in the Oracle7 and the
Postgres databases. At this point the project concentrates only on DML
transactions such as storing or manipulating data from an
existing database. The primary transaction not talked about in this
document is the selection of data, which does not necessarily need to
use the two-phase commit protocol and could be implemented otherwise. The
user will be able to send the SQL statements that define what the
transaction is.
The user will invoke the Transaction Manager [TM] by sending the TM a
string that is composed of information about the databases used and the
SQL statements that are to be carried out. The TM will be a server
sitting on a specific port and waiting for clients to invoke it. The Busy
Wait mechanism will be implemented, which means that the client will have
to wait for the server (the TM) to finish its work and report back to the
client.
The user (client) will receive results from the TM that clearly state
whether the transaction requested was successful or not. The result
will be an array of strings which, in the case of success will have
only two elements (i.e. OK = 1, result = commit or rollback) and in the
case of a failure will have multiple elements (i.e. NOT OK = -1, and
reason of failure including the SQL statement that caused the failure).
The user will have faith that the transactions are carried out with all
the ACID properties, whether the transactions requested were committed or
rolled back.
Design Specification
Applications and tools used:
Data representation:
MAX_TRANS = 10 | (the maximum number of transactions supported at the same time) |
MAX_SQL = 8 | (the maximum number of SQL statements a transaction can support, i.e. the maximum number of resource managers under a transaction) |
trans_array [MAX_TRANS] | (an array of pointers which holds the transaction manager; points to the array that holds the resource managers) |
res_arary [MAX_SQL] | (an array of pointers which hold the resource managers; points to a structure (an array of strings) that holds important information relevant to the resource managers, like $rid, $dbh, $rc, $sql, $res_vote, $shm_key, $child_pid) |
db_array[MAX_SQL] | (an array that holds the $dbh handles for the different databases at this point we choose to have that in an array of 2 because we know the databases we are supporting.) |
$trid | (a transaction id = [0..9] for its place in the trans_array + localtime_in_seconds) |
$rid | (a resource id = first_digit_of_trid + [0..9] for its place in the res_array + localtime_in_seconds) |
Flow of instructions:
The transaction manager is invoked with a $string that holds DB-SQL
pairs. The DB holds the information about the database where the SQL
will be executed. At this point (to avoid string manipulation for
getting the database information used by DBI) we will have it hardcoded
that when
Below is a diagram of the flow of instructions:
1. TM invoked with $string 2. TM logs - transaction started 3. TM decides on number of resource managers and forks a process for each RM. It logs before it starts each RM and then invokes RMs with SQL to be executed 4. Execution of SQL against the specific database 5. RM decides on commit/rollback 6. After all RMs have voted, TM decides on overall decision and logs it 7. TM invokes each RM with the decision 8. RMs commit/rollback 9. RMs inform TM of work completed 10. After all RMs have completed work, TM logs transaction completion |
Recovery process:
The Transaction Manager makes sure that the log is kept up to date and
durable at all times in case of a system crash or any other kind of
failure. At the point of failure the transaction may be at a state where
it is not atomic, which goes against the ACID properties of transactions
that we strive to provide. To battle this, a recovery process needs to
take place.
With the initial startup, the Transaction Manager checks the transaction
log to see whether there are any transactions that haven't finished prior
to a possible failure. If there is a transaction that hasn't been
finished, the Transaction Manager gets all the information about that
transaction from the transaction log and makes sure the transaction is
executed until finished.