The
database dilemma -
and the solution you’ve been waiting for
The database is a vital and critical element of business
infrastructure—much more than a basic system
for information storage and retrieval. Unfortunately,
the database has also become the single largest point
of pain in the enterprise. Generations of applications
have been written to allow users to access the business
logic and information held within the database. However,
over time the performance of these applications has
degraded to the point of being unusable. Why is that?
Databases 101—the
problem
Let’s look at what a database does,
and how it does it. The database's primary role is
to store and retrieve information. Database performance
is proportional to the amount of data stored. The
relationship between the amount of information stored
and how long it takes to find the proverbial "needle
in a haystack" is a logical but poorly understood
phenomenon. A quick, basic review of computer architecture
helps to explain the relationship between the amount
of information stored in the database and response
time.
A computer’s core is a high-priced and extremely
fast calculator called a central processing unit (CPU).
However, unlike a calculator, which requires direct
user input to produce results, a CPU is bound to a
hierarchy of inputs—a pyramid of dependency,
storage size, and performance:
When the computer is asked to retrieve
a piece of information, it starts at the top of the
pyramid—L1—and moves toward the bottom—the
hard disk—until it finds the desired data.
Storing data is a relatively simple operation of
writing the provided information to persistent storage—the
hard drive. Although hard drive performance is poor,
hard drives are required for database storage. Information
storage performance is not proportional to the amount
of information in a database or set. Performance is
linearly proportional to the speed of the hard drive
itself, not to the information in the target set.
Read operations constitute approximately 80 percent
of database traffic and consume the bulk of computing
horsepower for any given database transaction. The
primary reason? The time it takes to find a needle
in a haystack is proportional to how many people are
looking for the needle and how large the haystack
is. The relationship between information retrieval
performance and the size of the search set is nonlinear.
In other words, query execution time does not become
slower incrementally; it becomes VERY slow VERY quickly.
Database performance
curve
The curve of database performance in proportion
to data set size can be generalized into an "S"
curve with three distinct regions. The beginning of
the curve, when the data set is small, appears linear.
In this region, information lookups can be achieved
directly from main memory without hitting the hard
drive. The slope of the curve in this region is directly
proportional to the speed of the host CPU and main
memory.
Moving into the second region of the "S"
curve, main memory is depleted and performance quickly
decays; in fact, it decays exponentially. Initially,
because only a portion of the data set must be retrieved
from the hard disk, performance is tolerable. But
as more of the data set is fetched from the hard disk,
performance becomes slower and slower.
Eventually, this decrease in performance tapers
off into the third region of the "S" curve.
In the final region, database performance linearly
decreases
at a rate proportional to the speed of the
hard drive. When the data set is too large to fit
into main memory, database performance is no longer
a function of buying a faster CPU; it is dependant
on the speed of the hard drive.
The solution: make the
nonlinear —linear!
The solution is simple in concept, but extremely
difficult in practice. For years, computer scientists
have searched for a way to provide a linear solution
to database performance, but all efforts failed
to
provide an earth-shattering solution…until
the XPrime Database Accelerator (XDA). Employing
the technology
of its two key patent-pending ideas, the XDA provides
pure linear scalability of database
performance up to 1 Terabyte.
Cooperative Database
Processing
The XPrime patent-pending system for Cooperative
Database Processing (CDP) is built around the modern
database notion of heterogeneous queries, allowing
a client application written for Microsoft® SQL
Server™ 2000 direct access to information stored
on a foreign data source. Instead of the database
system looking into a local table to retrieve the
requested data, the non-native database is queried
for the required information.
CDP uses this feature to exploit the capabilities
of a non-native database to provide increased performance
for user applications.