Database management system
Encyclopedia : D : DA : DAT : Database management system
A database management system (DBMS) is a computer program (or more typically, a suite of them) designed to manage a database (a large set of structured data), and run operations on the data requested by numerous clients. Typical examples of DBMS use include accounting, human resources and customer support systems. Originally found only in large organizations with the computer hardware needed to support large data sets, DBMSs have more recently emerged as a fairly standard part of any company back office.
DBMS's are found at the heart of most database applications. Sometimes DBMSs are built around a private multitasking kernel with built-in networking support although nowadays these functions are left to the operating system.
Terminology
A database management system (DBMS) is a system, usually automated and computerized, for the management of any collection of compatible, and ideally normalized, data.
A relational database is a collection of data structured in accordance with the relational model. Database management systems are often used to implement relational databases in software.
A database application is computer software written to manage the data of a particular application or problem.
Features and Abilities
One can characterize a DBMS as an "attribute management system" where attributes are small chunks of information that describe something. For example, "color" is an attribute of a car. The value of the attribute may be a color such as "red", "blue", "silver", etc. Lately databases have been modified to accept large or unstructured (pre-digested or pre-categorized) information as well such as images and text documents. However, the main focus is still on descriptive attributes.
DBMS roll together frequently-needed services or features of attribute management. This allows one to get powerful functionality "out of the box" rather than program each from scratch or add and integrate them incrementally. Such features include:
- Persistence - Attributes are permanently stored on a hard-drive or other fast, reliable medium until explicitly removed or changed.
- Query Ability - Querying is the process of requesting attribute information from various perspectives and combinations of factor. Example: "How many 2-door cars in Texas are green?"
- Concurrency - Many people may want to change and read the same attributes at the same time. If there are not organized, predetermined rules for sharing changes, then the attributes may grow inconsistent or misleading. For example, if you change the color attribute of car 7 to be "blue" at the very same time somebody is changing it to "red", then you may not see your change when you go to view the attributes of the car you thought you just changed. DBMS provide various tools and techniques to deal with such issues. "Transactions" and "locking" are two common techniques for concurrency management.
- Backup and Replication - Often copies of attributes need to be made in case primary disks or equipment fails or if a distant organizations needs a periodic copy of attributes because they cannot readily access the original. DBMS usually provide utilities to facilitate the process of extracting and disseminating attribute sets.
- Rule Enforcement - Often one wants to apply rules to attributes so that the attributes are clean and reliable. For example, we may have a rule that says each car can have only one engine identification number. If somebody tries to put a second engine number attribute on a given car, we want the DBMS to deny such a request.
- Security - Often it is desirable to limit who can see or change which attributes or groups of attributes. After all, you don't want anybody on the street to be able to change your license plate number in government automobile databases.
- Computation - There are common computations requested on attributes such as counting, summing, averaging, sorting, grouping, cross-referencing, etc. Rather than have each computer application implement these from scratch, they can rely on the DBMS to supply such calculations.
- Change and Access Logging - Often times one wants to know who accessed what attributes, what was changed, and when it was changed. Logging services allow this by keeping a record of access occurrences and changes.
- Automated optimization - If there are frequently occurring usage patterns or requests, some DBMS can adjust themselves to improve the speed of those interactions. In some cases the DBMS will merely provide tools to monitor performance, allowing a human expert to make the necessary adjustments after reviewing the statistics collected.
- Meta-data Repository - Meta-data is information about information. For example, a listing that describes what attributes are allowed to be in data sets is called "meta-information".
- Modeling Tool - A DBMS can also act as a modeling tool. It can be used to model various nouns found in the environment by describing the attributes associated with such nouns and how the nouns and attributes relate to each other.
History
Databases have been in use since the earliest days of electronic computing, but the vast majority of these were custom programs written to access custom databases. Unlike modern systems which can be applied to widely different databases and needs, these systems were tightly linked to the database in order to gain speed at the expense of flexibility.
Navigational DBMS
As computers grew in capability, this tradeoff became increasingly unnecessary and a number of general-purpose database systems emerged; by the mid-1960s there were a number of such systems in commercial use. Interest in a standard began to grow, and Charles Bachman, author of one such product, IDS, founded the Database Task Group within CODASYL, the group responsible for the creation and standardization of COBOL. In 1971 they delivered their standard, which generally became known as the Codasyl approach, and soon there were a number of commercial products based on it available.The Codasyl approach was based on the "manual" navigation of a linked dataset which was formed into a large network. When the database was first opened, the program was handed back a link to the first record in the database, which also contained pointers to other pieces of data. To find any particular record the programmer had to step through these pointers one at a time until the required record was returned. Simple queries like "find all the people in Sweden" required the program to walk the entire data set and collect the matching results. There was, essentially, no concept of "find" or "search". This might sound like a serious limitation today, but in an era when the data was most often stored on magnetic tape such operations were too expensive to contemplate anyway.
IBM also had their own DBMS system in 1968, known as IMS. IMS was a development of software written for the Apollo program on the System/360. IMS was generally similar in concept to Codasyl, but used a strict hierarchy for its model of data navigation instead of Codasyl's network model.
Both concepts later became known as navigational databases due to the way data was accessed, and Bachman's 1973 Turing Award award presentation was The Programmer as Navigator. IMS is classified as a hierarchical database. IDS and IDMS (both CODASYL databases) as well as CINCOMs TOTAL database are classified as network databases.
Edgar Codd worked at IBM in San Jose, California, in one of their offshoot offices that was primarily involved in the development of hard disk systems. He was unhappy with the navigational model of the Codasyl approach, notably the lack of a "search" facility which was becoming increasingly useful when the database was stored on disk instead of tape. In 1970 he wrote a number of papers that outlined a new approach to database construction that eventually culminated in the groundbreaking A Relational Model of Data for Large Shared Data Banks.
In this paper he described a new system for storing and working with large databases. Instead of records being stored in some sort of linked list of free-form records as in Codasyl, Codd's idea was to use a "table" of fixed-length records. A linked-list system would be very inefficient when storing "sparse" databases where some of the data for any one record could be left empty. The relational model solved this by splitting the data into a series of normalized tables, with optional elements being moved out of the main table to where they would take up room only if needed.
For instance, a common use of a database system is to track information about users, their name, login information, various addresses and phone numbers. In the navigational approach all of these data would be placed in a single record, and unused items would simply not be placed in the database. In the relational approach, the data would be normalized into a user table, an address table and a phone number table (for instance). Records would be created in these optional tables only if the address or phone numbers were actually provided.
Linking the information back together is the key to this system. In the relational model some bit of information was used as a "key", uniquely defining a particular record. When information was being collected about a user, information stored in the optional (or related) tables would be found by searching for this key. For instance, if the login name of a user is unique, addresses and phone numbers for that user would be recorded with the login name as its key. This "re-linking" of related data back into a single collection is something that traditional computer languages are not designed for.
Just as the navigational approach would require programs to loop in order to collect records, the relational approach would require loops to collect information about any one record. Codd's solution to the necessary looping was a set-oriented language, a suggestion that would later spawn the ubiquitous SQL. Using a branch of mathematics known as tuple calculus, he demonstrated that such a system could support all the operations of normal databases (inserting, updating etc.) as well as providing a simple system for finding and returning sets of data in a single operation.
Codd's paper was picked up by two people at Berkeley, Eugene Wong and Michael Stonebraker. They started a project known as INGRES using funding that had already been allocated for a geographical database project, using student programmers to produce code. Beginning in 1973, INGRES delivered its first test products which were generally ready for widespread use in 1979. During this time a number of people had moved "through" the group — perhaps as many as 30 people worked on the project, about five at a time. INGRES was similar to System R in a number of ways, including the use of a "language" for data access, known as QUEL — QUEL was in fact relational, having been based on Codd's own Alpha language, but has since been corrupted to follow SQL, thus violating much the same concepts of the relational model as SQL itself.
IBM itself did only one test implementation of the relational model, PRTV, and a production one, Business System 12, both now discontinued. Honeywell did MRDS for Multics, and now there are two new implementations: Alphora Dataphor and Rel. All other DBMS implementations usually called relational are actually SQL DBMSs.
SQL DBMS
IBM started working on a prototype system loosely based on Codd's concepts as System R in the early 1970s — unfortunately System R was conceived as a way of proving Codd's ideas unimplementable, and thus the project was delivered to a group of programmers who weren't under Codd's supervision, never understood his ideas fully and ended up violating several fundamentals of the relational model. The first "quickie" version was ready in 1974/5, and work then started on multi-table systems in which the data could be broken down so that all of the data for a record (much of which is often optional) didn't have to be stored in a single large "chunk". Subsequent multi-user versions were tested by customers in 1978 and 79, by which time a standardized computer language, SQL, had been added. Codd's ideas were establishing themselves as both workable and superior to Codasyl, pushing IBM to develop a true production version of System R, known as SQL/DS, and, later, Database 2 (DB2).Many of the people involved with INGRES became convinced of the future commercial success of such systems, and formed their own companies to commercialize the work but with an SQL interface. Sybase, Informix, NonStop SQL and eventually Ingres itself were all being sold as offshoots to the original INGRES product in the 1980s. Even Microsoft SQL Server is actually a re-built version of Sybase, and thus, INGRES. Only Larry Ellison's Oracle started from a different chain, based on IBM's papers on System R by beating them to market when the first version was released in 1978.
Stonebraker went on to apply the lessons from INGRES to develop a new database, Postgres, now known as PostgreSQL. PostgreSQL is now one of the most widely used databases in the world, primarily for global mission critical applications (the .org and .info domain name registries use it as their primary data store, as do many large companies and financial institutions).
In Sweden Codd's paper was also read, Mimer SQL was developed from the mid-70s at Uppsala University, and in 1984 this project was consolidated into an independent enterprise. In the early 1980s Mimer introduced transaction handling for high robustness in applications, an idea that was subsequently implemented on most other DBMSs.
Multidimensional DBMS did have one lasting impact on the market: they led directly to the development of object database systems. Based on the same general structure and concepts as the multidimensional systems, these new systems allowed the user to store objects directly in the database. That is, the programming constructs being used in the object oriented (OO) programming world could be used directly in the database, instead of first being converted to some other format.
This could happen because of the multidimensional system's concepts of ownership. In an OO program a particular object will typically contain others; for example, the object representing Bob may contain a reference to a separate object referring to Bob's home address. Adding support for various OO languages and polymorphism re-created the multidimensional systems as object databases, which continue to serve a niche today.
Description
A DBMS can be an extremely complex set of software programs that controls the organization, storage and retrieval of data (fields, records and files) in a database. The basic functionalities that a DBMS must provide are:- A modeling language to define the schema of each database hosted in the DBMS, according to the DBMS data model.
- *The three most common organizations are the hierarchical, network and relational models. A database management system may provide one, two or all three methods. Inverted lists and other methods are also used. The most suitable structure depends on the application and on the transaction rate and the number of inquiries that will be made.
The dominant model in use today is the ad hoc one embedded in SQL, a corruption of the relational model by violating several of its fundamental principles. Many DBMSs also support the Open Database Connectivity API that supports a standard way for programmers to access the DBMS.
When a DBMS is used, information systems can be changed much more easily as the organization's information requirements change. New categories of data can be added to the database without disruption to the existing system.
Organizations may use one kind of DBMS for daily transaction processing and then move the detail onto another computer that uses another DBMS better suited for random inquiries and analysis. Overall systems design decisions are performed by data administrators and systems analysts. Detailed database design is performed by database administrators.
Database servers are specially designed computers that hold the actual databases and run only the DBMS and related software. Database servers are usually multiprocessor computers, with RAID disk arrays used for stable storage. Connected to one or more servers via a high-speed channel, hardware database accelerators are also used in large volume transaction processing environments.
See also
- Data warehouse
- Directory service
- Distributed Database Management System
- Navigational database management system
- Hierarchical database management system
- Network database management system
- Object-oriented database management system (OODBMS)
- Relational database management system (RDBMS)
- Object-relational database management system (ORDBMS)
- SQL is a language for database management.
- This article was originally based on material from the Free On-line Dictionary of Computing, which is [Foldoc licenselicensed] under the GFDL.
From Wikipedia, the Free Encyclopedia. Original article here. Support Wikipedia by contributing or donating.
All text is available under the terms of the GNU Free Documentation License See Wikipedia Copyrights for details.
