The Problem
- You have a business.
- Your business has lots of data.
- You need to store, and (later) add to, modify and retrieve this
data.
- You may need your customers to access some of this data as
well.
- You need to think about how to store your data so that all of this
will be "easy".
Possible Solutions (neither very good)
- Store it in files.
- Store it in one gigantic spreadsheet.
A Better Solution
Use a "database". More specifically, use a relational
database, which is currently the most widely used of the various
choices.
Origin of the Relational Database
E. F. Codd of IBM "invented" the relational database about 1970.
Subsequently he developed the concept of normalization to help
organize relational databases.
What is a relational database?
- It's a collection of tables with, of course, rows and columns.
- Each row corresponds to an entity or record of some kind.
- Each column corresponds to an attribute or property of the
entity.
- Tables have keys (special attributes) to identify their rows.
- Each value in a primary key column must uniquely identify its
row.
- A primary key from one table may appear as a "foreign" key in
another table.
- Tables can be related (by their keys) in various ways:
- one-to-one: SIN number and person, for example
- one-to-many: gender to person, for example
- many-to-many: CD to artist, for example (but it's better to
have several one-to-many)
Database Design
- The design makes or breaks the database ... you need a good
one.
- Your design may work now ... but will it work later?
- It's like building your own house ... you need to be both architect
and builder.
Database Design Goals
- Minimize redundancy.
- Minimize anomalies (insert, update, delete).
- Minimize storage.
- Maximize speed.
- Maximize user-friendliness for all users (both corporate and
client).
Questions to Ask
- How many tables do we need?
- What should go into each table?
- What are the relationships between the tables?
Normalization (not necessarily the answer to our prayers, but a good
place to start)
The goal of "normalization" is to ensure that every non-key column in
every table is directly dependent on the key, the whole key, and nothing
but the key. Normalization may help us to achieve our design goals.
Here is a brief description of the first three "normal forms"
developed by Codd:
- First Normal Form (1NF) Each row must contain the same
number of columns and each column must contain only "atomic" values
(each value represents a single piece of information).
- Second Normal Form (2NF) Each attribute in a row represents
a fact about the entity identified by the primary key in that row. So,
no non-key attributes can depend on a portion of the primary key. This
may happen if the primary key is a compound key (composed of more than
a single atrribute), which is possible. Thus 2NF is only relevant if
there is a compound key.
- Third Normal Form (3NF) No attributes depend on other
non-key attributes.
Normal forms are cumulative. This means that to be in second
normal form, a table (or database) must first be in first normal form ...
and so on, for higher normal forms.