Db Design Tips

8/21/2019 Db Design Tips

1/21

Database design guide

Page 1 2001 TechRepublic, Inc. www.techrepublic.com. All rights reserved.

www.techrepublic.com

TechRepublic's database design

guideIf an enterprises data is its lifeblood, then the database design can be the mostimportant part of an application. Volumes have been written on this topic, and entirecollege degree programs have been built around it. However, as has been said timeand time again on TechRepublic, theres no teacher like experience. So we recentlyasked our members to share their experience by providing their favorite databasedesign tips. Developer Republics editors selected the 60 best tips from the more than130 responses we received. We then compiled them into this document, organized intofive sections for ease of reference:

Section 1Before you buildHere are 12 tips on laying the groundwork for your project, from naming conventions togathering business requirements.

Section 2Table designThese 24 guidelines cover everything from fields you should include in every table tocommon pitfalls and how to avoid them.

Section 3Key selectionWhat should your keys be? Here are 10 tips on the correct use of system-generated

primary keys and when (and how) to index fields for best performance.

Section 4Ensuring data integrityFind out how to help your database keep itself clean and healthy. These eight tips focuson keeping bad data to a minimum.

Section 5Miscellaneous tipsThe collection of tips wraps up with everything that didnt fit into the first four sections:six general rules of thumb to make your life easier.

Enjoy!


2/21




Section 1Before you build

1. Do your homework

Not only should you research your business needs when designing a new databasebut you should also check out the existing system, as well. Few database projectsare built from scratch; there is almost always an existing system (maybe notcomputerized) that the organization is using to fulfill its needs. Obviously, theexisting system is not perfect; otherwise, you wouldnt be building a new one. Butby studying it, you may discover nuances that you would otherwise have missedhad you ignored it. If nothing else, examining the existing system is usually good fora chuckle or two.

Lamont Adams

I took on a side project, for a local transportation company, to develop a simple

storage app in Access. I laid out the parameters of the project, reviewed them withthe customer, previewed a working model that worked perfectly in our developmentenvironment, and finally deployed the app, which promptly developed terminalwhooping cough and died right in front of my eyes! Hours of hair pulling before Irealized the company had two database apps running on the network that requiredexplicit and very restrictive user accounts and permissions. More hours and lessavailable hair later, a solution was finally createdusing the customer's systembut not before some very embarrassing moments. Moral of the story: Do yourhomework and remember that if you're developing an app in a commonenvironment such as Access or Interbase, dig a little deeper than the surface.

kg

2. Define a standard object-naming scheme

Always define a strategy for naming your DB objects. For tables, at the start of aproject, decide whether they will all be plural or singular and stick to it. Definesimple rules for aliases of tables (for example, the first four letters from tables withone-word names, the first two from each word with two-word names, one letter fromthe first two words and two from the last for three-word tables, and so on). For worktables, prefix the table name with WORK_ and append the name of the programthat uses it. For columns, use a set of rules for keys. For example, if it is numeric,use _NO as the suffix; if it is character, use _CODE. Use standard prefixes andsuffixes for column names. For instance, if you have a lot of money fields, add the

suffix _AMT to each column. A useful rule for date columns is to always start thecolumn name with DATE_.

richard


3/21




Watch your naming conventions between table names, report names, and querynames. It can get very confusing very fast as to what you are working with andwhere its at. If you insist on naming these components identically, at least identify

them with table, query, or report at the beginning of the name of the object.rrydenm

With Microsoft Access, it is acceptable to use qry, rpt, tbl, and modto identifyobjects (e.g., tbl_Employees). When I deal with SQL Server (or Oracle) I still use tblto reference tables, but I use sp_company (currently sp_feft_) for storedprocedures, because I sometimes keep copies of special ones Ive written if I'vefound a clever way to do something, and v_ for views. When we implement SQLServer 2000, I will use udf_ (or something similar) for the functions I write.

Timothy J. Bruce

3. Plan ahead

Back in the early 1980s, when working with an asset ledger system and a System38, I had an opportunity to make all the date fields so they would handle the year2000 problem without a lot of extra work. Many people said that I should forgetabout working on the problem because it would take too much effort to deal with it.(This was long before it became known as the Y2K problem.) I bit the bullet backthen, planning ahead. It took a couple of weeks to make all the changes in the setof programs. But because of that preplanning, Y2K mods should have beenminimal. (The last I heard, the programs were going strong on an AS/400 in 1995.The only problem with them at that time was the removal of comments from thesource code.)

generalist

4. Get The Data Model Resource Book

For those looking for sample models, The Data Model Resource Bookby LenSilverston, W. H. Inmon, and Kent Graziano is the best data modeling book you canhave. This book includes chapters on lots of various common data areas, such aspeople, organizations, and work effort.

minstrelmike

5. Think about the future, but dont forget past lessons

I have always found it useful to ask the users how they see their requirementschanging in the future. This accomplishes two things: First, it gives you a good idea

of where the design has to be especially flexible and avoid performancebottlenecks, and second, you know that if changes occur that were not on this list,the user group will be as surprised as you are.

chrisdk

Remember the past as well! This is where experience really pays off, and wedevelopers may be able to help each other by sharing our own. Even if users thinkthat they will never need to have more than one phone number or need to separate


4/21




first name from middle name, we should try to sell them on it. We all have hadthose "if only Id done it this way" moments.

dhattrem

6. Do logical design before physical design

Use logical design before diving into physical design. With the number of CASEtools available that allow design at a logical level, you can usually get a betteroverall understanding of what is needed in the database as a whole.

chardove

7. Know the business

Echoing [tip number 1] a little, do not put a single table on your ER model [you dohave a model, right? See tip number 9] before you are 100 percent sure about what

the system is supposed to do from the client's point of view. This knowledge willsave you a lot of time on the next stages. Once you know the businessrequirements, you can make a lot of decisions on your own.

rangel

If I may expand on this a bit: Once you think you know the business, do a quickwhiteboard ERD with the client. Use the client's terms and try to explain back tothem what you think you heard. By also expressing the cardinality of each relation,in terms of may, will, must, etc., you can get the client to correct your understandingand then put in a much better starting ERD.

teburlew

8. Create a data dictionary and ER diagram

Always take the time to create an ER diagram and data dictionary. They shouldcontain at least the data types of each field and what the primary and foreign keysare in each table. These are time-consuming to create but are essential for otherdevelopers to understand the design. Creating one early helps avoid a lot ofconfusion later and allows anyone who understands databases to figure out how toretrieve data from the database.

bgumbert

I can't stress enough how important it is to keep up-to-date documentation like ERdiagrams, very useful for showing relationships between tables, and a datadictionary that describes what each field is used for and any aliases that may exist.Documenting SQL statements is a must as well.

vanduin.chris.cj


5/21




9. Always create a model

A picture is worth a thousand words: Not only can the developer read andimplement it, but it can be used to talk with the user. That helps promote a

collaborative approach and it's less likely that large holes will be present in the firstdatabase design. The model doesnt have to be grandiose; it could even be simplyhandwritten on a piece of paper. Just making sure the logical relationships hangtogether will yield a huge benefit later on.

Dana Daigle

10. Design from the output in

When defining database table and field requirements (inputs), first look at existingor desired reports, queries, and screens (outputs) to determine what tables andfields will be necessary to support these outputs. A simple example would be: If thecustomer will require a report that sorts, breaks, or subtotals by ZIP code, be sure

to include a separate ZIP code field rather than lump the ZIP code into an addressfield.

peter.marshall

11. Reporting tips

Understand how users will report on the data most often: batch or online? By day,week, month, quarter, or year? Consider creating summary tables if needed.System-generated primary keys are difficult to manage for reporting. Users performlookups against secondary keys on tables with system-generated primary keys,often returning many duplicates. The performance is generally awful, and theconfusion is high.

kol

12. Make sure you understand the customer

This may seem obvious, but requirements come from the customer (think bothinternal and external customers here). Don't depend on what the customer writes inrequirements being what he/she really wants. Ask for the interpretations of what therequirements "say" and, as development proceeds, check back with the customer toensure that his/her needs are still being met. Invariably, an "I'll know what I wantwhen I see it" approach will cause major rework when the database doesn't deliversomething that the customer never wrote down. Worse yet, your interpretation oftheir requirements only belongs to you and might be incorrect.

kgilson


6/21




Section 2Table design and field selection

1. Remember to audit changes over time

Whenever I design a new database, I consider which data fields may change overtime. The obvious example here, and one that is very commonly overlooked, is lastname. Whenever I build a system to store customer information, I tend to store thelast name field (and other transitory data items) in a separate table along withadditional data fields for Date From and Date To, in order to track all changes tothat data item.

Shropshire Lad

2. Use meaningful field names

I once worked on a project I inherited from another programmer who liked to namefields using the name of the on-screen control that displayed the data from thatfield. Thats all well and good, but unfortunately, she also liked to name her controlsusing some strange convention that combined Hungarian notation with the order inwhich she added the controls to the UI: cbo1, txt2, txt2_b, and so on.

Unless you are using a system that restricts you to short field names, make them asdescriptive as possiblewithin reason, of course. Its possible to go overboard withthis. Customer_Shipping_Address_Street_Line_1is very descriptive andmeaningful, but no one would want to have to type it more than once.

Lamont Adams

3. Use prefixes for recurring names

If you have fields of the same type (like a FirstName) in multiple tables, name themwith a table-specific prefix (CusLastName). This helps keep your sanity when youstart doing joins.

notoriousDOG

4. Provide auditing for time-sensitive data

For time-sensitive data, include a "last updated date/time" field. Time stamps canbe useful for debugging data problems, reprocessing/reloading data by date, andpurging old data.

kol

5. Normalize and data-drive

Normalize to at least 3rdNormal Form. You will make life so much easier foryourself and others down the road if you keep your data normalized. Put as much inthe database as you can. For example, if your UI accesses outside data sources(flat files, XML documents, other databases, etc.), store the connection or pathinformation in support tables for your UI. Also, if the UI performs tasks like workflow(sends e-mails, prints letters, changes record status), store the data to generate the


7/21




workflow in the database as well. It takes a little more effort up front, but if thoseprocesses are data-driven rather than hard-coded in the UI, policy changes andmaintenance are much easier. In fact, if the process is data-driven, you can give

much of that responsibility back to the users and let them maintain their ownworkflow processes and change them without coming to you.tduvall

6. Normalize, but dont overnormalize

For those unfamiliar with the term, normalizationhelps eliminate the redundancy ofdata in a database by ensuring that all fields in a table are atomic. There areseveral forms of normalization, but the Third Normal Form (3NF) is generallyregarded as providing the best compromise between performance, extensibility, anddata integrity. Briefly, 3NF states that:

Each value in a table is to be represented only once. Each row in a table should be uniquely identifiable. (It should have a unique key.) No nonkey information that relies upon another key should be stored in the table.

Databases in 3NF are characterized by a group of tables storing related data that isjoined together through keys. For example, a 3NF database for storing customersand their related orders would likely have two tables: Customer and Order. TheOrder table would not contain any information about an orders related customer.Instead, it would store the key that identifies the row containing the customersinformation in the Customer table.

Higher levels of normalization exist, but is more normal necessarily better? Notalways. In fact, for some projects, even 3NF may introduce too much complexityinto the database to be worth the rewards.

Lamont Adams

There are many legitimate instances where a denormalized table is necessary forspeed. I'm in the middle of a financial analyzer in which some 40-second querieswere reduced to a couple of seconds with a denormalized table. When I have to dothat, I never put the denormalized tables in the basic design. Instead, I make themderivative, so that it is always possible to regenerate the denormalized table fromthe original if it gets corrupted. It is not terribly difficult to keep the denormalizedtables up to date with triggers and the like or even to do a union of thedenormalized table to a certain date and do joins on the normalized tables later.

epepke


8/21




7. Microsoft Access reporting tip

If you're using Microsoft Access, use user-friendly field names instead of codednames: Customer Name instead of txtCNaM. That way, when you use the wizards

for forms and reports, the names will be something people can read, not geek-speak.

jwoodruf

8. Inactive or unused indicator

One thing I have found helpful is to add a field to indicate if the record is no longeractive in the business. Be it a customer, an employee, or a widget, it helps to beable to filter on active or inactive status when running queries. This eliminates a lotof questions when a new user is working on the data and prevents problemsassociated with deleting records once they are no longer used.

theoden

9. Use role entities to define columns belonging to a category

When you need to define things as belonging to a specific category or having aspecific role, use a role entity intersection to create specific relations that are time-bound and therefore self-documenting.

Rather than having a PERSON entity with a Title field in it, why not have aPERSON entity and a PERSON_TYPE entity to describe that person. Then, whenJohn Smith, Engineer gets promoted to John Smith, Director and finally to JohnSmith, CIO, all you need do is change the key of the relationship between twotables, PERSON and PERSON_TYPE, and add a date/time field to know when the

change occurred. This way, your PERSON_TYPE table contains all possible typesof PERSON, such as: Associate, Engineer, Director, CIO, CEO, etc.

The alternative is to always change the PERSON record to reflect new titles, andyou lose your audit-trail as to what timeframe each individual was in which position.

teburlew

10. Use generic entity names to organize data

The simplest way to organize data is by using generic names: PERSON,ORGANIZATION, ADDRESS, PHONE, etc. When you combine these or createspecific unique secondary (subtype) entities of these, you can get specific. The

main reason for using generic terms to start is that all business people canconceptualize in the abstract.

Once you have these generic abstracts, you can get very specific in thesecondaries. For instance, PERSON can be Employee, Spouse, Patient, Client,Customer, Vendor, Teacher, etc. Likewise, ORGANIZATION can be MyCompany,MyDepartment, Competitor, Hospital, Warehouse, Government, etc. And finally,


9/21




ADDRESS can be Site, Location, Home, Work, Client, Vendor, Corporate,FieldOffice, etc.

By using generic abstract terms to identify classes of "things," you gain the greatestflexibility in relating the data to meet business needs while at the same timereducing the amount of redundant storage you need for the data.

teburlew

11. Remember that there may be users outside the United States

When designing a database that will be used on the Web or other internationalstage, remember that most countries have a different format for fields like ZIP/postcodesand some, like New Zealand, do not use these codes.

billh

12. If its repetitive, it needs a separate tableIf you find yourself repeating an entry, make a new table and a new relationship.

Alan Rash

13. Three useful fields that should be added to every table

dRecordCreationDate, default to Now() in VB or GETDATE() in SQL Server

sRecordCreator, default to NOT NULL DEFAULT USER in SQL Server

nRecordVersion, the version identifier of the record; helps to accurately interpretany missing or null data in that record

Peter Ritchie

14. Multiple fields for Address & Phone

One line for the street address is no longer enough. Address_Line1,Address_Line2, Address_Line3 offers more flexibility. Also, telephone numbers ande-mail addresses are no longer address-specific. They probably need their owntables, with type and some kind of preferred flag.

dwnerd

Be careful not to overnormalize, which can lead to performance problems. Whileseparate address and phone tables commonly are best, it may be appropriate tostore the preferred information in a parent table (e.g., Customer) if you will need to

access it often. The trade-off between normalization and speed of access can besignificant.dhattrem

15. Use multiple name fields

I'm amazed by how many people make nameone field in a database. I tend to thinkthat is the sign of a beginning developer, but having seen it enough times on Websites, I'm not so sure. So enter the first name and last name as separate fields


10/21




(include the middle initial field if it's appropriate); then concatenate the fields later inyour queries.

klempan

Klempan isn't the only one to notice widespread use of a single namefield. Youhave several options for making it user-friendly. One of my favorites is to simplycreate a computed column in the same table that will automatically concatenate thenormalized fields yet change when the data changes. However, this can get trickywhen using modeling software. A view is also a great way to insulateusers/developers from the tedium of concatenating fields.

damon

16. Watch out for mixed-case object names and special characters

Something that has caused me grief in the past is when an existing database I had

to work with had mixed-case object names (CustomerData). The problem I ran intowas porting from Access to Oracle. I didn't want mixed-case objects, so I had tochange them manually. Will this database/application grow to need a larger, morepowerful database someday? Use uppercase and include the underscore characterfor better readability (CUSTOMER_DATA). Another big no-no is putting spaces inobject names.

bfren

17. Watch out for reserved words

Make sure that none of your field names are reserved words, either for yourdatabase system or commonly used access methods. As an example, I recently

ODBC-ed to a table that used DESC as the field name for description. Choke!DESC is a reserved word abbreviation for DESCENDING. A SELECT * on the tableworked, but I would up pulling a bunch of extra useless stuff across the wire.

Daniel Jordan

18. Be consistent with field names and types

When naming fields and specifying their data types, be consistent. If the field iscalled agreement_number in one table, don't change the name to ref1 inanother. If the data type is integer in one table, don't make it char in another.Remember other people will have to work and understand the database after you'vemoved to the well-paying job where you're more appreciated.

setanta

19. Choose numeric types carefully

Beware using smallint and tinyint types in SQL. It may be tempting, but rememberthat the field type must accommodate any calculations you wish SQL to perform.For example, if you want to see total sales for a month, and your invoice total fieldis smallint, you won't be able to perform the calculation if the total is over $32,767.

egermain


11/21




20. Flag for deletion rather than delete

Include a "delete flag" field so rows can be flagged for deletion. Never delete rowsindividually in a relational database; always use purge programs and be careful to

maintain referential integrity.kol

21. Avoid triggers

There are usually other ways to accomplish what a trigger does. Triggers canbecome gotchas later when trying to debug a problem. If you absolutely have touse a trigger, document it centrally.

kol

22. Include a versioning mechanism

One suggestion that has always served me well is to include some mechanism inthe database to determine which "version" of the database you are using. No matterhow hard you try to fix requirements, over time your users requirements will almostalways change. Eventually, this may require a change in your database structure.

Although you can determine what version of the database structure you are lookingat by checking for the existence of new fields or indexes, I have always found itmost useful to store this explicitly in a table.

Richard Foster

23. Make text fields larger than you need

ID-type text fields, such as customer ID or purchase order number, should be madelarger than you think you need because you'll end up needing the extra charactersbefore long. For instance, suppose that your customer ID is based upon a 10-digitsequential value. Make the field 12 or 13 characters long instead. Does this wastespace? A little, but not as much as you think: A field with three extra characterswould only increase the database size by about 3 megabytes if there were a millionrecords in it, plus a little more for a larger index. But the extra space will allow forgrowth without needing to restructure the entire database at some future date. Howmany megabytes would you be willing to sacrifice now to avoid having torestructure a dozen, or two dozen, data tables, or having to update a bunch ofprograms whose subroutines rely on the length of the field a year from now?

tlundin

24. Column naming tip

We find that if you use a unique prefix with a column name for each table, it cangreatly simplify the writing of SQL statements. This does have the drawback ofbreaking those automatic table-linking tools that link on common column namesthat some databases come with now, but even these tools can sometimes get the

join wrong. As a simple example, consider two tables: Customer and Order. TheCustomer table is given prefix cu_, so its fields would be named: cu_name_id,


12/21




cu_surname, cu_initials, cu_address, etc. Well give the Order table the prefix or_and name its fields or_order_id, or_cust_name_id, or_quantity, or_description, etc.

So the SQL to select a trivial given row from this database looks like this:Select * from Customer, Order

Where cu_surname = "MYNAME"

and cu_name_id = or_cust_name_id

and or_quantity = 1;

While without those field prefixes, it would look like this:Select * from Customer, Order

Where Customer.surname = "MYNAME"

and Customer.name_id = Order.cust_name_id

and Order.quantity = 1

There is a lot less typing involved in the first SQL statement, even in this trivial

example. Expand this to a query with five tables and many more columns and thistips usefulness becomes more apparent.

Bryce Stenberg


13/21




Section 3Key Selection and indexing

1. Plan ahead for data mining

I learned the hard way. After our marketing department called over 80,000 contactsand filled out the necessary data on each customer (no small task, I might add), Idecided to target-market certain groups of clients. When I initially designed thetables for the form fields, I tried not to add too many fields to the primary index soas to speed the database up. I then realized that specific group lookups and miningwere inaccurate and slow. I rebuilt and remerged the data with the proper fields inthe primary index. I found that index planning is criticalwhy have fax numberas aprimary indexed field when I want to create lookups for system typeinstead? I canstill search by fax number, but it is not nearly as important to me as the systemtype. By making the latter a primary field, the database is reindexed as it is updatedand searches are much faster.

hscovell

Thats the difference between indexing in an operational data store (ODS)environment vs. in a data warehouse (DW) environment. In a DW environment, youneed to consider how your marketing department is going to construct theircampaigns. They, not the DBA, should be defining what constitutes key information.This is a case where an architect or database marketer should analyze thestructure to determine best case for both performance and validity of output.

teburlew

2. Use system-generated primary keys

This is similar to [tip number 1], but I feel it's important to repeat. If you alwaysdesign your database to use system-generated keys as the primary keys, youcontrol the referential integrity of the database. This way, the databases and nothumans control access to each row of data stored.

An added advantage in using system-generated keys for primary keys is that it iseasier to identify logic flaws when going through dumps when you have aconsistent key structure.

teburlew

3. Break up fields for indexing

Along with separating name fields and inclusion of fields to support user-definedreports, consider breaking other fields, even primary keys, into their componentelements so that they may then be indexed. Indexing will increase the speed ofexecution of SQL and report generator scripts. For example, I routinely createreports where I have to use a SQL LIKE expression because a case number fieldwas not separated into its basic parts of year, serial number, case type, and


14/21




defendant code. Performance is generally bad, and these reports would run muchfaster if the year and type fields were separate indexed fields.

rdelval

4. Four key rules for keys

Always create foreign keys for linked fields.

All keys should be unique.

Avoid compound keys.

A foreign key should always link to a unique key field.

Peter Ritchie

5. Don't forget the index

Indexing is one of the most efficient ways to retrieve your data. Ninety-five percent

of my performance and tuning issues have been resolved with an index. As a ruleof thumb, I generally use a unique clustered index on the logical primary key, aunique nonclustered index on the system key (for stored procedures), andnonclustered indexes on any foreign key columns. Remember though, indexes arelike salttoo much can be a bad thing. Consider how much room you have for thedatabase, how the table is going to be accessed, and whether that access willprimarily be for read or write.

tduvall

Most databases index primary key fields automatically upon creation, but don'tforget to index foreign key fieldsthey'll be used every time you want to run a query

that shows a record from the primary table and all related tables. Also, don't indexmemo/notes fields and try to stay away from indexing large (many characters) fieldsthis will make your indexes take up more space.

gbrayton

6. Dont index small, high-activity tables

Do not give any keys to small tables, especially if they have high amounts of insertand delete activity. The index maintenance on those inserts and deletes may costyou more time than a table space scan.

kbpatel

7. Never use Social Security Number(SSN) as a keyOne should never use SSN as a database key. Aside from the privacy angle andthe fact that the government is moving toward disallowing the use of SSN except forincome-related purposes, it needs to be hand entered. Never everuse a hand-entered key as the primary key since once you enter it wrong, the only choice youhave is to delete the entire record and start over.

teburlew


15/21




When I was in college in the 1970s, I recall that the SSN was used as the studentID despite the fact that such usage was illegal. People knew that it was illegal, butthey used it anyway. Decades later, as identity theft increases, the college campus

I'm on now is going through the pains of removing SSN from those screens andreports that use them but don't need them. It is a major problem, mandated by thestate but not funded.

generalist

8. Take the users keys away

When deciding which field or fields to use as keys in a table, always consider thefields that users will be editing. Its usually a bad idea to choose a user-editable fieldas a key. Doing so forces you to take one of these two actions: Restrict the user from editing the field after the records creation. If you do so,

you may discover that your application isnt flexible enough when business

requirements suddenly change and users needto edit that uneditable field. Whathappens when a user makes a mistake in data entry and doesnt notice until therecord is saved? Delete and re-create? What if the record isnt re-creatable;suppose the customer left?

Provide some way of detecting and correcting key collisions. Usually, this can bedone with some effort, but it is expensive in terms of performance. Also, a keycorrection may wind up being possible only from outside the data layer, forcingyou to break the isolation between your data and business/UI layers.

The underlying maxim here is this: Make your design fit the user; dont make theuser fit the design.

Lamont Adams

The reason we don't make primary keys updateable is that in a relational model,they provide the links between the various tables. For example, the Customer tablewill have a primary key (say, CustomerID) and customers will place orders, kept in aseparate table. The primary key of the Order table may well be something likeOrderNo (a unique number) or a composite of OrderNo, CustomerID, and date.Whichever key you choose, you will need to store the CustomerID on the Ordertable to ensure that you can find the record for the customer who placed eachorder.

If you change the CustomerID in the Customer table, you must find all relatedrecords in the Order table and change them too. Otherwise, you will have ordersthat don't belong to a customeryou will upset the referential integrity of yourdatabase.

If referential integrity rules are enforced at table level, which they should be, then itcan be almost impossible to change the key of one record and all associated


16/21




records throughout the database without a lot of code and appending and deletingof records. This process is frequently prone to errors and should be avoided.

ljboast

9. Candidate keys sometimes make the best primary key

Remember, humans are the ones who have to query the data.

Although not always possible, if you have a candidate key, go ahead and use it as aprimary key. That way, you have the value everywhere it is referenced. This keepspeople using the database from having to join tables to properly filter data. On adatabase with tightly controlled domain tables, this overhead can be significant. Ifsomething is a true candidate key, it meets criteria for a primary key!

My point is if you have a candidate key, such as state_code in a state table, don't

create a sequential key on top of the existing key that cannot change and is unique.You've done nothing but create extra worthless data. Consider the example below:

Now instead of:

Select count(*)

from address, state_ref

where

address.state_id = state_ref.state_id

and state_ref.state_code = 'TN'

I do:

Select count(*)

from addresswhere

and state_code = 'TN'

If you get several of these simple joins caused by the overuse of sequential keys ina table, the overhead can really mount.

Stocker

10. Dont forget your foreign keys

Most databases index primary key fields automatically upon creation. But don'tforget to index foreign key fieldsthey'll be used every time you want to run a querythat shows a record from the primary table and its related records. Also, don't index

memo/notes fields and try to stay away from indexing large (many characters) textfieldsthis will make your indexes take up more space.

gbrayton


17/21




Section 4Ensuring data integrity

1. Use constraints to enforce data integrity, not business rules

If you are dealing with requirements that are based on business rules, they shouldbe validated in the business layer/UI: If the business rules later change, updatesneed only be made in one place.

If the requirements are based on the need to maintain data integrity, they should bevalidated through constraints in the database layer.

If you do use constraints in the data layer, make sure there is a way tocommunicate the reason why an update failed a constraint check back to the UI, inlanguage the user understands. Unless you have been very verbose in your fieldnaming, field names themselves are rarely sufficient.

Lamont Adams

Whenever possible, use the database system for data integrity. This not onlyincludes integrity by design through normalization but also by functionality. Addtriggers to ensure that data is correct when written. Do not rely on the businesslayer to ensure data integrity; it can't ensure cross-table (foreign key) integrity sodon't force other integrity rules.

Peter Ritchie

2. Distributed data systems

For distributed systems, estimate your amount of data after five years (medium) or

10 years (large) before you decide whether to replicate all your data at every site orkeep your data only at one place. When you transfer data to other sites, its betterto set some flags in a database field. Update your flags after the targeted sites havereceived your data. To carry out the transfer, write your own batch processing orscheduling program to run at specific time intervals rather than asking a user topost it at the end of the day. Copy your maintenance data, like calculation constantsand interest rates, locally and set a version number to make sure that the data isthe same at every site.

SuhairTechRepublic

3. Enforce referential integrity

There is no good way to eliminate bad data after it's in the database, so you shouldattempt to eliminate it before its in the database. Activate the database systemsreferential integrity feature. This will keep your data clean but will force developersto put more time into handling error conditions.

kol


18/21




4. Relationships

If there is a many-to-one relationship between two entities, and there is any remotepossibility that it could turn into a many-to-many relationship, make it many-to-many

to start with. It is harder to go from an existing many-to-one relationship to a many-to-many relationship than it is to have a many-to-many relationship to begin with.

CS Data Architect

5. Use views

To provide another layer of abstraction between your database and yourapplications code, try building views specifically for the use of your applicationrather than let it access tables directly. This also provides you with a little morefreedom when handling database changes.

Gay Howe

6. Plan for data retention and recoveryThink through the data retention policy and build it into the design. Design your datarecovery processes up frontyou will need them. Use a data dictionary that can bepublished to users/developers for easy data identification and be sure to documentdata sources. Write online updates to "update queues" that can be used later toreprocess updates in case of data loss.

kol

7. Use stored procedures to let the system do the hard work

Having gone to lots of trouble to generate a high-integrity DB solution, my team(rightly!) decided to encapsulate small groups of functionally related tables byproviding a suite of regular stored procedures to access each group in order tospeed up and simplify client code development. During this, we found the commonapproach from 3GL coders was to trap all possible error conditions, as per standard3GL good practice:

SELECT Cnt = COUNT (*)

FROM []

WHERE [] =

IF Cnt = 0

BEGIN

INSERT INTO []

( [< primary key column>] )

VALUES ( )

END

ELSE

BEGIN

END

Whereas one non-3GL coder would rather do the following:INSERT INTO []

( [< primary key column>] )

VALUES

( )


19/21




IF @@ERROR = 2627 -- Literal error code for Primary Key Constraint

BEGIN

END

The second is a lot simpler, and in fact, utilizes the power we have given thedatabase by all that integrity-ensuring effort. Although I personally don't like the useof the embedded literals (2627), that can be easily replaced with a bit ofpreprocessing. Remember, the DB is not just a repository for data; it can empowerand simplify your coding.

a-smith

8. Make use of lookups

The best way to control data integrity is to limit a user's choices. Wherever possible,have a distinct list of values for a user to select from. This will cut down on typing

errors and misunderstandings and provide consistency in the data. Some commondata thats good for turning into lookups: state codes, status codes, titles, etc.CS Data Architect


20/21




Section 5Miscellaneous tips

1. Document, document, document

Document and explain any shorthand, naming conventions, restrictions, functions,etc.

nickypendragon

Use the database facility of commenting tables, columns, triggers, etc. Yes, it ismore work but pays huge dividends in the long run, for further development,support, and tracking of modifications.

chardove

Depending on what database system you use, there may be some software that willgive you a decent start on the documentation. You might want to start with the

largest pieces and work inward, getting more and more detailed. Or you might wantto do a lifecycle walkthrough, starting when new data is entered and detailing eachpiece as you go. No matter how you choose to do it, always document yourdatabaseseither within the database itself or in a separate document. That way,when you come back a year from now to do "version 2," or when another developersteps in, you'll be less likely to make any blunders.

mrs_helm

2. Use plain English (or whatever your language is) instead of codes

There are a number of good reasons why we use codes for things (e.g., 9935Amight be the supply code for a box of ink pens, 4XF788-Q might be the accounting

code for a business you buy things from). That's great, but users tend to think inEnglish, not codes. The accountant who's been there for five years probably knowsexactly who 4XF788-Q is, but the new accountant won't have a clue. When creatingpull-down menus, lists, reports, etc., sort them by the plain-English names. If youneed a code, show the user the plain-English names beside the codes. I also put apop-up help statement telling the user that after they make their selection, only thecode will appear.

amasa

3. Keep some general information around

I have also found it most useful to have a table containing general databaseinformation. In that table, I place information such as the current version of thedatabase, the date it was last checked/repaired (for Access), the name(s) of relateddesign documents, customer information, etc. This provides a simple mechanismfor keeping track of the database, especially useful in non-client/server situationswhen customers complain their database is not working as expected and e-mail orFTP the file to you.

Richard Foster


21/21

Database design guide www.techrepublic.com

4. Test, test, and test again

After building or revising a database, it is a must to always test the data fields withlive input from users. Most importantly, run user tests and work with users to ensure

that the data types you chose fit the needs and requirements of the business.Testing needs to be accomplished before putting the new database into servicelive on the system.

juneebug

5. Validating the design

A general technique for validating the database design during development is tolook at the database through the prototype of the application it is supporting. Inother words, for each area of the prototyped application that will eventually showdata, make sure that you can look at the data model and see how the data will beextracted.

jgootee

6. Access-specific design tip

For complex Microsoft Access database applications, put all of the primary tablesinto one database file; then add other database files that carry out specific functionsrelating to the original tables. Link to the primary tables in the primary file as neededto carry out those functions. Examples include data entry, data QC, statisticalanalysis, reports to management or governmental agencies, and various types ofread-only queries. This approach simplifies assignment of user and grouppermissions, and it also groups and compartmentalizes application functions,making them easier to manage when it becomes necessary to modify them.

Dennis Walden

Documents

Db Design Tips