3 Ways to Annoy Your Database Administrator

August 17th, 2010
  1. Select * from verylargetable, anotherlargetable, reallybigtable;
  2. Delete from verylargetable;
  3. Update verylargetable set keyfield = null;

Database Design – Part 8: Security

August 16th, 2010

Security is an important part of the database design. With more applications being moved to the internet, databases are now being exposed to more opportunities of intrusion. According to Vigna et al. (2009), “the percentage of web-related vulnerabilities increased from 15.1% in 1999 to 49.1% in 2005, and this trend is projected to increase during 2006.” Solid securities measures help prevent hacking.

Default administrator user names such as “sa” for Microsoft SQL or “root” for MySQL, should be deleted or deactivated. Default settings like these open the door for anyone with knowledge of these database engines to have an opportunity to attempt intrusion into the database. A new less obvious account should be created with administrator rights.

A password business rule should be created and followed. Passwords should have a minimum length with mixture of types of characters such as 8 to 12 characters and at least one number, alpha and upper/lower case. The password should be changed every 60 to 90 days with a rule to not allow a password to be a repeat of the last 5 or 6 passwords used.

Accounts should be created for each application or user accessing the system. The account should only have the minim access needed to do its role. For example, a book application should not have access to a music collection database or worse an accounting database. If the application needs access to other databases, its right should only include what actions it can perform. For example, the book application may need access to music collection to check if the author is also a musician. The application should only have read rights and not update or delete rights.

Database Design – Part 7: Integrity

August 3rd, 2010

Good database design should account for referential integrity. By creating rules to enforce referential integrity, the database administrator will increase the quality of the data being held by the database. Referential integrity defines that a record cannot be deleted if there are other records that depend on it. All dependent or child records must be deleted with the primary or parent record. If the child records are not deleted with the parent records than they are considered orphan records. This causes tables to contain useless data that will impact the performance of the database engine.

Triggers are one way to create referential integrity. A trigger can be activated to do a task when another action takes place in a table. For instance, a record is deleted from the author table; a trigger can then delete the corresponding book records in the book table.

If triggers are not used, then it will be left to the application to create referential integrity. Extra planning would be needed to have the business rules included into all systems that used the data. If more than one application used the book collection database, each of these applications will need the code to delete the book records when the author records are deleted. If one application does not, then referential integrity is lost.

Database Design – Part 6: Indexes

August 2nd, 2010

Indexes can be created to add better performance to the database engine. Indexes are a collection of fields that the database engine maintains to better return results to both queries and applications. By indexing this information the database engine can retrieve the data more efficiently.

The primary key is automatically an index. The primary index is composed of the data that uniquely identifies each record. For example the primary key on the book table would be the ISBN field. A primary key can also be an auto sequence number generated by the database engine. The author id field and the genre id field would both be auto sequence numbers.

For performance purposes it is best to create indexes on all foreign keys. Any data that will relate one table to another should have its on index. This allows the database engine to efficiently gather related data together quickly for queries and applications.

Indexes do not have to be foreign or primary keys. Any data that is used regularly to filter or sort data should be within an index. An example of this would be book title field in the book table, author name field in the author table and genre title field in the genre table. These pieces of information will be filtered and sorted repeatedly and performance can be gained by creating their own indexes.

Database Design – Part 5 : Keys

July 23rd, 2010

Keys create uniqueness within a table and form the relationship between tables. They also increase the performance of the database engine and any applications that use the data. Keys can be composed of one or more fields.

The candidate key is the primary key within a table. Borthick, Jones and Kim (2001) state that, “Primary keys must uniquely identify the row in the table.” This key gives the row an unique identity that can be used to both retrieve and update the data within the row without changing the data within other rows. Performance is increased by using candidate keys because an application or query will only retrieve one record and not have to sort through several to find the data needed.
Using the book collection example a primary key could be the international standard book number (ISBN), author id, and genre id

Foreign keys are keys that refer to data in another table. Second form defines that data should only be located once in a database. The foreign key defines the relationship between two or more tables and limits the need of duplication.

Foreign keys in the book collection database would be the author id field inside the book table. The book genre table would have two foreign keys: ISBN and genre id.

Database Design – Part 4 : Normalization

July 22nd, 2010

Normalization is the process of breaking down the data into more usable formats. There are three levels to the process: first, second and third normal form.
First normal form determines how each piece of data is broken into columns within the table. Some pieces of data would need to be separated out into its own column. An example is the author’s name. The name could be stored into one field called name. Following the process of first normalization the author’s name would be broken into first name, middle name, last name, pre-fix and suffix.

By breaking the data down into more specific pieces of information allows anyone querying the data to have more flexibility. For instance, someone could search by the author’s last name and sort by their first name. Combining the data into one field would make this process almost impossible. This also allows for better index creation which in turns creates better performance for both an application and the database engine.

Second normal form must follow the rules of first form as well as each row must be capable of being uniquely identified. To achieve second normal form each record must have a column that gives the record an identity and the data inside of that field cannot be repeated in the same field in other rows. A good example of second normal form is the author id field. No other record should have a duplicate author id. The column or columns that uniquely identify records is called the candidate key.
Second normal form gives the table integrity. When an application or query statement must update a record it is necessary for that record to be uniquely identified. If the record cannot be uniquely identified than all records have the possibility of being changed.

In the third normal form the tables must comply with both first and second normal forms and eliminate duplicate information across the database. To achieve third normal form data cannot be duplicated in multiple tables. The book table and author table will both hold the author’s name. Without third normal form, the name would be repeated in both. With third normal form, the name would be contained in the author table with a reference from the book table to the author table. The reference columns within the tables are the foreign keys.

An example to further show the extent of the third normal form using the book collection example is the genre table. A book could have multiple genres. A table could be created that tracked the book genre information. The table would have a unique identifier like a sequence number, the international standard book number and the genre identification number. At first glance this table would seem to make the database overly complicated, but this table would hold multiple references for an individual book to multiple genres. This limits the need for duplicate information, in this case genre, from being stored within the database.

Third normal forms create efficiencies within the database. When data is to be changed, it should only be changed in one table and referenced by the others. This makes both the database and any applications using the data more efficient. This also minimizes any errors that may occur if duplicate data is not changed in all tables.

Database Design – Part 3 : Good Schema Design

July 21st, 2010

A database is a collection of tables that are loosely related. Examples include financial information, product information, inventory, and customer contact details. A database should only hold the data for its particular purpose. Accounting, inventory and customer contact details should not be contained in the same database. This will decrease the scalability, performance and security of the database. If the data is mixed, an application like a web page that would use the customer details information would also have access to the accounting data. This would open security risks to the company. Furthermore, having the data mixed would hamper performance and scalability due to having too much information within one database. Each database should be named appropriately for future database administrators to easily recognize what data is contained within it.

An example of a good database would be book collection. Only information about books like author, genre and year published would be included in this database. Information about a music collection would have its own separate database.

A table is a collection of tightly related data contained within one file. Tables are composed of rows and columns. Rows are made up of a single entity of data split into columns. All data within the row is in some way related to each other. Each column is a single piece of data with the same type and length.
An example of good table design is a book table. Only the information about the book would be included in the table. Fields would include title, published date, condition, genre, international standard book number (ISBN) and author id. Author information would be its own table with an author id that would be referenced by other tables in the book collection database. This decreases the amount of duplicated data.

Fields in a table make up the pieces of data that is collected in each row. A field has predefined types of valid data that it can contain. With the field type there is also field length which determines how much data each field will store. An example of a field type is “varchar (25)” or variable character and the length is 25 characters.

Each piece of data in the book collection table would have its own field and type. The field for book title would be a character field with a possible length of 100 characters. The field could contain any character, number or special character up to 100 characters. The field for published date would be a date field and only contain date information; alpha characters are not a valid entry for this field.
The database engine used will decide what types of fields are allowed. Most common types are string, numeric, data and Boolean. Some database engines will have unique types like blob and binary.

Database Design – Part 2: Designing To Specifications

July 20th, 2010

The database specifications can be gathered from various sources. Most common source includes use cases, existing applications and business forms such as a printed application, invoice or order form or a digital document such as an Excel document, Word document or PDF. Most likely several of these sources will be used in designing the database.

A use case is useful for gathering the specifications from the end-user. A system’s analyst can use a use case to better document the business process and intent of the application in development. Included in the use case will be descriptions of the data that the user is expecting to gather, track and reproduce. A database administrator can use the use case to infer some if not all of the data that will be stored within the database.
When upgrading an application that calls for a new schema, the old schema can be reused. The major advantage of reusing the old schema shortens the development time. Major disadvantages include loosing benefits of new technology when moving from one technology to another, and adhering to proper database design. In some cases, it is better to create a new database schema than to reuse an existing one.

The best source for gathering data requirements is the forms that either is used to gather the information or will be used to produce a printed copy of the information gathered. These forms will have a true representation of what will be stored within the database.

Once the specifications have been gathered, a data model will need to be created. The data model will break the data down into its respected tables, fields and field types. A data model is useful for a database administrator to see the layout of the data and how to best create the final database schema. The data model will than need to be normalized to create the final schema. Initial indexes, keys, triggers and stored procedures can be implied from the data model.

The final database schema can be created on paper or with various programs. Depending on the database engine there may be a native tool that can be used to better create it. Various techniques such as UML can be used to better understand that needs of the end-user and what data will be stored.

The importance of good database design – Part 1

July 19th, 2010

This is part one of several posts on good database design.

It is vital that good database design is adhered to both during and after development of a project. During the project, good design will not only impact the performance but add more security to an application. Using features within a database server can greatly increase the performance of any business application. After the development is complete, good design will allow for the continued performance and scalability of the application as the data needs of the business grows over the years.

Connect to a Foxpro table using C#

May 26th, 2010

It would be nice to always use the newest technology when doing a new project, but that is not always an option. Sometimes we have to <gasp> use legacy data <cue creepy music>. In my case, the offending old technology is Foxpro. With a little frustration and some luck I was able to drag the data I needed from the dreaded Foxpro. Below, I share one option with you…

Namespaces:

using System.Data.OleDb;
using System.Data;

Code:

/* Foxpro Connection String */
string cxConnectionString = @"Provider=vfpoledb;Data Source=C:\test\;Collating Sequence=machine;";

/* Connect and open*/
OleDbConnection cxConnection = new OleDbConnection(cxConnectionString);
cxConnection.Open();

/* Query statement */
string strQuery = "select fxString, str(fxInt) as fxInt, fxDate from foxprotable";

/* Prepare the data adapter and fill the data set */
OleDbDataAdapter daFoxpro = new OleDbDataAdapter(strQuery,cxConnection);
DataSet dsFoxpro = new DataSet("FoxproTable");
daFoxpro.Fill(dsFoxpro,"FoxproTable");

/* Loop through the records in the Data Set */
foreach (DataRow drFoxpro in dsFoxpro.Tables["fxString"].Rows)
{
/* Write to the console window */
Console.WriteLine(drFoxpro[""]);
}

/* Pause */
Console.ReadLine();

If you get this error:

The provider could not determine the Decimal value. For example, the row was just created, the default for the Decimal column was not available, and the consumer had not yet set a new Decimal value.

You’ll need to find the numeric field and add “str()”. Example: str(fxInt) as fxInt

Tip: Keep in mind that blank Foxpro dates will default to 12/30/1899.

Not much to it really. Oledb takes care of all the work. If you have any questions, please leave a comment below.

Thanks for reading!