Friday, April 24, 2009

NORMALIZATION

Database Design Process

There are several processes that must be done to design a database, namely:

  • Gathering user needs / business
  • Develop the ER model based on user needs / business
  • Convert ER Model to set the relation (table)
  • normalization of relations, for the anomaly
  • to implement the database for each table to create relationships that have been in the normalization

Database Normalization

Normalization process is the establishment of the database structure so that most of the ambiguity can be removed. Normalization process of decomposition is also a relationship that is still "bad" with break the attributes to build some relationships. Normalization stage, starting from the most mild (1NF) to most stringent (5NF). Normalization is usually only up to the level of 3NF or BCNF because already sufficient to generate the table-a table of good quality.

Normalization is done at the design stage, before the database is implemented. Purpose to avoid update anomalies. To find out how to do normalization, we must first understand the concept of functional dependency and key.

Why do normalization?

  • Optimization table structures

  • Increase the speed
  • Eliminate income data the same
  • More efficient use of storage media
  • Reduce redundancy
  • Avoiding anomalies (insertion anomalies, deletion anomalies, update anomalies).
  • Improved data integrity

A table saying good (efficient) or if the normal 3 to meet the following criteria:

  • If there is decomposition (decomposition) table, then the decomposition will be guaranteed safe (Lossless-Join Decomposition). That is, after the table is described / in the decomposition into a new table-table, the table-table can generate a new table with the same exact.
  • Maintain the functional dependence on the change data (Dependency preservation)
  • No violate Boyce-Code Normal Form (BCNF)

If the three criteria (BCNF) can not be met, then at least the table does not violate the Normal Form of the third stage (3rd Normal Form / 3NF).

Normalization is required because of the redundancy relations, not relations of "good". Why?

  • The main reason is the possibility of "update anomalies" (when the insert, delete, update) because can impact on the data inconsistencies
  • The reason is other waste storage space (hard disk)

How to Handle anomaly?

1. Anomaly in the handle by programming language used to create the application database. Designer should note this anomaly and to tell a programmer.

2. Anomaly does not handle the system, but submitted to the operator to be careful in making modifications, inserts and deletes. It is human error risk.

3. Anomalies be avoided, with ways to secure the normalization.

Denormalization

Denormalization is a reverse process of normalization, ie, combine several relations, brought to normal form of a lower.

Functional Dependency (FD)

Functional dependency (abbreviated FD) is a restriction that comes from the meaning of attributes and relationships between attributes. Functional Dependency attributes describe the relationship in a relationship. An attribute said functionally dependant on the other, if we use the value attribute to determine the value of the other attributes.

Symbol that is used to represent --> functional dependency. --> read the functional set. FD from the fact that there is (obtained at the analysis system).

There is a functional dependency from A to B indicates that a value attribute to uniquely determine the value of attribute B.

  • notation: A --> B

A and B are attributes of a table. A means of determining the functional B or B depends on A, if and only if there are 2 rows of data with the same value of A, then B is also the same value

  • notation: A --> B or A x--> B

  • It is the opposite of the previous notation.

Example:

Functional Dependency of the table value

  • Nrp --> Nama

Because for each value Nrp the same, then the value of the same Nama

  • {Mata_kuliah, NRP} --> Nilai

Because the Nilai of attributes depending on the NRP and Mata_kuliah together. In another sense Mata_kuliah for the NRP and the same, they also the same Nilai, because Mata_kuliah and the NRP is a key (is unique).

  • Mata_kuliah --> NRP
  • NRP --> Nilai

INFERENCE RULE TO FD

Example FD:


Suppose we are given a set of FD F, then we can reduce the FD-FD of F for the rule of inference. Example:

F = {SSN --> {EName, BDate, Address, DNumber}, {SSN, PNumber} --> Hours }

What is the FD that has a meaning similar to the F?

  • SSN --> EName
  • SSN --> BDate
  • SSN --> Address
  • SSN --> DNumber
  • SSN, PNumber --> Hours

Amstrong’s Inference Rule

1. Reflective (IR1): If X --> Y, then X --> Y.

2. Augmentation (IR2): If {X --> Y} then XZ --> YZ.

3. Transitive (IR3): If {X--> Y, Y--> Z} then X --> Z.

Derived Inference Rule

Decomposition: If {X--> YZ} then X --> Y.

Additive (Union): If {X--> Y, X--> Z} then X --> YZ.

Pseudotransitive: If {X -->Y, WY -->Z} then WX --> Z.

  • closure (cover) from the set of functional dependency F, where F is F+ the set of all the FD has that can be derived from F.
  • closure of the set of attribute X in relation to F is a set of X+ has all the attribute which is determined by the functional X.
  • X+ can be searched by repeatedly applying IR1, IR2, IR3 with FD in the F

NORMAL FORM

Normal form is a condition (using the FD and key) that determines whether a scheme relationships meet certain criteria. There are several normal forms based on a number of criteria:

  • Primary keys (1NF, 2NF, 3NF)
  • All Candidate Keys (2NF, 3NF, BCNF)
  • Multivalued dependencies (4NF)
  • Join dependencies (5NF)

FIRST NORMAL FORM (First Normal Form - 1NF)

A table on the form said to be normal if I did not reside in the unnormalized form of a table, where there is a kind of field multiplication and field that allows a null (empty) 1NF is not allowed on the:

  • Attribute values, many (Multivalued attributes).
  • Attribute a composite or a combination of both.
  • nested relations.

So:
  • Price is the domain attribute must be atomic rates
  • Ex Student Data as follows:

Or:

Table-top table in 1NF is not eligible, so the decomposition will be:

Mahasiswa Table

Hobby Table

Second NORMAL FORM (Second Normal Form - 2NF)

Normal form 2NF met in a table if it meets the form of 1NF, and all the attributes than the primary key, have a full Functional Dependency on primary key. A table does not meet 2NF said, if there are attributes that Functional Dependency are only partial (only depending on the part of the primary key). If there are attributes that have no dependence on the primary key, then the attributes must be moved or removed.

Functional dependency X --> Y if it is said of a remove attribute A from X means that Y is no longer dependent functional.

Functional dependency X --> Y if it is said partial delete an attribute A from X means that Y is functionally dependent.

Relation scheme R in the form 2NF if every non-primary key attribute A Î R is functionally dependent on the full primary key R.

The following table meet 1NF, 2NF, including but not


The table above does not meet 2NF, because (NIM, KodeMk) is regarded as the primary key:

{NIM, KodeMk} --> NamaMhs

{NIM, KodeMk} --> Alamat

{NIM, KodeMk} --> Matakuliah

{NIM, KodeMk} --> Sks

{NIM, KodeMk} --> NilaiHuruf

Table in the decomposition needs to be some of the table so that eligible 2NF, becomes:

  • Functional dependency is as follows:

{NIM, KodeMk} --> NilaiHuruf (fd1)
NIM --> {NamaMhs, Alamat} (fd2)
KodeMk --> {Matakuliah, Sks} (fd3)

  • So
fd1 (NIM, KodeMk, NilaiHuruf) --> Tabel Nilai
fd2 (NIM, NamaMhs, Alamat) --> Tabel Mahasiswa
fd3 (KodeMk, Matakuliah, Sks) --> Tabel MataKuliah


Third NORMAL FORM (Third Normal Form - 3NF)

Normal form 3NF fulfilled if the form meets 2NF, and if there are no non-primary key attribute that has a dependence on non-primary key attributes of the other (transitive dependencies).

Table following students eligible 2NF, 3NF, but does not meet


Because the table above there are still non-primary key attribute (ie, Kota and Provinsi), which has a dependence on non-primary key attributes of the other (ie KodePos), namely:

KodePos --> {Kota, Provinsi}

So that the table in the decomposition needs to be:

  • Mahasiswa (NIM, NamaMhs, Jalan, KodePos)

KodePos (KodePos, Provinsi, Kota)

Boyce-Codd Normal Form (BNCF)

Boyce-Codd Normal Form constraint has a stronger form of the Normal third. To be BNCF, relations must be in the First Normal Form and forced each of the attributes depends on the function in the super key attributes.

In the example below there is a relationship seminar, is the Primary Key NPM + Seminar.

Students may take one or two seminars. Each seminar requires 2 each of the students and led by one of the 2 seminar. Each leader can only take one seminar course. NPM and Seminar in this example and show a Pembimbing.

Seminar relations is a Third Normal Form, but not BCNF because Code Seminar is still dependent on the function Pembimbing, if any Pembimbing can only teach a seminar. Depending on the seminar is not a super key attributes such as required by BCNF. But relations Seminar should be parsed into two namely:

Normal Form of the Fourth and Fifth

  • Relations in fourth normal form (4NF) if the relation in BCNF and does not contain a lot of dependence values. To remove the dependency of many values from a relation, we divide the relationship into two new relations. Each relation contains two attributes that have a lot of relationship value.

  • Relations in fifth normal form (5NF) deal with the property called the join without any loss of information (lossless join). Fifth normal form also called the 5 NF PJNF (projection join normal form). The case is very rare and appear difficult to detect in practice.

References:
  1. ER Ngurah Agus Sanjaya. Slide Part 6 - NORMALISASI.
  2. Siti Aminah. Slide Review_Normalisasi_untuk_Vcon_7_des.

Saturday, April 18, 2009

DATABASE AND ER-Diagram

DATABASE DEFINITIONS

The database is a set of data stored in the magnetic disk, optical disk or other secondary storage. Besides, the data base can also be defined as the collection of data, which can be described as the activities of one or more organizations that be relations. The database can be a collection of integrated data-related data of an enterprise (company, government or private). For example:

  1. Company data à manufacturing production planning, actual production data, data, material ordering, and so forth.
  2. à hospital patient data, doctor, nurse, and so forth.

DBMS (Database Management System)

DBMS is a collection of databases or a combination of software-based database applications. DBMS is a software designed to assist in the maintenance and utility data collection in large numbers. Application programs are used to access and maintain databases. The main purpose DBMS is to provide an environment that is efficient and easy to use, withdrawal and storage of data and information.

BIT, BYTE, Field


1. Data bit is the part that contains the smallest value of 0 or 1
2. Bytes-bit set of bit similar
3. Field set of byte-byte similar, in the database used the term attribute

ATTRIBUTE OR FIELD
Attribute is the nature or characteristics of an entity that provides provide detail on these entities. A relationship can also have attributes. Example attributes:

  1. STUDENTS: NIM, NAME, ADDRESS
  2. CAR: NOMOR_PLAT, COLOR, TYPE, CC

ATTRIBUTE TYPES

  • Single attribute vs multivalue attribute

Single attribute is an attribute that can only be filled at most one value.
Multivalue attribute is an attribute that can be filled with more than one value with the same type.

  • Atomic vs composition

Atomic attribute is an attribute that can not be divided into smaller attributes.
Composite attribute is a combination of several attributes of a smaller.

  • Derived attribute is an attribute whose value can be derived from the value of other attributes, such as age that resulted from birth date attribute.
  • Null Value attribute is an attribute that has no value to a record.
  • Value attribute is a mandatory attribute that must have a value.

RECORD / TUPLE
Record is a data line in a relationship. Record consists of a set of attributes where the attribute is an attribute-related entity or to inform the full relationship.

ENTITY / FILE
Entity is the object or objects in a certain mini world represented in the database. Entity can be:

  1. Things that have physical (people, cars, houses, etc.)
  2. think that there are conceptual (enterprise, employment, discipline, etc.)

File is a collection of similar records and have the same elements, the same attributes but different data values. In processing applications, files can categories with several types as follows:

- Master File
- Transaction Files
- File Reports
- File History
- File Protection
- File Work

DOMAIN
Domain is the set of values that are allowed to reside in one or more attributes. Each attribute in a database relational is defined as a domain

KEY DATA ELEMENT
Key elements of record which is used to find these records at the time of access, or can also be used to identify each entity / record / line.

SPECIES OF KEY
There are several types of key in the ERD, namely:

  • Superkey is one or more attributes of a table that can be used to identify entityty / record of the table are unique (not all attributes can be superkey)
  • Cadidate Key is a super key with minimal attributes. Candidate must not contain a key attribute of the table so that the other candidate key is certain superkey but not necessarily vice versa.
  • Primary Key One of the key attributes of the candidate can be selected / specified a primary key with the three criteria, namely:
1. Key is more natural to use as reference
2. Key is more simple
3. Key is guaranteed unique

  • Alternate Key is the attribute of the candidate key is not selected to be primary key.
  • Foreign Key is any attribute that points to the primary key in another table. Foreign key will be going on a relationship that has cardinality one to many or many to many. Foreign key is usually always put on the table that point to many.
  • External Key is a lexical attribute (or set of lexical attributes) that values are always identify an object instance.

ERD (Entity Relationship Diagram)
ERD is a model of a network that uses word order is stored in the abstract system.
Differences between the DFD and ERD, namely:

1. DFD is a model of network functions that will be implemented by the system.
2. ERD is a model that emphasizes the network data on the structure and relationship data.

Elements of the ERD

  • Entity

In the ER Diagram, Entity is described with the rectangular shape. Entity is something that exists in the real system and the abstract where the data stored or where there are data.

  • Relationship

In the ER diagram, relationship can be described with a lozenge. Relationship is a natural relationship that occurs between entities. In general, the name given to the verb base making it easier to do the reading it relations.

  • Relationship Degree

Relationship Degree is the number of entities participating in a relationship. Degree which is often used in the ERD.

  • Attribute

Attribute is the nature or characteristics of each entity and relationship.

  • Cardinality

Cardinality show tuple maximum amount that can be relations with entities on the other entity.

Relationship Degree
There are several types of degree of relationship, namely:

  • Unary Relationship

Unary Relationship model is the relationship between the entity originating from the same entity set.

  • Binary Relationship

Binary Relationship model is a relationship between 2 entities.

  • Ternary Relationship

Ternary Relationship is a relationship between the instance of 3 types of entities are unilateral.

CARDINALITY
There are 3 cardinality relations, namely

  • One to One: Level one to one relationship with the one stated in the entity's first event, only had one relationship with one incident in which the two entities and vice versa.
  • One to Many or Many to One: Level one to many relationship is the same as the one to many depending on the direction from which the relationship view. For an incident on the first entity can have any relationship with many incident on the second entity, if the one incident on the second entity can only have one relation with the incident on the first entity.
  • Many To Many: if any incident occurs in many entity have relationships with other entities in the incident.
EXEMPLARY CARDINALITY


Notation (E-R diagram)

Symbolic notation in the ER diagram, namely:

1. Rectangle represent the collective entity
2. Circle represent the attributes
3. Rhomb represent collective relationships
4. Line as the set of relationships between the entity and the collective entity with the attribute


Reference:

  1. ER Ngurah Agus Sanjaya. Slide Part 5 - DATABASE DAN ER-DIAGRAM.
  2. Elmasri & Navathe, Fundamental of Database Systems, 5th Edition, Chapter 3, 2007. ¨

Thursday, April 2, 2009

DATA FLOW DIAGRAM

DATA FLOW DIAGRAM

Data flow diagram also called Data Flow Diagram (DFD). DFD is to describe the system modules in a smaller and less easy for the user to understand to understand the field of computer systems that will be done.

DFD also serves to describe the existing system or the new system will be developed logically without considering the physical environment where the data flows, or where data are stored. DFD is a tool that is used in the system development methodology is structured (structured analysis and design). DFD can describe the flow of data within the system with a structured and clear.

CONTEXT DIAGRAM

Context diagram of a process and describe the scope of a system and is the highest level of the DFD that describes the entire input to the system and output of the system. System in the context diagram is limited by Boundary (depicted by broken line). In the context diagram can not have storage (storage).

ZERO DIAGRAM

Zero diagram is a diagram describing the process of the DFD. This diagram provides a view of the overall system in which, showing the main function or process that is, the flow of data and external entity. At this level of data storage possible.

For the process that is not explained in the next level then added to the symbol '*' or 'P' at the end of the process. Balance of input and output (balancing) between 0 to diagram context diagram should be maintained.

DETAILED DIAGRAM

Diagram is a detailed diagram of the process of decipher what is in the diagram zero level or above.

Rules on numbering level DFD:

In one level there should be no more than 7 units and the maximum of 9, when more should be done in the decomposition.

SPECIFICATION PROCESS

Each process in the DFD must have a specification process. At the top level method is used to describe the process you can use with descriptive sentences. At a more detailed level, namely on the bottom (functional primitive) require a more structured specification.

EXTERNAL ENTITY

Specification process will be the guideline for the programmer to make this program (coding). Method used in the specification process: the process of disintegration in the form of a story, decision table, decision tree.

Unit outside is something that is outside the system, but it provides data in the system or to provide data from the system. A symbol of unity in outside the box with the notation. External entity does not include part of the system.

Rules for naming:

1. The name of the noun form.

2. Terminal may not have the same name except the same object (described twice, is intended to more clearly make the diagram). If so, then the terminal should be a forward slash in the top left corner.

3. Name of the flow of data into a process may not be the same as the name of the data flow out of the process.

4. Data flow into or out of the data store does not need to be given a name if:

  • Flow data is simple and easily understood.
  • Describes the data flow of all data items (one record intact).

5. There can be no flow of data from the terminal to store data or on the contrary, because the terminal is not part of the system. Relationship with the terminal must store data through the process.

External entity symbols:

DATA FLOW

Data flow is place where the information flow. Depicted with the straight line that connects the components of the system. Data flow direction indicated by arrows and the given name on the flow of data flow.

Data flow, flow between processes, data storage and data flow indicates that the form of data input to the system.

Guidelines of the name:

1. Name of the flow of data that consists of some words associated with the flow line speed.

2. No flow data for the same and the name should reflect its contents.

3. The flow of data that consists of several elements can be expressed with the group elements.

4. Avoid using the word 'data' and 'information' to give a name to the flow of data.

5. Wherever possible the complete flow of data is written.

Other provisions:

1. Name of the flow of data into a process may not be the same as the name of the data flow out of the process.

2. Data flow into or out of data storage does not need to be given a name if:

  • Flow data is simple and easily understood.
  • Describes the data flow of all data items (one record intact).

3. There can be no flow of data from the terminal to the data storage, or vice versa because the terminal is not part of the system, the relationship with the terminal data storage must be through the process.

Data flow symbols:

There are some concepts of data flow that must be considered, namely:

1. The concept of packet data (packet of data).

When two or more data flows from a source to the same destination the same, so should be considered as a single data flow.

2. The concept of data flow spread (diverging data flow).

Shows a number of copies of the same data flow from the same source to a different destination.

3. Gather the data flow concept.

Shows some of the different data flows from different sources join together towards the same goal.

4. The concept of destination and source of flow data.

All data flows must be generated from a process or to a process (can be one or both, that is derived from a process leading to a process is not derived from or is not a process leading to a process or a process come from and go to a process).

PROCESS

The process is also often called bubble is what is done by the system. The process can process the data or incoming data flows into outgoing data flow. Transform the process of working one or more of the input data into one or more of the output data in accordance with the desired specifications. Each process has one or more inputs and produce one or several outputs.

Guidelines of the process:

1. Name of the process consists of a verb and noun, which reflects the function of the process.

2. Do not use the process as part of the name of a bubble.

3. May not have some process that has the same name.

4. The process should be given a number. Order number wherever possible to follow the flow of the process or sequence, but the sequence number does not mean that the absolute is a process in chronological order.

5. Numbering process on the first level (the diagram is zero) is 1.0, 2.0, 3.0, etc.

6. Numbering process on the second level of 1.0 (the details of the process 1.0) is 1.1, 1.2, 1.3, etc.

7. Context diagram (context diagram) does not need to be given a number.

8. The process is the lowest 2.x, not explain again.

Process symbols:


DATA STORAGE

Data storage is a storage place for data that exists in the system. In a symbol with a pair of parallel lines or two lines with one of the side open. The process can retrieve data from or provide data to the database.

Form of data storage are as follows:

• A file or database in the computer system.

• An archive, or manual entry.

• One box of data in a table.

• A table reference manual.

• A book or agenda.

Guidelines of the name:

1. The name should reflect the data storage.

2. When his name more than one word must be marked with the number.

Data storage symbols:


DFD SYMBOLS



DATA DICTIONARY

Data dictionary is also called with a system data dictionary is a catalog of facts and data information needs of an information system. Working data dictionary to help the system to interpret the application in detail and organize all elements of the data used in the system precisely so that the system analyst and have a basic understanding of the same input, output, storage and processing.

In the analysis phase, the data dictionary is used as a means of communication between the systems analyst with the user. While in the system design phase, the data dictionary is used to design input, reports and databases.

Flow data on the global DAD, further details can be seen in the data dictionary. Load the data dictionary as follows:

1. Name of data flow: must note that readers who need further explanation about a flow of data can find it easily.

2. Alias: alias or other name of the data can be written if there is.

3. Forms of data: used to segment the data dictionary to use when designing the system.

4. Flow data: indicates from which data flows and where the data.

5. Description: to give an explanation of the meaning of the data flow.


BALANCING IN DFD


There are some balancing between input and output in the DFD, such as:

1. The flow of data into and out of a process must be the same as the flow of data into and out of the details of the process on the level / levels below it.

2. Name of the data flow into and out of the process must match the name of the flow of data into and out of the details of the process.

3. Number and the name of an entity outside the process must be equal to the number of names and entities outside of the details of the process.

4. The issues that must be considered in the DFD which have more than one level:

  • There must be a balance between input and output of one level and the next level.
  • Balance between level 0 and level 1 at the input / output of stream data to or from the terminal on level 0, while the balance between level 1 and level 2 is seen on the input / output of stream data to / from the process concerned.
  • Name of the flow of data, data storage and terminals at each level must be the same if the object is same.

RESTRICTIONS IN DFD

To drawing DFD, there are a few rules or restrictions that must be so in its drawing is not an error occurs, these rules are:

1. Flow data may not be from outside the entity directly to other outside entities without going through a process.

2. Flow data may not be from the savings directly to the data to outside entities without going through a process.

3. Flow data may not be saving the data directly from the savings and other data without going through a process.

4. Flow data from one process directly to the other without going through the process of saving data should or be avoided as much as possible.

DIAGRAM CASH INVESTMENT DATA

There are 2 forms of the DAD:

a. Physical Data Flow Diagram (DADF), more appropriately used to describe the existing system (old system). The emphasis is how the processes of the system is applied (in a way what, where and by whom), including manual processes.

b. Data Flow Diagram Logic (DADL), more appropriately used to describe the system that will be proposed (new system), with no emphasis on how the system implemented, but the emphasis only on the logic of the demand system, that is what the process logic needed by the system.

GUIDELINES TO DRAWING DFD

How to draw the following guidelines for good DAD, DADF or DADL. Example below is for DADL but can also be used for the DADF.

1. Identify the first unit outside of all involved in the system.

For example, sales for the system have involved foreign entity such as customer, credit manager, the warehouse and the delivery.

2. Identification of all input and output involved in union with the outside.

3. First image context diagram (context diagram). DAD is a tool for structured analysis. The structure of this system to try to describe the first time the outline (top level) and split them into the more detailed (lower level) is called overview diagram (level 0). Each overview will be drawn more detailed again called level 1, and each process in level 1 will be drawn back with a detailed level with a 2 and so on until each of these processes can not be drawn again.

4. Chart image tiered process for all that is in the system first. Tiered chart (hierarchy chart) used to prepare DAD drawing to more levels down again. Tiered chart can be drawn using the notation in the DAD.

5. Image to sketch DAD overview diagram (level 0) process based on the tiered chart.

6. DAD-level image to the next level-1 level and beyond.

For example, application DAD sales transactions that must be drawn is a level 1 to process 1, process 2 and process 3.

7. After all drawn level DAD, DAD is drawing to a management reporting drawn separately.

8. After all drawing is wrong, then all this DAD combined in one diagram.


Reference:

  1. ER Ngurah Agus Sanjaya. Slide Part 4 - DATA FLOW DIAGRAM.
  2. HM, Jogiyanto. 2007. Analisis & Desain Sistem Informasi. Yogyakarta: ANDI.