Journal+Page

toc This is the journal page for our group in CS165. Here is where we write our experiences in learning about Databases. = = = = =11/18/10 - 11/23/10: Introduction to Database Systems= = = All throughout the discussion, there were several information cited about Software Engineering and not just the Database part of the Software Engineering. One of which is the Software Architecture:

As what's presented above, the User Interface handles the presentation of the software, the Data Processing handles the logic of the software, and the Data Structures handle the Data of the software. And so in class, the Data Approach will be focused. The data layer is subdivided into:

So when change comes, only the Database Objects and Database will have to be subjected for modifications, as the Data Objects passed to the Logic and Presentation Layers will remain to be the representatives.

//But why the need for database?// According to the concept of **Information Resource Management (IRM),** information is very important resource for any corporation or any business organization, so important that it should be managed with the guidance from basic principles the same way as managing other corporate resources such as people, equipment and finance. The basic principles deriving the concept of IRM are:

//Data Vs. Information// Data - no meaning; facts Information - processed data for interpretation Things to do in processing data to become information: The necessary thing in order to perform these 5 things is the **Database**. Formally, "it is a shared collection of logically related data, designed to meet the information needs to multiple users in organization." And it has two generic architecture, namely:
 * 1) Acquisition
 * 2) Storage
 * 3) Manipulation
 * 4) Retrieval
 * 5) Distribution
 * 1) Centralized Database - single location
 * 2) Distributed Database - single logical, spread on multi-location

//ISA (Information System Architecture)// It is a framework used in developing information systems. It provides basis for strategic planning, development and use in order to support the overall goals of the business organization. It is mostly used whatever the type process is used.

ISA major components:
 * 1) Data - generally composed of data entities and their corresponding relationships; answers **__What__ are inside the information systems?**
 * 2) Processes - series of steps that when undergone by inputs, is converted into output; answers **__How__ are data inside the information systems manipulated?**
 * 3) Network - locations of data storage, location of performing processes, and the links of connectivity; answers **__Where__ are the data placed?** and **__Where__ do the processes executed?**

ISA architectural layers:
 * 1) Business Scope - gives the overall use of information systems in the organization such as scope, mission, business direction and support.
 * 2) Business Model - develops models, which will represent the business scope, mission and direction, that defines business entities and their interrelationships
 * 3) Information System Model - develops the more detailed model representations of business entities
 * 4) Technology Model - designs the information systems models in accordance with the desired technological features and constraints
 * 5) Technology Definitions - implements technology models to produce statements that will generate the actual information system (database schemas)
 * 6) Information System - handles, maintains, and operates the complete information system

Note: The **Data Component** is focused in building databases.

ISA framework defines models for the information system, but only defines them and it does not provide the means to develop these models. So there comes the need to use **methodologies** and related set of **modeling tools** to develop these model representations into architectural representations.

Methodology - defines the sequence of steps that accomplishes a design goal by manipulating the set of design objects acting upon the desired process Modeling tool - manual and automated tools that provide support for the development of information systems. Examples:
 * 1) Computer-aided Software Engineering (CASE) tools - software products that offers automated support for some portions of system development process
 * 2) Integrated CASE (I-CASE) tools - set of CASE tools that can support all portions of the system development process

One of the methodologies that is widely used is the __Information Engineering methodology__. It is a formal methodology that aids in information systems creation and maintenance. The general idea here is that business models are first acquired, then use these models to derive the data and process models, which then will be used in developing the information system.
 * Information Engineering**

//But why Information Engineering?// Because... As what is noted earlier: the data component is focused on building databases, so Information Engineering suits the need to focus on data because it is data-driven.
 * 1) it is enterprise-wide
 * 2) it is data-driven
 * 3) it is ISA framework compatible

__Information Engineering Development Phases:__



//Database Management Systems has been cited many times on the previous paragraphs, but what exactly is it?//
 * Database Management Systems (DBMS)** - database management systems are high-level and not-simple software that intends to give services to manage enterprise data aiming to store them in an efficient way. Such features include:
 * 1) Giving users the ability to __create__, __retrieve__, and __update__ data in the database
 * 2) Giving users the ability to view descriptions of data items stored through the use of a __catalog__
 * 3) Giving users the ability to make __transactions__ with the assurance that the database will always be in a consistent and stable state
 * 4) Giving users the ability to update database __concurrently__ with the assurance that the database is updated correctly
 * 5) Giving users the assurance of database __recovery__ upon damage
 * 6) Giving users the assurance of database __security__ upon unauthorized attacks/access
 * 7) Having the capability of communication software __integration__
 * 8) Giving users the ability to __constraint__ the data in the database

= = =**12/01/10: Installing NetBeans and Configuring JavaDB**=

I have found a very straightforward website that guided me through the process; []. Though, I really did not have a very hard time following the instructions because I already had NetBeans installed. Now, I know how to use databases in NetBeans. =) There is another site that might be useful for future reference; [|http://java.sun.com/developer/technicalArticles/J2SE/Desktop/javadb].
 * -Reynald**

I had no problems installing NetBeans and configuring JavaDB, since I already had NetBeans installed some time ago, and fortunately the installation already included Java DB. If I recall correctly, I didn't modify any options in the CD installer given to me last semester (and something about installing GlassFish = installs JavaDB too). Last Thursday, I learned how to use the SELECT statement properly in SQL, and now I am looking forward to learning how to actually make tables and such using SQL. :)
 * -Kat**

I downloaded the Netbeans installer at [], I waited for more than an hour to finish the download, 318 MB download took quite a long time in my computer, haha. After the successful download, I installed the Netbeans in my usual way (just clicking the next button, and sometimes the agree checklist, etc). Then for the first time I was able to see the greetings Netbeans 6.9.1 on my screen. I checked the service tab and looked for the JavaDB and it was there so I downloaded the sample database organicshop in order to test and it was a success because I was able to do everything we did last time in class. From that I concluded that my Netbeans installation that I will use for my CS 165 was finished. I call it a day. ^_^
 * -Mar**

= = =**12/8/10: Organizing the project amongst the members**=
 * Summary:** We had no problems installing NetBeans and configuring Java DB, since the latter comes with the NetBeans package. :D

It was hard, very hard. Mainly because the Christmas break is coming and deadlines will be bombarding us first thing in 2011. Organizing and setting schedules was a challenge because having to give assignments regarding the project in the vacation is somewhat inconsiderate. We are all people who have our own lives as everyone should know. Anyway, we fixed the schedule and hopefully it would be a good call. We also have edited the home page and the transcript is soon to follow.
 * -Reynald---12/9/10** I have completed a detailed information in the home page and gantt chart for the project. [[file:gantt chart.docx]]

=**11/25/10 - 12/11/10: Entity Relationship Diagram (ERD)**=

Entity-Relationship Modeling - "a technique for defining the information needs of an organization providing a foundation for high-quality, appropriate systems that meet business needs"

We learned about Entity Relationship Diagrams (ERDs) in class. During Week 2, we were given ideas regarding the basic notation used for ERDs. Primarily, an ERD is composed of //entities//, //attributes//, and //relationships//.

//Entities// An entity is an important person, place, thing, event, or concept. It is represented by a rectangle.Example:

//Attributes// Attributes, on the other hand, are properties or characteristics that describle entities. The notation used for this is a circle/oblong. There are sixspecial attributes.
 * 1) **Candidate key(s)** are attributes or a combination of attributes that uniquely identify instances of an entity.
 * 2) **Primary key(s)** are attributes that uniquely define instance of entities, distinguished by __solid underlines__.
 * 3) A **Composite key** is the primary key of two or more attributes.
 * 4) **Foreign key(s)** are attributes or a set of attributes that is the primary key of another entity. This is represented by a __broken underline__.
 * 5) **Derivative data** are values that are computed or derived, i.e. counts and totals.
 * 6) **Multivalued Attributes** are attributes that have more than one value for each instance.

Example:

//Relationships// Relationships are associations between instances of one entity with another entity. **Verbs** are used when specifying relationships, and the symbol used for this is a **diamond**.

We also learned about //degrees of relationship//: **unary**,


 * binary,**

and **ternary**,



and //cardinality//: **mandatory one** (one-and-only-one),


 * optional one** (zero or one),


 * many cardinality** (one or many)

and **optional many** (zero or many).

//But why identify relationships?// Because...
 * 1) It gives __data integrity__.
 * 2) It provides easy access of __dependent data.__

//Unified Modeling Language (UML) Class Diagrams// It represents a uniﬁcation of the concepts and notations with the goal to become a common language for creating models of object oriented computer software.

//Class Diagram// It depicts the classes within a model. //Composition Relationships// It depicts instance of a type containing an instance of another type and is represented by **black diamond**.

//Inheritance// It depicts the passing of attributes and behaviors objects from previously created objects and is represented by **triangular arrowhead**. //Aggregation// It depicts the relationship between the aggregate class (the class with the **white diamond** touching it) that is in some way the “whole”, and the other class in the relationship is somehow “part” of that whole. //Association// It depicts the relationship of parent class to its child class. Dependency It depicts weak relationship between classes and is represented by **dashed arrow**. //Interfaces// It depicts classes that only have nothing but pure virtual functions and is represented by **lollipop notation**. reference: Martin, Robert C.[]

//Situation Analysis// It is a way for discovering entities, attributes and relationships through analyzing **situations,** which are "well defined set of circumstances that can be described using a sufficiently complete natural language." //Advanced Entity-Relationship Modeling// Gerunds - many-to-many relationships that are chosen to model entity types with several associated one-to-many relationships and is represented by **box with diamond**

Modeling Multivalued Attributes - a way for modeling **multivalued attributes,** attributes that contain more than one value, by converting each multivalued attribute to a separate entity type that has a relationship to the entity type from which it was removed

Modeling Related Group - a way of modeling a set of two or more multivalued attributes that are logically related, commonly known as **related group**, by removing the repeating group through creating another entity that would represent this group. Then, primary key can be created or chosen from existing attributes, and the primary key of the original entity becomes a foreign key of the new entity.

Modeling Time-Dependent Data - a way of modeling **time stamp**, a time value (e.g. date or time) that is associated with a data value, by removing repeating group then creating a new entity along with required composite key(s).

//Generalization and Categorization// Generalization - concept that some things are subtypes of more general things. Example, Finals is a subtype of Exam.

Categorization - concept that some things come in various types. Example, Cakes come in different flavors.

//Supertype Subtype Relationship (a.k.a Is-a Relationship)// Modeling the relationship of **supertypes**, which is generic entity subdivided into subtypes, and **subtypes,** which is subset of a supertype, that is represented by **rectangle with rounded ends**.

Discussed briefly was a layered view of software engineering: tools, methods, process and quality.

During lab hours, we were taught how to use the WHERE, BETWEEN, DATE, EXISTS, ANY, GROUP BY, HAVING, CARTESIAN PRODUCT, JOIN and set operations (UNION, AND, NOT IN). We were also taught how to create tables, delete tables and add/remove records from existing tables.

GROUP BY and HAVING are often used for report generation, and CARTESIAN PRODUCT must be avoided, because it tends to mess up data. Instead, JOIN must be employed when merging/linking tables.

Equi-join query structure in a way that records in the joined table must have the same values for the columns to which they are joined. Natural join, on the other hand, is a query structured so that the join column does not display data redundantly It is also termed as inner join. Self-join is a query that joins a table to itself, and makes use of aliases on tables. Left outer join preserves the unmatched rows from the first (left) table, joining them with a NULL row of the second (right) table. Right outer join preserves the unmatched rows from the second (right) table, joining them with a NULL row of the first (left) table.

During Week 3, we tackled advanced ERD modeling. This included gerunds, multivalued attributes, repeating groups, time-dependent data, generalization and categorization, supertypes and subtypes.
 * -Kat**

Just to add, we also learned about requirements gathering; what to ask in an interview and what to look for in their response. I have nothing more to say. =)
 * -Reynald**

=**01/11/11 - 01/13/11: Formal Technical Review**= This week, we learned how to make a formal technical review of software engineering projects. The main goal of the technical review is to uncover errors before the product is released to the customer. The people involved in the review are the review leader, producer, reviewer, recorder, user representative and standards bearer. We were also given guidelines on how to conduct a formal technical review. We also learned what goes on before, during and after a review. The review participants decide on what to do with the project: 1)accept without further modification, 2)reject and rework, or 3)correct minor errors.

In addition, we learned what makes a good ERD model. Criteria for this include: completeness, data reusability, stability, flexibility, elegance, communication and integration. More often than not, the listed objectives will conflict with one another, and this conflict also needs to be considered when judging ERDs. Also, discussed briefly was the requirements profile test, system designer scale and system tester scale. If the results of the tests are 1s and 2s, then the requirements can be passed on to designers and testers. Otherwise, requirements need to be reassessed and rewritten.

Data models can be validated by using a data flow diagram or an entity life history.

=**01/18/11 - 01/21/11: Normalization**= We learned about logical database design (LDD) this week. Normalization, the process of removing anomalies and redundancy of data, is an integral part of the LDD. We also learned about other database models, like the hierarchical model, network model, relational model and object model. Examples of relational database systems include JavaDB, mySQL, Postgre, Oracle, DBL and Sybase.

Forms, reports and conceptual data models can be normalized. There are steps to follow in normalization:
 * 1) The first normal form (FNF) contains no repeating groups.
 * 2) The second normal form (SNF) is in FNF and every nonkey attribute should be fully functionally dependent on the primary key.
 * 3) The third normal form is in SNF and has no transitive dependencies.

For most database applications (specifically businesses), the TNF is enough. Other applications may require additional steps, like:


 * Boyce-Codd Normal Form (BCNF): Each determinant is a candidate key.
 * Fourth Normal Form: Is in BCNF and has no nontrivial multi-valued dependencies
 * Fifth Normal Form: has no join dependency

=**01/25/11 - 01/27/11: Physical Database Design**=

The Physical Database Design is the process of mapping the logical database structures developed in previous stages into an internal model or a set of physical database structure.

__Three Major Inputs to Physical Database Design:__
 * 1) Logical database structure (Relational Schema)
 * 2) User processing requirements that were identified during requirements definition, including size and frequency of use of the database.
 * 3) Characteristics of the database management system (DBMS) and other components of the computer operating environment.

To specify the physical design of the tables, we need to consider the following:
 * Business Rules or Integrity Constraints
 * Data Volume and Usage Analysis
 * Data Distribution Strategies
 * File Organization and File Access Methods
 * Indexes
 * Denormalization

__Business Rules or Integrity Constraints__ Business Rules or Integrity Constraints are specifications that preserves the integrity of the logical data model. The term business rules are usually used in the context of the analysis phase of the database while the term integrity constraints are used in the context of the design phase.

Three Categorization of Integrity Constraints:
 * 1) Domain Constraints
 * 2) Entity Integrity
 * 3) Referential Integrity
 * 4) Triggering Operations

__Domain Constraints:__ A domain is a set of all data types and ranges of values that attributes may assume.

//Data types//
 * //Numeric Data Types//: SMALLINT, INTEGER, BIGINT, REAL, DOUBLE PRECISION, DECIMAL
 * //Character Data Types//: CHAR, CHAR FOR BIT DATA, VARCHAR, LONG VARCHAR, VARCHAR FOR BIT DATA, LONG VARCHAR FOR BIT DATA
 * //Date and Time Data Types//: DATE, TIME, TIMESTAMP,
 * //Large Objects Data Types:// CLOB, BLOB,
 * //Other Data Types:// XML

//Allowable Values//
 * 1) //Null Values// - a missing or unknown value in a column of a table. They are not the same as blanks.
 * 2) //Duplicate Values// - a value in a column of a table that exactly matches some other value in that same column.
 * 3) //Changeable Value// - a value in a table that may vary with time.
 * 4) //Key// - The primary key (PK) of a table is the column or group of columns whose values uniquely identify each row of that table. Every table must have a primary key. Primary Key values must never be null (NN), never be duplicated (ND) and never be changed (NC). The primary key can be system-assigned (SA) or user-assigned (UA).A foreign key (FK) is a column or group of columns that is a primary key in other tables.
 * 5) //Derivative or Calculated Data// - Derivative data are data that are calculated from data defined elsewhere in a model.

//Format and Code Design// Codes are sets of integer numbers or letter used to represent item description in short form for efficient processing. It is used for: Identification and Retrieval, Classification, Efficient Sorting and Indexing and Cost Saving in Data Preparation.

//Types of Coding Scheme://
 * 1) Alphanumeric Coding Schemes
 * 2) Mixed Mode Coding Schemes
 * 3) Numeric Coding Schemes
 * 4) Protected Numeric Code

__Entity Integrity__ Entity integrity means that the base relation's primary key (whether single or composite) can not be null. A base relation or base table corresponds to an entity in the conceptual schema whose tuples or rows are physically stored in the database. It is also known as the primary key constraint.

__Referential Integrity__ Referential integrity defines the constraints that address the validity of references by one table in a database to some other table or tables in the database. It relates to how foreign keys are defined.

//Referential Integrity Rules://
 * //Insertion Rule// - This states that row should not be inserted in the same referencing table unless there already exists a matching entry in the referenced table.
 * //Deletion Rule//. This states that a row should not be deleted from the referenced table if there are any matching rows in the referencing table. There are three things that can be done: //Restrict// (deletion request is not allowed or denied), //Nullify// (values of the foreign key of the referencing table are set to null), //Cascade// (the rows in the referencing table are also deleted).

__Triggering Operations__ Triggering Operation is an assertion or rule that governs the validity of data manipulation operations such as insert, update and delete. The components of a triggering operation are:
 * //User Rule// which is a concise statement of the business rule to be enforced by the triggering operation;
 * //Event// which is the data manipulation operation (insert, update or delete) that initiates the operation;
 * //Entity Name// which is the name of the entity being accessed or modified;
 * //Condition// that causes the operation to be triggered; and
 * //Action// that should be taken when the operation is triggered.

=**02/01/11 - 02/03/11: JDBC**= JDBC is short for Java Database Connectivity. It is a Java API that enables Java programs to execute SQL statements.

__JDBC Design Pattern__

__Static View of Design Pattern of Persistence__ The static view is an object model of the pattern. It is illustrated using a class diagram.

//JDBC Design Pattern Classes://
 * //PersistentClient// - [[image:file:///D:/DOCUME%7E1/KATHLE%7E1/LOCALS%7E1/Temp/moz-screenshot.png]] It is a class requesting data from the database.It works with a DBClass.
 * //DBClass// - It is a class that is responsible for communicating with the database. It works with the classes found in the java.sql package.
 * //PersistentClassList// - It is a class that is use to return a set of persistent objects as a result of a database query. One record is equivalent to a PersistentClass in the list.
 * //PersistentClass// - It is a class that maps a record in the database.

__Dynamic View of Design Pattern of Persistence__ The dynamic view of the design pattern of persistence shows how the classes from the static view interacts with one another. The sequence diagram is used to illustrate this dynamic behavior.

Dynamic Behavior:
 * 1) JDBC Initialization - establish connection with the underlying Relational Database Management System (RDBMS)
 * 2) JDBC Create - creates a record by executing the INSERT-statement
 * 3) JDBC Read - retrieves records from the database by executing the SELECT-statement
 * 4) JDBC Update - changes the values of an existing record in the database by executing the UPDATE-statement
 * 5) JDBC Delete - deletes records in the database by executing the DELETE-statement

__Database Server-side Programming__ Database-side procedures are methods invoked within the database, eg., JavaDB.

__JDBC Stored Procedures and Functions__ A Stored Procedure is a subroutine that is available to applications accessing the relational database system. A Stored Function computes scalar results, and enforcing domain constraints. These are used to access and manipulate data in the DB.

=**02/15/11 - 02/24/11: Transactions**=

During this span of time, we learned about **transactions**. Transactions are series of operations on a database that the user wants to be completed or not at all. It is an action or series of actions carried out by a single user or application program, which accesses or changes the contents of the database.

A transaction can have one of two outcomes. It can be //committed//, i.e., the transaction completes successfully, and the database has reached a new consistent state. Or, it can be //rolled back or undone//, i.e., the transaction does not executed successfully or aborts, the database is restored to the consistent state it was before the transaction started.

__Properties of Transactions__
 * 1) //Atomicity// - A transaction is a single unit that is either performed in its entirety or it is not performed at all.
 * 2) //Consistency// - A transaction must transform the database from one consistent state to another consistent state.
 * 3) //Isolation// - A transaction executes independently of one another. In other words, the partial effects of incomplete transactions should not be visible to other transactions.
 * 4) //Durability// - The effects of a successfully completed (committed) transaction are permanently recorded in the database, and must not be lost because of a subsequent failure.


 * //Concurrency control//** is the process of managing simultaneous operations on the database without having them interfere with one another. This sounds good, however there are possible problems that arise because of this.

__Potential Problems Caused by Concurrency:__
 * 1) The //Lost Update Problem// is a problem where an apparently successfully completed update operation by one user can be overwritten by another update operation of another user.
 * 2) The //Uncommitted Dependency Problem// is the problem that occurs when the transaction is allowed to see the intermediate results of another transaction before it has committed.
 * 3) The I//nconsistent Analysis Problem// occurs when a transaction reads several values from the database but a second transaction updates some of them during the execution of the first. The phenomena of reading partial results of incomplete transactions (which are simultaneously updated and rolled back) is also known as __dirty read__ or __unrepeatable read__.

__Serializability and Recoverability__ The concept of serializability is used to achieve concurrency in database transactions.

To properly provide concurrency control, a schedule needs to be established to ensure consistency of our data. A **schedule** is a sequence of operations by a set of concurrent transactions that preserves the order of the operations in each of the individual transactions.

A //nonserial schedule// is a schedule where the operations from a set of concurrent transactions are interleaved. A //serial schedule// is a schedule where the operations of each transaction are executed consecutively without interleaved operations from other transactions.

Our goal is to achieve concurrency with a nonserial schedule.

__Concurrency Control Techniques__ There are two basic concurrency control techniques that allow transactions to execute safely in parallel.. These are also known as //conservative or pessimistic// approaches since they cause transactions to be delayed in case conflicts with other transaction occurs at some time in the future. > incorrect results. It is the most widely used approach to ensure serializability.
 * //Locking// is a procedure used to control concurrent access to data. When a transaction is accessing the database, a lock may deny access to other transactions to prevent
 * //Timestamp method// guarantees serializability by using transaction timestamps to order transaction execution for an equivalent serial schedule.

There are also //optimistic// concurrency control protocols, which are based on the assumption that conflicts are rare.

__Basic Locking concepts__
 * A lock is a mechanism that prevents another transaction from modifying (and in some cases, reading) a data item.
 * //Read lock (shared lock)// - transaction can read the data item but not write it.
 * //Write lock (exclusive lock)// - transaction can both read and write the data item.

Locks are used in the following manner:
 * Any transaction that needs to access a data item whether it for reading or writing must first lock it.
 * If the item is not locked by another transaction, the lock will be granted.
 * If the item is currently lock, the DBMS determines whether the request is compatible with the existing lock. If a read lock is requested on the item that has already a read lock on it, the request is granted. Otherwise, the transaction must wait until the existing lock is released.
 * A transaction continues to hold a lock until it explicitly release it. It can be during the execution or at the end. It is only when the write lock has been release can the others see the effects of the write operation, i.e., the changes of the write operations are already visible to other users or applications.

Locks by themselves does not guarantee the serializability of schedules. To guarantee serializability, an additional protocol concerning the position of the lock and unlock operations is needed. The best known protocol is the two-phase locking (2PL).

In a //two-phase locking (2PL)//, a transaction follow a protocol that all locking operations precede the first unlock operation in the transaction. In this protocol, a transaction is divided into two phases:
 * //Growing phase// - a transaction acquires all the locks needed but cannot release any locks
 * //Shrinking phase// - a transaction releases locks but cannot acquire new locks

There are two forms/variants of 2PL:
 * Rigorous 2PL - leave the release of all locks until the end of the transaction
 * Strict 2PL - holds write locks until the end of the transaction

A //deadlock// is a situation in which no progress is possible because two or more transactions are each waiting for locks held by the other to be released.

Two general techniques for handling deadlocks:
 * 1) //Deadlock Prevention// - The DBMS looks ahead to determine if a given schedule will cause a deadlock, and never allow a deadlock to occur
 * 2) //Deadlock Detection and Recovery// - The DBMS allows deadlocks to occur but recognizes occurrences of deadlock, and breaks it.

__Timestamping Methods__
 * A //timestamp// is a unique identifier created by the system that indicates the relative starting time of a transaction. It can be generated by using the system clock at the time the transaction started, or by incrementing a logical counter every time a new transaction starts.
 * //Timestamping// is a concurrency control protocol in which the fundamental goal is to order transactions globally in such a way that older transactions, transactions with smaller timestamps, get priority in the even conflict.
 * //read-timestamp// - timestamp of the last transaction to read the item
 * //write-timestamp// - timestamp of the last transaction to write or update the item

__Three Phases of an Optimistic Concurrency Control Protocol__ > transaction that have updates, checks are done to ensure that the transaction would leave the database in a consistent state.
 * 1) //Read Phase// - The transaction reads all data items it needs from the database and uses local variables to hold the data items. Updates are applied to the local copies; not to the database.
 * 2) //Validation Phase// - Checks are done to ensure serializability. For read-only transactions, checks are done to ensure that the data values are the current data values. For
 * 1) //Write Phase// - This follows from a successful check of the validation phase. Updates made to the local copy are applied to the database.

__Database Recovery__ Database Recovery is the process of restoring the database to a correct state in the event of a failure.

The DBMS must provide the following mechanism to assist in database recovery:
 * 1) //Backup Mechanism// - This provides backup copies of the database. The full backup, also known as complete backup, is the backup of the entire database while incremental backup consists only of modifications made since the last complete or incremental backup.
 * 2) //Logging Mechanisms// - This keeps track of current state of transactions and database changes. A log file, also known as a journal, keeps track of database transaction, which is a file containing information about all updates to the database.
 * 3) //Checkpoint Mechanism// - This enables updates to the database that are in progress to be made permanent. A checkpoint is the point of synchronization between the database and the transaction log file.
 * 4) //Recovery Manager// - This allows the system to restore the database to a consistent state following a failure.

Recovery Techniques:
 * //Deferred Update// - In this protocol, update are not written to the database until after a transaction has reached its commit point.
 * //Immediate Update// - Using this protocol, updates are applied to the database as they occur without waiting for transaction to reach a commit point.


 * Reflection**: It is a big relief that this mostly taken care of by the DBMS. Imagine trying to implement that all by yourself!

=**03/01/11 - 03/03/11: Query Processing**=


 * Query Processing** involves activities that retrieves data from the database. One of its aims is to transform a query written in a high-level language, typically SQL, into a correct and efficient execution strategy expressing in low-level language, and to execute the strategy to retrieve the required data.

Four Main Phases Of Query Processing:
 * 1) //Query Decomposition// - a high-level query is converted or transformed into a relational algebra query
 * 2) //Query Optimization// - involves the choosing of an efficient execution strategy for processing the query
 * 3) //Code Generation// - transforms the execution strategy into low-level operations that can be executed by the next phase
 * 4) //Runtime Query Execution// - executes the low-level operations to retrieve data from the database

__Query Decomposition__ -Conjunctive Normal Form. It is a sequence of conjuncts that are connected with ^ (AND) operator. -Disjunctive Normal Form. It is a sequence of disjuncts that are connected with the v (OR) operator.
 * //Analysis// - query is analyzed for syntax errors
 * //Normalization// - query is converted into a normalized form that can be easily manipulated
 * //Semantic analysis// - needed to reject normalized queries that are incorrectly formulated or are contradictory
 * //Simplification// - detect redundant qualifications, remove common sub-expressions, transform the query to a semantically equivalent but easy and efficient form
 * //Query Restructuring// - query is restructured to provide a more efficient implementation

__Query Optimization__ The //Heuristical Approach// to query optimization uses transformation rules to convert one relational algebra expression into an equivalent from that is known to be more efficient.

//Cost Estimation for Relational Algebra Operations:// the relational database system's implementation uses formulae that estimate the cost for a number of options, and selects the one with the lowest cost.


 * Reflection:** Similar to transactions, it is a big relief that these are being handled by the DBMS. Makes you appreciate it even more. :D