Evolutionary database design

From HandWiki

Evolutionary database design involves incremental improvements to the database schema so that it can be continuously updated with changes, reflecting the customer's requirements. People across the globe work on the same piece of software at the same time hence, there is a need for techniques that allow a smooth evolution of database as the design develops. Such methods utilize automated refactoring and continuous integration so that it supports agile methodologies for software development. These development techniques are applied on systems that are in pre-production stage as well on systems that have already been released. These techniques not only cover relevant changes in the database schema according to customer's changing needs, but also migration of modified data into the database and also customizing the database access code accordingly without changing the data semantics.[1]

History

After using the waterfall model for a long time, the software industry has witnessed a rise in adoption of agile methods for software development. Agile methodologies don’t assume requirements to be permanent at any stage of the software life cycle. These methods are designed to support sporadic changes in contrast to waterfall design technique. An important part of this approach is iterative development, where the entire software life-cycle is run multiple times during the life of a project. Every iteration witnesses the complete software development life cycle despite the iterations being of short duration that can vary between weeks to a few months.[1]

Before the adoption of these methodologies, the entire system was designed before starting to develop the code. The same principle was applied to the database schema as well where it was considered to be derived out of the software requirements which were in turn developed by collaboration between the customer, end-users, business analysts, etc. and these requirements were not expected to change with the progress in the software development. This approach proved to be cumbersome because as time progressed, the redundancies in the existing database schema in the form of unused rows or columns were evident. This redundancy along with data quality problems went on to become a costly affair. It was concluded that the practice of not having design interleaved with construction and testing was highly inefficient.[1]

Techniques

As mentioned in the previous section evolutionary methods are iterative in nature and these methods have become immensely popular over last two decades. Evolutionary database design aims to construct the database schema over the course of the project instead of building the entire database schema at the beginning of the project. This method of database design can capture and deal effectively with the changing requirements of projects.

There are five evolutionary database design techniques that can aid developers in building their database in an iterative fashion. A brief overview about the five techniques are provided below.

Database refactoring

Refactoring is the process of making changes to the program without affecting the functionality of the program. Database refactoring is the technique of implementing small changes to the database schema without affecting the functionality and information stored in the database.[2] The main purpose of database refactoring is to improve the database design so that the database is more in-sync with the changing requirements. The user can modify tables, views, stored procedures and triggers. Dependency between the database and external applications make database refactoring a challenge.

Evolutionary data modeling

Data modeling is the technique of identifying entities, associating attributes to the entities and deciding the data structure to represent the attributes.[3] In the traditional database scenario, a logical data model is created at the beginning to represent the entities and their associated attributes. In evolutionary data modeling the technique of data modeling is performed in an iterative manner, that is multiple data models are developed, each model representing a different aspect of the database. This kind of data modeling technique is practiced in an agile environment and it is one of the main principles of agile development.[4]

Database regression testing

Whenever a new functionality is added to a system, it is essential to verify that the update does not corrupt or render the system unusable. In a database, the business logic is implemented in stored procedures, data validation rules and referential integrity and they have to be tested thoroughly when any change is implemented in the system. Regression testing is the process of executing all the test cases whenever a new feature is added to the system. test-first development (TFD) is a form of regression testing followed in evolutionary database design. The steps involved in TFD approach are,[3]

  • Before adding a new function to the system, add a test to the test case suite such that the system fails the test
  • Run the tests, either the entire set of test cases or just a subset and ensure that the newly added test does indeed fail
  • Update the function such that the test passes
  • Run the tests again to ensure that all they succeed and that the system is not broken

Configuration management of database artifacts

Configuration management is a detailed recording of versions and updates that have been applied to any system. Configuration management is useful in rolling back updates and changes which have impacted the system in a negative manner. To ensure that any updates made in database refactoring can be rolled back, it is important to maintain database artifacts like data definition language scripts, data model files, reference data, stored procedures, etc. in a configuration management system.[5]

Developer sandboxes

A sandbox is a fully functional environment in which the system can be built, tested and executed. In order to make changes to the database schema in an evolutionary manner it is ideal for every developer to have his/her own physical sandbox, copy of source code and a copy of database. In a sandbox environment the developer can make changes to the database schema and run tests without affecting the work of other developers and other environments. Once the change has been implemented successfully, it is promoted to pre-production environment where in acceptance testing is performed and after the acceptance tests succeed it is deployed into production.

Advantages and disadvantages

Advantages

  1. High quality of database design: In evolutionary database design, the developer makes small changes to the database schema in an incremental manner and this achieves a highly optimized database schema.
  2. Handling change: In a traditional database approach, a lot of time is spent in remodeling and restructuring the database when the requirements change. In evolutionary database technique, the schema of the database is adjusted periodically to keep up with the changing requirements. Hence, evolutionary database design technique is better suited in handling the changing requirements.
  3. Guaranteed working of system at all times: The evolutionary database design approach follows test-first development model, in which the complete working of a system is tested before and after implementing an update. Hence, it is guaranteed that the system always works.
  4. Compatible with software development: The IT industry is progressing towards agile method of software development and evolutionary database design ensures that data development is in sync with software development.
  5. Reduced overall effort: In an evolutionary environment only the functionality that is required at that moment is implemented and no more.

Disadvantages

  1. Cultural impediments: Evolutionary database design approach is relatively a newer concept and many well qualified data professionals still advocate the traditional approach. Therefore, most of the databases are still being designed in a serial fashion and evolutionary database design is yet to gain support and traction among experienced data professionals.
  2. Requires a learning curve: Most of the developers are more familiar with the traditional approach and it takes time to learn evolutionary design as it is not intuitive.
  3. Complex: When the database has many external dependencies, making changes to the schema becomes all the more complicated as the external dependencies should also be updated to cope up with the changes made in the database schema. With the increase in number of dependencies, Evolutionary Database Design approach becomes extremely complex.

Comparison with traditional database design

Traditional database design technique does not support changes like evolutionary database design technique.'Unfortunately, the traditional data community assumed that evolving database schema is a hard thing to do and as a result never thought through how to do it.'[1] In a way, the evolutionary design is better for application developers and traditional design is better for data professionals.[6]

Properties Traditional Database Design Evolutionary Database Design
Design Traditional databases were developed by collaboration between business analysts and users. Evolutionary Databases were designed by software developers and data professionals.
Design Issues They demonstrate some design issues in databases. Commercially available databases were developed by experienced individuals, but are now being serviced by the database and not data professionals.[6] They are developed by a close alliance of software developers and data professionals. The database evolves with the development and hence is processed by the same individuals who were responsible for development.
Approach towards change Any change requested by the user is incorporated in the logical model followed by the physical model and then tested to ensure perfect functionality.[6] Change is an integral part of evolutionary database design. Any change requested by the user is immediately implemented in the database as well as the code. The data migration scripts need to be updated as well.
Dependency on ER diagram Traditional design is methodological and due to its dependency on ER diagram and its detailed design phases such as user, logic, and physical we can track data as well as its meaning.[6] Design is interleaved between stages for evolutionary database design. Thus, the relationship between entities can change over software development cycle and so does the ER diagram.

Tools

Given below are a list of tools that provide the functionality of designing and developing a database in an evolutionary manner.

See also

References

  1. 1.0 1.1 1.2 1.3 "Evolutionary Database Design". http://martinfowler.com/articles/evodb.html. 
  2. Vial, G. (2015-11-01). "Database Refactoring: Lessons from the Trenches". IEEE Software 32 (6): 71–79. doi:10.1109/MS.2015.131. ISSN 0740-7459. 
  3. 3.0 3.1 Ambler, Scott; Sadalage, Pramod J (2006). Refactoring Database: Evolutionary Database Design. Addison Wesley Professional. ISBN 978-0-321-29353-4. 
  4. "The Principles of Agile Modeling (AM)". http://www.agilemodeling.com/principles.htm#MultipleModels. 
  5. "Evolutionary/Agile Database Best Practices". http://agiledata.org/essays/bestPractices.html. 
  6. 6.0 6.1 6.2 6.3 Data, Big; data, Thank heavens for the silicon chip: A. BRIEF history of; motor, Car hacker secrets revealed: Clutching up a tank engine in a classic; Beyond the genome: YOU'VE BEEN DECODED, again. "Evolutionary vs. traditional database design". https://www.theregister.co.uk/2007/08/06/edbd_vs_tdbd/.