Dealing with Database coupling / 开发进行时...

Dealing with Database coupling

The business world runs on information systems, and yet we just can't get it right when it comes to building them. we learn about modularity and coupling at an early age, but to most of us the notion appears to apply only to coding, so when a database comes into the mix, we don't think twice about the fact that all of our modules will be dependent on once central storage structure.

Here's a hypothetical example. Say a startup supplier company XYZ build an information system to handle customer information, inventory, financial reporting, and payroll. All of these components are modules of the system, but due to the lazy nature of developers (why do you think we strive to get computers to do everything for us?) the data storage is handle by one central database. A while later additional modules are built to satisfy new business requirements are the company expands,
and these modules further grow the database structure.

Now suppose a small change is introduced into the schema. Let's say that the phone number field got expanded because the firm now deals with customers overseas. This tiny change will cost the company gigantic money. Why? Because the source code(both business and presentation logic) in all modules is now in need of attention, the data storage and retrieval procedures requires changes, all of the existing automated tests are now bogus, and the documentation needs an update. You must now go through everything you have in order to find all instances where the structural change affects the system. All of this due to one small modification. If the initial schema was badly made, expect the maintenance cost of the perfectly modulated system to skyrocket.

It is surprising to me that most system built today still go with the central data management module. I guess we never learn, but if we did, how would one avoid database coupling? There are a few ways to deal with it.

You can't effectively "modulate" a database schema by simply creating multiple databases for the system modules. The complexity of such a system would be enormous since any particular module will need to work with several databases at once, unless you duplicate the information, in which case you must ensure coherence among the copies. All in all, it would be a gigantic mess.

One solution would be to put together multiple databases without data duplication, and cap it with a central data retrieval system. All storage and retrieval commands would have to go through this module, after which they will be spread through the databases. A module can be built to work with the database it needs, so changes to other database structures won't affect it. Any changes will require tweaking the storage and retrieval module as well as the source code of the module affected. It isn't a perfect solution, but it isn't the apocalypse version.

Another way would be to use a central database and a number of local data storages. A module can have a minimal duplicate data set stored locally for frequent operations and the main database would include a trigger to update the local data set if it is updated in central storage, and vice versa.[反之亦然] This is a good solution if the various system components mostly use unique parts of the database for standard operation.

My final proposition, the wildcard, is to use an entity-attribute-value database schema. If done right, this type of implementation is very forgiving when it comes to change. Since there is no "final" schema at any point of its existence, the modules are built in accordance with this constraint, so changes are absorbed much more easily. The cost of developing a system with a EAV model is usually higher, at least it was for the two projects I directed, but if you anticipate constant evolution of you model, the future savings a worth it.

If you are already stuck in a couping mess, the best you can do is make sure additional modules are built using one of the techniques mentioned above, and don't forget to learn from your mistakes.