Caffeinated Bitstream

Bits, bytes, and words.

Foreign Key Constraint Discovery

When developing database-oriented business webapps, the data model can sometimes become quite complex with many database tables referencing other database tables. To maintain referential integrity, you can (and should) declare foreign key constraints so rows cannot be deleted if other rows depend on them. However, if the application is a basic CRUD webapp that provides the user a direct, non-abstract view of the data, the user may be surprised to get a confusing constraint violation error when he or she attempts to delete a row. Such errors seldom contain useful details, so even if you wrap the error in a pretty page, the user may still be left wondering exactly why that row could not be deleted. Even if your database provides such details when a constraint is violated, an important usability question must still be asked: Why was the user given the option to delete the row in the first place, if deleting the row would result in a constraint violation?

The ideal solution would be to remove the option to delete rows that cannot be deleted, gray out the delete button, or otherwise indicate that deletion can't happen. Better yet, provide some information about why the row is important and can't be deleted yet. This requires the webapp to determine which rows can be deleted when the rows are being rendered in the web page view.

The goal is to write a routine for preemptively checking foreign key constraints. Such a routine must be very general, because we don't want to hard-code data model logic that must be maintained every time we change the schema, and prevent us from easily reusing the code in other projects. It would be great if databases supported an SQL query such as "CHECK DELETE CONSTRAINTS FROM table_name WHERE id IN (42,43,44,...)". Unfortunately, I haven't come across any such feature. Fortunately, databases do allow the schema to be examined and foreign key constraints discovered. Therefore, our problem can be handled with a simple, two-step solution: (1) At application startup time, perform foreign key constraint discovery, and (2) when delete options are being rendered, call our routine to query the deletability (deleteability?) of each row.

In Java, for instance, a schema's foreign keys can be examined in a database-neutral manner with JDBC using the DatabaseMetaData.getExportedKeys() method on each table, building a list of all foreign key relationships. (The list of tables can be queried with DatabaseMetaData.getTables().) The foreign key relationships defined with ON DELETE CASCADE must be flagged based on the presence of the importedKeyCascade flag in the DELETE_RULE.

When the webapp needs to check the deletability of displayed rows, it can call the routine with the table name and a list of id's to check. The routine can check each foreign key relationship with a SELECT to see if any foreign key constraints would be thrown if any of the specified rows were deleted. Relationships defined with ON DELETE CASCADE do not throw constraint violations for the initial delete, but the cascaded delete might violate a constraint, so the "check deletability" routine must be recursed into for each of these cascade relationships. Because the directed graph of foreign key relationships is allowed to contain circular references, the set of processed rows must be passed into each recursion and checked against to prevent infinite recursion. After processing, you should have a list of row id's that cannot be deleted, along with the reasons. Implementation of this scheme is left as an exercise for the reader. For bonus points, integrate with Hibernate so the webapp can pass a collection of persisted objects instead of row id's.

I've found that this approach results in a nice, easily reusable module that seems to work well with PostgreSQL and SQL Server. For some webapps, there may be some performance issues with executing these additional queries for every page view. However, if all the relevant criteria fields are indexed, it's probably not a big deal. Many people solve this problem by never actually deleting rows, but merely setting a "deleted" boolean flag to indicate that the rows should no longer be displayed. This can be a fine solution, except when there are business reasons for making sure the user is aware of all the outstanding dependencies.