The goal: Compare two tables

When switching from one source of geo data to a different source which provided the data in a more suitable structure I wanted to get an overview of how similar the result was.

I was looking for a simple SQL snippet which I could run directly in the command line client psql.

About the case

In this case I had already a database which contained all zip codes for Germany together with names. There was no information about the name though, so that it was not possible to distinguish a city from a district. Since the data was based on the OpenGeoDB I tried to find a different export which would provide mode information. With this additional information it should be possible to clearly find out which name is the city name for a given zip code and which other names exist.

After implementing everything and importing the new data, I wanted to find out how big the difference was to prevent that there would be any late surprise with the new data.

Use a full outer join

One nice option which I found is to use a full outer join and limit the result to the rows where the columns in one of the tables would be NULL. This has the benefit of giving a nice side-by-side view.


SELECT old.zip_code, new.zip_code
  FROM old_geo_table AS old
       FULL OUTER JOIN new_geo_table AS new
       ON old.zip_code = new.zip_code
 WHERE old.zip_code is NULL
    OR new.zip_code is NULL

The result would then look somewhat like the following excerpt:

 zip_code | zip_code
 15751    |
 15758    |
 15758    |
 15758    |
          | 08344
          | 99820
          | 71631
          | 65926

This is really nice, especially using SELECT * puts the data side by side and one gets an immediate impression of the difference.

Using except

Another nice option which I discovered afterwards is to use EXCEPT. Its usage is straight forward:

SELECT zip_code from old_geo_data
SELECT zip_code from new_geo_data

The output looks like the following example:




comments powered by Disqus