The goal: Compare two tables
When switching from one source of geo data to a different source which provided the data in a more suitable structure I wanted to get an overview of how similar the result was.
I was looking for a simple SQL snippet which I could run directly in the command line client psql.
About the case
In this case I had already a database which contained all zip codes for Germany together with names. There was no information about the name though, so that it was not possible to distinguish a city from a district. Since the data was based on the OpenGeoDB I tried to find a different export which would provide mode information. With this additional information it should be possible to clearly find out which name is the city name for a given zip code and which other names exist.
After implementing everything and importing the new data, I wanted to find out how big the difference was to prevent that there would be any late surprise with the new data.
Use a full outer join
One nice option which I found is to use a full outer join and limit the result to the rows where the columns in one of the tables would be NULL. This has the benefit of giving a nice side-by-side view.
Example:
SELECT old.zip_code, new.zip_code FROM old_geo_table AS old FULL OUTER JOIN new_geo_table AS new ON old.zip_code = new.zip_code WHERE old.zip_code is NULL OR new.zip_code is NULL
The result would then look somewhat like the following excerpt:
zip_code | zip_code ----------+---------- 15751 | 15758 | 15758 | 15758 | | 08344 | 99820 | 71631 | 65926
This is really nice, especially using SELECT * puts the data side by side and one gets an immediate impression of the difference.
Using except
Another nice option which I discovered afterwards is to use EXCEPT. Its usage is straight forward:
SELECT zip_code from old_geo_data EXCEPT SELECT zip_code from new_geo_data
The output looks like the following example:
zip_code ---------- 15758 15751
Pointers
OpenGeoDB Homepage
Download of datasets