license: other
Whereabouts: Reference databases
This is a space containing reference databases to be used by whereabouts.
Whereabouts is a geocoding package in Python that implements some clever record linkage algorithms in SQL using DuckDB. The package itself is available at whereabouts and can be installed via
pip install whereabouts
Installation of reference databases
Once the package is installed you will need to install a geocoding database, which has been built from a country's or region's address data. This repo contains a collection of these databases for different countries and regions. Currently it has files for
Australia:
- Whole of country
- Victoria, Australia
- New South Wales, Australia
United States:
- Florida, United States
- California, United States
- Massachusetts, United States
More are being added as I get around to cleaning the data and creating the corresponding databases. The file format is
<country_abbreviation>_<states>_<size>
where <size>
is either sm
or lg
depending on whether the inverted index has been created using
pairs of consecutive tokens or trigrams. The large models can handle lower quality address data at the expense of speed.
Example (install the small Australian geocoding database)
python -m whereabouts download au_all_sm
Start geocoding
Once you have installed the package and a database you can start geocoding your data.
from whereabouts.Matcher import Matcher
addresslist = ['122 station st fairfield vic', '643-645 sydney road brsunwick', '504 sydney rd brunswick']
matcher = Matcher(db_name='au_all_sm')
matcher.geocode(addresslist, how='standard')
License Disclaimer for Third-Party Data
Note that while the code from this package is licensed under the MIT license, the pre-built databases use data from data providers that may have restrictions for particular use cases:
- The Australian databases are built from the Geocoded National Address File with conditions of use based on the End User License Agreemment
- The US databases are still work-in-progress but are based on data from OpenAddresses and so any work with whereabouts based on US address data should adhere to the OpenAddresses license.
Users of this software must comply with the terms and conditions of the respective data licenses, which may impose additional restrictions or requirements. By using this software, you agree to comply with the relevant licenses for any third-party data.