Geoloc: fast geolocation from the command line
I wrote geoloc to help me query my access logs, fast.
geoloc is a command line tool for bulk geolocation queries written in C++. Once its binary database has been built, geoloc
performs geolocation queries offline.
Bulk lookup from apache access.log:
$ cat access.log | awk '{print $1}' | geoloc -f - | column -t AU 02 Sydney -33.8001 151.3123 AS1610581 BIGCableCo AU 07 Melbourne -37.8266 144.7834 AS1370775 Micronode+PTY+LTD US CA San+Francisco 37.6777 -122.2221 AS49335653 Big+Flare,+Inc
Query some IPs:
$ geoloc -q --headers | column -t
ip country region city latitude longitude as_num as_text US CA Mountain+View 37.3860 -122.0838 AS15169 Google+Inc. US CA San+Francisco 37.7697 -122.3933 AS36459 GitHub,+Inc.
is designed to run fast and load fast:
$ wc -l /tmp/ip_list
1000000 /tmp/ip_list
$ time geoloc -f /tmp/ip_list > /tmp/res1
real 0m6.131s
user 0m5.662s
sys 0m0.369s
$ time geoloc -q > /tmp/res2
real 0m0.010s
user 0m0.002s
sys 0m0.005s
The program is designed as a portable application, to run out of ~/bin
, with the database stored in ~/var/db/geoloc/geodata.bin
To install:
$ git clone && cd geoloc
$ ./configure
$ make
$ make install
The configure script will check for these dependencies:
- iconv
- unzip
- wget
- make
- c++
During installation, data will be downloaded from MaxMind to create the database.
An update script will be installed into ~/bin/
. Run this script when you would like to update your geolocation database. MaxMind updates their source data once a month.
I have tested on OSX 10.9.5 and Ubuntu 14.04. Other unices are likely to work with minimal or no changes. It is unlikely to work on windows, due to the use of mmap.
Design and Implementation
I plan to do a longer write-up on the design and implementation of the tool, to share some C++ tips and tricks.
The short version is that the code operates in two phases, packing and query. The packing phase is all about converting the data into a machine optimal format, namely relocatable sorted vectors. The query phase simply mmaps that data, and performs a std::upper_bound binary search on it to find the IPs.
There is an outline of the code, roughly in topological order here, that contains a summary of each module.
This software includes GeoLite data created by MaxMind available from
Like the article? Please follow me on twitter and check out my bio.