Google Maps provides several useful APIs for accessing data: a geocoding API to convert addresses to latitude and longitude, a search API to provide locations matching a term, and a details API for retrieving location metadata.
For many mapping tasks it is valuable to get a large list of locations (restaurants, churches, etc) – since this is valuable, Google places a rate limiter on the information, and encourages caching query results.
You can load a specific area of a map – the best way to find the starting point for the latitude and longitude is to enter an address in a geocoding API:
map = new google.maps.Map(document.getElementById('map-canvas'), {
mapTypeId: google.maps.MapTypeId.ROADMAP,
center: new google.maps.LatLng(curLat, curLong),
zoom: 15,
styles: [
{
stylers: [
{ visibility: 'simplified' }
]
},
{
elementType: 'labels',
stylers: [
{ visibility: 'off' }
]
}
]
});
To run a search, you can use the radarSearch API, which appears to return up to 200 results. However, this only returns latitudes and longitudes – not place names or anything you’d really want to a full application.
google.maps.event.addListenerOnce(map, 'bounds_changed', performSearch);
function performSearch() {
var request = {
bounds: map.getBounds(),
keyword: 'church'
};
service.radarSearch(request, callback);
}
Once that finishes, it runs a callback – in this we save off the results so far, and set up a timer to get the full address of each entity. I determined experimentally that the Maps API won’t let you run a query more than once every two seconds – this adds a little extra lag because I’d rather the script continue than risk an error being slightly too soon.
function callback(results, status) {
for (var i = 0, place; place = results[i]; i++) {
createMarker(place);
setTimeout(loadPlace, 2200 * i);
}
Each “place” is hydrated using the getDetails function on the maps API, then saved back to a server:
function loadPlace() {
place = places[placeIdx++];
service.getDetails(place,
function(result, status) {
if (status !=
google.maps.places.PlacesServiceStatus.OK) {
return;
}
$.post(
"save.php",
{text: JSON.stringify(result)},
function() {
next();
});
});
}
This requires a simple PHP file- the results can be extracted later or used as a cache.
$text = $_POST['text'];
$json = json_decode($text, true);
$id = md5($text);
file_put_contents('db/' . $id, $text);
Up to this point, we only have the ability to script a specific segment of a map – in reality we likely want to loop back and forth across an area. I found a bounding box that encompasses Philadelphia and the surrounding counties relatively well experimentally, by loading the map in several areas until I found good edges.
Interestingly, Google Maps does not seem to have the same scale for latitude and longitude, as I found about one map unit area to be about 20x longitude as latitude (ideally this is slightly smaller than one box – this gives a little overlap and record a few entries twice)
var minLat = 39.873;
var minLong = -75.483;
var maxLat = 40.453;
var maxLong = -75.163;
var dLat = 0.01;
var dLong = 0.2;
Finally, we need to define a function which moves the current map location over to the right or down, back and forth, until we read the entire area we want:
function next() {
if (placeIdx >= places.length) {
curLat += dLat;
if (curLat > maxLat) {
curLong += dLong;
curLat = minLat;
}
if (curLong <= maxLong) {
setTimeout(initialize,
Math.max(
2100,
2100 * (places.length - placeIdx)));
}
}
}
This function must be called in a few places- anywhere there could be an error or a finished task which would otherwise stop the script. If we don't do this, it will stop partway through:
if (status != google.maps.places.PlacesServiceStatus.OK) {
placeIdx = 1000000;
next();
return;
}
places = results;
if (!results) {
next();
return;
}
if (results.length == 0) {
next();
return;
}
Interesting approach. How do you plan on using it? I still object to the title. “Scraping” would mean getting the data from a regular website built for humans. Your are using an API.
Fair enough. Scraping usually entails something like “extracting the contents of a div”, usually involving some pain. While this is more of an API processing thing, it’s an API that appears to be designed to torture the person using it so there are some overlapping concepts.
Aha awesome Gary 🙂
I am Praveen from India and I have been searching for some powerful GOOGLE MAPS EMAIL EXTRACTOR and while searching, I saw you were talking about the extraction over here.
I am looking to get emails of customers based upon cities/states and keywords.
Let me know the details. Thanks in advance and have a nice time ahead.