Infra IT Consulting logo Infra ITC
Data Analytics & BI geospatiallocationanalytics

Geospatial Analytics on AWS: Tools and Patterns

By Infra IT Consulting · · 10 min read

Location is one of the most underused dimensions in business analytics. For organisations in retail, logistics, field services, real estate, agriculture, and public sector, where something happens is often as important as what happened and when. Yet most analytics platforms treat geography as a text field — province, postal code, city — rather than a first-class spatial dimension with radius searches, boundary intersections, and route calculations built in.

AWS has invested significantly in geospatial capabilities across its analytics stack. Amazon Location Service handles mapping, geocoding, and routing. Amazon Athena and Amazon Redshift both support spatial SQL functions natively. Amazon QuickSight renders point maps and choropleth visualisations. Together, these services enable sophisticated location intelligence without the operational burden of managing specialised GIS infrastructure.

This post covers the AWS geospatial stack, key SQL patterns for spatial analysis, and practical architecture patterns for organisations building location-aware analytics.

The AWS Geospatial Stack

Understanding which service handles which part of the problem prevents the common mistake of using the wrong tool for the job:

Amazon Location Service — managed APIs for mapping (map tiles), places (geocoding/reverse geocoding), routes (driving directions, travel time estimation), geofences (boundary monitoring), and trackers (real-time asset location). This is the right service for operational location use cases: tracking fleet vehicles, sending geofence alerts when a field technician enters a customer site, or embedding a map in your application.

Amazon Athena (with Apache Sedona / spatial functions) — Athena supports a subset of OpenGIS spatial SQL functions natively, and can be extended with Apache Sedona (formerly GeoSpark) through Athena’s custom connector framework for more complex spatial operations. Use Athena for batch spatial analysis over large datasets in S3 (e.g., finding all points of interest within 5 km of each store location, or identifying which sales territories each customer falls in).

Amazon Redshift (spatial) — Redshift has supported spatial data types (GEOMETRY, GEOGRAPHY) since 2019 and implements most of the OpenGIS standard. For organisations already using Redshift as their analytics warehouse, spatial queries can run directly against their existing data without a separate spatial database. Redshift spatial functions are particularly well-suited to point-in-polygon queries and distance calculations at warehouse scale.

Amazon QuickSight — QuickSight supports point maps, filled maps (choropleths), and heat maps. It geocodes location fields automatically using postal codes, city names, or coordinates. For standard location visualisation — plotting customer locations on a map, colouring provinces by sales volume — QuickSight handles the use case without external GIS tools.

Amazon SageMaker Geospatial — a newer service that provides managed geospatial ML capabilities including satellite imagery analysis, terrain modelling, and Earth observation data access. Relevant for agriculture, environmental monitoring, and infrastructure planning use cases.

Spatial SQL in Amazon Athena

Athena’s spatial functions follow the OpenGIS standard and operate on WKT (Well-Known Text) geometry representations. The most commonly used functions are ST_Point, ST_Distance, ST_Contains, ST_Intersects, and ST_Within.

Finding customers within a radius of store locations:

-- Find all customers within 10 km of each store
SELECT
    s.store_id,
    s.store_name,
    c.customer_id,
    c.customer_name,
    ST_Distance(
        ST_Point(s.longitude, s.latitude),
        ST_Point(c.longitude, c.latitude)
    ) / 1000.0 AS distance_km
FROM stores s
CROSS JOIN customers c
WHERE ST_Distance(
    ST_Point(s.longitude, s.latitude),
    ST_Point(c.longitude, c.latitude)
) <= 10000  -- 10,000 metres
ORDER BY s.store_id, distance_km

Note that ST_Distance on point geometries in Athena returns distances in degrees by default for GEOMETRY types, but in metres when using GEOGRAPHY types (where the spherical earth model is applied). For distance-based queries, always use GEOGRAPHY:

-- Using GEOGRAPHY for accurate metre-based distances
SELECT
    store_id,
    customer_id,
    ST_Distance(
        ST_GeographyFromText(CONCAT('POINT(', CAST(s_lon AS VARCHAR), ' ', CAST(s_lat AS VARCHAR), ')')),
        ST_GeographyFromText(CONCAT('POINT(', CAST(c_lon AS VARCHAR), ' ', CAST(c_lat AS VARCHAR), ')'))
    ) AS distance_metres
FROM store_customer_pairs
WHERE ST_Distance(...) <= 10000

Point-in-polygon: assigning customers to sales territories:

-- territories table has a 'boundary_wkt' column with polygon WKT
SELECT
    c.customer_id,
    c.customer_name,
    t.territory_id,
    t.territory_name,
    t.sales_rep
FROM customers c
JOIN territories t
    ON ST_Contains(
        ST_GeometryFromText(t.boundary_wkt),
        ST_Point(c.longitude, c.latitude)
    )

Territory boundaries can be loaded into S3 from GeoJSON files and converted to WKT using AWS Glue or a simple Python Lambda. The ST_Contains function handles the spatial join — a query that would be extremely difficult to express in standard SQL without spatial extensions.

Spatial Queries in Amazon Redshift

Redshift’s spatial implementation is similar to PostGIS and supports both GEOMETRY and GEOGRAPHY types natively. One advantage of Redshift over Athena for spatial work is that Redshift can store spatial data in its own columnar storage, enabling zone-map pruning on bounding boxes that dramatically accelerates spatial queries on large datasets.

-- Load spatial data into Redshift
CREATE TABLE store_locations (
    store_id    VARCHAR(20),
    store_name  VARCHAR(100),
    province    CHAR(2),
    location    GEOGRAPHY
);

-- Insert with geography from coordinates
INSERT INTO store_locations
SELECT
    store_id,
    store_name,
    province,
    ST_GeogFromText('SRID=4326;POINT(' || longitude || ' ' || latitude || ')')
FROM stage_store_locations;

-- Spatial join: aggregate sales by census division
SELECT
    cd.division_name,
    cd.province,
    COUNT(DISTINCT o.customer_id) AS unique_customers,
    SUM(o.amount_cad) AS total_revenue_cad
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN census_divisions cd
    ON ST_Within(c.location, cd.boundary)
WHERE o.order_date >= '2024-01-01'
GROUP BY 1, 2
ORDER BY total_revenue_cad DESC

For Canadian organisations, Statistics Canada publishes census division and dissemination area boundaries as Shapefiles, which can be converted to GeoJSON and loaded into Redshift as boundary polygons. This enables direct joins between your customer data and Statistics Canada demographic data — a powerful combination for market analysis.

Geocoding with Amazon Location Service

Raw addresses in your CRM or order management system need to be converted to latitude/longitude coordinates before you can do spatial analysis. Amazon Location Service’s Places API handles geocoding at scale:

import boto3
import pandas as pd

location = boto3.client('location', region_name='ca-central-1')

def geocode_address(address: str, country: str = 'CAN') -> dict:
    """Geocode a single address using Amazon Location Service."""
    try:
        response = location.search_place_index_for_text(
            IndexName='my-place-index',
            Text=address,
            FilterCountries=[country],
            MaxResults=1
        )
        if response['Results']:
            place = response['Results'][0]['Place']
            point = response['Results'][0]['Place']['Geometry']['Point']
            return {
                'longitude': point[0],
                'latitude': point[1],
                'formatted_address': place.get('Label', ''),
                'municipality': place.get('Municipality', ''),
                'region': place.get('Region', ''),
                'postal_code': place.get('PostalCode', ''),
                'geocode_confidence': response['Results'][0].get('Relevance', 0)
            }
    except Exception as e:
        return {'longitude': None, 'latitude': None, 'error': str(e)}

# Process a batch of addresses
df = pd.read_csv('customers_to_geocode.csv')
geocoded = df['address'].apply(geocode_address)
df = pd.concat([df, pd.DataFrame(list(geocoded))], axis=1)

Amazon Location Service uses HERE Maps data for Canada, which provides good coverage of Canadian addresses including rural routes and postal codes. Geocoded coordinates should be written back to S3 and loaded into your spatial tables in Athena or Redshift.

Pricing for Amazon Location Service geocoding is approximately $0.50 per 1,000 requests for the first 100,000 requests per month in ca-central-1. For an organisation geocoding its customer address history (say, 200,000 unique addresses), the one-time cost is approximately $100.

Architecture Pattern: Location Intelligence Pipeline

A complete location intelligence pipeline for a Canadian retailer or field services company looks like this:

[Source Systems]
  CRM (customer addresses)
  ERP (order locations, delivery addresses)
  Field App (technician GPS tracks)


[AWS Glue / Lambda]         (extract & stage to S3)


[Amazon Location Service]   (geocoding: address → lat/lon)


[Amazon S3]                 (geocoded data in Parquet)

       ├──► [Amazon Athena]     (ad-hoc spatial SQL)
       ├──► [Amazon Redshift]   (warehouse-scale spatial joins)


[Amazon QuickSight]         (point maps, choropleths, filtered by territory)

The geocoding step is typically a one-time batch for historical data, followed by incremental geocoding of new records. Storing coordinates in S3 as Parquet avoids re-geocoding the same addresses repeatedly.

Visualising Spatial Data in QuickSight

QuickSight’s map visualisations cover the most common requirements:

  • Point maps — plot customer or asset locations as dots, sized or coloured by a measure (e.g., revenue, recency)
  • Filled maps — colour geographic regions (provinces, postal code forward sortation areas) by a metric
  • Heat maps — show density of events or customers across a geographic area

For QuickSight to geocode automatically, your dataset needs a field containing either: coordinates (latitude/longitude), postal codes (QuickSight understands Canadian FSA format), province names, or city names. QuickSight resolves these to geometries internally using its built-in geocoding. For custom boundaries (sales territories, custom regions), QuickSight supports GeoJSON uploads as custom map layers.

Conclusion

Geospatial analytics on AWS is more accessible than most organisations realise. The combination of Amazon Location Service for geocoding, Athena or Redshift for spatial SQL, and QuickSight for visualisation covers the majority of business location intelligence use cases without requiring a dedicated GIS platform or specialised GIS expertise.

For organisations with more advanced needs — satellite imagery analysis, real-time asset tracking, complex terrain modelling — Amazon SageMaker Geospatial and Amazon Location Service Trackers extend the stack into territory traditionally served by specialised GIS vendors.

Whether you are trying to understand your customer density by region, optimise field service territories, or identify store locations underserved by your current network, geospatial analytics on AWS provides the infrastructure to answer those questions at scale.

Infra IT Consulting has implemented geospatial analytics solutions for clients in retail, logistics, and field services across Canada and the UK. Contact us to discuss your location intelligence requirements.

Related reading:

Related posts