Infra IT Consulting logo Infra ITC
Data Analytics & BI marketinganalyticsattribution

Marketing Analytics on AWS: Connecting Ad Spend to Revenue

By Infra IT Consulting · · 8 min read

Marketing teams spend significant budgets across Google Ads, Meta, LinkedIn, programmatic display, and email — yet most organisations still cannot answer the most important question: which of those investments actually drives revenue? The gap between marketing spend and revenue attribution is not a tools problem; it is a data architecture problem. When your ad platform data lives in disconnected silos and your CRM or e-commerce platform stores transactions separately, building a unified view requires deliberate engineering effort.

This post walks through how to build a marketing analytics stack on AWS that consolidates multi-channel ad data, joins it with transactional revenue data, and surfaces actionable attribution insights to marketing and growth teams.

The Attribution Problem and Why Most Setups Fail

Standard platform-reported metrics are unreliable for cross-channel attribution. Google Analytics and each ad platform apply their own attribution models — last-click, first-click, or data-driven — and they all claim credit for the same conversion. A customer who saw a LinkedIn ad, clicked a retargeting display ad, opened an email, and then converted through a Google Search ad will appear as a conversion in all four platforms. Summing platform-reported conversions routinely inflates reported performance by 30–60%.

The solution is to build attribution on your own data: server-side conversion events paired with a unified customer identity. AWS provides the building blocks to do this at scale without proprietary lock-in.

Ingesting Multi-Channel Marketing Data

The first step is centralising raw data from every ad platform and your transactional systems into Amazon S3 as your data lake. For ad platforms, you have two practical options:

Managed connectors via AWS Glue or a third-party ELT tool (Fivetran, Airbyte) pull data from the Google Ads API, Meta Marketing API, and LinkedIn Campaign Manager API on a daily or hourly schedule. These connectors handle OAuth refresh, pagination, and API versioning so your engineering team does not maintain platform-specific integration code.

Event streaming for high-frequency data. If you run your own website or app, push server-side click and impression events to Amazon Kinesis Data Streams. A Kinesis consumer (typically an AWS Lambda function or Kinesis Data Firehose) writes these events to S3 in Parquet format, partitioned by date and event type. This is the canonical approach described in detail in the post on real-time dashboards with Kinesis.

Your S3 structure should separate raw ingestion from cleaned, query-ready data:

s3://your-data-lake/
  raw/
    google-ads/year=2024/month=04/day=27/
    meta-ads/year=2024/month=04/day=27/
    crm-orders/year=2024/month=04/day=27/
  curated/
    marketing/ad_performance/
    marketing/attributed_conversions/

Transforming and Modelling with AWS Glue and dbt

Raw ad platform data is messy. Metric definitions differ across platforms (Google’s “conversions” ≠ Meta’s “results”), campaign hierarchies vary, and currency and timezone inconsistencies are common. The transformation layer is where you standardise everything.

AWS Glue jobs handle the heavy lifting: reading raw JSON or CSV files from S3, applying schema normalisation, deduplicating records (ad APIs frequently return overlapping data across pulls), and writing Parquet output to the curated layer. A Glue Crawler then updates the AWS Glue Data Catalog so Amazon Athena can query the curated data immediately.

On top of the Athena-queryable curated layer, dbt models define your marketing semantic layer. A well-structured dbt project for marketing analytics typically includes:

  • Staging models — one per source (stg_google_ads_campaigns, stg_meta_ads_campaigns) that standardise column names and types
  • Intermediate models — unified ad performance (int_ad_performance_daily) that unions all platform data into a single spine
  • Mart models — fact_attributed_conversions and dim_campaigns that business users and BI tools query directly

For a deeper look at dbt on AWS, see the post on dbt and AWS analytics engineering.

Building the Attribution Model

Attribution is where the technical depth matters most. A simple last-touch attribution model in SQL looks straightforward, but production attribution requires handling several edge cases.

Here is a simplified multi-touch attribution query using Amazon Athena that assigns equal fractional credit across all touchpoints in a conversion window:

WITH touchpoints AS (
  SELECT
    customer_id,
    session_id,
    channel,
    campaign_id,
    event_time,
    LEAD(event_time) OVER (
      PARTITION BY customer_id ORDER BY event_time
    ) AS next_event_time
  FROM curated.marketing.customer_touchpoints
  WHERE event_date >= DATE_ADD('day', -30, CURRENT_DATE)
),
conversions AS (
  SELECT
    customer_id,
    order_id,
    revenue,
    converted_at
  FROM curated.crm.orders
  WHERE order_date >= DATE_ADD('day', -30, CURRENT_DATE)
),
attributed AS (
  SELECT
    t.channel,
    t.campaign_id,
    c.order_id,
    c.revenue,
    COUNT(*) OVER (
      PARTITION BY c.order_id
    ) AS total_touchpoints,
    c.revenue / COUNT(*) OVER (
      PARTITION BY c.order_id
    ) AS attributed_revenue
  FROM conversions c
  JOIN touchpoints t
    ON t.customer_id = c.customer_id
    AND t.event_time <= c.converted_at
    AND t.event_time >= DATE_ADD('day', -30, c.converted_at)
)
SELECT
  channel,
  campaign_id,
  COUNT(DISTINCT order_id) AS attributed_conversions,
  ROUND(SUM(attributed_revenue), 2) AS attributed_revenue
FROM attributed
GROUP BY 1, 2
ORDER BY attributed_revenue DESC;

This linear attribution distributes revenue equally across all touchpoints within the 30-day look-back window. You can extend the model to time-decay (weighting touchpoints closer to conversion more heavily) or position-based (40% first, 40% last, 20% middle) by adjusting the window function logic.

Visualising Marketing Performance in Amazon QuickSight

With attributed conversion data modelled in Athena, Amazon QuickSight connects directly to Athena as a data source via QuickSight SPICE or direct query mode. Marketing dashboards typically need:

  • Channel ROAS (Return on Ad Spend) — attributed revenue divided by platform spend, by channel and campaign
  • CPL and CAC by channel — cost per lead and customer acquisition cost using your own attribution rather than platform-reported numbers
  • Conversion path analysis — most common touchpoint sequences for high-value customers
  • Budget pacing — daily spend vs. monthly budget targets with alerts when campaigns overspend

QuickSight’s ML-powered anomaly detection can surface unusual drops in attributed conversions or sudden spikes in CPC without requiring a separate alerting setup. For detailed QuickSight configuration guidance, the Amazon QuickSight guide covers dataset setup, SPICE refresh scheduling, and dashboard sharing with external stakeholders.

Practical Considerations for Canadian and Multi-Region Teams

If your business serves customers in Canada, the UK, or Africa, your attribution pipeline needs to handle multi-currency normalisation (exchange rates pulled daily and applied at the fact table level), GDPR and CASL compliance for customer identity resolution (hashed email rather than raw PII in the data lake), and time zone alignment when comparing campaigns running across regions.

Store exchange rates in a small reference table in Amazon RDS or DynamoDB, join them into your attribution mart at transformation time, and report in a single base currency. For compliance, apply SHA-256 hashing to email identifiers in your Glue transformation jobs before writing to S3 — never land raw PII in your data lake unless you have explicit data processing agreements and proper access controls via AWS Lake Formation.

Conclusion

A purpose-built marketing analytics stack on AWS eliminates the double-counting and platform bias that plague standard marketing measurement. By centralising raw ad data in S3, standardising it with Glue and dbt, computing attribution in Athena, and visualising results in QuickSight, your marketing and finance teams share a single, trustworthy view of what is actually driving revenue.

Building this properly requires data engineering expertise across ingestion, transformation, and modelling layers. If your team is struggling to connect ad spend to revenue or needs a scalable attribution infrastructure, get in touch with Infra IT Consulting to discuss a technical assessment and implementation roadmap.

Related posts