Data Engineering for Canadian Financial Services: Compliance and Scale
Canadian financial institutions operate under one of the most demanding regulatory environments in the world. Between OSFI’s escalating technology risk guidelines, PIPEDA’s personal data obligations, and FINTRAC’s mandatory transaction reporting, a data platform that can merely store and query data is not enough. It must also enforce access controls at the column level, generate auditable lineage records, and respond to regulatory queries within hours — not days.
This post walks through how Infra IT Consulting designs compliant, scalable data engineering architectures for Canadian banks, credit unions, insurance companies, and fintechs on AWS. We cover the regulatory context that shapes every design decision, the AWS services that carry the load, and the patterns that keep auditors satisfied without crippling analyst productivity.
Understanding the Canadian Financial Regulatory Stack
Before a single AWS service is chosen, the team must understand which rules apply and what they demand of the data layer.
OSFI B-10 — Technology and Cyber Risk Management is the guideline that OSFI-regulated federally regulated financial institutions (FRFIs) — banks, insurance companies, trust companies — must comply with. Its 2023 revision significantly expanded expectations around third-party and cloud risk, data residency, and incident response. For data engineering teams, the most consequential requirements are: data must be recoverable within defined RTOs and RPOs; access to sensitive data must be logged and auditable; and third-party providers (including AWS) must be subject to formal risk assessments and contractual obligations. AWS’s Canadian region (ca-central-1) combined with a properly executed Business Associate Agreement addresses the residency question; the platform design must address the rest.
PIPEDA (and Quebec Law 25) governs how personal financial data is collected, used, and disclosed. In practice this means data minimisation — you should not store more personal data than you need — pseudonymisation of PII in analytics environments, and data subject access rights that require you to locate and export an individual’s data on request. Quebec’s Law 25 goes further, requiring privacy impact assessments (PIAs) for new systems and stricter consent mechanisms.
FINTRAC requires reporting entities to submit Suspicious Transaction Reports (STRs) and Large Cash Transaction Reports (LCTRs) within tight timelines. From a data engineering standpoint, this means your pipeline must be able to identify, aggregate, and report on specific transaction patterns in near real time — a batch-only architecture simply does not meet the obligation.
Amazon Redshift for Transactional Analytics
The analytical core of most financial data platforms we build is Amazon Redshift. Its columnar storage and massively parallel processing (MPP) architecture make it the right choice for the query patterns that dominate in financial services: aggregating millions of transactions by account, date range, counterparty, or product, often joining across large dimension tables.
For a regional Canadian bank processing roughly 50 million transactions per month, a typical Redshift architecture looks like this: a raw schema holds immutable ingested data partitioned by transaction_date; a conformed schema holds cleansed, typed data with referential integrity enforced via dbt transformations; and a reporting schema exposes pre-aggregated views for regulatory and business intelligence consumers. Redshift Serverless is increasingly preferred for variable workloads — analyst query bursts during month-end close do not need to sustain provisioned capacity year-round.
Redshift Spectrum extends this further, allowing queries to reach back into the S3 data lake for historical claims data or archival transaction records without loading them into the cluster. A credit union, for example, can maintain five years of full transaction history in S3 Parquet and query it alongside hot data in Redshift without managing a separate archive cluster.
For a deeper look at data lake patterns that feed into Redshift, see our post on building a data lake on S3.
AWS Lake Formation for Column-Level Security
The single most important access control requirement in financial services data platforms is preventing analysts from seeing data they are not authorised to see — specifically SINs, account numbers, and income data — while still allowing them to run the aggregate queries their jobs require.
AWS Lake Formation addresses this with fine-grained access control that operates at the database, table, column, and row level within the AWS Glue Data Catalog. In a compliant financial data platform, the configuration looks like this:
- Column-level security masks or excludes columns containing SIN, date of birth, and full account numbers from all roles except the compliance team and automated regulatory reporting jobs.
- Row-level security (via Lake Formation data filters) restricts analysts in the retail banking division from seeing corporate banking records and vice versa.
- Tag-based access control (TBAC) allows the security team to classify columns with tags like
PII_HIGH,PII_MEDIUM, andINTERNAL, then define policies at the tag level rather than maintaining per-table ACLs.
All Lake Formation permission grants are logged to CloudTrail, giving the compliance team a complete audit trail of who was granted access to what, and when. When OSFI examiners or internal auditors request evidence of access controls, this trail is the primary artifact.
For a comprehensive treatment of Lake Formation patterns, see Lake Formation Best Practices.
FINTRAC Reporting Pipelines with Kinesis and Lambda
FINTRAC reporting requirements demand that certain transaction types be detected and reported within 30 days (for STRs) or within 15 days (for LCTRs over $10,000 CAD). Building this as a batch process introduces risk; a streaming architecture is more defensible.
A pattern we implement for credit unions and Schedule II banks uses Amazon Kinesis Data Streams to capture all transactions in real time from the core banking system via CDC (change data capture) from the transactional database. A Lambda function applies configurable rules — large cash threshold, structuring detection heuristics, high-risk jurisdiction flags — to each transaction record. Matches are written to a DynamoDB table for case management and simultaneously trigger an SNS notification to the compliance team.
The aggregation layer uses Kinesis Data Analytics (Apache Flink) for stateful computations — detecting structuring patterns requires looking at transaction sequences across a 10-day rolling window per customer, which requires stateful stream processing rather than per-record Lambda evaluation.
The final FINTRAC report submission is generated by a Step Functions workflow that queries the flagged records, formats the output to the prescribed XML schema, and delivers it via secure file transfer. Every step is logged, every report is archived in S3 with immutable object lock enabled, and the entire workflow is auditable end to end.
For related streaming patterns, see our post on real-time data streaming with Kinesis.
Audit Trail Patterns and Data Lineage
Regulatory examinations in financial services routinely ask a question that is deceptively hard to answer: “Show me exactly which source records contributed to this reported figure.” Without data lineage, answering that question requires days of manual investigation. With it, the answer is a query.
We implement lineage at two levels. At the pipeline level, AWS Glue job bookmarks and Step Functions execution histories provide a record of what ran and when. At the data level, every record written to the conformed and reporting layers carries metadata columns: _source_system, _ingested_at, _transformed_by, and _pipeline_run_id. These columns allow any reported figure to be traced back to its source record in minutes.
For immutability, S3 Object Lock in Compliance mode prevents modification or deletion of raw ingested data for the retention period required by OSFI (typically seven years for financial records). Combined with versioning and replication to a second AWS region for disaster recovery, this gives the platform a defensible data integrity posture.
Column-level encryption using AWS KMS customer-managed keys (CMKs) adds another layer: even if an attacker gained access to the S3 bucket, they could not read the encrypted PII columns without access to the CMK. Key rotation policies are configured to comply with OSFI’s cryptographic controls guidance.
Conclusion
Building a data platform for Canadian financial services is not a matter of picking the right database and calling it done. It requires a deliberate layering of AWS services — Redshift for analytics, Lake Formation for access control, Kinesis for real-time regulatory reporting, KMS for encryption, and CloudTrail for auditability — each configured to satisfy specific regulatory obligations.
Infra IT Consulting has deep experience navigating OSFI B-10, PIPEDA, and FINTRAC requirements on behalf of Canadian financial institutions. If your team is building or modernising a financial data platform and needs to get compliance right the first time, reach out to us to discuss your requirements.
Related posts
Book a free 30-minute consultation to discuss your data engineering and analytics needs.
Talk to our team →