Infra IT Consulting logo Infra ITC
Industry Use Cases africatelecomaws

Data Engineering for African Telecom Operators: Scale, Cost, and Connectivity

By Infra IT Consulting Β· Β· 9 min read

African mobile network operators (MNOs) face a data engineering challenge that has no direct parallel elsewhere in the world. An operator with 20 million subscribers in West Africa may generate 2 billion call detail records (CDRs) per month, serve customers across regions with wildly uneven connectivity, operate mobile money platforms that process more transactions per day than many commercial banks, and do all of this on infrastructure budgets that are a fraction of what European or North American operators spend per subscriber.

Building data platforms for African telecoms requires genuine understanding of these constraints. Architectures that work well for a European operator β€” always-on connectivity to cloud, abundant compute budgets, homogeneous customer device profiles β€” fail in the African context for reasons that are entirely predictable and entirely solvable if the design accounts for them from the start.

This post covers the data engineering patterns Infra IT Consulting uses for African MNOs: CDR processing at scale with Kinesis and EMR, cost optimisation strategies appropriate for African AWS regions, mobile money analytics, and edge processing considerations for connectivity-constrained environments.

CDR Processing at Scale with Kinesis and EMR

Call detail records are the foundational data asset of any telecom operator. Every voice call, SMS, data session, and mobile money transaction generates a CDR containing subscriber identifiers, timestamps, duration, volume, network node identifiers, and charging information. At the volumes generated by a large African operator β€” MTN Nigeria alone has over 70 million subscribers β€” CDR processing is a genuine big data problem.

The real-time path uses Amazon Kinesis Data Streams. Mediation systems on the network side format CDRs as JSON or ASN.1-encoded records and stream them to Kinesis via a producer agent running on the mediation servers. The Kinesis stream is configured with enough shards to sustain peak throughput β€” for a 20-million-subscriber operator, typical peak is around 50,000 CDRs per second during evening traffic peaks. Kinesis Data Firehose buffers and delivers these records to S3 in Parquet format, partitioned by event_date, event_hour, and record_type (voice, data, SMS, USSD, mobile money).

The batch processing path uses Amazon EMR. For an African MNO, the most computationally intensive regular batch job is the monthly billing reconciliation: joining CDRs against the rating engine output, applying tariff plans per subscriber segment, calculating roaming charges from partner operator data files, and producing invoices for postpaid accounts and reconciliation reports for prepaid top-ups. EMR on EC2 using Spot Instances executes this job at a cost that is typically 60–70% lower than equivalent on-demand compute β€” a meaningful saving at African operator budget levels.

A practical EMR configuration for CDR batch processing uses a master node (m5.xlarge, on-demand for availability) and a core fleet of r5.4xlarge Spot Instances for memory-intensive join operations. Checkpointing intermediate results to S3 ensures that a Spot interruption does not require restarting the entire job from scratch.

For related batch and streaming pipeline patterns, see Real-Time Data Streaming with Kinesis.

Cost Optimisation for African Deployments

Cloud costs are not a minor consideration for African telecom data teams β€” they are often the primary constraint that determines whether a data platform gets built at all. AWS pricing is expressed in USD, while operator revenue is in local currencies (NGN, GHS, KES, ZAR) that are subject to ongoing devaluation pressure. A platform that is economically viable when the NGN/USD rate is 800 may become unaffordable at 1,500 if cost controls are not built in from the start.

Several strategies specifically relevant to African telecom deployments:

S3 Intelligent-Tiering for CDR archives. CDRs older than 90 days are rarely queried except for dispute resolution and regulatory audits. S3 Intelligent-Tiering automatically moves infrequently accessed objects to lower-cost storage tiers β€” Infrequent Access (62% cheaper than Standard) and Glacier Instant Retrieval (68% cheaper) β€” without requiring the engineering team to manage lifecycle rules per object type. For an operator storing 50TB of CDR history, this alone reduces annual storage costs by roughly $15,000 USD.

EMR Spot Fleet for batch workloads. All non-latency-sensitive batch jobs β€” CDR rating, network quality reporting, subscriber analytics β€” run on Spot Instances. A diversified instance fleet that bids across r5.2xlarge, r5.4xlarge, r4.4xlarge, and m5.4xlarge simultaneously achieves Spot availability rates above 95% while maintaining the cost discount.

Redshift pause and resume. Analytics clusters used by the business intelligence team are paused outside business hours (17:00–08:00 local time, weekends). For a cluster running 65 hours per week instead of 168, this reduces cluster compute costs by 61%.

AWS region selection. AWS does not yet have a region on the African continent north of Cape Town (where af-south-1 is located). Operators in West Africa typically use eu-west-1 (Ireland) or eu-west-3 (Paris) for latency reasons, while East African operators prefer eu-central-1 (Frankfurt) or me-south-1 (Bahrain). South African operators use af-south-1. The region choice affects both latency and pricing β€” af-south-1 carries a premium compared to European regions, which is a trade-off that must be evaluated against data residency requirements and latency budgets.

For a full treatment of AWS cost optimisation patterns, see AWS Cost Optimisation for Data Teams.

Mobile Money Analytics

Mobile money is the defining financial services innovation of sub-Saharan Africa. M-PESA in Kenya, MTN MoMo across fourteen African markets, Airtel Money, and Orange Money collectively process hundreds of millions of transactions per day. For operators running these platforms, transaction analytics is not a reporting function β€” it is a fraud detection and regulatory compliance function that operates under real-time pressure.

The mobile money analytics pipeline on AWS uses a separate Kinesis stream for USSD and API-driven transaction events, isolated from the voice/data CDR stream for both security and scalability reasons. A Kinesis Data Analytics (Apache Flink) application applies fraud detection rules in real time: transaction velocity anomalies (a subscriber sending 50 P2P transfers in 10 minutes), geographic impossibility (a transaction originating from a cell tower 500km from the subscriber’s home location 2 minutes after a previous transaction), and structuring patterns (multiple transactions just below the regulatory reporting threshold).

Flagged transactions are written to a DynamoDB table with a TTL of 7 days for the fraud operations team to review. Clean transaction records land in S3 and are loaded into Redshift for regulatory reporting β€” most African central banks require operators to submit mobile money transaction aggregates weekly or monthly, formatted to country-specific schemas.

Subscriber segmentation analytics β€” identifying high-value mobile money users for premium product targeting, detecting churning agents in the agent banking network, measuring merchant payment adoption by geography β€” runs as daily Redshift queries against the transaction history, with outputs delivered to QuickSight dashboards for the commercial team.

Edge Processing and Connectivity Constraints

Not all African operator infrastructure is connected to the cloud in real time. Remote base stations in rural Nigeria, DRC, or Tanzania may have satellite backhaul with 600ms latency and 1Mbps throughput β€” enough for operational traffic but inadequate for streaming CDR data to the cloud continuously. Edge processing at the base station controller or regional aggregation point becomes necessary.

AWS IoT Greengrass provides a managed runtime for edge compute that synchronises with AWS IoT Core in the cloud. For telecom use cases, a Greengrass deployment on a regional mediation server can buffer CDRs locally during connectivity outages, compress them (CDRs compress at 8:1 ratios with gzip), and batch-upload to S3 when connectivity is restored β€” with deduplication logic to handle records that may have been partially transmitted. This pattern maintains data completeness even on unreliable backhaul links.

AWS DataSync accelerates file transfer from on-premises network operations centres to S3, using a multi-threaded transfer protocol that saturates available bandwidth far more efficiently than naive FTP or SFTP transfers. For operators with large historical CDR archives that need to be migrated to S3 for analytics access, DataSync is typically 5–10x faster than alternatives on the same link.

Conclusion

African telecom data platforms must be engineered for the specific constraints of the market: massive CDR volumes, cost sensitivity driven by currency dynamics, mobile money fraud detection requirements, and connectivity constraints that demand edge-aware architectures. Generic cloud data platform templates designed for European or North American operators fail these requirements predictably.

Infra IT Consulting works with African MNOs and their technology partners to design data platforms that are cost-efficient, operationally resilient under connectivity constraints, and capable of supporting the real-time fraud detection and regulatory reporting obligations that African regulators increasingly require. Get in touch to discuss your telecom data engineering requirements.

Related posts