Architect · 2022–present

Encount Data Lake

Automated survey processing for Norwegian transit

Five transit authorities. Three incompatible APIs. One unified output.

Norwegian transit authorities needed automated, weighted survey datasets from multiple raw API sources. Each authority ran its own survey program through different vendors — and needed results in a consistent, usable format for reporting and analysis.

The challenge was unifying data that came from fundamentally incompatible systems, applying statistically rigorous weighting, and delivering outputs that matched the metadata fidelity of professionally produced SPSS files.

Three incompatible upstream APIs

Decipher, WALR, and QuenchTec each with different data models, authentication schemes, and pagination patterns.

Complete SPSS metadata fidelity

Variable labels, value labels, and measurement levels preserved through every stage of the pipeline.

Time-windowed statistical weighting

Weights must account for departure times, contract periods, and seasonal patterns simultaneously.

Multi-operator contract mapping

Each authority has different contract structures that shape how data is attributed across operators.

A pipeline that carries metadata through every transformation.

A processing pipeline that ingests from 3 upstream APIs, transforms data through 4–6 pipeline stages, and applies RIM weighting with time-windowed, contract-based, and departure-limited dimensions. The pipeline runs automatically, producing clean SPSS-compatible datasets for each transit authority.

A key internal component is the DataFrameWithMeta library — a custom data structure that carries SPSS metadata (variable labels, value labels, measurement levels) alongside the raw data through every stage of the pipeline. This ensures that the output datasets have the same professional quality as hand-produced SPSS files.

The abstraction layer was the key design decision. By building DataFrameWithMeta early, metadata fidelity became a property of the data itself rather than something to retrofit at export time. Adding new transit authorities is configuration, not code.

510K+

Survey responses processed

5

Transit authorities served

3

Incompatible upstream APIs unified

Python FastAPI PostgreSQL Pandas

The pipeline serves five Norwegian transit authorities: Ruter, Brakar, AtB, Statens vegvesen (SVV), and Avinor.

Let's build something ambitious.

Have a complex problem? Let's talk.