Edward Kahler — VP of Technology

Projects

All AWSAI/MLDevOpsLeadershipReal-Time SystemsETLRetail TechnologyIoTEvent-DrivenCloud & InfrastructureArchitectureAzurePySparkLegacy MigrationComputer VisionC++BLE / MQTTAdobe Experience ManagerHigh Concurrency.NET / C#QA AutomationHigher EducationClaudeData PlatformPythonCloud TransformationConnected Systems

AI-Augmented Technical Leadership

2024

HCE (formerly HCRC)

A practiced model for senior architects using AI as a force multiplier: owning architecture, interaction patterns, lifecycle, and security decisions while delegating implementation to AI, producing substantial throughput gains without trading away engineering judgment.

AI/MLArchitectureDevOpsAzureLeadership

Problem

Senior architects and technology leaders are not primarily constrained by design decisions. They are constrained by implementation throughput: the time between "here is the architecture" and "here is working code" that can be reviewed, tested, and validated. As AI coding tooling matured, the question became how to use it in a way that amplified the high-value work rather than creating a new class of output to babysit.

The same pattern applied to infrastructure operations. DevOps and database maintenance work at the VP and architect level involves a specific kind of friction: not deep complexity, but retrieval cost. Knowing the right Azure CLI sequence for a repeatable migration, or the relevant system tables for a SQL maintenance task, is a lookup problem disguised as an expertise problem.

Approach

Code acceleration model. The VP of Engineering and Principal Architect developed a working model for AI-assisted implementation: own the architecture, interaction patterns, resource management, lifecycle design, and security posture; stub out the functions; hand the implementation of each stub to the AI assistant. Comparative analysis of architectural approaches became substantially faster because both variants could be prototyped from stubs rather than written from scratch. API surface discovery became interactive rather than documentation-driven.

The key principle is that the architect's judgment is the input and the constraint. AI raises the output of people who already know what they are designing. It does not substitute for the upstream decisions about what to build and why.

Azure VM migration (colo to Azure). AI-assisted CLI scripting allowed the team to design and execute a repeatable, auditable VM migration from colocation infrastructure to Azure without engaging an implementation partner or navigating vendor documentation gaps. Each migration step was generated as a reviewable CLI sequence, executable in order, with no black-box automation. The result was a migration that could be rerun, audited, and documented without the overhead of an external engagement.

SQL maintenance. AI assistance shifted the mode of database maintenance work from syntax retrieval to intent expression. Providing the AI with system table context and relational structure allowed focus on what the maintenance operation needed to accomplish rather than how to write it against specific system catalog tables. The same dynamic as the CLI work: the practitioner owns the intent; the AI handles the reference lookup and boilerplate.

Outcome

At the architect level, where implementation had been the primary throughput constraint, the team observed roughly 100x acceleration on stubbed-function implementation. The gain is concentrated at the senior end of the skill distribution precisely because the upstream architectural decisions that determine whether the output is useful are still human work.

The Azure migration was completed without an implementation partner, on a timeline and at a cost that would not have been achievable through traditional documentation-driven execution.

The broader observation is directional but concrete: the DevOps and DBA functions at the principal architect level have been fundamentally changed by AI tooling. Tribal knowledge retrieval, boilerplate generation, and documentation gap-filling are largely solved problems. What remains as distinctly human work is intent, architecture, and validation judgment.

Technologies & Tools

ClaudeGitHub CopilotAzure CLISQL Server

AI-Assisted SPSS to PySpark Migration

2022

HCE (formerly HCRC)

Designed and led an AI-assisted ETL migration framework in 2022, translating years of analyst-authored SPSS syntax into cloud-native PySpark pipelines on AWS Glue, with a workflow architecture built specifically around the constraints of the AI tooling available at the time.

AI/MLETLPySparkAWSLegacy Migration

Problem

HCE's core analytics workflows ran on IBM SPSS scripts authored by higher education analysts over many years. Each script encoded deep institutional knowledge: field mappings, edge case handling, multi-dataset join logic, and institution-specific scoring models. Moving to a cloud-native AWS platform required translating that body of work into PySpark, but the volume made a manual rewrite impractical, and the institutional knowledge embedded in the scripts made a fully automated approach equally unreliable.

The challenge was designing a workflow that could handle the mechanical 70% of transformations with AI assistance while preserving human judgment for the complex 30% that required domain expertise to validate.

Approach

The migration was designed as a structured, multi-layer framework rather than a direct translation effort.

ETL config format. The core artifact was a declarative, JSON-serializable schema capturing the most common SPSS transformation patterns as four typed operations: Rename (column mapping), SimpleLookup (value mapping, replacing chains of SPSS if statements), Conditional (column assignment with PySpark expressions), and Inline (computed fields). This format covered the high-frequency transforms cleanly and gave AI tooling a precise target schema to produce against.

@register decorator escape hatch. Complex transforms that did not fit the declarative format were handled via a first-class @register pattern, allowing custom PySpark functions to execute in sequence alongside config-driven transforms. The escape hatch was intentional and architectural, not a workaround: the config did not need to be comprehensive, only reliable for the common cases.

AI-assisted conversion workflow. The cookbook document provided side-by-side SPSS/PySpark/ETL config comparisons across all transform types, designed simultaneously as a training reference for analysts and as priming context for LLM-assisted conversion. SPSS files were parsed into variable-level chunks by a purpose-built parser script, enabling serial LLM conversion within the context limits of 2022 tooling.

Pretransformation generator. A separate Glue script read Bronze S3 data directly, inferred column types from actual data, and auto-generated the type-casting section of each ETL config. This required no analyst judgment and no AI assistance, and eliminated a category of onboarding error by reading what the data actually was rather than what someone assumed it would be.

Infrastructure. Jobs ran on AWS Glue (G.1X workers, PySpark, Python 3), versioned in CodeCommit with sourceControlDetails linking commits to live Glue job definitions. CloudWatch and Spark UI instrumentation was wired in from the start. A DynamoDB metadata layer tracked client configurations and snapshot state.

Outcome

The framework produced fully functional PySpark pipelines for converted clients, including complex implementations covering GPA normalization across multiple scoring subscales, cost-of-attendance assignment, admissions index calculation with z-score normalization and test-optional weighting, financial aid award code parsing across 20 award slots, and event classification logic. The reusable Python library and cookbook remained active as shared platform assets.

The framework design also prefigured how this class of problem is solved with current AI tooling. The 2022 approach was constrained by session-scoped context windows (4-8K tokens), no cross-session persistence, and no structured output validation. The architecture was designed around those constraints: variable-level chunking, explicit cookbook priming, and a declarative config format that gave the LLM a precise and bounded output target.

A current implementation with Claude (200K+ context, persistent memory, structured output, tool use, and agentic task decomposition) would read the full SPSS file in a single context window, surface implicit variable dependencies automatically, compare generated output against known Bronze data without manual diffing, and maintain institutional context across sessions. The architectural pattern built in 2022 maps directly to that workflow; the tooling has caught up to the design.

Technologies & Tools

AWS Glue / PySparkPythonS3 (data lake)DynamoDBGitHub CopilotClaude

GrandVision Magic Mirror

2017

Second Story / Publicis Groupe

Computer vision magic mirror for GrandVision's China flagship store launch — customers without corrected vision can see themselves wearing any eyeglass frame via a real-time 3D face scan superimposed on the product catalog.

Computer VisionC++Real-Time SystemsRetail Technology

Problem

Customers who need prescription glasses face a fundamental problem when shopping for frames: they can't see clearly while trying them on, making it genuinely difficult to evaluate how they look. GrandVision's China market launch needed a centerpiece interactive experience for two Shanghai flagship stores that solved this problem — and that bridged the in-store interaction with their new ecommerce backend.

Approach

I designed and built a computer vision magic mirror that gave customers a high-fidelity view of themselves wearing any frame without needing corrected vision. A camera system scanned the customer's face; the imaging data was shared back to an associate's tablet to generate a scrubbable 3D scan with selected frames superimposed on the customer's face — merging the try-on experience with the ecommerce product catalog.

I built the C++ camera server from scratch, including prototyping materials, lighting, and networking for the mirror installation. The tablet application was a WPF-containerized app with native hooks into both the camera server and the ecommerce backend.

Performance engineering was the critical constraint. Acceptance criteria required the full head scan to complete within 15 seconds and the result to upload to the tablet within 5 seconds. Meeting this required a triple-buffer architecture: the camera ran at a high capture framerate to ensure scan quality, while a separate streaming pipeline — decoupled from the capture loop — transmitted frames to the tablet at a lower, transfer-optimized framerate. I used libjpegturbo for fast JPEG encoding, a custom UDP interface for low-latency image transfer, and thread pools to parallelize capture, encode, and send stages, keeping the camera pipeline unblocked during transmission.

Outcome

Shipped and installed at both Shanghai flagship store openings. I traveled to Shanghai to commission both locations, working on-site with GrandVision's team for the flagship store launches. Delivered through Second Story / Publicis Groupe as a consulting engagement for a major brand market entry, with direct client relationship ownership through delivery. The experience served as the centerpiece customer interaction for GrandVision's China ecommerce and retail entry.

Technologies & Tools

C++ (camera server)WPF / .NET (tablet application)libjpegturboCustom UDP transportEcommerce backend integration

Razorshop

2015

Second Story / Publicis Groupe

Proximity-aware IoT retail platform connecting BLE, MQTT, and Adobe Experience Manager for real-time personalized content delivery across global retail touchpoints.

IoTAWSBLE / MQTTAdobe Experience ManagerRetail TechnologyEvent-Driven

Problem

Second Story / Publicis Groupe's global retail clients needed a way to deliver personalized, location-aware content to customers in-store without requiring staff involvement or customers to take any manual action. The challenge was bridging physical proximity (where is this customer standing right now?) with a content management system designed for web delivery — connecting the physical store environment to a digital content layer at scale across multiple retail locations.

Approach

I architected a proximity-aware, device-agnostic IoT retail platform that used BLE beacons and MQTT messaging over AWS IoT to detect customer location and trigger personalized Adobe Experience Manager content delivery to their device in real time.

The mobile layer was built in Xamarin, enabling 70% UI code reuse across iOS and Android while preserving the ability to drop into native Java and Objective-C for environment-specific integrations. Physical location detected by BLE triggered targeted content delivery through the mobile client, which acted as the bridge between the IoT proximity layer and the AEM content engine.

The backend used MQTT pub/sub over AWS IoT for low-latency event propagation — allowing content targeting decisions to happen server-side against AEM's content tree, keeping business logic out of the device. New store configurations and content rules could be pushed without app updates.

Outcome

Platform contributed to Publicis Groupe winning Adobe Digital Marketing Partner of the Year in 2015 and 2016. Deployed across multiple client engagements as a shared consulting platform, serving as the practice's reusable infrastructure for proximity-aware retail experiences. Technical work was presented at Adobe Summit Sydney and NRF Big Show.

Technologies & Tools

AWS IoTMQTTBLE (Bluetooth Low Energy)Adobe Experience ManagerXamarin (iOS / Android)C#Node.js

T-Mobile Times Square NYE

2014

Second Story / Publicis Groupe

AWS backend handling 100,000+ concurrent sessions and 10,000+ user-generated images per hour for T-Mobile's Times Square New Year's Eve Selfie Cam with 100% uptime during peak global traffic.

AWSHigh ConcurrencyReal-Time Systems.NET / C#

Problem

T-Mobile needed a public-facing web experience for their Times Square New Year's Eve activation that would let hundreds of thousands of users simultaneously submit selfies and see them displayed in real time. The traffic profile was extreme and highly concentrated: global NYE countdown traffic means a predictable spike to maximum load with no ramp — the system had to be at full capacity the moment the experience went live, with zero tolerance for downtime during the event window. The engagement was delivered through Second Story / Publicis Groupe's consulting practice, with T-Mobile brand visibility and direct client accountability concentrated in a single, non-negotiable event window.

Approach

I architected and deployed an elastic AWS backend designed specifically for this burst-traffic profile. The image upload path used a .NET backend hosted on Elastic Beanstalk fronting S3 for direct object storage — keeping the application tier stateless and independently scalable from storage. Elastic Beanstalk's auto-scaling handled the surge load without manual intervention during the event.

The in-store experience at Times Square ran a C++ application that polled S3 for new images and pulled them into the video wall display loop, decoupling the display system from the upload backend entirely. This meant a failure in the display path couldn't back-pressure the upload pipeline, and vice versa.

Outcome

Handled 100,000+ concurrent user sessions and 10,000+ user-generated images per hour with 100% uptime throughout the event. The system performed without incident during peak global NYE traffic.

Technologies & Tools

AWS (S3, Elastic Beanstalk)ASP.NET / C#C++

NPC Identity Test Case Generation

2024

HCE (formerly HCRC)

Applied Claude to generate the complete universe of identity test cases for a DOE-required Net Price Calculator platform: a permutation space too large for human authorship, fully automatable because the underlying award logic is deterministic.

AI/MLQA AutomationHigher EducationClaude

Problem

The Net Price Calculator (NPC) is a DOE-mandated public-facing tool that all federally funded institutions must publish. Students enter academic and financial profile data to receive an estimated net price for an upcoming enrollment cycle. HCE operates in this market from an unusual position: the company runs the financial aid regimens for merit and need awards across its client institutions, providing deep visibility into the underlying calculation logic.

The award structure follows a two-axis lookup model. An Academic Index (AI) is normalized into a ranked band (AIRK). A Student Aid Index (SAI) is normalized into a need rank band (NDRK), with bands for non-filers and no-aid cases included. These two normalized indices resolve as coordinates in a 10x10 award matrix, yielding the institution-specific award for that student persona. At baseline, this produces 100 merit identities and 100 need identities. Institutions with special populations add additional permutation layers on top.

Because every student persona maps deterministically to a unique combination of academic and financial profile inputs, the full universe of valid test cases is enumerable. But enumerating it manually is not practical: the case count across merit, need, and special population permutations exceeds what a human QA process can author and maintain.

Approach

Claude was used to generate the complete permutation set for each identity dimension. Each identity maps to a static, unique academic and financial profile; Claude generated every combination within the defined universe, producing a test case for each persona with the expected award as the deterministic output.

The strategy treats the test suite as a first-class artifact rather than a byproduct of manual QA work. Because the mapping is deterministic, generated test cases can be validated against the live NPC and incorporated into CI/CD pipelines, running against every build to catch regressions across the full award matrix before deployment.

Web scraper validation was designed as the verification layer: automated browser-level tests drive each persona's inputs through the student-facing UI and confirm the returned estimate matches the expected award from the generated test case.

Outcome

Full identity coverage across the merit and need award matrices, with special population permutations included. The approach converts a test coverage problem that scales with institutional complexity into a generation problem that scales with compute time. Prior coverage at this permutation count was not feasible through manual authorship.

The CI/CD integration model means coverage does not degrade as award logic evolves: regenerating the test suite for an updated regimen is the same operation as the initial generation, with the same cost.

Technologies & Tools

ClaudePythonCI/CD

HCE Cloud & ETL Platform

2022

HCE (formerly HCRC)

Multi-year transformation from legacy on-premise infrastructure to a cloud-native AWS data platform — eliminating $200K annual CapEx, reducing critical process latency by 80%, and establishing the shared data layer for all analytics, reporting, and AI/ML capabilities.

ETLAWSData PlatformPythonCloud Transformation

Problem

HCE's data operations ran on legacy on-premise infrastructure with a $200K annual CapEx refresh cycle. Data processing relied on SPSS, Excel, and shared drives — single-threaded compute, long runtimes, ad hoc scripts with limited visibility, and consultant-heavy manual workflows that couldn't scale with growing client and product volume. There was no shared data layer: each product and reporting cycle operated in isolation, with no auditability, no reprocessing capability, and no path toward AI/ML readiness.

Approach

I spearheaded a multi-year cloud-native transformation covering both infrastructure migration and a shared data engineering platform built on top of it. The two efforts ran in parallel: infrastructure migration moved everything to AWS and eliminated the CapEx cycle; the ETL platform became the shared data foundation for all analytics, reporting, and downstream product capabilities.

The data model follows a Bronze/Silver/Serving layer architecture. Bronze ingests faithfully from source (SurveyMonkey APIs, Salesforce/Slate exports, census datasets) partitioned by client, context, and date. Silver normalizes, schema-validates, and loads into Redshift Serverless via metadata-driven pipelines. The Serving/Contract layer exposes views and exports for Tableau, APIs, and downstream consumers — decoupled from ingestion logic to enable backfills and controlled schema evolution without breaking downstream.

The most architecturally significant piece was Dynamic Schema Management: attribute-level metadata, context-aware schemas, and versioned data contracts that enable dynamic Redshift table creation and validation without hardcoded schemas. This replaced brittle SPSS text-matching with stable identifier-based mappings. I authored shared Python libraries — API fetch, pagination/retry, JSON normalization, schema enforcement, Redshift load/merge helpers — used across all pipelines, so new workflows are built through composition rather than reinvention.

All infrastructure is standardized on AWS CDK with composable stacks: environment-safe deployments across dev/test/prod, clear separation of infrastructure and pipeline logic, and zero environment drift.

Outcome

Eliminated the $200K annual CapEx cycle (reduced to $4K/month OpEx). Reduced critical business process latency by 80% (2 hours → 10 minutes) through automated pipelines. Parallelized Glue and Redshift workloads materially reduced Snapshot and Census generation runtimes. Platform is the shared foundation for all current analytics products and is positioned for AI/ML workloads without re-architecture.

Technologies & Tools

AWS Glue / PySparkAWS Step FunctionsRedshift ServerlessS3 (data lake)LambdaDynamoDBAWS CDKPython

DevOps Pipeline Modernization

2019

HCE (formerly HCRC)

Three-stage delivery toolchain migration over six years: locally hosted TFS to Azure DevOps to GitHub, establishing modern CI/CD pipelines and shared delivery standards across a distributed 25+ person engineering organization.

DevOpsLeadershipCloud & Infrastructure

Problem

HCE's engineering workflow was anchored to a locally hosted Team Foundation Server instance — on-premise source control and build tooling that required infrastructure maintenance, limited remote access, and could not support a distributed delivery model. As the organization scaled an offshore team and began modernizing toward cloud-native delivery, the on-premise toolchain became a constraint on team velocity, code review discipline, and operational reliability.

Approach

The migration happened in three stages, each driven by the direction the broader technology platform was heading.

Stage 1 — TFS to Azure DevOps. Moved source control and CI pipelines from the locally hosted TFS instance to Azure DevOps, eliminating the on-premise dependency and enabling cloud-hosted build pipelines. This was the first step in treating the delivery toolchain as a managed service rather than owned infrastructure.

Stage 2 — Azure DevOps to AWS CodeCommit. As the primary cloud platform consolidated on AWS, source control moved to AWS CodeCommit to align with the broader cloud strategy and reduce cross-cloud toolchain complexity.

Stage 3 — AWS CodeCommit to GitHub. GitHub became the standard for source control and CI/CD across the full organization, including the offshore development team. Established shared branching standards, pull request workflows, and automated build pipelines. GitHub's ecosystem coverage for integrations, code review tooling, and Actions-based CI consolidated previously fragmented delivery practices across onshore and offshore staff.

Outcome

Modern, cloud-hosted CI/CD pipelines across all engineering teams. The 20+ person offshore team operates on the same delivery toolchain and standards as onshore staff, with shared branching discipline and automated build validation in place before review. Each stage reduced operational overhead from the previous one — no on-premise toolchain infrastructure remains. The resulting delivery foundation supports blue/green and canary deployment patterns across the production environment.

Technologies & Tools

GitHubGitHub ActionsAzure DevOpsAWS CodeCommitTFS (Team Foundation Server)

HCE Infrastructure Transformation

2019

HCE (formerly HCRC)

Two-stage infrastructure migration over six years: on-premise to colocation, then colocation to cloud-native AWS, eliminating a $200K annual CapEx cycle and reducing operational overhead to $4K per month.

Cloud & InfrastructureAWSLeadershipDevOps

Problem

HCE operated on aging on-premise infrastructure with a $200K+ annual CapEx hardware refresh cycle. Physical hardware dependencies created operational risk, limited scalability, and locked the organization into a cost model that could not flex with growth or contraction. The path to cloud-native architecture could not happen in a single step — the infrastructure had to move in stages while maintaining continuity for active client workloads at each transition point.

Approach

Stage 1 — On-Premise to Colocation. The first transition moved infrastructure from on-site hosting into a managed colocation facility. This step removed hardware ownership risk, improved physical redundancy, and established managed connectivity — creating a stable intermediate state before the subsequent cloud migration.

Stage 2 — Colocation to Cloud-Native AWS. The second transition retired the colocation environment and rebuilt all workloads on AWS cloud-native services. Owned the strategy, vendor approach, and execution end-to-end. Infrastructure was rebuilt as code using AWS CDK and Terraform, replacing equivalent on-premise workloads across compute, database, storage, and networking with cloud-native counterparts.

Both transitions were executed without engaging an implementation partner and while maintaining continuity for business-critical production systems.

Outcome

Eliminated the $200K annual CapEx infrastructure cycle, reducing operational overhead to approximately $4K per month. Established the cloud-native foundation that the HCE data platform, ETL pipelines, and NPC modernization were subsequently built on top of. Infrastructure as Code across all environments eliminated configuration drift and enabled repeatable, auditable deployments.

Technologies & Tools

AWS (EC2, RDS, S3, VPC, CloudFront)AWS CDKTerraform

SFO Terminal 3 Flight Deck

2014

Second Story / Publicis Groupe

Three-tier integrated digital experience for SFO Terminal 3 connecting a large-scale beacon projection, six multi-touch kiosks, and a mobile takeaway for 35M+ annual passengers. Won 1st Place Creative Innovations Award.

Connected SystemsReal-Time SystemsIoTEvent-Driven

Problem

San Francisco International Airport needed a large-scale digital experience for Terminal 3 that connected multiple independent display systems — a major beacon projection, interactive kiosks, and passenger mobile devices — into a single coherent experience. The challenge was real-time synchronization across physically separate systems in a high-traffic public venue serving 35M+ annual passengers, where failure is visible and recovery needs to be seamless. The project was a Second Story / Publicis Groupe consulting engagement for SFO, a permanent public infrastructure installation with direct client relationship ownership across architecture, delivery, and on-site commissioning.

Approach

I architected a three-tier integrated system with a real-time synchronization layer connecting all components: a large-scale beacon projection visible across the terminal, six multi-touch interactive kiosks positioned throughout the space, and a mobile takeaway component passengers could use on their own devices.

The synchronization layer kept the three tiers in state: content shown on the kiosks reflected what was active on the projection, and the mobile experience could branch from where a passenger left off at a kiosk. The architecture isolated failure boundaries so that a kiosk issue couldn't affect the projection or mobile path.

Outcome

Installed and operational at SFO Terminal 3. The project won 1st Place Creative Innovations Award at the 2014 Peggy G. Hereford Award ceremony.

Technologies & Tools

Real-time synchronizationMulti-touch kiosk systemsBeacon / projection controlMobile (iOS)Node.js