Apache Flink interview questions for data engineer roles at streaming-heavy companies (Netflix, Uber, Lyft, Pinterest, Twitter/X, AWS managed Flink customers). 35+ questions covering Flink architecture (TaskManagers, JobManagers, slot- based parallelism), stateful streaming (RocksDB state backend, checkpointing, savepoints), exactly-once semantics (two-phase commit, transactional sinks), event-time processing (watermarks, allowed lateness, side outputs), and Flink SQL. Pair with the data engineer interview prep guide and the streaming data engineer interview guide.
From 124 reported streaming data engineer loops in 2024-2026.
| Topic | Test Frequency | Depth Expected |
|---|---|---|
| Exactly-once via two-phase commit | 94% | How it differs from at-least-once + idempotent |
| State backend choice (heap vs RocksDB) | 82% | When each is right; state size implications |
| Checkpointing and savepoints | 87% | Frequency tuning, incremental checkpoints, recovery |
| Event-time and watermarks | 89% | Watermark generation strategies, allowed lateness |
| Window types (tumbling, sliding, session) | 76% | When to pick each, window assigners |
| Keyed state vs operator state | 67% | ValueState, ListState, MapState, broadcast state |
| Backpressure handling | 58% | Detection via metrics, mitigation strategies |
| Job parallelism and slot allocation | 62% | Sizing TaskManagers and slots |
| Flink SQL (Table API) | 44% | When to use vs DataStream API |
| Side outputs for late data | 53% | Routing late events to dead-letter |
| Async I/O for external lookups | 37% | Pattern for enriching from external services |
| Connectors (Kafka, Kinesis, Iceberg, JDBC) | 71% | Connector-level exactly-once semantics |
| Schema evolution in stateful jobs | 41% | POJO migration, Avro evolution |
| Flink on Kubernetes vs YARN vs standalone | 39% | Deployment trade-offs |
Flink's exactly-once is end-to-end exactly-once via two-phase commit between source and sink. The source provides replayable offsets (Kafka). The processing produces deterministic output. The sink participates in a transaction that commits atomically with the offset commit. On failure, the entire transaction rolls back; on retry, the same input produces the same output, committed atomically.
Most candidates can recite this. Few can explain the practical implications. The cost of true exactly-once: latency increases (transactions add 100-500ms per commit), throughput decreases (commits are synchronous), sink choice is constrained (sink must support transactions; Kafka, JDBC, Iceberg, and a few others). For sinks that don't support transactions (HTTP services, legacy databases), Flink's exactly-once degrades to effectively-once with idempotent consumers.
The L5 signal in the interview: stating exactly-once guarantees explicitly with the constraints. “The Flink-Kafka pipeline is end-to-end exactly-once because the Kafka producer participates in the two-phase commit; the Flink-HTTP sink would degrade to at-least-once with consumer-side idempotency required.”
Flink stores keyed state in a state backend. Two production options, with different performance and operational characteristics.
Heap state backend: state lives in JVM heap. Reads and writes are fast (microseconds). State size limited by heap size (typically 4-32 GB per TaskManager). Best for small-state workloads where speed matters and total state fits in memory.
RocksDB state backend: state lives in an embedded RocksDB instance with disk spillover. Reads and writes are slower (milliseconds). State size limited by disk (typically 100s of GB to TB per TaskManager). Required for large-state workloads (sessionization with long TTLs, feature pipelines with many keys). Supports incremental checkpoints, which is critical at scale.
Choosing between them is a state-size question. Roughly: under 1 GB state per TaskManager, use heap; over 10 GB, use RocksDB; in between, depends on latency requirements. Strong candidates can estimate state size from the workload (number of keys * size per key * retention) and pick accordingly.
public class SessionizeFunction extends KeyedProcessFunction<String, Event, Session> {
private ValueState<SessionAccumulator> sessionState;
private static final long GAP_MS = 30 * 60 * 1000;
@Override
public void open(Configuration parameters) {
sessionState = getRuntimeContext().getState(
new ValueStateDescriptor<>("session", SessionAccumulator.class));
}
@Override
public void processElement(Event event, Context ctx, Collector<Session> out) throws Exception {
SessionAccumulator current = sessionState.value();
if (current == null || event.ts - current.lastEventTs > GAP_MS) {
if (current != null) {
out.collect(current.toSession());
}
current = new SessionAccumulator(event);
} else {
current.add(event);
}
sessionState.update(current);
ctx.timerService().registerEventTimeTimer(event.ts + GAP_MS);
}
@Override
public void onTimer(long timestamp, OnTimerContext ctx, Collector<Session> out) throws Exception {
SessionAccumulator current = sessionState.value();
if (current != null && current.lastEventTs + GAP_MS <= timestamp) {
out.collect(current.toSession());
sessionState.clear();
}
}
}All three are production stream processors. The choice depends on workload, team expertise, and ecosystem.
| Dimension | Flink | Spark Structured Streaming | Kafka Streams |
|---|---|---|---|
| Processing model | True streaming | Micro-batch (configurable) | Streaming |
| State management | Heap or RocksDB, well-documented | RocksDB, less explicit tuning | RocksDB, application-embedded |
| Exactly-once | True end-to-end via 2PC | End-to-end with transactional sinks | End-to-end with Kafka transactions |
| Window semantics | Most flexible, all window types | Tumbling, sliding, session | Tumbling and hopping |
| Connectors | Most extensive | Spark ecosystem | Kafka-native only |
| Operational complexity | Highest | Moderate (Spark expertise transfers) | Lowest (embedded in JVM apps) |
| Best fit | Complex stateful, high throughput | Spark-native teams, mixed batch+stream | Kafka-only ecosystems, embedded |
Flink is the most-tested stream processor in dedicated the streaming data engineer interview guide loops. The system design framework from the system design round prep guide applies to streaming architectures with Flink as the primary stream processor.
For broader streaming context, see the streaming guide. For comparison with batch-first stream processors, see the Kafka vs Kinesis decision page (Flink works equally well with both). Companies most likely to test deep Flink: the Netflix data engineer interview guide, the Uber data engineer interview guide, the Lyft data engineer interview guide, the Pinterest data engineer interview guide.
Drill Flink, exactly-once, and stateful streaming patterns in our practice sandbox.
Start PracticingThe full streaming role framework with Flink as primary stream processor.
Message broker decision relevant to every Flink deployment.
Pillar guide covering every round in the Data Engineer loop, end to end.
The full SQL interview question bank, indexed by topic, difficulty, and company.
BigQuery internals, slot-based pricing, partitioning, and clustering interview prep.
Redshift sort keys, dist keys, compression, and RA3 architecture interview prep.
Postgres MVCC, indexing, partitioning, and replication interview prep.
Hadoop ecosystem (HDFS, MapReduce, YARN, Hive) interview prep, including modern relevance.
AWS Glue ETL jobs, crawlers, Data Catalog, and PySpark-on-Glue interview prep.
Continue your prep
50+ guides covering every round, company, role, and technology in the data engineer interview loop. Grounded in 2,817 verified interview reports across 929 companies, collected from real candidates.