Class StatefulProcessorWithInitialState<K,I,O,S>

Object
org.apache.spark.sql.streaming.StatefulProcessor<K,I,O>
org.apache.spark.sql.streaming.StatefulProcessorWithInitialState<K,I,O,S>
All Implemented Interfaces:
Serializable

public abstract class StatefulProcessorWithInitialState<K,I,O,S> extends StatefulProcessor<K,I,O>
Stateful processor with support for specifying initial state. Accepts a user-defined type as initial state to be initialized in the first batch. This can be used for starting a new streaming query with existing state from a previous streaming query.
See Also:
  • Constructor Details

    • StatefulProcessorWithInitialState

      public StatefulProcessorWithInitialState()
  • Method Details

    • handleInitialState

      public abstract void handleInitialState(K key, S initialState, TimerValues timerValues)
      Function that will be invoked only in the first batch for users to process initial states. The provided initial state can be arbitrary dataframe with the same grouping key schema with the input rows, e.g. dataframe from data source reader of existing streaming query checkpoint.

      Note that in microbatch mode, this function can be called for one or more times per grouping key. If the grouping key is not seen within the initial state dataframe rows, then the function will not be invoked for that key.

      Parameters:
      key - \- grouping key
      initialState - \- A row in the initial state to be processed
      timerValues - \- instance of TimerValues that provides access to current processing/event time if available