Purpose
- Map Function:
- The
Mapfunction’s primary purpose is to process individual input data points. Each piece of data retrieved in theGet Input Datastage is passed to theMapfunction for processing. - This stage is ideal for operations that need to be performed on each individual record or data point, such as data transformation, filtering, or enriching data before aggregation.
- Reduce Function:
- The
Reducefunction aggregates and processes data that has been output by theMapstage. It is designed to handle cases where multiple related records or data points need to be consolidated, summarized, or further processed as a group. - This stage is suitable for tasks like summing values, calculating averages, or performing operations that require knowledge of the entire dataset, such as deduplication.
Input and Output
- Map Function:
- Input: The
Mapfunction receives key-value pairs, where the key is a unique identifier, and the value is a data point or record. EachMapfunction call processes a single key-value pair. - Output: The
Mapfunction emits key-value pairs that will be passed to theReducefunction. The key typically groups related records together, and the value is the processed data. - Reduce Function:
- Input: The
Reducefunction receives key-value pairs, where each key represents a group of related data points output by theMapfunction. The value is an array of data points or records associated with that key. - Output: The
Reducefunction typically produces a single output per key, which is often a summary or aggregated result of the group of records.
Use Cases
- Map Function:
- Example: Suppose you are processing a list of sales orders and need to normalize the customer names. The
Mapfunction would be ideal for this task as it can independently process each sales order and modify the customer name accordingly. - Other Use Cases: Data filtering, transformation, validation, enrichment.
- Reduce Function:
- Example: After normalizing the customer names in the
Mapstage, you might want to calculate the total sales amount per customer. TheReducefunction would sum the sales amounts for each customer (grouped by customer ID) to produce a final total. - Other Use Cases: Aggregating totals, summarizing data, removing duplicates.
Error Handling and Recovery
- Map Function:
- Errors in the
Mapstage affect only the current key-value pair being processed. NetSuite’s Map/Reduce Script framework can automatically retry failed map operations or log the error for further investigation. - The isolated nature of the
Mapstage makes it easier to handle errors without affecting the overall data processing. - Reduce Function:
- Errors in the
Reducestage can be more complex to handle because they may involve groups of records. Recovering from errors might require reprocessing the entire group or adjusting the logic to handle specific edge cases. - The
Reducestage is less tolerant of errors because it usually deals with aggregated or related data, making error recovery potentially more impactful.