Input: dirty data files

Output: preprocessing results

Map (Object, Data_Text, NullWritable, Data_Text)

Input: key = offset, value = tuple

FOR each (key, value) DO

output_key := null

output_value := key +value