Input: dirty data files
Output: preprocessing results
Map (Object, Data_Text, NullWritable, Data_Text)
Input: key = offset, value = tuple
FOR each (key, value) DO
output_key := null
output_value := key +value