Extraction of data from semi-structured text files, and preprocess the text into numerical representations.