Input: call the update_model(), load the old model, update the vocabulary and call the training function for the new corpus after preprocessing

Output: Get_model () gets the Word2vec model and saves it

Step 1: Convert the corpus format from GBK to UTF-8 encoding, obtain the content, and save it in news.txt.

Step 2: Clean off excess tags, load the custom dictionary Jieba participle, and stop using words

Step 3: Read the file contents, use the split_file () function to make the document slices, divide into several documents with 10,000 as a partition, and save the split contents

Step 4: Call Continue_train () to change the corpus into a set of sentences, and load the old model on the updated corpus for training and saving the model