生产运营/行业运营-word-rnn-PROSAGA-码农传奇

=== INTRODUCTION ===

This project builds a Word-level Recurrent Neural Network in Python3 using TensorFlow that can be trained to:
(i) generate text similar to the training corpus and (ii) find lines in a test file that are of not the same
“style” as the lines found in the training corpus. It supports optional pre-trained word embeddings from the Stanford Glove project.

https://nlp.stanford.edu/projects/glove/

If pre-trained word embeddings are not used, the embeddings will be learned as part of training.

Applications:

Case 1: Can be used to generate new poems, essays, source code etc. depending on the training set
Case 2: Can be used to detect “fakes” that are similar in style to the training set on cursory glance

To run the project, execute main.py from a unix-style command line shell and follow the help section.

=== Basic Command Line Usage Examples ===

Print help:

./main.py -h

Train the Word RNN model (for 10 epochs):

./main.py -c train —num-epochs=10

Generate text using a trained Word RNN model:

./main.py -c generate —num-words=100

Compute anomaly lines for an input test file using trained model:

./main.py -c anomaly-detect —test-input-file=”./myfile.txt” —anomaly-threshold=95

Use pre-trained word embeddings with any of the above commands:

./main.py -c train -e glove —num-epochs=10
./main.py -c generate -e glove —num-words=100
./main.py -c anomaly-detect -e glove —test-input-file=”./myfile.txt” —anomaly-threshold=95