# LSTM-TensorSpark **Repository Path**: fcy_gitee/LSTM-TensorSpark ## Basic Information - **Project Name**: LSTM-TensorSpark - **Description**: Implementation of a LSTM with TensorFlow and distributed on Apache Spark - **Primary Language**: Python - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 1 - **Forks**: 0 - **Created**: 2020-07-08 - **Last Updated**: 2022-05-12 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # LSTM-TensorSpark Implementation of a LSTM with [TensorFlow](https://www.tensorflow.org/) and distributed on [Apache Spark](http://spark.apache.org/) There are provided two different implementations: - Distributed on Spark; - Standalone; Detailed explanation here: [Distributed implementation of a LSTM on Spark and Tensorflow](http://www.slideshare.net/emanueldinardo/distributed-implementation-of-a-lstm-on-spark-and-tensorflow-69787635) Developed for academic purpose ## Dependencies Distributed model needs: - Python 2.6+ - Pyspark - TensorFlow 1.0+ - Numpy - Argparse - TQDM Standalone model needs: - Python 2.6+ - TensorFlow 1.0+ - Numpy - Argparse - TQDM ## Usage ### Example using Spark: From src directory ``` spark-submit rnn.py --training_path ../dataset/iris.data --labels_path ../dataset/labels.data --output_path train_dir_iris --partitions 4 ``` ``` usage: rnn.py [-h] [--master MASTER] [--spark_exec_memory SPARK_EXEC_MEMORY] [--partitions PARTITIONS] [--epochs EPOCHS] [--hidden_units HIDDEN_UNITS] [--batch_size BATCH_SIZE] [--num_classes NUM_CLASSES] [--in_features IN_FEATURES] [--evaluate_every EVALUATE_EVERY] [--learning_rate LEARNING_RATE] [--training_path TRAINING_PATH] [--labels_path LABELS_PATH] [--output_path OUTPUT_PATH] [--mode MODE] [--checkpoint_path CHECKPOINT_PATH] ``` ``` optional arguments: -h, --help show this help message and exit --master MASTER Host or master node location (can be node name) --spark_exec_memory SPARK_EXEC_MEMORY Spark executor memory --partitions PARTITIONS Number of distributed partitions --epochs EPOCHS Number of epochs --hidden_units HIDDEN_UNITS List of hidden units per layer (seprated by comma) --batch_size BATCH_SIZE Mini batch size --num_classes NUM_CLASSES Number of classes in dataset --in_features IN_FEATURES Number of input features --evaluate_every EVALUATE_EVERY Numbers of steps for each evaluation --learning_rate LEARNING_RATE Learning rate --training_path TRAINING_PATH Path to training set --labels_path LABELS_PATH Path to training_labels --output_path OUTPUT_PATH Path for store network state --mode MODE Execution mode --checkpoint_path CHECKPOINT_PATH Directory where to save network model and logs ``` ### Example without Spark: From src directory ``` python lstm-no-spark.py --training_path ../dataset/iris.data --labels_path ../dataset/labels.data --output_path train_dir_iris ``` ``` usage: rnn.py [-h] [--hidden_units HIDDEN_UNITS] [--batch_size BATCH_SIZE] [--num_classes NUM_CLASSES] [--in_features IN_FEATURES] [--evaluate_every EVALUATE_EVERY] [--learning_rate LEARNING_RATE] [--training_path TRAINING_PATH] [--labels_path LABELS_PATH] [--output_path OUTPUT_PATH] [--mode MODE] [--checkpoint_path CHECKPOINT_PATH] ``` ``` optional arguments: --epochs EPOCHS Number of epochs --hidden_units HIDDEN_UNITS List of hidden units per layer (seprated by comma) --batch_size BATCH_SIZE Mini batch size --num_classes NUM_CLASSES Number of classes in dataset --in_features IN_FEATURES Number of input features --evaluate_every EVALUATE_EVERY Numbers of steps for each evaluation --learning_rate LEARNING_RATE Learning rate --training_path TRAINING_PATH Path to training set --labels_path LABELS_PATH Path to training_labels --output_path OUTPUT_PATH Path for store network state --mode MODE Execution mode --checkpoint_path CHECKPOINT_PATH Directory where to save network model and logs ```