# streaming **Repository Path**: xiwanggit/streaming ## Basic Information - **Project Name**: streaming - **Description**: kafka+spark_streaming/pyflink data to dashboard of html - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 2 - **Created**: 2020-04-22 - **Last Updated**: 2021-05-28 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README ### purpose:[spark/pyflink + kafka to Dashbord by xmu](http://dblab.xmu.edu.cn/blog/1532/) ### SparkStreaming * env * download install package * jdk=1.8.0_181: https://www.oracle.com/cn/java/technologies/javase-jdk8-downloads.html * kafka=2.11-2.4.1: * http://kafka.apache.org/downloads * https://mirrors.tuna.tsinghua.edu.cn/apache/kafka/2.4.1/kafka_2.11-2.4.1.tgz * Scala=2.11.12: * https://www.scala-lang.org/download/ * https://www.scala-lang.org/download/2.11.12.html * https://downloads.lightbend.com/scala/2.11.12/scala-2.11.12.tgz * Python=3.7.7:https://www.python.org/downloads/ * Spark=2.4.5:http://spark.apache.org/downloads.html/https://archive.apache.org/dist/spark/ * spark-streaming-kafka-0-8-assembly_2.11-2.4.5.jar(python version) * spark-streaming-kafka-0-8_2.11-2.4.5.jar(scala version) * https://search.maven.org #推荐 * https://jar-download.com/maven-repository-class-search.php * https://mvnrepository.com * https://repo1.maven.org/maven2/org/apache/spark/spark-streaming-kafka-0-8-assembly_2.11/2.4.5/spark-streaming-kafka-0-8-assembly_2.11-2.4.5.jar * data_format.zip:https://pan.baidu.com/s/1cs02Nc * steps * docker build: `docker-compose build` * docker start: `docker-compose up -d` * docker exec: `docker exec -it streaming_test_streaming_1 bash` * start server * start zk server: `zookeeper-server-start.sh $KAFKA_HOME/config/zookeeper.properties &` * start kafka server: `kafka-server-start.sh $KAFKA_HOME/config/server.properties &` * test kafka * producer:`python3 /work/code/py/kafka_test/scripts/producer.py ` * consumer:`python3 /work/code/py/kafka_test/scripts/consumer.py` * run * python * producer:`python3 /work/code/py/kafka_test/scripts/producer.py ` * spark streaming demo test: `spark-submit $SPARK_HOME/examples/src/main/python/streaming/direct_kafka_wordcount.py localhost:9092 sex` * spark streaming deal data and producer to kafka new topic: `spark-submit /work/code/py/stream_kafka_test/kafka_test.py localhost:2181 1 sex 1` * consumer:`python3 /work/code/py/stream_kafka_test/consumer.py` * scala * package * `cd /work/code/scala/kafka` * `sbt package` * producer:`python3 /work/code/py/kafka_test/scripts/producer.py` * spark streaming demo test: `run-example streaming.DirectKafkaWordCount localhost:9092 1 sex` * spark streaming deal data and producer to kafka new topic: `spark-submit /work/code/scala/kafka/target/scala-2.11/simple-project_2.11-1.0.jar localhost:2181 1 sex 1` * consumer:`python3 /work/code/py/stream_kafka_test/consumer.py` * start Dashboard * start server * start producer: `python3 /work/code/py/kafka_test/scripts/producer.py` * start spark streaming: `spark-submit /work/code/py/stream_kafka_test/kafka_test.py localhost:2181 1 sex 1` * start consumer:`spark-submit /work/code/py/stream_kafka_test/consumer.py` * start app: `python3 /work/code/py/kafka_test/app.py` * url: `http://host_ip:8008/` ![输入图片说明](https://images.gitee.com/uploads/images/2020/0423/174458_bac14dde_379161.png "微信截图_20200423174429.png") ### pyflink * Reference * [flink官网](https://ci.apache.org/projects/flink/flink-docs-stable/zh/dev/python/installation.html) * 依赖jar包的处理 * git clone git@gitee.com:apache/flink.git && mvn install * download * [flink maven](https://repo1.maven.org/maven2/org/apache/flink/) * [jar search](https://mvnrepository.com/) * 依赖包 * 所有的:https://gitee.com/apache/flink/blob/release-1.12/pom.xml * connect: https://gitee.com/apache/flink/blob/release-1.12/flink-connectors/pom.xml * step * start server * start zk server: `zookeeper-server-start.sh $KAFKA_HOME/config/zookeeper.properties &` * start kafka server: `kafka-server-start.sh $KAFKA_HOME/config/server.properties &` * start producer: `python3 /work/code/py/kafka_test/scripts/producer_json_time.py` * start spark streaming: `python3 /work/code/py/flink_code/read_write_kafka/test.py` * 查看consumer(print data):`python3 /work/code/py/stream_kafka_test/consumer.py` * start app: `python3 /work/code/py/kafka_test/app_flink.py` * url: `http://host_ip:8008/`