# presto **Repository Path**: wolfcub.com/presto ## Basic Information - **Project Name**: presto - **Description**: presto作业 - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-10-07 - **Last Updated**: 2021-10-07 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # presto #### 介绍 presto作业 1. 搜索HyperLogLog算法相关内容,了解其原理,写出5条 HyperLogLog的用途或大数据场景下的实际案例。 1. 原理:目的是做基数统计,故不是集合,不会保存元数据,只记录数量而不是数值。耗空间极小,支持输入非常体积的数据量。核心是基数估算算法,主要表现为计算时内存的使用和数据合并的处理。最终数值存在一定误差。对海量数据进行unique统计。 2. 案例: 1. 统计网站的每日UV。 2. 统计总的注册用户。 3. 统计企业每月GMV。 2. 在本地docker环境或阿里云e-mapreduce环境进行SQL查询, 要求在Presto中使用HyperLogLog计算近似基数。(请自行创 建表并插入若干数据) 1. 建表 ```sql CREATE TABLE visit_user ( user_id int, visit_date date, media varchar ); ``` 2. 插入数据 ```sql INSERT INTO visit_user VALUES (101, DATE('2021-09-27'), 'organic search'), (101, DATE('2021-09-27'), 'paid search'), (102, DATE('2021-09-27'), 'organic search'), (103, DATE('2021-09-28'), 'organic search'), (103, DATE('2021-09-29'), 'organic search'), (104, DATE('2021-09-29'), 'paid search'), (105, DATE('2021-09-29'), 'referral'), (106, DATE('2021-09-30'), 'organic search'), (107, DATE('2021-09-30'), 'referral'), (101, DATE('2021-10-01'), 'organic search'), (102, DATE('2021-10-01'), 'organic search'), (102, DATE('2021-10-01'), 'paid search'), (103, DATE('2021-10-01'), 'organic search'), (103, DATE('2021-10-01'), 'referral'), (104, DATE('2021-10-01'), 'organic search'), (105, DATE('2021-10-01'), 'organic search'), (106, DATE('2021-10-01'), 'organic search'), (107, DATE('2021-10-01'), 'organic search'), (111, DATE('2021-10-02'), 'organic search'), (112, DATE('2021-10-02'), 'organic search'), (112, DATE('2021-10-02'), 'paid search'), (113, DATE('2021-10-02'), 'organic search'), (113, DATE('2021-10-02'), 'organic search'), (114, DATE('2021-10-02'), 'organic search'), (115, DATE('2021-10-02'), 'organic search'), (116, DATE('2021-10-02'), 'organic search'), (117, DATE('2021-10-02'), 'organic search'), (121, DATE('2021-10-02'), 'organic search'), (122, DATE('2021-10-02'), 'organic search'), (122, DATE('2021-10-02'), 'organic search'), (123, DATE('2021-10-02'), 'organic search'), (123, DATE('2021-10-03'), 'organic search'), (124, DATE('2021-10-03'), 'organic search'), (125, DATE('2021-10-03'), 'organic search'), (126, DATE('2021-10-03'), 'organic search'), (127, DATE('2021-10-03'), 'organic search'), (211, DATE('2021-10-03'), 'organic search'), (212, DATE('2021-10-03'), 'organic search'), (212, DATE('2021-10-03'), 'paid search'), (213, DATE('2021-10-03'), 'organic search'), (213, DATE('2021-10-04'), 'organic search'), (214, DATE('2021-10-04'), 'organic search'), (215, DATE('2021-10-04'), 'organic search'), (216, DATE('2021-10-04'), 'organic search'), (217, DATE('2021-10-04'), 'organic search'), (221, DATE('2021-10-04'), 'organic search'), (222, DATE('2021-10-04'), 'referral'), (222, DATE('2021-10-04'), 'organic search'), (223, DATE('2021-10-04'), 'organic search'), (223, DATE('2021-10-04'), 'organic search'), (224, DATE('2021-10-04'), 'organic search'), (225, DATE('2021-10-04'), 'organic search'), (226, DATE('2021-10-04'), 'organic search'), (227, DATE('2021-10-04'), 'organic search'), (311, DATE('2021-10-04'), 'organic search'), (312, DATE('2021-10-05'), 'organic search'), (312, DATE('2021-10-05'), 'organic search'), (313, DATE('2021-10-05'), 'organic search'), (313, DATE('2021-10-05'), 'organic search'), (314, DATE('2021-10-05'), 'organic search'), (315, DATE('2021-10-05'), 'organic search'), (316, DATE('2021-10-05'), 'organic search'), (317, DATE('2021-10-05'), 'organic search'), (321, DATE('2021-10-05'), 'organic search'), (322, DATE('2021-10-05'), 'organic search'), (322, DATE('2021-10-05'), 'organic search'), (323, DATE('2021-10-05'), 'organic search'), (323, DATE('2021-10-05'), 'organic search'), (324, DATE('2021-10-05'), 'organic search'), (325, DATE('2021-10-05'), 'organic search'), (326, DATE('2021-10-05'), 'organic search'), (327, DATE('2021-10-05'), 'organic search') ; ``` 3. 执行查询 ```sql SELECT COUNT(DISTINCT user_id) AS last3d_user_count FROM visit_user WHERE visit_date >= current_date - interval '3' day; ``` ![执行结果](https://images.gitee.com/uploads/images/2021/1007/224940_c856ac65_330140.jpeg "20211007224727.jpg")