# cdl **Repository Path**: micooz/cdl ## Basic Information - **Project Name**: cdl - **Description**: crawler desciption language - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2015-05-17 - **Last Updated**: 2020-12-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # cdl ### Description It\`s a customer crawler framework, we want to provide some interface to crawl a defination website and extract data from page. This framework is not designed for ordinary user, because we need a precise location for data and link, so user should analyse the structure of object website, this not a easy work for a "non-programer".and on the other hand,the main interface we support to caller is a language we called "cdl: crawler description language", so caller\`s work is write a cdl to description a crawler mode for a dedination website. Simulated login module maybe stand-alone from cdl, beacuse it\`s comlex logic, but we will give you a cookie enviroment , so you can produce anly cookie object to this cookie container, and also we will consider define a login module in cdl use javaScript Language, maybe is a weak function. 这是一个定制爬虫的框架,会提供一些接口用于爬取特定的网站并且从网页中抽取结构化数据。 本框架不是为普通软件使用者设计的,因为我们需要得到数据和连接的确切位置,所以使用者需要分析目标网站的结构,这对于“非程序员”来说不是一个容易的事情。另一方面,我们为调用者提供的最主要的接口是我们设计的一个xml语言,我把它叫做cdl:爬虫描述语言,调用者最主要的工作则是使用cdl描述爬虫爬取规则。 模拟登陆模块可能独立于cdl之外,这是因为模拟登陆可能存在负责的逻辑,但是我们会提供给你一个cookie的环境,你只需要产生cookie并且存入cookie的容器即可,当然我们也会考虑在cdl里面定义登陆的模块(使用JavaScript语言),但是我觉得这始终是一个比较弱的功能。 ### Build ``` $ gradle build ``` Compiled library is located at `build/libs/cdl.jar`. ### Contributors 邓维佳, Micooz