# expression-engine **Repository Path**: yuantao99/expression-engine ## Basic Information - **Project Name**: expression-engine - **Description**: 公式解析功能学习 - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2022-03-30 - **Last Updated**: 2022-07-12 ## Categories & Tags **Categories**: Uncategorized **Tags**: Java ## README ### 项目介绍 项目是为实现一个简单的公式解析引擎。目前进度完成如下: - [x] 了解基础知识 - [x] 学习项目[the-super-tiny-compiler](https://github.com/jamiebuilds/the-super-tiny-compiler),并把它改为java项目。 - [ ] 自实现一个简单公式解析引擎。(下次一定,😢最近在恶补数据库。) 在一切的开始,我会从介绍Formal Language这个单词的翻译开始。不是在作秀,这并非是无用,在项目`the-super-tiny-compiler`的代码学习和表达式解析功能的设计实现时,我很庆幸自己从这个词开始学习。 #### Formal Language 我关注到这个词是因为翻译软件对其解释为"形式语言","正式语言"等。这个解释让我无法理解由它扩展的概念:Formal Grammer、Anaylytic Formal Grammar、Lexer等。于是,我觉得需要对这个词的中文进行自我理解定义。在维基百科中,有下面的内容: > In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules. > > 在逻辑学、数学、计算机语言和语言学中,一个formal language由wrod组成,这些wrod的letters来自alphabet,并根据一组特定的rules组成所以well-formed。 > > The alphabet of a formal language consists of symbols, letters, or tokens that concatenate into strings of the language. Each string concatenated from symbols of this alphabet is called a word, and the words that belong to a particular formal language are sometimes called well-formed words or well-formed formulas. A formal language is often defined by means of a formal grammar such as a regular grammar or context-free grammar, which consists of its formation rules. > > formal language的alphabet由symbols、letters或tokens组成,这些成员连接成该语言的字符串。由该字母表的成员连接起来的每个字符串称为word,属于特定formal language的words有时称为well-formal words或well-formed formulas。formal language通常是通过formal grammar来定义的,如规则语法或上下文无关语法,这些formal grammar由他的formation rules组成。 在上文这两段引言中,翻译部分没有将全部意思翻译为中文,但是我们可以根据上面找到下面这个等式: ![](./picture/equation.png) 读引言原内容,自我进行了联想,用具体的东西,去回答自己"什么是Formal Language"这个词。 在我理解下,Java可以属于Formal Language,LaTex数学公式可以属于Formal Language,中文同样可以是属于Formal Language。 ![](./picture/concrete_object.png) Java有自己的规定符号集,它包括是符号`{ `、`;`等,它还包括字母`a`、`b`、`c`等。在编码时,符号的编排需要符合它的规则,既我们的语句需要符合Java的语法,否则它无法编译。上句话中,Java符合上面的等式,LaTex、中文细想之下也可以说明符合等式。 规定的符号、规则,这些词透露着限制,而且这种限制是有规律的,所以,对于Formal Language的翻译我想能够体现这两层意思便可,我想"定义语言"、"规则语言"也许可以,这里我喜欢把它叫为"结构化语言",结构听着感觉很有美感。 > ps:The first use of formal language is thought to be Gottlob Frege's 1879 Begriffsschrift, meaning "concept writing", which described a "formal language, modeled upon that of arithmetic, for pure thought." 在我想法中,结构化语言不只是我们看得见的东西,这里我从音乐来讲: ![](./picture/music.png) 常念叨的`do、re、mi、fa、sol、la、si`,我认为实则是以某个声为基础,并根据某个规则连起而形成的声音,这也是为什么电视上有人通过几个水杯就能够躺出我们熟悉的`do、re、mi、fa、sol、la、si`这个旋律。声具体的东西,且有很多种声,流水声、钢琴声等。所以我用音来作为音乐的单位,音通过律动形成了好听的音乐。所以,音乐也是一种语言,在懂它规则的人看来,乐曲可以有情绪,会有故事。 > 音乐这个例子里的知识是作为门外汉的我的说法。 所以,结构化语言的范围很大很大,我认为能够承载起信息传递的事物,都可以是结构化语言。 到这里,我认为符合上面提到的等式,可以自己定义出一门结构化语言。如下面这个例子: >**Examples** > >The following rules describe a formal language L over the alphabet Σ = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, +, =}: > >- Every nonempty string that does not contain "+" or "=" and does not start with "0" is in L. >- The string "0" is in L. >- A string containing "=" is in L if and only if there is exactly one "=", and it separates two valid strings of L. >- A string containing "+" but not "=" is in L if and only if every "+" in the string separates two valid strings of L. >- No string is in L other than those implied by the previous rules. > >现有名为"L"的一门结构化语言,它有字母表 Σ = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, +, =} 并且有规则如下: > >- 在L中每一个非空字符串不能以"+"和"="开头并且也不能以0开头。 >- L中"0"字符串合法。 >- L中字符串只能有一个"="且"="要在两个合法字符串中间。 >- 在L中"+"需要在两个有效字符串中间,且这两个字符串不包括"="。 >- 除了符合前面规则的字符串外,L中没有其他字符串。 > >Under these rules, the string "23+4=555" is in L, but the string "=234=+" is not. This formal language expresses natural numbers, well-formed additions, and well-formed addition equalities, but it expresses only what they look like (their syntax "Syntax")), not what they mean (semantics). For instance, nowhere in these rules is there any indication that "0" means the number zero, "+" means addition, "23+4=555" is false, etc. > >要特别注意的是,我们这里的字母,并不是我们理解的自然数字。如"0"不是我们平时说的数字0,它知识一个符合L语言的字符串。 #### Formal Grammar 上节我们看到,Formal Grammar是非常重要的一个组成部分。在维基百科中,有下面内容: > In formal language theory, a **grammar** (when the context is not given, often called a **formal grammar** for clarity) describes how to form strings from a language's alphabet that are valid according to the language's syntax. A grammar does not describe the meaning of the strings or what can be done with them in whatever context—only their form. A formal grammar is defined as a set of production rules for such strings in a formal language. > > 在结构化语言学说中,语法描述了如何通过字母表组织成能通过语言语法校验的字符串。语法不描述字符串的含义,也不描述在任何上下文中可以用它们做什么——只描述它们的结构。结构化语法被定义为一组在结构化语言中针对此类字符串的生成规则。 > > A formal grammar is a set of rules for rewriting strings, along with a "start symbol" from which rewriting starts. Therefore, a grammar is usually thought of as a language generator. However, it can also sometimes be used as the basis for a "recognizer"—a function in computing that determines whether a given string belongs to the language or is grammatically incorrect. > > 结构化语法是为重写字符串的一组规则。结构语法重新定义一个"start symbol"。因此,grmmar通常被认为是语言生成器。然而,它有时也可以用作“校验器”的基础。“校验器”是计算中的一个函数,用于确定给定的字符串是否属于该语言或语法错误。 Formal Grammar能够帮助我们传递信息,他将字符组成有特殊含义的字符串,又通过规则把字符串解析成信息。最熟悉莫过于我们写作,而读者可以通过头脑理解文字读取我们要表达的意思。有时我们的文字可能会是病句,那么就无法解析成正确的信息。 ![](./picture/grammar_fuction.png) 我们可以看到上图中语法起到了两个不同的作用,所以Formal Grammar这里需要清楚的是[generative grammars and analytic grammars](https://cs.stackexchange.com/questions/28541/generative-grammars-and-analytic-grammars)。 ![](./picture/generative_grammars_and_analytic_grammars.png) #### The Super Tiny Compiler 上述的知识点意义在于后续代码设计能够站在更高的角度,Java是面向对象,没有一个很好的概念概念,代码设计即使功能实现,代码质量也可能是糟糕的。`The Super Tiny Compiler`的学习在于了解编译器的思路,为实现提供一些帮助。在项目 `cn.yuanyuan.guide`中,写着由原项目转化而来的Java版`The Super Tiny Compiler`。 下面是词法解析、AST生成和AST转化的流程图及其运行结果: ![](./picture/super-tiny-compiler-flow.png) 词法解析: ```java /** TokenizerTest */ @Test public void test() throws JsonProcessingException { Tokenizer tokenizer = new Tokenizer(); List tokens = tokenizer.tokenize("(add 2 (subtract 4 2))"); ObjectMapper objectMapper = new ObjectMapper(); String tokenString = objectMapper.writerWithDefaultPrettyPrinter().writeValueAsString(tokens); System.out.println(tokenString); } ``` ```json [ { "type" : "paren", "value" : "(" }, { "type" : "name", "value" : "add" }, { "type" : "number", "value" : "2" }, { "type" : "paren", "value" : "(" }, { "type" : "name", "value" : "subtract" }, { "type" : "number", "value" : "4" }, { "type" : "number", "value" : "2" }, { "type" : "paren", "value" : ")" }, { "type" : "paren", "value" : ")" } ] ``` 生成AST结构: ```java /** ParserTest */ @Test public void test() throws JsonProcessingException { Tokenizer tokenizer = new Tokenizer(); List tokens = tokenizer.tokenize("(add 2 (subtract 4 2))"); Parser parser = new Parser(); ASTRootNode parse = parser.parse(tokens); ObjectMapper objectMapper = new ObjectMapper(); String astString = objectMapper.writerWithDefaultPrettyPrinter().writeValueAsString(parse); System.out.println(astString); } ``` ```json { "type" : "Program", "body" : [ { "type" : "CallExpression", "name" : "add", "params" : [ { "type" : "NumberLiteral", "value" : "2", }, { "type" : "CallExpression", "name" : "subtract", "params" : [ { "type" : "NumberLiteral", "value" : "4", }, { "type" : "NumberLiteral", "value" : "2", } ], } ], } ] } ``` AST再次转化为更加便于计算的AST结构,这里正如高级语言和低级语言的转化: ```java /** TransformerTest */ @Test public void test() throws JsonProcessingException { Tokenizer tokenizer = new Tokenizer(); List tokens = tokenizer.tokenize("(add 2 (subtract 4 2))"); Parser parser = new Parser(); ASTRootNode parse = parser.parse(tokens); Transformer transformer = new Transformer(); ASTRootNode transferNode = transformer.transformer(parse); ObjectMapper objectMapper = new ObjectMapper(); String astString = objectMapper.writerWithDefaultPrettyPrinter().writeValueAsString(transferNode); System.out.println(astString); } ``` ```json { "type" : "Program", "body" : [ { "type" : "ExpressionStatement", "expression" : { "type" : "CallExpression", "callee" : { "type" : "Identifier", "name" : "add" }, "arguments" : [ { "type" : "NumberLiteral", "value" : "2" }, { "type" : "CallExpression", "callee" : { "type" : "Identifier", "name" : "subtract" }, "arguments" : [ { "type" : "NumberLiteral", "value" : "4" }, { "type" : "NumberLiteral", "value" : "2" } ], } ], } } ] } ``` AST生成JS代码: ```java @Test public void test() { Tokenizer tokenizer = new Tokenizer(); List tokens = tokenizer.tokenize("(add 2 (subtract 4 2))"); Parser parser = new Parser(); ASTRootNode parse = parser.parse(tokens); Transformer transformer = new Transformer(); ASTRootNode transferNode = transformer.transformer(parse); CodeGenerator codeGenerator = new CodeGenerator(); String code = codeGenerator.codeGenerator(transferNode); System.out.println(code); } ```