# StataRegex **Repository Path**: arlionn/StataRegex ## Basic Information - **Project Name**: StataRegex - **Description**: A Stata implementation of the Java regular expression utilities - **Primary Language**: HTML - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 1 - **Forks**: 3 - **Created**: 2020-02-23 - **Last Updated**: 2023-01-16 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README [![Project Status: WIP - Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.](http://www.repostatus.org/badges/latest/wip.svg)](http://www.repostatus.org/#wip) - 连玉君注释(`2017/11/8 15:45`): 扩展 Stata 中正则表达式的功能,通过调用 java 来实现。 - 参考推文:[爬虫俱乐部 - 正则表达式之Dotall模式](http://mp.weixin.qq.com/s/BWqjVGMGbqMuFKRHmwqzSg) # Java-based Regular Expression Utilities for Stata Still very early in development, but will provide methods to use the regular expression capabilities in Java. Additionally, this will also provide methods of using regular expression methods across `varlists`, `macros`, and/or single variables. This should provide users with significantly improved flexibility with regards to how regular expressions can be used in their work flows. One of the more important features this brings to Stata is the use of traditional metacharacters used in regular expressions (e.g., {2, 5} curly brackets to specify the number of times a pattern must match as well as the limit of the number of matches, \d [a digit], \D [non-digits], etc...). For additional information regarding the available meta characters and their usage, users are referred to the [Pattern class javadocs](https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html). ## Additional information This is a single planned component of a slightly larger body of work for munging and working with string data in Stata. Additional capabilities will include different string distance/fuzzy matching algorithms based on the metaphone, double methaphone, and other phonetics-based string encoding algorithms.