# univocity-parsers
**Repository Path**: mirrors_gspandy/univocity-parsers
## Basic Information
- **Project Name**: univocity-parsers
- **Description**: uniVocity-parsers is a suite of extremely fast and reliable parsers for Java. It provides a consistent interface for handling different file formats, and a solid framework for the development of new parsers.
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2020-09-24
- **Last Updated**: 2025-10-19
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README

Welcome to uniVocity-parsers
============================
uniVocity-parsers is a collection of extremely fast and reliable parsers for Java. It provides a consistent interface for handling different file formats,
and a solid framework for the development of new parsers.
### Table of contents ###
* [Introduction](#introduction)
* [Parsers](#parsers)
* [Installation](#installation)
* [Background](#background)
* [Examples](#examples)
* [Reading CSV](#reading-csv)
* [To read all rows of a CSV (the quick and easy way)](#to-read-all-rows-of-a-csv-the-quick-and-easy-way)
* [To read all rows of a CSV (iterator-style)](#to-read-all-rows-of-a-csv-iterator-style)
* [Escaping quote escape characters](#escaping-quote-escape-characters)
* [Read all rows of a CSV (the powerful version)](#read-all-rows-of-a-csv-the-powerful-version)
* [Using annotations to map your java beans](#using-annotations-to-map-your-java-beans)
* [Using your own conversions in annotations](#using-your-own-conversions-in-annotations)
* [Reading master-detail style files](#reading-master-detail-style-files)
* [Parsing fixed-width files](#parsing-fixed-width-files)
* [Parsing TSV files](#parsing-tsv-files)
* [Column selection](#column-selection)
* [Reading columns instead of rows](#reading-columns-instead-of-rows)
* [Parsing columns from a CSV file](#parsing-columns-from-a-csv-file)
* [Using the batched column processor in a Fixed-With input](#using-the-batched-column-processor-in-a-fixed-with-input)
* [Reading columns from a TSV while converting the parsed content to Objects](#reading-columns-from-a-tsv-while-converting-the-parsed-content-to-objects)
* [Processing rows in parallel](#processing-rows-in-parallel)
* [Parsing individual Strings](#parsing-individual-strings)
* [Settings](#settings)
* [Fixed-width settings](#fixed-width-settings)
* [Format Settings](#format-settings)
* [CSV format](#csv-format)
* [Fixed width format](#fixed-width-format)
* [TSV format](#tsv-format)
* [Writing](#writing)
* [Quick and simple CSV writing example](#quick-and-simple-csv-writing-example)
* [TSV writing example](#tsv-writing-example)
* [Writing row by row, with comments](#writing-row-by-row-with-comments)
* [Writing with column selection](#writing-with-column-selection)
* [Writing with value conversions (using ObjectRowWriterProcessor)](#writing-with-value-conversions-using-objectrowwriterprocessor)
* [Writing annotated java beans](#writing-annotated-java-beans)
* [Writing value by value](#writing-value-by-value)
## Introduction ##
The project was started and coded by [uniVocity Software](http://www.univocity.com), an Australian company that develops
[uniVocity](http://www.univocity.com), a commercial data integration API for Java.
It soon became apparent that many parsers out there didn't provide enough flexibility, throughput or reliability for massive and diverse (a nice word for messy) inputs.
Another inconvenience was the difficulty in extending these parsers and dealing with a different beast for each format.
We decided to then build our own architecture for parsing text files from the ground up.
The main goal of this architecture is to provide maximum performance and flexibility while making it easy for anyone to create new parsers.
### Parsers ###
uniVocity-parsers currently provides parsers for:
- CSV files (it's the fastest CSV parser for Java you can find)
- Fixed-width files
- TSV files
We will introduce more parsers over time. Note many delimiter-separated formats, such as pipe-separated, are subsets of CSV and our CSV parser should handle them.
We are planning to introduce parsers for this and other specific formats to uniVocity-parsers later on.
Please let us know what you need the most by sending and e-mail to `parsers@univocity.com`.
We will introduce parsers for formats that are of public interest.
We also documented every single class for you, so you can try to create your own parsers for your own particular purposes.
We will help anyone building their own parsers, and offer commercial support for all parsers included in the API (send us an e-mail to `support@univocity.com`,
a dedicated team of experts are ready to assist you).
### Installation ###
Just download the jar file from [here](http://oss.sonatype.org/content/repositories/releases/com/univocity/univocity-parsers/2.2.0/univocity-parsers-2.2.0.jar).
Or, if you use maven, simply add the following to your `pom.xml`
```xml
...
com.univocityunivocity-parsers2.2.0jar
...
```
### Background ###
uniVocity-parsers have the following functional requirements:
1. Support parsing and writing of text files in tabular format, especially:
1.1 CSV files
1.2 Fixed-width files
1.3 TSV files
2. Handle common non-standard functions such as
2.1 File comments
2.2 Partial reads
2.3 Record skipping
3. Column selection
4. Annotation based mapping with data conversions
5. Handle edge cases such as multi-line fields and portable newlines
6. Process the input in parallel.
And these non-functional requirements:
1. Be fast and flexible.
1. Have no external dependencies to existing libraries.
2. Be simple to use.
3. Provide a consistent API for different parsers.
4. Be flexible and heavily configurable.
5. Be extremely fast and memory efficient - yes, we micro optimize.
6. Provide an extensible architecture: You should be able to write your own parser using ~200 lines of code and have all of the above for free.
## Examples ##
### Reading CSV ###
In the following examples, the [example.csv](http://github.com/uniVocity/univocity-parsers/tree/master/src/test/resources/examples/example.csv) file will be used as the input. It is not as simple as you might think.
We've seen some known CSV parsers being unable to read this one correctly:
```
# This example was extracted from Wikipedia (en.wikipedia.org/wiki/Comma-separated_values)
#
# 2 double quotes ("") are used as the escape sequence for quoted fields, as per the RFC4180 standard
#
Year,Make,Model,Description,Price
1997,Ford,E350,"ac, abs, moon",3000.00
1999,Chevy,"Venture ""Extended Edition""","",4900.00
# Look, a multi line value. And blank rows around it!
1996,Jeep,Grand Cherokee,"MUST SELL!
air, moon roof, loaded",4799.00
1999,Chevy,"Venture ""Extended Edition, Very Large""",,5000.00
,,"Venture ""Extended Edition""","",4900.00
```
All parsers work with an instance of `java.io.Reader`, so you will see calls such as `getReader("/examples/example.csv")` everywhere. This is just a helper method we use to build the examples ([source code here](https://github.com/uniVocity/univocity-parsers/tree/master/src/test/java/com/univocity/parsers/examples)):
```
public Reader getReader(String relativePath) {
...
return new InputStreamReader(this.getClass().getResourceAsStream(relativePath), "UTF-8");
...
}
```
So let's get started!
#### To read all rows of a CSV (the quick and easy way) ####
```java
CsvParserSettings settings = new CsvParserSettings();
//the file used in the example uses '\n' as the line separator sequence.
//the line separator sequence is defined here to ensure systems such as MacOS and Windows
//are able to process this file correctly (MacOS uses '\r'; and Windows uses '\r\n').
settings.getFormat().setLineSeparator("\n");
// creates a CSV parser
CsvParser parser = new CsvParser(settings);
// parses all rows in one go.
List allRows = parser.parseAll(getReader("/examples/example.csv"));
```
The output will be:
```
1 [Year, Make, Model, Description, Price]
-----------------------
2 [1997, Ford, E350, ac, abs, moon, 3000.00]
-----------------------
3 [1999, Chevy, Venture "Extended Edition", null, 4900.00]
-----------------------
4 [1996, Jeep, Grand Cherokee, MUST SELL!
air, moon roof, loaded, 4799.00]
-----------------------
5 [1999, Chevy, Venture "Extended Edition, Very Large", null, 5000.00]
-----------------------
6 [null, null, Venture "Extended Edition", null, 4900.00]
-----------------------
```
#### To read all rows of a CSV (iterator-style) ####
```java
// creates a CSV parser
CsvParser parser = new CsvParser(settings);
// call beginParsing to read records one by one, iterator-style.
parser.beginParsing(getReader("/examples/example.csv"));
String[] row;
while ((row = parser.parseNext()) != null) {
println(out, Arrays.toString(row));
}
// The resources are closed automatically when the end of the input is reached,
// or when an error happens, but you can call stopParsing() at any time.
// You only need to use this if you are not parsing the entire content.
// But it doesn't hurt if you call it anyway.
parser.stopParsing();
```
#### Escaping quote escape characters ####
In CSV, quotes inside quoted values must be escaped. For example, the sequence [*\"*] will a quote character inside a quoted value. But what if your quoted value ends with the backslash?
In this case you need to escape the escape character. Consider the following input in [escape.csv](http://github.com/uniVocity/univocity-parsers/tree/master/src/test/resources/examples/escape.csv):
``` escape.csv
"You are \"beautiful\""
"Yes, \\\"in the inside\"\\"
```
To parse this properly, you need to define the *CharToEscapeQuoteEscaping*:
```java
// quotes inside quoted values are escaped as \"
settings.getFormat().setQuoteEscape('\\');
// but if two backslashes are found before a quote symbol they represent a single slash.
settings.getFormat().setCharToEscapeQuoteEscaping('\\');
```
This way the data will be correctly processed as:
```
[You are "beautiful"]
[Yes, \"in the inside"\]
```
#### Read all rows of a CSV (the powerful version) ####
To have greater control over the parsing process, use a [RowProcessor](http://github.com/uniVocity/univocity-parsers/tree/master/src/main/java/com/univocity/parsers/common/processor/RowProcessor.java). uniVocity-parsers provides some useful default implementations but you can always provide your own.
The following example uses [RowListProcessor](http://github.com/uniVocity/univocity-parsers/tree/master/src/main/java/com/univocity/parsers/common/processor/RowListProcessor.java), which just stores the rows read from a file into a List:
```java
// The settings object provides many configuration options
CsvParserSettings parserSettings = new CsvParserSettings();
//You can configure the parser to automatically detect what line separator sequence is in the input
parserSettings.setLineSeparatorDetectionEnabled(true);
// A RowListProcessor stores each parsed row in a List.
RowListProcessor rowProcessor = new RowListProcessor();
// You can configure the parser to use a RowProcessor to process the values of each parsed row.
// You will find more RowProcessors in the 'com.univocity.parsers.common.processor' package, but you can also create your own.
parserSettings.setRowProcessor(rowProcessor);
// Let's consider the first parsed row as the headers of each column in the file.
parserSettings.setHeaderExtractionEnabled(true);
// creates a parser instance with the given settings
CsvParser parser = new CsvParser(parserSettings);
// the 'parse' method will parse the file and delegate each parsed row to the RowProcessor you defined
parser.parse(getReader("/examples/example.csv"));
// get the parsed records from the RowListProcessor here.
// Note that different implementations of RowProcessor will provide different sets of functionalities.
String[] headers = rowProcessor.getHeaders();
List rows = rowProcessor.getRows();
```
Each row will contain:
```
[Year, Make, Model, Description, Price]
=======================
1 [1997, Ford, E350, ac, abs, moon, 3000.00]
-----------------------
2 [1999, Chevy, Venture "Extended Edition", null, 4900.00]
-----------------------
3 [1996, Jeep, Grand Cherokee, MUST SELL!
air, moon roof, loaded, 4799.00]
-----------------------
4 [1999, Chevy, Venture "Extended Edition, Very Large", null, 5000.00]
-----------------------
5 [null, null, Venture "Extended Edition", null, 4900.00]
-----------------------
```
You can also use a [ObjectRowProcessor](http://github.com/uniVocity/univocity-parsers/tree/master/src/main/java/com/univocity/parsers/common/processor/ObjectRowProcessor.java), which will produce rows of objects. You can convert values using an implementation of the [Conversion](http://github.com/uniVocity/univocity-parsers/tree/master/src/main/java/com/univocity/parsers/conversions/Conversion.java) interface.
The [Conversions](http://github.com/uniVocity/univocity-parsers/tree/master/src/main/java/com/univocity/parsers/conversions/Conversions.java) class provides some useful defaults for you.
For convenience, the [ObjectRowListProcessor](http://github.com/uniVocity/univocity-parsers/tree/master/src/main/java/com/univocity/parsers/common/processor/ObjectRowListProcessor.java) can be used to store all rows into a list.
```java
// ObjectRowProcessor converts the parsed values and gives you the resulting row.
ObjectRowProcessor rowProcessor = new ObjectRowProcessor() {
@Override
public void rowProcessed(Object[] row, ParsingContext context) {
//here is the row. Let's just print it.
println(out, Arrays.toString(row));
}
};
// converts values in the "Price" column (index 4) to BigDecimal
rowProcessor.convertIndexes(Conversions.toBigDecimal()).set(4);
// converts the values in columns "Make, Model and Description" to lower case, and sets the value "chevy" to null.
rowProcessor.convertFields(Conversions.toLowerCase(), Conversions.toNull("chevy")).set("Make", "Model", "Description");
// converts the values at index 0 (year) to BigInteger. Nulls are converted to BigInteger.ZERO.
rowProcessor.convertFields(new BigIntegerConversion(BigInteger.ZERO, "0")).set("year");
CsvParserSettings parserSettings = new CsvParserSettings();
parserSettings.getFormat().setLineSeparator("\n");
parserSettings.setRowProcessor(rowProcessor);
parserSettings.setHeaderExtractionEnabled(true);
CsvParser parser = new CsvParser(parserSettings);
//the rowProcessor will be executed here.
parser.parse(getReader("/examples/example.csv"));
```
After applying the conversions, the output will be:
```
[1997, ford, e350, ac, abs, moon, 3000.00]
[1999, null, venture "extended edition", null, 4900.00]
[1996, jeep, grand cherokee, must sell!
air, moon roof, loaded, 4799.00]
[1999, null, venture "extended edition, very large", null, 5000.00]
[0, null, venture "extended edition", null, 4900.00]
```
### Using annotations to map your java beans ###
Use the [Parsed](http://github.com/uniVocity/univocity-parsers/tree/master/src/main/java/com/univocity/parsers/annotations/Parsed.java) annotation to map the property to a field in the CSV file. You can map the property using a field name as declared in the headers,
or the column index in the input.
Each annotated operation maps to a [Conversion](http://github.com/uniVocity/univocity-parsers/tree/master/src/main/java/com/univocity/parsers/conversions/Conversion.java) and they are executed in the same sequence they are declared.
This example works with the csv file [bean_test.csv](http://github.com/uniVocity/univocity-parsers/tree/master/src/test/resources/examples/bean_test.csv)
```java
class TestBean {
// if the value parsed in the quantity column is "?" or "-", it will be replaced by null.
@NullString(nulls = { "?", "-" })
// if a value resolves to null, it will be converted to the String "0".
@Parsed(defaultNullRead = "0")
private Integer quantity; // The attribute type defines which conversion will be executed when processing the value.
// In this case, IntegerConversion will be used.
// The attribute name will be matched against the column header in the file automatically.
@Trim
@LowerCase
// the value for the comments attribute is in the column at index 4 (0 is the first column, so this means fifth column in the file)
@Parsed(index = 4)
private String comments;
// you can also explicitly give the name of a column in the file.
@Parsed(field = "amount")
private BigDecimal amount;
@Trim
@LowerCase
// values "no", "n" and "null" will be converted to false; values "yes" and "y" will be converted to true
@BooleanString(falseStrings = { "no", "n", "null" }, trueStrings = { "yes", "y" })
@Parsed
private Boolean pending;
//
```
Instances of annotated classes are created with by [BeanProcessor](http://github.com/uniVocity/univocity-parsers/tree/master/src/main/java/com/univocity/parsers/common/processor/BeanProcessor.java) and [BeanListProcessor](http://github.com/uniVocity/univocity-parsers/tree/master/src/main/java/com/univocity/parsers/common/processor/BeanListProcessor.java):
```java
// BeanListProcessor converts each parsed row to an instance of a given class, then stores each instance into a list.
BeanListProcessor rowProcessor = new BeanListProcessor(TestBean.class);
CsvParserSettings parserSettings = new CsvParserSettings();
parserSettings.setRowProcessor(rowProcessor);
parserSettings.setHeaderExtractionEnabled(true);
CsvParser parser = new CsvParser(parserSettings);
parser.parse(getReader("/examples/bean_test.csv"));
// The BeanListProcessor provides a list of objects extracted from the input.
List beans = rowProcessor.getBeans();
```
Here is the output produced by the `toString()` method of each [TestBean](http://github.com/uniVocity/univocity-parsers/tree/master/src/test/java/com/univocity/parsers/examples/TestBean.java) instance:
```
[TestBean [quantity=1, comments=?, amount=555.999, pending=true], TestBean [quantity=0, comments=" something ", amount=null, pending=false]]
```
### Using your own conversions in annotations ###
Any implementation of [Conversion](http://github.com/uniVocity/univocity-parsers/tree/master/src/main/java/com/univocity/parsers/conversions/Conversion.java) can be used in fields annotated with [Parsed](http://github.com/uniVocity/univocity-parsers/tree/master/src/main/java/com/univocity/parsers/annotations/Parsed.java). The following class converts delimited Strings to a set of words (when reading) and a set of words to a delimited String with all words in the set (for writing). To do this, all you need is to introduce a varargs constructor to your class, so it can also be initialized with `String... args`:
```java
class WordsToSetConversion implements Conversion> {
private final String separator;
private final boolean toUpperCase;
public WordsToSetConversion(String... args) {
String separator = ",";
boolean toUpperCase = true;
if (args.length == 1) {
separator = args[0];
}
if (args.length == 2) {
toUpperCase = Boolean.valueOf(args[1]);
}
this.separator = separator;
this.toUpperCase = toUpperCase;
}
public WordsToSetConversion(String separator, boolean toUpperCase) {
this.separator = separator;
this.toUpperCase = toUpperCase;
}
@Override
public Set execute(String input) {
if (input == null) {
return Collections.emptySet();
}
if (toUpperCase) {
input = input.toUpperCase();
}
Set out = new TreeSet();
for (String token : input.split(separator)) {
//extracting words separated by white space as well
for (String word : token.trim().split("\\s")) {
out.add(word.trim());
}
}
return out;
}
//
```
Let's use our beaten up example to create instances of [Car](http://github.com/uniVocity/univocity-parsers/tree/master/src/test/java/com/univocity/parsers/examples/Car.java) from all entries in [example.csv](http://github.com/uniVocity/univocity-parsers/tree/master/src/test/resources/examples/example.csv). Now we want to split the words in the `description` field add them to a set of words. All we hate to do is this:
```java
class Car {
@Parsed
private Integer year;
@Convert(conversionClass = WordsToSetConversion.class, args = { ",", "true" })
@Parsed
private Set description;
//
```
uniVocity-parsers will create an instance of [WordsToSetConversion](http://github.com/uniVocity/univocity-parsers/tree/master/src/test/java/com/univocity/parsers/examples/WordsToSetConversion.java) using the given arguments. Now, let's use the good old [BeanListProcessor](http://github.com/uniVocity/univocity-parsers/tree/master/src/main/java/com/univocity/parsers/common/processor/BeanListProcessor.java) to parse and generate a list of [Car](http://github.com/uniVocity/univocity-parsers/tree/master/src/test/java/com/univocity/parsers/examples/Car.java)s from our file
```java
BeanListProcessor rowProcessor = new BeanListProcessor(Car.class);
parserSettings.setRowProcessor(rowProcessor);
CsvParser parser = new CsvParser(parserSettings);
parser.parse(getReader("/examples/example.csv"));
//Let's get our cars
List cars = rowProcessor.getBeans();
for (Car car : cars) {
// Let's get only those cars that actually have some description
if (!car.getDescription().isEmpty()) {
println(out, car.getDescription() + " - " + car.toString());
}
}
```
After executing this to print only those cars that have a description, the output will be:
```
[ABS, AC, MOON] - year=1997, make=Ford, model=E350, price=3000.00
[AIR, LOADED, MOON, MUST, ROOF, SELL!] - year=1996, make=Jeep, model=Grand Cherokee, price=4799.00
```
### Reading master-detail style files ###
Use [MasterDetailProcessor](http://github.com/uniVocity/univocity-parsers/tree/master/src/main/java/com/univocity/parsers/common/processor/MasterDetailProcessor.java) or [MasterDetailListProcessor](http://github.com/uniVocity/univocity-parsers/tree/master/src/main/java/com/univocity/parsers/common/processor/MasterDetailListProcessor.java) to produce [MasterDetailRecord](http://github.com/uniVocity/univocity-parsers/tree/master/src/main/java/com/univocity/parsers/common/processor/MasterDetailRecord.java) objects.
A simple example a master-detail file is in the [master_detail.csv](http://github.com/uniVocity/univocity-parsers/tree/master/src/test/resources/examples/master_detail.csv) file.
Each [MasterDetailRecord](http://github.com/uniVocity/univocity-parsers/tree/master/src/main/java/com/univocity/parsers/common/processor/MasterDetailRecord.java) holds a master record row and its list of associated detail rows.
```java
// 1st, Create a RowProcessor to process all "detail" elements
ObjectRowListProcessor detailProcessor = new ObjectRowListProcessor();
// converts values at in the "Amount" column (position 1 in the file) to integer.
detailProcessor.convertIndexes(Conversions.toInteger()).set(1);
// 2nd, Create MasterDetailProcessor to identify whether or not a row is the master row.
// the row placement argument indicates whether the master detail row occurs before or after a sequence of "detail" rows.
MasterDetailListProcessor masterRowProcessor = new MasterDetailListProcessor(RowPlacement.BOTTOM, detailProcessor) {
@Override
protected boolean isMasterRecord(String[] row, ParsingContext context) {
//Returns true if the parsed row is the master row.
//In this example, rows that have "Total" in the first column are master rows.
return "Total".equals(row[0]);
}
};
// We want our master rows to store BigIntegers in the "Amount" column
masterRowProcessor.convertIndexes(Conversions.toBigInteger()).set(1);
CsvParserSettings parserSettings = new CsvParserSettings();
parserSettings.setHeaderExtractionEnabled(true);
// Set the RowProcessor to the masterRowProcessor.
parserSettings.setRowProcessor(masterRowProcessor);
CsvParser parser = new CsvParser(parserSettings);
parser.parse(getReader("/examples/master_detail.csv"));
// Here we get the MasterDetailRecord elements.
List rows = masterRowProcessor.getRecords();
MasterDetailRecord masterRecord = rows.get(0);
// The master record has one master row and multiple detail rows.
Object[] masterRow = masterRecord.getMasterRow();
List