Core Components

This document attempts to give you an overview of the core classes that make up DbUnit. Following are the core interfaces or abstract classes:

Class Description
IDatabaseConnection Interface representing a DbUnit connection to a database.
IDataSet Interface representing a collection of tables.
DatabaseOperation Abstract class representing an operation performed on the database before and after each test.
ValueComparer Interface representing a strategy for comparing values, typically columns.
VerifyTableDefinition Interface representing how to verify a table meets expectations.

IDatabaseConnection

The IDatabaseConnection interface represents a DbUnit connection to a database.

Class Description
DatabaseConnection Wraps a JDBC connection.
DatabaseDataSourceConnection Wraps a JDBC DataSource.

IDataSet

The IDataSet interface represents is a collection of tables. This is the primary abstraction used by DbUnit to manipulate tabular data.

Most commonly used implemetations:

Implementation Description
FlatXmlDataSet Reads and writes flat XML dataset document. Each XML element corresponds to a table row. Each XML element name corresponds to a table name. The XML attributes correspond to table columns.
Flat XML dataset document sample:
<!DOCTYPE dataset SYSTEM "my-dataset.dtd">
<dataset>
    <TEST_TABLE COL0="row 0 col 0" 
                COL1="row 0 col 1"
                COL2="row 0 col 2"/> 
    <TEST_TABLE COL1="row 1 col 1"/>
    <SECOND_TABLE COL0="row 0 col 0" 
                  COL1="row 0 col 1" />
    <EMPTY_TABLE/>
</dataset>

To specify null values, omit corresponding attribute. In the above example, missing COL0 and COL2 attributes of TEST_TABLE second row represents null values.
Table metadata is deduced from the first row of each table by default, whereas it is possible to enable the column sensing feature as described in differentcolumnnumber Beware you may get a NoSuchColumnException if the first row of a table has one or more null values. Because of that, this is highly recommended to use DTD. DbUnit will use the columns declared in the DTD as table metadata. DbUnit only support external system URI. The URI can be absolute or relative.
Another way to cope with this problem is to use the ReplacementDataSet.
XmlDataSet Reads and writes original XML dataset document. This format is very verbose and must conform to the following DTD:
<?xml version="1.0" encoding="UTF-8"?>
<!ELEMENT dataset (table+) | ANY>
<!ELEMENT table (column*, row*)>
<!ATTLIST table
    name CDATA #REQUIRED
>
<!ELEMENT column (#PCDATA)>
<!ELEMENT row (value | null | none)*>
<!ELEMENT value (#PCDATA)>
<!ELEMENT null EMPTY>

XML dataset document sample:
<!DOCTYPE dataset SYSTEM "dataset.dtd">
<dataset>
    <table name="TEST_TABLE">
        <column>COL0</column>
        <column>COL1</column>
        <column>COL2</column>
        <row>
            <value>row 0 col 0</value>
            <value>row 0 col 1</value>
            <value>row 0 col 2</value>
        </row>
        <row>
            <null/>
            <value>row 1 col 1</value>
            <null/>
        </row>
    </table>
    <table name="SECOND_TABLE">
        <column>COLUMN0</column>
        <column>COLUMN1</column>
        <row>
            <value>row 0 col 0</value>
            <value>row 0 col 1</value>
        </row>
    </table>
    <table name='EMPTY_TABLE'>
        <column>COLUMN0</column>
        <column>COLUMN1</column>
    </table>
</dataset>

YamlDataSet Reads and writes Yaml dataset documents. Each toplevel key corresponds to a table, where the value is a list of rows. Each row is a sequence of key:value pairs, where the key corresponds to the table column.
Yaml dataset document sample:
TEST_TABLE:
  - COL0: row 0 col 0
    COL1: row 0 col 1
    COL2: row 0 col 2
  - COL1: row 1 col 1
SECOND_TABLE:
  - COL0: row 0 col 0
    COL1: row 0 col 1
EMPTY_TABLE:

To specify null values, just omit the corresponding column key. In the above example, missing COL0 and COL2 keys of TEST_TABLE second row represents null values.
In contrast to a FlatXmlDataSet, the Table metadata is deduced from the sum of all rows for each table.
StreamingDataSet Consumes a producer and expose its content as a dataset. Provides cursor like forward only access to it and only keeps the active row in memory. Can be used with FlatXmlProducer and XmlProvider.
This is a very efficient way to load XML dataset document when working with forward only database operations (UPDATE, INSERT, REFRESH).
Following sample shows how to load a flat XML dataset with the StreamingDataSet:
    IDataSetProducer producer = new FlatXmlProducer(
            new InputSource("dataset.xml"));
    IDataSet dataSet = new StreamingDataSet(producer);
DatabaseDataSet Adapter that provides access to a database instance as a dataset. This class is not usually instantiated directly but from the factory method IDatabaseConnection.createDataSet().
QueryDataSet Holds collection of tables resulting from database queries.
Following sample snippet creates a dataset containing two tables: FOO, resulting from specified query and BAR, resulting from generated query "SELECT * FROM BAR".
    QueryDataSet dataSet = new QueryDataSet(connection);
    dataSet.addTable("FOO", "SELECT * FROM TABLE WHERE COL='VALUE'");
    dataSet.addTable("BAR");
DefaultDataSet Uses to create datasets programmatically.
CompositeDataSet Combines multiple datasets into a single logical dataset.
FilteredDataSet Decorator that exposes only some tables from decorated dataset. Can be used with different filtering strategies. Some strategies can include/exclude tables without altering their order while others expose tables with a different order.
Strategy Description
IncludeTableFilter Exposes only matching tables pattern without modifying the original table order. Support wildcards.
ExcludeTableFilter Hides matching tables pattern without modifying the original table order. Support wildcards.
SequenceTableFilter Exposes a configured table sequence and can be used to reorder dataset table. This is the original filtering strategy from DbUnit 1.x.
DatabaseSequenceFilter Automatically determine the tables order using foreign/exported keys information. This strategy is vendor independent and should work with any JDBC driver that implement the DatabaseMetaData.getExportedKeys() method.
Support simple multi-level dependency like this:
   A
  / \
 B   C
    / \
   D   E
XlsDataSet Read and writes MS Excel dataset documents. Each sheet represents a table. The first row of a sheet defines the columns names and remaining rows contains the data.
ReplacementDataSet Decorator that replaces placeholder objects from the decorated dataset with replacement objects. Substring substitution is also possible.
Interestingly this provides a new way to specify null values in flat XML datasets. For example you can use a placeholder value, like "[NULL]" in your flat XML dataset and replace it with null at runtime.
<?xml version="1.0"?>
<dataset>
    <TEST_TABLE COL0="row 0 col 0" 
                COL1="[null]"/> 
    <TEST_TABLE COL1="row 1 col 0"
                COL2="row 1 col 1"/> 
</dataset>

Loading the flat XML dataset:
ReplacementDataSet dataSet = new ReplacementDataSet(
        new FlatXmlDataSet(…)); 
dataSet.addReplacementObject("[NULL]", null);

You can choose to use a fail-fast replacement to ensure that all placeholders are actually set and no one is missing in the replacement map. If one is missing the replacement will fail immediately throwing an exception. (Note that the default behaviour is to leave the non-replaced placeholder there and proceeding work silently):
replacementDataSet.setStrictReplacement(true);

CsvDataSet Reads and writes CSV dataset documents. Each CSV file represents a table - one CSV file per table.
  • CSV base filename must match table name.
  • Requires a file named table-ordering.txt containing all table names (same as the base file name; case insensitive but best to match case), one per line, sorted in insert order.
  • All data CSV files reside in the same directory as table-ordering.txt
  • Numeric CSV data file fields must not have trailing spaces before the comma-separator as will interpret it as a String instead of a number.
  • Specify null values as null without quotes.

DatabaseOperation

DatabaseOperation is an abstract class that represents an operation performed on the database before and after each test.

The two most useful operations are REFRESH and CLEAN_INSERT. They are the ones you will deal usually with. They represent two testing strategies with different benefits and tradeoffs.

Operation Description
DatabaseOperation.UPDATE This operation updates the database from the dataset contents. This operation assumes that table data already exists in the target database and fails if this is not the case.
DatabaseOperation.INSERT This operation inserts the dataset contents into the database. This operation assumes that table data does not exist in the target database and fails if this is not the case. To prevent problems with foreign keys, tables must be sequenced appropriately in the dataset.
DatabaseOperation.DELETE This operation deletes only the dataset contents from the database. This operation does not delete the entire table contents but only data that are present in the dataset.
DatabaseOperation.DELETE_ALL Deletes all rows of tables present in the specified dataset. If the dataset does not contains a particular table, but that table exists in the database, the database table is not affected. Table are truncated in reverse sequence.
DatabaseOperation.TRUNCATE_TABLE Truncate tables present in the specified dataset. If the dataset does not contains a particular table, but that table exists in the database, the database table is not affected. Table are truncated in reverse sequence.
DatabaseOperation.REFRESH This operation literally refreshes dataset contents into the target database. This means that data of existing rows are updated and non-existing row get inserted. Any rows which exist in the database but not in dataset stay unaffected. This approach is more appropriate for tests that assume other data may exist in the database.
if they are correctly written, tests using this strategy can even be performed on a populated database like a copy of a production database.
DatabaseOperation.CLEAN_INSERT This composite operation performs a DELETE_ALL operation followed by an INSERT operation. This is the safest approach to ensure that the database is in a known state. This is appropriate for tests that require the database to only contain a specific set of data.
DatabaseOperation.NONE Empty operation that does absolutely nothing.
CompositeOperation This operation combines multiple operations into a single one.
TransactionOperation This operation decorates an operation and executes it within the context of a transaction.
InsertIdentityOperation This operation decorates an insert operation and disables the MS SQL Server automatic identifier generation (IDENTITY) during its execution. Use following constants InsertIdentityOperation.INSERT, InsertIdentityOperation.CLEAN_INSERT or InsertIdentityOperation.REFRESH instead of those defined in DatabaseOperation.

ValueComparer

ValueComparer is a strategy for comparing values, a data comparison definition.

VerifyTableDefinition directly supports using ValueComparer in the definition..

Refer to Data Comparisons for more information.

ValueComparers

ValueComparers is a class of static ValueComparer instances. Please create a feature request for new ValueComparer ideas we can add to this class.

VerifyTableDefinition

VerifyTableDefinition defines a database table to verify (assert on data), specifying include and exclude column filters and optional ValueComparers.

PrepAndExpectedTestCase directly supports using VerifyTableDefinitions in the test definition and execution.