Adding Sample Data

Quite often I need to add sample data to tables for testing or demonstration. I built this small MySQL database complete with names from the U.S. Social Security Administration to help me populate other tables with sample data. The sample database has the following design.

An entity relationship diagram showing tables that can be used to add sample data to a relational database

Creation Script

You can use this SQL script to build the sample database in a schema named sample which will have these functions:

randBool()
Returns either TRUE or FALSE.
randInt(low INT, high INT)
Returns a pseudo random integer between low and high inclusive.
roundToFrac(value DOUBLE, frac DOUBLE)
Returns a value rounded to a fractional value. For example roundToFrac(12.167, 0.25) returns 12.25
gender()
Returns either 'M' or 'F'.
givenName()
Returns a person’s given name, often called the first name.
givenMaleName()
Returns a male person’s given name, often called the first name.
givenFemaleName()
Returns a female person’s given name, often called the first name.
familyName()
Returns a person’s family name, often called the last name.
email(givenName TEXT, familyName TEXT)
Returns a pseudo random email address.
phone(area TEXT)
Returns a telephone area code for given zip code.
street1()
Returns the first part of a street address such as “4708 Ridge Way”.
street2()
Returns the optional second part of a street address such as “Apt 25”.
zipCode()
Returns a ZIP code chosen pseudo randomly.
zipCodeFromState(abbrev TEXT)
Returns a ZIP code picked pseudo randomly from the given state.
city(zip TEXT)
Returns a city for given zip code.
state(zip TEXT)
Returns the state for given zip code.
areaCode(zip TEXT)
Returns a telephone area code for given zip code.
password()
Returns a psuedo random password. This is NOT a secure way to generate a password.

The functions that pseudo randomly choose a row, such as givenName(), familyName(), and zipCode(), choose the row using a query similar to this one:

-- Select one name from the familyName table.
SELECT name
FROM familyName AS t
INNER JOIN (SELECT 1 + FLOOR(RAND() *
    (SELECT MAX(id) FROM familyName)) AS id) AS r
WHERE t.id >= r.id
ORDER BY t.id
LIMIT 1;

This query to pseudo randomly choose a row from a table came from an article about ORDER BY RAND() written by Jan Kneschke. As long as the sample database contains an index for the familyName.id column, which it will because the id column is the primary key of the familyName table, this query will run much faster than the naive solution to pseudo randomly choose a row which is:

-- NAIVE SOLUTION
-- Select one name from the familyName table.
SELECT name
FROM familyName
ORDER BY RAND()
LIMIT 1;

Unfortunately, if someone deletes some of the rows from a table, the very fast solution from Jan Kneshke will not select each row with the same probability. If you must have a pseudo randomly chosen row with equal probability that each row is chosen, then you will need to rewrite some of the functions to use two queries like this:

-- Pseudo randomly choose a row number.
DECLARE size, rowNum INT;
SET size = (SELECT COUNT(*) FROM familyName);
SET rowNum = 1 + FLOOR(RAND() * size);

-- Select one name from the familyName table.
SELECT name
FROM familyName
ORDER BY id
LIMIT rowNum, 1;

Using the Sample Database

Here is an example SQL script that shows how to use the functions in the sample schema to populate a customer table in a schema named company.

-- -----------------------------------------------------
-- Schema company
-- -----------------------------------------------------
DROP SCHEMA IF EXISTS company ;

CREATE SCHEMA IF NOT EXISTS company DEFAULT CHARACTER SET utf8mb4 ;
USE company ;

-- -----------------------------------------------------
-- Table company.customer
-- -----------------------------------------------------
DROP TABLE IF EXISTS customer ;

CREATE TABLE IF NOT EXISTS customer (
    customer_id INT NOT NULL AUTO_INCREMENT,
    givenName VARCHAR(45) NOT NULL,
    familyName VARCHAR(45) NOT NULL,
    phone VARCHAR(45) NULL DEFAULT NULL,
    email VARCHAR(45) NULL DEFAULT NULL,
    emailList TINYINT NULL DEFAULT NULL,
    `password` VARCHAR(45) NULL DEFAULT NULL,
    address VARCHAR(45) NOT NULL,
    city VARCHAR(45) NOT NULL,
    state CHAR(2) NOT NULL,
    zip CHAR(5) NOT NULL,
    PRIMARY KEY (customer_id))
ENGINE = InnoDB
AUTO_INCREMENT = 1 ;


-- Inserts N people into the customer table.
DROP PROCEDURE IF EXISTS addCustomers;
DELIMITER //
CREATE PROCEDURE addCustomers(n INT)
    NOT DETERMINISTIC
    MODIFIES SQL DATA
BEGIN
    REPEAT
        CALL addCustomer();
        SET n = n - 1;
    UNTIL n = 0
    END REPEAT;
END//
DELIMITER ;


-- Inserts one person into the customer table.
DROP PROCEDURE IF EXISTS addCustomer;
DELIMITER //
CREATE PROCEDURE addCustomer()
    NOT DETERMINISTIC
    MODIFIES SQL DATA
BEGIN
    DECLARE gn, fn, e, p, el, pw, a, c, s, z TEXT;

    -- Get a given name and family name.
    SET gn = sample.givenName();
    SET fn = sample.familyName();

    -- Possibly get an email address and password.
    SET el = sample.randBool();
    IF el THEN
        SET e = sample.email(gn, fn);
        SET pw = sample.password();
    END IF;

    -- Get a physical address.
    SET a = sample.street1();
    SET z = sample.zipCode();
    SET c = sample.city(z);
    SET s = sample.state(z);

    -- Get a phone number.
    SET p = sample.phone(sample.areaCode(z));

    -- Insert one row into the customer table.
    INSERT INTO company.customer
        (givenName, familyName, phone, email, emailList,
	`password`, address, city, state, zip)
    VALUES
        (gn, fn, p, e, el, pw, a, c, s, z);
END//
DELIMITER ;


-- Call the addCustomers stored procedure which
-- will insert 1000 people into the customer table.
CALL addCustomers(1000) ;