SeedBase, a Faker alternative for relational, foreign-key-consistent test data
Faker is the library every developer reaches for when they need a fake name or email. The moment you need a whole database filled, where every foreign key resolves and the row counts look real, you start hand-wiring factories. This is an honest comparison: keep Faker for single values, reach for SeedBase when the schema does the work. SeedBase reads the same schema you seed with Prisma, Django or raw SQL, or compare it with Mockaroo.
Jump to: Faker and foreign keys · Faker vs SeedBase · Prisma seed code · deterministic CI
Where Faker is genuinely the right tool
This is not a takedown. Faker (both faker.js and Python Faker) is excellent at exactly what it claims: generating realistic individual values. Keep it for that.
- Perfect for unit tests: one fake email here, one fake address there.
- Zero infrastructure, runs entirely in your process, and almost every language has a port.
- Infinitely flexible because it is just a function call inside your own code.
- Deterministic per value with
faker.seed(42)for stable snapshots.
The catch is structural, not quality: Faker produces values, with no awareness of your tables, columns or foreign keys. Everything relational is left to you.
The problem: Faker has no schema and no foreign keys
A call like faker.person.fullName() returns a string. It does not know that a user has many orders, that an order needs a valid user_id, or that those rows must be inserted parent-first. Watch the work pile up when you need 10 users, each with 3 posts, in faker.js:
// seed.ts with faker.js, the part nobody budgets for
import { faker } from "@faker-js/faker";
faker.seed(42);
// 1. parents first, and you must capture their ids
const users = Array.from({ length: 10 }, () => ({
id: faker.string.uuid(),
name: faker.person.fullName(),
email: faker.internet.email(),
}));
// 2. children, and you hand-wire the foreign key yourself
const posts = users.flatMap((u) =>
Array.from({ length: 3 }, () => ({
id: faker.string.uuid(),
userId: u.id, // FK wired by hand, every time
title: faker.lorem.sentence(),
status: faker.helpers.arrayElement(["draft", "published"]),
}))
);
// 3. you still have to insert users BEFORE posts, in order,
// and repeat this for comments, tags, every other table...
For three tables that is tolerable. For a real schema with dozens of tables and layered relationships, it becomes a parallel codebase: one factory per model, foreign keys wired by hand, insertion order tracked manually, and the whole thing breaks silently the day someone adds a column or a relation. Python Faker has the same boundary: fake.name(), fake.email() and friends return values one at a time, and the relationships are your job.
factory_boy and FactoryBot do not remove the wiring
The common answer is a factory library on top of Faker: factory_boy in Python, FactoryBot in Ruby. They help, but they shift the work rather than remove it. You write one factory per model and express each relationship with a SubFactory:
# factories.py, factory_boy on top of Faker
import factory
from factory import Faker, SubFactory
class UserFactory(factory.django.DjangoModelFactory):
class Meta:
model = User
name = Faker("name")
email = Faker("email")
class PostFactory(factory.django.DjangoModelFactory):
class Meta:
model = Post
title = Faker("sentence")
author = SubFactory(UserFactory) # the FK, declared by hand
That is still a factory per table, a SubFactory per foreign key, and no awareness of the schema itself. factory_boy does not introspect your database, so every new table or relation is new factory code you write and keep in sync. SeedBase derives all of that from the schema you already have.
How SeedBase reads the schema and keeps foreign keys consistent
SeedBase starts from your schema instead of your factory code. Point it at a SQL CREATE TABLE dump (PostgreSQL or MySQL), a Django models.py, a Prisma schema.prisma, or a live database, and it parses the tables, columns and relations. From there it generates rows where every foreign key references an existing parent, in foreign-key-safe insertion order, with realistic distributions rather than flat uniform randomness. The same 10-users-with-posts example becomes a single call:
// the SeedBase equivalent: rows keyed by table, FK-safe order
import { SeedbaseClient } from "@seedbase/client";
const client = new SeedbaseClient({ token: process.env.SEEDBASE_TOKEN });
const rows = await client.seededRows(process.env.SEEDBASE_PROJECT, {
seed: 42,
rows: 100,
});
// { users: [...], posts: [...], comments: [...] }
// every posts[i].userId already points at a real users row
No id capture, no manual ordering, no per-model factory. The schema is the source of truth, and the foreign keys are correct by construction across hundreds of tables. We tested SeedBase against a real 20-app Django project with 226 tables, the kind of scale where hand-wired factories stop being worth maintaining.
Faker vs SeedBase: an honest comparison
Faker generates values. SeedBase generates relational datasets. The table below is deliberately fair: each tool is strong where it was designed to be.
| Faker (faker.js / Python Faker) | SeedBase | |
|---|---|---|
| What it generates | Single fake values (name, email, address) one call at a time | Whole tables of rows from your schema, value pools included |
| Schema awareness | None, it does not know your tables or columns | Reads SQL, Django models.py, Prisma, or a live DB |
| Foreign keys | Hand-wired in your loops or factory code; breaks when the schema changes | Resolved from the schema; children always reference existing parents, 226-table schemas included |
| Insertion order | You track parent-before-child ordering yourself | Rows returned and pushed in foreign-key-safe order |
| Distributions | Uniform randomness unless you code the skew yourself | Realistic skew built in: long-tail child counts, smart per-table row volumes |
| Determinism | Per value via faker.seed(42) | Per dataset via seed; reference_date pins timestamps |
| Masking production data | Not what Faker does | PII detection plus format-preserving, consistent masking |
| Output | In-process values you assemble into rows | SQL, CSV, JSON, or a direct push into your database |
| Where it runs | Inside your codebase, every language | Web, CLI, Python and Node SDKs, pytest plugin, VS Code and JetBrains, MCP for AI assistants |
conftest.py with Faker is fine, keep it. The crossover comes with schema size and churn: when factory maintenance becomes its own backlog item, generation should move out of your codebase and onto the schema. SeedBase even uses Faker-style value pools under the hood for leaf values, so the realism you like is still there. We tested it against a real 20-app Django project with 226 tables, the scale it was built for.Seeding a Prisma database instead of hand-wiring Faker
If your Faker seed script lived in prisma/seed.ts, the SeedBase replacement runs through the same prisma db seed entry point. Your migrations still own the schema; SeedBase fills it with foreign-key-consistent rows in dependency order:
// prisma/seed.ts
import { PrismaClient } from "@prisma/client";
import { SeedbaseClient } from "@seedbase/client";
import { seedPrisma } from "@seedbase/client/prisma";
const prisma = new PrismaClient();
const client = new SeedbaseClient({ token: process.env.SEEDBASE_TOKEN });
// FK-consistent rows, inserted in dependency order, no factories
await seedPrisma(prisma, client, {
project: process.env.SEEDBASE_PROJECT,
seed: 42,
});
Want the rows in memory for a test fixture instead of a direct insert? seededRows returns them keyed by table, already in foreign-key-safe order, the way the earlier example showed. The full Prisma path, including a runnable offline demo, is on the Prisma test data page.
Exporting a SQL file for CI instead of a Faker loop
Faker in CI means running your seed script and trusting it still wires every foreign key correctly. SeedBase can produce a deterministic SQL file you load before the test suite, with the same seed guaranteeing the same rows every run:
// seed-sql.mjs, run in CI
import { SeedbaseClient } from "@seedbase/client";
import { writeFile } from "node:fs/promises";
const client = new SeedbaseClient({ token: process.env.SEEDBASE_TOKEN });
const gen = await client.generate(process.env.SEEDBASE_PROJECT, { seed: 42, wait: true });
await writeFile("seed.sql", await client.download(gen.id, { format: "sql" }));
# .github/workflows/test.yml
- run: node seed-sql.mjs
- run: psql "$DATABASE_URL" -f seed.sql
Because the data is keyed off the seed, a failing test reproduces locally with the same seed: 42. The SQL path is covered in detail on the SQL test data page, and the Django and pytest workflow on the Django test data page.
When to pick Faker and when to pick SeedBase
Keep Faker when you need a handful of fake values inside a unit test, when you are constructing a single in-memory object, or when a small conftest.py or seed script already covers a tiny schema. It is the right tool for single-field fakes and always will be.
Pick SeedBase when the test needs a populated, foreign-key-consistent database: many tables, real relationships, realistic distributions, deterministic seeds for CI, and optionally masked production data for staging. SeedBase uses Faker-style value pools internally, then adds the schema, foreign-key and distribution logic that a value library does not, so you stop maintaining factory code and let the schema drive the data.
Faker alternative: FAQ
For relational test data, yes. Faker (both faker.js and Python Faker) is a library that generates single fake values, like a name or an email, with no awareness of your schema or foreign keys. SeedBase reads your schema and generates whole tables of rows where every foreign key resolves, with realistic distributions, so you do not hand-wire factories and parent or child ordering yourself.
No. Faker generates values, not relationships. To fill 10 users each with 3 posts, you write the loop yourself: generate the users, capture their ids, then generate posts that point at those ids in the right order. Faker has no concept of a schema or a foreign key, so referential integrity is entirely your code. SeedBase reads the schema and inserts rows in foreign-key-safe order automatically.
Factory libraries wrap Faker and add SubFactory declarations to express relationships, but you still write and maintain one factory per model and wire every foreign key by hand. factory_boy does not introspect your database schema. SeedBase derives the same relationships from the schema itself, so there is no per-model factory code to keep in sync as tables change.
Yes. SeedBase ships a pytest plugin and a Python SDK. A fixture pulls deterministic, seeded rows into your test database in foreign-key-safe order. CI runs are reproducible because the same seed produces the same rows every time.
SeedBase generates individual field values from Faker-style value pools, so the leaf values feel just as realistic. The difference is everything around them: SeedBase adds schema parsing, foreign-key-consistent generation across hundreds of tables, realistic distributions, and deterministic seeding, which a value library does not provide.
Yes. Pass a seed (for example seed: 42) and the same schema produces the same rows on every run, the same idea as faker.seed(42) but applied to the whole relational dataset instead of one value at a time. A pinned reference date makes timestamps reproducible too.
Yes, a free tier without a credit card, including schema import and generation. Paid plans start at €19/month.
Stop hand-wiring foreign keys, free.
Import a schema (SQL, Django models, Prisma, or connect a database), generate FK-consistent data with realistic distributions, and pull it into your dev or CI database. No card required, no sales call.
Create a free accountSee it for your stack: Django test data · Prisma test data · SQL test data · vs Mockaroo · vs Tonic · vs Snaplet · docs