Summary of "Стратегия тестирования баз данных — вебинар"
Webinar overview
This summary covers a webinar by Alexander (database engineer / team lead at Bestos Technologies) about database testing strategies for large schemas and migration projects (source → target DB). The talk focused on types of database tests, migration-specific challenges, test generation and automation, running tests at scale, measuring code coverage, and recommended processes and tooling.
Key concepts and testing types
-
Test categories:
- Unit tests: functions and procedures.
- CRUD tests: insert/update/select/delete.
- Integration tests: database ↔ other components.
- Stress / load tests and performance tests.
- Permission and configuration tests.
-
Test structure:
- Single test scripts (SQL) with parameterization.
- Grouping of tests into projects or suites for individual or batch runs.
-
Migration-specific checks:
- Object presence and validity after migration.
- Data comparisons, including type and whitespace/format issues.
- Behavioral equivalence of stored code (procedures/functions).
- Preservation and formatting of user-facing error messages.
- Differences between DBMS dialects and transaction modes.
Test-first approach recommended: write tests for the source and expected target behavior before migration where possible.
Large-schema / migration project realities
- Example project scale cited: ~6,000 tables/views, ~5,000 triggers, ~100,000 stored procedures/functions across layers.
- Team and environment constraints:
- Small test teams (2–5 people) vs huge object counts.
- Limited test environments and many customer configurations.
- Repository size limits (e.g., Bitbucket 2 GB) and VPN/access constraints.
- Maintenance cost: tests must be updated as refactorings and conversion rules change — recurring project expense.
Test coverage strategies and scoping
-
Scoping choices to decide with stakeholders:
- Test only objects actually called by applications (top-layer).
- Cover layer-by-layer (progressively deeper).
- Cover only migrated objects.
- Attempt whole-database coverage (very costly).
-
Prioritization guidance:
- Start with objects exercised by applications (triggers, tables, views) to get early value.
- Expand deeper by layer as time and budget allow.
- Decide up front whether to include auxiliary objects (DB-admin, third-party).
Test creation approaches and automation
-
Creation approaches:
- Manual authoring (slow; ~15 minutes per test on average).
- Template-based project generation.
- Intelligent automated generation using schema metadata.
-
Automation practices described:
- Inspect the data dictionary (column types, defaults, constraints) to generate inputs.
- Use customer-provided test data and application trace logs to derive parameters and sequences.
- Use Perl scripts to generate project/test XML templates; store individual tests as editable JSON for quick fixes.
- Use templates so mass changes (e.g., conversion-rule changes) can be applied programmatically.
-
Benefits:
- Faster ramp-up and less tedious than fully manual work.
- Easier bulk adaptation after schema or type changes.
Test execution at scale
- Orchestration:
- Jenkins runs test projects across cloud VMs/instances with multiple parallel runners to reduce wall time.
- Performance example:
- Serial run on one 16 GB / 8-core machine: ~15–17 hours.
- Parallel runs (2 machines) roughly halve run time; 4+ parallel instances can bring runs under a working day.
- Test design goal: minimize test mutual interference (locking) so concurrency is effective.
Code coverage: measuring and reporting
- Coverage metrics:
- Count of covered objects.
- Line-level vs basic code-block coverage.
- Coverage ratio = covered blocks / total blocks.
-
Best practices:
- Start collecting coverage from day 0 to measure progress.
- Define what “100%” means before collecting statistics.
- Exclude unreachable code using pragmas/annotations so coverage is meaningful.
-
Instrumentation and tooling examples:
- Oracle: built-in coverage collection in recent versions (e.g., Oracle 12.2) — enable collection, run tests, disable, store stats.
- Piggly (Ruby): compiles functions/procedures with debug calls to log execution and produce HTML reports. The speaker’s team adapted it to write debug data to DB tables.
- Custom pipeline: collect logs/traces, filter by test-run sources (IP addresses) or use separate schemas to distinguish test vs application traffic, then merge coverage results.
-
Reporting:
- Jenkins + test-runner output provides pass/fail counts and code-coverage per schema/project.
- Example reported scale: ~24k CRUD tests for ~3.5k tables, ~25k unit tests for ~15k functions, 39 test projects; fail rate ~0.2% for “core” tests in their setup.
Operational recommendations and best practices
- Define measurement scope and what “100%” means before collecting statistics.
- Decide scope with stakeholders early to avoid wasted effort.
- Automate test generation and project creation from day 1 where possible.
- Use metadata (dictionary) and application traces to generate realistic test inputs.
- Keep tests independent to enable parallel runs and reduce runtime.
- Plan and budget for test maintenance — updates are required as conversion rules or DB logic change.
- Separate test traffic from application traffic using separate schemas/databases or filtering to support parallel execution and merging of coverage data.
- Store tests in editable formats (XML/JSON) so individual tests can be quickly inspected and edited without reloading whole projects.
Tools and product mentions
- Bestos Technologies internal “test organizer” module (free mode available on their website).
- Jenkins for orchestration and parallel execution.
- Perl scripts for project/test generation from dictionary metadata.
- Piggly (open-source Ruby project) for instrumentation and coverage reporting (adapted).
- Bitbucket (mentioned for repository-size constraints).
- General use of cloud VMs / CI runners.
Practical outcomes and numbers from the presented project
- Object counts: thousands of DB objects (6k tables/views, 5k triggers, ~100k SPs/functions).
- Tests created: ~24k CRUD tests (covering ~3.5k tables), ~25k unit tests (covering ~15k functions), 39 projects.
- Test execution times: single-server serial runs ~15–17 hours; parallelization reduces runtime proportionally.
- Coverage behavior: adding new code increased object count coverage but reduced block-level percentage — illustrates the need for a consistent measurement baseline.
Challenges highlighted
- Limited staff and environments for very large projects.
- Repository size and access constraints (e.g., Bitbucket 2 GB cap).
- Mass refactorings requiring sweeping test updates unless templated/automated.
- Sparse initial documentation — tests often become the best living documentation.
- Difficulty convincing customers to invest in coverage and ongoing maintenance.
Actionable steps and tutorial-style guide
- Define scope: decide which schemas and objects to include.
- Prefer tests-first for new or changed objects when possible.
- Use dictionary metadata and production traces to auto-generate realistic tests.
- Store tests in editable formats (JSON/XML) to support quick edits and templated mass changes.
- Automate project/test generation and CI execution early in the project.
- Collect coverage metrics from day 0 and agree on what is measured.
- Use pragmas/annotations to exclude unreachable code blocks from coverage calculations.
Speakers and sources
- Alexander — database engineer and team lead, Bestos Technologies (primary presenter).
- Bestos Technologies — provider of migration and testing services; maintains an internal test/organizer product.
- Other mentions: Katya (answered a Q&A question), Piggly, Jenkins, Bitbucket.
Category
Technology
Share this summary
Is the summary off?
If you think the summary is inaccurate, you can reprocess it with the latest model.