Skip to main content
Version: Next

JDBC

Incubating

Important Capabilities

CapabilityStatusNotes
Asset ContainersEnabled by default
Detect Deleted EntitiesOptionally enabled via stateful_ingestion.remove_stale_metadata
Platform InstanceEnabled by default
Table-Level LineageEnabled by default

The JDBC source plugin provides comprehensive metadata extraction capabilities for JDBC-compliant databases. It supports:

  • Extraction of database structure (tables, views, columns)
  • Schema metadata including data types and constraints
  • View definitions and dependencies
  • Stored procedures (optional)
  • SSL connections with certificate management
  • Maven-based driver management
  • Flexible pattern matching for schema/table filtering

The plugin uses Java Database Connectivity (JDBC) APIs through JPype and JayDeBeApi, allowing it to support any database with a JDBC driver. It handles connection pooling, retries, and proper resource cleanup to ensure reliable metadata extraction.

Setup

This integration pulls metadata from databases via JDBC connections. It supports various database systems through their respective JDBC drivers.

You'll need:

  1. A running database instance that you want to connect to
  2. The appropriate JDBC driver for your database
  3. Valid credentials with permissions to read metadata

Steps to Get Started

  1. JDBC Driver Setup:

    • Option 1: Download the JDBC driver JAR file for your database
    • Option 2: Use Maven coordinates to automatically download the driver
  2. Permissions Required:

    • READ access to system catalogs/metadata views
    • Ability to execute metadata queries
    • Access to relevant schemas and tables
  3. Connection Information:

    • JDBC connection URL
    • Username and password (if using basic authentication)
    • SSL configuration (if required)
    • Any additional JDBC properties needed for your specific database

CLI based Ingestion

Install the Plugin

The jdbc source works out of the box with acryl-datahub.

Starter Recipe

Check out the following recipe to get started with ingestion! See below for full configuration options.

For general pointers on writing and running a recipe, see our main recipe guide.

source:
type: jdbc
config:
# JDBC Driver Configuration
driver:
driver_class: org.postgresql.Driver # Replace with your database's driver class
# Either specify maven_coordinates or driver_path
maven_coordinates: org.postgresql:postgresql:42.7.1
# driver_path: "/path/to/driver.jar"

# Connection Configuration
connection:
uri: "jdbc:postgresql://localhost:5432//mydb" # Replace with your database URI
username: "user"
password: "pass"

# Optional SSL Configuration
ssl_config:
cert_path: "/path/to/cert"
# cert_type: "pem" # pem, jks, or p12
# cert_password: ""

# Additional JDBC Properties
properties:
applicationName: "datahub_jdbc_ingestion"

# Additional JVM Arguments
jvm_args:
- "-Xmx1g"
- "-Djavax.net.ssl.trustStore=/etc/ssl/certs/java/cacerts"

# Optional: SQL dialect for query parsing
sqlglot_dialect: "postgres" # Replace with your database's dialect

# Optional Filters
schema_pattern:
allow:
- "schema1"
- "schema2"

table_pattern:
allow:
- "schema1.table1"
deny:
- "schema1.temp_.*"

# Feature flags
include_tables: true
include_views: true
include_stored_procedures: false

sink:
# sink configs

Config Details

Note that a . is used to denote nested fields in the YAML recipe.

FieldDescription
platform 
string
Name of platform being ingested.
connection 
JDBCConnectionConfig
Database connection configuration
connection.uri 
string
JDBC URI (jdbc:protocol://host:port/database)
connection.password
string
Database password
connection.properties
map(str,string)
connection.username
string
Database username
connection.ssl_config
SSLConfig
SSL configuration
connection.ssl_config.cert_content
string
Base64 encoded certificate content
connection.ssl_config.cert_password
string
Certificate password if required
connection.ssl_config.cert_path
string
Path to SSL certificate file
connection.ssl_config.cert_type
string
Certificate type (pem, jks, p12)
Default: pem
driver 
JDBCDriverConfig
JDBC driver configuration
driver.driver_class 
string
Fully qualified JDBC driver class name
driver.driver_path
string
Path to JDBC driver JAR
driver.maven_coordinates
string
Maven coordinates (groupId:artifactId:version)
include_stored_procedures
boolean
Include stored procedures in extraction
Default: False
include_tables
boolean
Include tables in extraction
Default: True
include_views
boolean
Include views in extraction
Default: True
platform_instance
string
The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://datahubproject.io/docs/platform-instances/ for more details.
profile_table_row_counts
boolean
Include table row counts
Default: True
profile_table_size
boolean
Include table size metrics
Default: False
sqlglot_dialect
string
sqlglot dialect to use for SQL transpiling
env
string
The environment that all assets produced by this connector belong to
Default: PROD
jvm_args
array
JVM arguments for JDBC driver
Default: []
jvm_args.string
string
schema_pattern
AllowDenyPattern
Regex patterns for schemas
Default: {'allow': ['.*'], 'deny': [], 'ignoreCase': True}
schema_pattern.ignoreCase
boolean
Whether to ignore case sensitivity during pattern matching.
Default: True
schema_pattern.allow
array
List of regex patterns to include in ingestion
Default: ['.*']
schema_pattern.allow.string
string
schema_pattern.deny
array
List of regex patterns to exclude from ingestion.
Default: []
schema_pattern.deny.string
string
table_pattern
AllowDenyPattern
Regex patterns for tables
Default: {'allow': ['.*'], 'deny': [], 'ignoreCase': True}
table_pattern.ignoreCase
boolean
Whether to ignore case sensitivity during pattern matching.
Default: True
table_pattern.allow
array
List of regex patterns to include in ingestion
Default: ['.*']
table_pattern.allow.string
string
table_pattern.deny
array
List of regex patterns to exclude from ingestion.
Default: []
table_pattern.deny.string
string
usage
BaseUsageConfig
Usage statistics configuration
Default: {'bucket_duration': 'DAY', 'end_time': '2025-01-02...
usage.bucket_duration
Enum
Size of the time window to aggregate usage stats.
Default: DAY
usage.end_time
string(date-time)
Latest date of lineage/usage to consider. Default: Current time in UTC
usage.format_sql_queries
boolean
Whether to format sql queries
Default: False
usage.include_operational_stats
boolean
Whether to display operational stats.
Default: True
usage.include_read_operational_stats
boolean
Whether to report read operational stats. Experimental.
Default: False
usage.include_top_n_queries
boolean
Whether to ingest the top_n_queries.
Default: True
usage.start_time
string(date-time)
Earliest date of lineage/usage to consider. Default: Last full day in UTC (or hour, depending on bucket_duration). You can also specify relative time with respect to end_time such as '-7 days' Or '-7d'.
usage.top_n_queries
integer
Number of top queries to save to each table.
Default: 10
usage.user_email_pattern
AllowDenyPattern
regex patterns for user emails to filter in usage.
Default: {'allow': ['.*'], 'deny': [], 'ignoreCase': True}
usage.user_email_pattern.ignoreCase
boolean
Whether to ignore case sensitivity during pattern matching.
Default: True
usage.user_email_pattern.allow
array
List of regex patterns to include in ingestion
Default: ['.*']
usage.user_email_pattern.allow.string
string
usage.user_email_pattern.deny
array
List of regex patterns to exclude from ingestion.
Default: []
usage.user_email_pattern.deny.string
string
view_pattern
AllowDenyPattern
Regex patterns for views
Default: {'allow': ['.*'], 'deny': [], 'ignoreCase': True}
view_pattern.ignoreCase
boolean
Whether to ignore case sensitivity during pattern matching.
Default: True
view_pattern.allow
array
List of regex patterns to include in ingestion
Default: ['.*']
view_pattern.allow.string
string
view_pattern.deny
array
List of regex patterns to exclude from ingestion.
Default: []
view_pattern.deny.string
string
stateful_ingestion
StatefulStaleMetadataRemovalConfig
Base specialized config for Stateful Ingestion with stale metadata removal capability.
stateful_ingestion.enabled
boolean
Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or datahub_api is specified, otherwise False
Default: False
stateful_ingestion.remove_stale_metadata
boolean
Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled.
Default: True

Starter Recipe

source:
type: jdbc
config:
# JDBC Driver Configuration
driver:
driver_class: org.postgresql.Driver # Replace with your database's driver class
# Either specify maven_coordinates or driver_path
maven_coordinates: org.postgresql:postgresql:42.7.1
# driver_path: "/path/to/driver.jar"

# Connection Configuration
connection:
uri: "jdbc:postgresql://localhost:5432//mydb" # Replace with your database URI
username: "user"
password: "pass"

# Optional SSL Configuration
ssl_config:
cert_path: "/path/to/cert"
# cert_type: "pem" # pem, jks, or p12
# cert_password: ""

# Additional JDBC Properties
properties:
applicationName: "datahub_jdbc_ingestion"

# Additional JVM Arguments
jvm_args:
- "-Xmx1g"
- "-Djavax.net.ssl.trustStore=/etc/ssl/certs/java/cacerts"

# Optional: SQL dialect for query parsing
sqlglot_dialect: "postgres" # Replace with your database's dialect

# Optional Filters
schema_pattern:
allow:
- "schema1"
- "schema2"

table_pattern:
allow:
- "schema1.table1"
deny:
- "schema1.temp_.*"

# Feature flags
include_tables: true
include_views: true
include_stored_procedures: false

sink:
# sink configs

Code Coordinates

  • Class Name: datahub.ingestion.source.jdbc.source.JDBCSource
  • Browse on GitHub

Questions

If you've got any questions on configuring ingestion for JDBC, feel free to ping us on our Slack.