Official Bank 0/186

Databricks Certified Professional Data Engineer (DATABRICKS-CERTIFIED-PROFESSIONAL-DATA-ENGINEER) - DataBricks Actual Exam Questions

Last updated on May 13, 2026

97% Exam Compliance
186 Total Questions
1
Question

A data engineer needs to install the PyYAML Python package within an air-gapped Databricks environment. The workspace has no direct internet access to PyPI. The engineer has downloaded the .whl file locally and wants it available automatically on all new clusters. Which approach should the data engineer use?

Options
A

Upload the PyYAML .whl file to the user home directory and create a cluster-scoped init script to install it.

B

Upload the PyYAML .whl file to a Unity Catalog Volume, ensure it’s allow-listed, and create a cluster-scoped init script that installs it from that path.

C

Set up a private PyPI repository and install via pip index URL.

D

Add the .whl file to Databricks Git Repos and assume automatic installation.

Discussion (0 comments)

to join the discussion

Community Discussion

No discussions yet. Be the first to ask!

2
Question

Each configuration below is identical to the extent that each cluster has 400 GB total of RAM, 160 total cores and only one Executor per VM. Given a job with at least one wide transformation, which of the following cluster configurations will result in maximum performance?

Options
A

• Total VMs; 1 • 400 GB per Executor • 160 Cores / Executor

B

• Total VMs: 8 • 50 GB per Executor • 20 Cores / Executor C. • Total VMs: 4 • 100 GB per Executor • 40 Cores/Executor D. • Total VMs:2 • 200 GB per Executor • 80 Cores / Executor

Discussion (0 comments)

to join the discussion

Community Discussion

No discussions yet. Be the first to ask!

3
Question

The data engineer team is configuring environment for development testing, and production before beginning migration on a new data pipeline. The team requires extensive testing on both the code and data resulting from code execution, and the team want to develop and test against similar production data as possible. A junior data engineer suggests that production data can be mounted to the development testing environments, allowing pre production code to execute against production data. Because all users have Admin privileges in the development environment, the junior data engineer has offered to configure permissions and mount this data for the team. Which statement captures best practices for this situation?

Options
A

Because access to production data will always be verified using passthrough credentials it is safe to mount data to any Databricks development environment.

B

All developer, testing and production code and data should exist in a single unified workspace; creating separate environments for testing and development further reduces risks.

C

In environments where interactive code will be executed, production data should only be accessible with read permissions; creating isolated databases for each environment further reduces risks.

D

Because delta Lake versions all data and supports time travel, it is not possible for user error or malicious actors to permanently delete production data, as such it is generally safe to mount production data anywhere.

Discussion (0 comments)

to join the discussion

Community Discussion

No discussions yet. Be the first to ask!

4
Question

A data engineer, User A, has promoted a new pipeline to production by using the REST API to programmatically create several jobs. A DevOps engineer, User B, has configured an external orchestration tool to trigger job runs through the REST API. Both users authorized the REST API calls using their personal access tokens. Which statement describes the contents of the workspace audit logs concerning these events?

Options
A

Because the REST API was used for job creation and triggering runs, a Service Principal will be automatically used to identity these events.

B

Because User B last configured the jobs, their identity will be associated with both the job creation events and the job run events.

C

Because these events are managed separately, User A will have their identity associated with the job creation events and User B will have their identity associated with the job run events.

D

Because the REST API was used for job creation and triggering runs, user identity will not be captured in the audit logs.

E

Because User A created the jobs, their identity will be associated with both the job creation events and the job run events.

Discussion (0 comments)

to join the discussion

Community Discussion

No discussions yet. Be the first to ask!

5
Question

Review the following error traceback: Which statement describes the error being raised?

Question image
Options
A

The code executed was PvSoark but was executed in a Scala notebook.

B

There is no column in the table named heartrateheartrateheartrate

C

There is a type error because a column object cannot be multiplied.

D

There is a type error because a DataFrame object cannot be multiplied.

E

There is a syntax error because the heartrate column is not correctly identified as a column.

Discussion (0 comments)

to join the discussion

Community Discussion

No discussions yet. Be the first to ask!

Finish Practice?

Are you sure you want to finish? This will end your practice session.