Skip to content

Latest commit

 

History

History
26 lines (13 loc) · 831 Bytes

File metadata and controls

26 lines (13 loc) · 831 Bytes

Databricks Workspace Setup and Data Loading

Objective: Setted up a cluster, create a notebook, and load sample data

Step 1: Created a cluster (done via UI, ensure it's running)

Step 2: Loaded sample data (e.g., NYC Taxi dataset)

data_path = "dbfs:/databricks-datasets/nyctaxi/tripdata/yellow/yellow_tripdata_2019-01.csv.gz" df = spark.read.csv(data_path, header=True, inferSchema=True)

Image

Step 3: Saved to DBFS

df.write.mode("overwrite").parquet("/mnt/sample-data/nyc-taxi")

Image

Step 4: Displayed sample data

display(df.limit(10))

Image