# synapse-spark-discovery-tooling

**Repository Path**: mirrors_microsoft/synapse-spark-discovery-tooling

## Basic Information

- **Project Name**: synapse-spark-discovery-tooling
- **Description**: A tool for discovering Synapse workloads to support migration assessment and planning.
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-09-12
- **Last Updated**: 2025-10-11

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Synapse Spark Discovery Tool

The **Synapse Discovery Tool** is a migration assistant that helps you explore the configuration of Azure Synapse Analytics environments across one or more Azure subscriptions. 
It currently collects metadata for Synapse workspaces, notebooks, Spark pools, and Spark job definitions, and can export results in JSON for migrating to Fabric. 

---

## ✨ Features

- **Multi-subscription scanning**  
  Provide one or more Azure subscription IDs in a config file, and the tool scans all Synapse workspaces.

- **Workspace metadata**  
  Collects key properties including provisioning state, private endpoint connections, and exfilteration settings.

- **Notebook inventory**  
  Lists all notebooks within each workspace, capturing runtime details, Spark pool references, and session properties.

- **Spark pool metadata**  
  Retrieves Spark Big Data pool configurations such as:
  - Node size, count, and family
  - Spark version
  - Autoscale and dynamic executor allocation settings
  - Library requirements and custom libraries

- **Spark job definitions**  
  Captures all Spark job definitions in a workspace, including runtime, target pool, language, files, resources, and arguments.

- **Summaries & reporting**  
  - Console output with detailed workspace, notebook, pool, and job properties  
  - High-level summary of number of Notebooks, SJDs, pools in each workspace.
  - JSON export of all the details collected. 

---

## 📦 Installation

Make sure you run the tool in **Azure Cloud Shell**. Follow these steps:

1. **Clone the repository**:

~~~bash
git clone https://github.com/microsoft/fabric-migration-assistant.git
cd fabric-migration-assistant
~~~

2. **Install Python Dependencies**:
~~~bash   
pip install -r requirements.txt
~~~

3.**Update the configuration file**:
~~~bash
cd tooling
~~~
Edit the `config.json` file and add your Azure subscription IDs:

~~~json
{
  "subscription_ids": [
    "id_here"
  ]
}
~~~

4. Run the script to fetch Synapse Spark metadata
~~~
python fetch_all_synapse_spark_metadata.py config.json
~~~

## 🚧 Upcoming Features
- Discovery of pipelines that invoke Notebooks  
- Discovery of pipelines that invoke Spark Job Definitions (SJD)