Skip to content

Commit 7f67fdb

Browse files
author
UnicoLab
committed
feat: adding some new features
1 parent c083ca0 commit 7f67fdb

24 files changed

Lines changed: 4169 additions & 64 deletions

.graphflow_cache/cache_index.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
{}

README.md

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -133,20 +133,26 @@ pipeline.export_graph(format="html", output="my_pipeline.html")
133133
- **Testing**: Unit tests with parallel execution support
134134
- **Development Tools**: Makefile, pre-commit hooks, linting, formatting
135135

136+
### ✅ Recently Implemented
137+
138+
- **Distributed Executors**: Ray and Dask integration ✅
139+
- **Advanced Caching**: Content-addressed caching system ✅
140+
- **Data Validation**: Schema validation and data quality checks ✅
141+
- **Graph Visualization**: Dynamic graph export and visualization ✅
142+
- **Jupyter Notebooks**: Interactive examples and tutorials ✅
143+
- **Enhanced CLI**: Rich output and better error handling ✅
144+
136145
### 🚧 In Development
137146

138-
- **Distributed Executors**: Ray and Dask integration
139147
- **Cloud Executors**: Vertex AI, AWS Batch, Azure ML support
140-
- **Advanced Caching**: Content-addressed caching system
141-
- **Data Validation**: Schema validation and data quality checks
142-
- **Graph Visualization**: Dynamic graph export and visualization
143148
- **Performance Profiling**: Built-in performance monitoring
149+
- **Streaming Support**: Real-time data processing capabilities
144150

145151
### 🎯 Roadmap
146152

147-
- **v0.2.0**: Distributed execution and cloud backends
148-
- **v0.3.0**: Advanced caching and data validation
149-
- **v0.4.0**: Graph visualization and performance profiling
153+
- **v0.2.0**: Distributed execution, caching, validation, and visualization (Current)
154+
- **v0.3.0**: Cloud backends and performance profiling
155+
- **v0.4.0**: Streaming support and advanced ML features
150156
- **v1.0.0**: Production-ready release with full feature set
151157

152158
## 🛠️ Development

advanced_features_pipeline.json

Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
{
2+
"pipeline": {
3+
"name": "advanced_features_pipeline",
4+
"base_uri": "data/",
5+
"context": {
6+
"data_source": "advanced_sample",
7+
"random_state": 42,
8+
"n_estimators": 100,
9+
"executor": "auto",
10+
"batch_size": 1000,
11+
"lookback_days": 30,
12+
"min_samples": 100,
13+
"test_size": 0.2,
14+
"quality_threshold": 0.95,
15+
"target_column": "target",
16+
"validation_strict": false,
17+
"cache_ttl": "24h",
18+
"max_workers": 4,
19+
"aggregation_level": "category"
20+
}
21+
},
22+
"graph": {
23+
"nodes": [
24+
{
25+
"name": "validate_data",
26+
"function": "validate_data",
27+
"module": "advanced_features",
28+
"inputs": [
29+
"raw/advanced_data"
30+
],
31+
"outputs": [
32+
"validated/advanced_data"
33+
],
34+
"tags": [
35+
"validation",
36+
"quality_check"
37+
],
38+
"cache_ttl": "1h"
39+
},
40+
{
41+
"name": "process_data",
42+
"function": "process_data",
43+
"module": "advanced_features",
44+
"inputs": [
45+
"validated/advanced_data"
46+
],
47+
"outputs": [
48+
"processed/advanced_data"
49+
],
50+
"tags": [
51+
"processing",
52+
"aggregation"
53+
],
54+
"cache_ttl": "2h"
55+
},
56+
{
57+
"name": "create_advanced_features",
58+
"function": "create_advanced_features",
59+
"module": "advanced_features",
60+
"inputs": [
61+
"processed/advanced_data"
62+
],
63+
"outputs": [
64+
"features/advanced_features"
65+
],
66+
"tags": [
67+
"feature_engineering",
68+
"ml_prep"
69+
],
70+
"cache_ttl": "4h"
71+
},
72+
{
73+
"name": "prepare_advanced_model_data",
74+
"function": "prepare_advanced_model_data",
75+
"module": "advanced_features",
76+
"inputs": [
77+
"features/advanced_features"
78+
],
79+
"outputs": [
80+
"final/advanced_model_data"
81+
],
82+
"tags": [
83+
"final_preparation",
84+
"model_ready"
85+
],
86+
"cache_ttl": "8h"
87+
}
88+
],
89+
"edges": [
90+
{
91+
"source": "validate_data",
92+
"target": "process_data"
93+
},
94+
{
95+
"source": "process_data",
96+
"target": "create_advanced_features"
97+
},
98+
{
99+
"source": "create_advanced_features",
100+
"target": "prepare_advanced_model_data"
101+
}
102+
]
103+
}
104+
}
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# GraphFlow: Getting Started\n",
8+
"\n",
9+
"This notebook demonstrates the basic concepts of GraphFlow, including:\n",
10+
"- Creating pipelines\n",
11+
"- Defining nodes with automatic context management\n",
12+
"- Running pipelines with different executors\n",
13+
"- Exporting pipeline graphs\n",
14+
"\n",
15+
"## Installation\n",
16+
"\n",
17+
"First, let's install GraphFlow and its dependencies:\n"
18+
]
19+
},
20+
{
21+
"cell_type": "code",
22+
"execution_count": null,
23+
"metadata": {},
24+
"outputs": [],
25+
"source": [
26+
"# Install GraphFlow (uncomment if needed)\n",
27+
"# !pip install graphflow[all]\n",
28+
"\n",
29+
"# Import required libraries\n",
30+
"import pandas as pd\n",
31+
"import numpy as np\n",
32+
"from graphflow import Pipeline, context, dataset, node\n",
33+
"import matplotlib.pyplot as plt\n",
34+
"import seaborn as sns\n",
35+
"\n",
36+
"# Set up plotting\n",
37+
"plt.style.use('seaborn-v0_8')\n",
38+
"sns.set_palette(\"husl\")\n",
39+
"\n",
40+
"print(\"✅ GraphFlow imported successfully!\")\n"
41+
]
42+
}
43+
],
44+
"metadata": {
45+
"language_info": {
46+
"name": "python"
47+
}
48+
},
49+
"nbformat": 4,
50+
"nbformat_minor": 2
51+
}

0 commit comments

Comments
 (0)