-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathmain.py
More file actions
57 lines (44 loc) · 1.9 KB
/
main.py
File metadata and controls
57 lines (44 loc) · 1.9 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
"""
This script sets up and runs a machine learning pipeline for a linear regression model using a specified configuration.
Steps:
1. Imports necessary libraries and modules.
2. Adds the parent directory to the system path to ensure module imports work correctly.
3. Loads a dataset from an Azure ML table into a pandas DataFrame.
4. Prepares the feature matrix `X` and target vector `y` from the DataFrame.
5. Creates a `ModelingConfig` object to define the model configuration, including data preprocessing steps and the model estimator.
6. Initializes a `ModelingPipelineBuilder` with the configuration.
7. Splits the data into training and testing sets.
8. Fits the model to the training data.
The configuration specifies:
- A data preprocessing step using `sklearn.preprocessing.Normalizer` with L2 normalization.
- A model estimator using `sklearn.linear_model.Ridge` with an alpha parameter of 0.9.
- Tracking of the experiment.
Note: The actual path to the Azure ML table should be specified in place of "azureml://path.....".
"""
import os
# Add the parent directory to sys.path
current_path = os.path.dirname(os.path.abspath(__file__))
import mltable
from core.modeling.config import ModelingConfig, MethodConfig
from core.modeling.pipeline import ModelingPipelineBuilder
import pandas as pd
# Create the MLPipelineConfig object
config = ModelingConfig(
model_name="Linear Regression",
data_preprocessing_steps=[
MethodConfig(
name="sklearn.preprocessing.Normalizer", params=dict(norm="l2")
),
],
model_estimator=MethodConfig(
name="sklearn.linear_model.Ridge", params=dict(alpha=0.9)
),
track_experiment=True
)
tbl = mltable.load(f"azureml://path.....")
df = tbl.to_pandas_dataframe()
X = df.drop(["Price"], axis = 1)
y = pd.DataFrame(df["Price"].copy())
pipeline_builder = ModelingPipelineBuilder(config)
pipeline_builder.split_data(X, y)
pipeline_builder.fit()