Skip to content

Commit 84b0bcb

Browse files
author
Taylor Miller
committed
* catalyst EDW instructions
* improved getting started, prediction types * clarification in data pipeline
1 parent 12cfd16 commit 84b0bcb

File tree

4 files changed

+78
-41
lines changed

4 files changed

+78
-41
lines changed

docs/catalyst_edw_instructions.md

+43
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# Health Catalyst EDW Instructions
2+
3+
Many of our users operate on and in the Health Catalyst ecosystem, that is heavily based on MSSQL. This document outlines ways to use healthcare.ai in these settings beyond what is in the [getting started](getting_started.md) docs.
4+
5+
## Preparing Your SAM
6+
7+
- If you plan on deploying a model to a MSSQL server (ie, pushing predictions to SQL Server), you will need to setup your tables to receive predictions.
8+
9+
```sql
10+
CREATE TABLE [SAM].[dbo].[HCAIPredictionClassificationBASE] (
11+
[BindingID] [int] ,
12+
[BindingNM] [varchar] (255),
13+
[LastLoadDTS] [datetime2] (7),
14+
[PatientEncounterID] [decimal] (38, 0), --< change to your grain col
15+
[PredictedProbNBR] [decimal] (38, 2),
16+
[Factor1TXT] [varchar] (255),
17+
[Factor2TXT] [varchar] (255),
18+
[Factor3TXT] [varchar] (255))
19+
20+
CREATE TABLE [SAM].[dbo].[HCAIPredictionRegressionBASE] (
21+
[BindingID] [int],
22+
[BindingNM] [varchar] (255),
23+
[LastLoadDTS] [datetime2] (7),
24+
[PatientEncounterID] [decimal] (38, 0), --< change to your grain col
25+
[PredictedValueNBR] [decimal] (38, 2),
26+
[Factor1TXT] [varchar] (255),
27+
[Factor2TXT] [varchar] (255),
28+
[Factor3TXT] [varchar] (255))
29+
```
30+
31+
## Writing New Predictions to the SAM
32+
33+
By passing the `.predict_to_catalyst_sam()` method a raw prediction dataframe and your database info, the TrainedSupervisedModel will generate predictions with binding ids, grain column and factors and write them to your database.
34+
35+
```python
36+
# This output is a Health Catalyst EDW specific dataframe that includes grain lumn, the prediction and factors
37+
server = 'localhost'
38+
database = 'SAM'
39+
table = 'HCAIPredictionRegressionBASE'
40+
schema = 'dbo'
41+
42+
trained_model.predict_to_catalyst_sam(prediction_dataframe, server, database, table, schema)
43+
```

docs/getting_started.md

+13-34
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
# Getting started with healthcare.ai
1+
# Getting Started With Healthcare.ai
22

3-
## What can you do with these tools?
3+
## What Can You Do With These Tools?
44

55
- Fill in missing data via imputation
66
- Train and compare models based on your data
@@ -67,38 +67,17 @@ To verify that *healthcareai* installed correctly:
6767

6868
If you did get an error, or run into other installation issues, please [let us know](http://healthcare.ai/contact.html) or better yet post on [Stack Overflow](http://stackoverflow.com/questions/tagged/healthcare-ai) (with the healthcare-ai tag) so we can help others along this process.
6969

70-
## Getting started
71-
72-
- Read through the docs on this site
73-
- Starting with
74-
- Modify the queries and parameters to match your data
75-
- If you plan on deploying a model to a MSSQL server (ie, pushing predictions to SQL Server), run this in SSMS beforehand:
76-
77-
```sql
78-
CREATE TABLE [SAM].[dbo].[HCAIPredictionClassificationBASE] (
79-
[BindingID] [int] ,
80-
[BindingNM] [varchar] (255),
81-
[LastLoadDTS] [datetime2] (7),
82-
[PatientEncounterID] [decimal] (38, 0), --< change to your grain col
83-
[PredictedProbNBR] [decimal] (38, 2),
84-
[Factor1TXT] [varchar] (255),
85-
[Factor2TXT] [varchar] (255),
86-
[Factor3TXT] [varchar] (255))
87-
88-
CREATE TABLE [SAM].[dbo].[HCAIPredictionRegressionBASE] (
89-
[BindingID] [int],
90-
[BindingNM] [varchar] (255),
91-
[LastLoadDTS] [datetime2] (7),
92-
[PatientEncounterID] [decimal] (38, 0), --< change to your grain col
93-
[PredictedValueNBR] [decimal] (38, 2),
94-
[Factor1TXT] [varchar] (255),
95-
[Factor2TXT] [varchar] (255),
96-
[Factor3TXT] [varchar] (255))
97-
```
98-
99-
- Note that there are examples that write to other databases (MySQL, SQLite)
100-
101-
## For Issues
70+
## Getting Started
71+
72+
1. Read through the docs on this site.
73+
2. Start with either `example_regression_1.py` or `example_classification_1.py`
74+
3. Modify the queries and parameters to match your data.
75+
4. Decide on what kind of prediction output you want.
76+
5. Set up your database tables to match the output schema. See the [prediction types](prediction_types.md) document for details.
77+
- If you are working in a Health Catalyst EDW ecosystem (primarily MSSQL), please see the [Catalyst EDW Instructions](catalyst_edw_instructions) for SAM setup.
78+
- Please see the [databases docs](databases.md) for details about writing to different databases (MSSQL, MySQL, SQLite, CSV)
79+
80+
## Where to Get Help
10281

10382
- Double check that the code follows the examples in these documents.
10483
- If you're still seeing an error, file an issue on [Stack Overflow](http://stackoverflow.com/) using the healthcare-ai tag. Please provide

docs/prediction_types.md

+21-6
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,17 @@ Healthcareai provides a few options when you want to get predictions from a trai
44

55
Please note that you will likely only need one of these prediction output types.
66

7+
## Database Setup
8+
9+
Each prediction type has a different set of columns and types. You will need to set up your database tables to receive these with appropriate data types.
10+
11+
An easy way to understand each of the prediction types is to inspect the `.dtypes` property of each returned dataframe. For example: `print(predictions.dtypes)`.
12+
13+
## Prediction Types
14+
715
Each prediction output format is detailed below.
816

9-
## Predictions Only
17+
### Predictions Only
1018

1119
By passing the `.make_predictions(prediction_dataframe)` method a raw prediction dataframe you'll get back a dataframe containing the grain id and predicted values.
1220

@@ -16,7 +24,7 @@ predictions = trained_model.make_predictions(prediction_dataframe)
1624
print(predictions.head())
1725
```
1826

19-
## Important Factors
27+
### Important Factors
2028

2129
By passing the `.make_factors(prediction_dataframe)` method a raw prediction dataframe you'll get back a dataframe containing the grain id and top predictive factors.
2230

@@ -26,7 +34,7 @@ factors = trained_model.make_factors(prediction_dataframe)
2634
print(factors.head())
2735
```
2836

29-
## Predictions + Factors
37+
### Predictions + Factors
3038

3139
By passing the `.make_predictions_with_k_factors(prediction_dataframe)` method a raw prediction dataframe you'll get back a dataframe containing the grain id and predicted values, and top factors.
3240

@@ -36,7 +44,7 @@ predictions_with_factors_df = trained_model.make_predictions_with_k_factors(pred
3644
print(predictions_with_factors_df.head())
3745
```
3846

39-
## Original Dataframe + Predictions + Factors
47+
### Original Dataframe + Predictions + Factors
4048

4149
By passing the `.make_original_with_predictions_and_factors(prediction_dataframe)` method a raw prediction dataframe you'll get back a dataframe containing all the original data, the predicted values, and top factors.
4250

@@ -47,9 +55,16 @@ original_plus_predictions_and_factors = trained_model.make_original_with_predict
4755
print(original_plus_predictions_and_factors.head())
4856
```
4957

58+
### Health Catalyst EDW Format
5059

60+
Many of our users operate on and in the Health Catalyst ecosystem, and most have standardized on a table format that others may find useful. Please note that if you do intend to use this specific format there is an easier and more robust way to save this to your databaes outlined in the [Health Catalyst EDW Instructions](catalyst_edw_instructions.md).
5161

62+
By passing the `.create_catalyst_dataframe(prediction_dataframe)` method a raw prediction dataframe you'll get back a dataframe containing all the original data, the predicted values, and top factors.
5263

53-
54-
64+
```python
65+
## Health Catalyst EDW specific instructions. Uncomment to use.
66+
# This output is a Health Catalyst EDW specific dataframe that includes grain lumn, the prediction and factors
67+
catalyst_dataframe = trained_model.create_catalyst_dataframe(ediction_dataframe)
68+
print(catalyst_dataframe.head())
69+
```
5570

healthcareai/pipelines/data_preparation.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ def full_pipeline(model_type, predicted_column, grain_column, impute=True):
2424
('null_row_filter', hcai_filters.DataframeNullValueFilter(excluded_columns=None)),
2525
('convert_target_to_binary', hcai_transformers.DataFrameConvertTargetToBinary(model_type, predicted_column)),
2626
('prediction_to_numeric', hcai_transformers.DataFrameConvertColumnToNumeric(predicted_column)),
27-
('create_dummy_variables', hcai_transformers.DataFrameCreateDummyVariables([predicted_column])),
27+
('create_dummy_variables', hcai_transformers.DataFrameCreateDummyVariables(excluded_columns=[predicted_column])),
2828
])
2929
return pipeline
3030

0 commit comments

Comments
 (0)