-
Notifications
You must be signed in to change notification settings - Fork 4
Expand file tree
/
Copy pathUsingPythonUploads.html
More file actions
210 lines (181 loc) · 14 KB
/
UsingPythonUploads.html
File metadata and controls
210 lines (181 loc) · 14 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
<!DOCTYPE html>
<!-- Generated by pkgdown: do not edit by hand --><html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<title>Using python for postrgresql uploads • ResultModelManager</title>
<script src="../deps/jquery-3.6.0/jquery-3.6.0.min.js"></script><meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<link href="../deps/bootstrap-5.3.1/bootstrap.min.css" rel="stylesheet">
<script src="../deps/bootstrap-5.3.1/bootstrap.bundle.min.js"></script><link href="../deps/font-awesome-6.4.2/css/all.min.css" rel="stylesheet">
<link href="../deps/font-awesome-6.4.2/css/v4-shims.min.css" rel="stylesheet">
<script src="../deps/headroom-0.11.0/headroom.min.js"></script><script src="../deps/headroom-0.11.0/jQuery.headroom.min.js"></script><script src="../deps/bootstrap-toc-1.0.1/bootstrap-toc.min.js"></script><script src="../deps/clipboard.js-2.0.11/clipboard.min.js"></script><script src="../deps/search-1.0.0/autocomplete.jquery.min.js"></script><script src="../deps/search-1.0.0/fuse.min.js"></script><script src="../deps/search-1.0.0/mark.min.js"></script><!-- pkgdown --><script src="../pkgdown.js"></script><meta property="og:title" content="Using python for postrgresql uploads">
</head>
<body>
<a href="#main" class="visually-hidden-focusable">Skip to contents</a>
<nav class="navbar navbar-expand-lg fixed-top bg-primary" data-bs-theme="dark" aria-label="Site navigation"><div class="container">
<a class="navbar-brand me-2" href="../index.html">ResultModelManager</a>
<small class="nav-text text-muted me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="">0.6.2</small>
<button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbar" aria-controls="navbar" aria-expanded="false" aria-label="Toggle navigation">
<span class="navbar-toggler-icon"></span>
</button>
<div id="navbar" class="collapse navbar-collapse ms-3">
<ul class="navbar-nav me-auto">
<li class="nav-item"><a class="nav-link" href="../reference/index.html">Reference</a></li>
<li class="active nav-item dropdown">
<button class="nav-link dropdown-toggle" type="button" id="dropdown-articles" data-bs-toggle="dropdown" aria-expanded="false" aria-haspopup="true">Articles</button>
<ul class="dropdown-menu" aria-labelledby="dropdown-articles">
<li><a class="dropdown-item" href="../articles/CreatingMigrations.html">Creating Migrations</a></li>
<li><a class="dropdown-item" href="../articles/ExampleProject.html">Example package results spec</a></li>
<li><a class="dropdown-item" href="../articles/PackageDesign.html">Package Design</a></li>
<li><a class="dropdown-item" href="../articles/UploadFunctionality.html">Upload Functionality</a></li>
<li><a class="dropdown-item" href="../articles/UsingAnExportManager.html">Using An Export Manager</a></li>
<li><a class="dropdown-item" href="../articles/UsingConnectionHandlers.html">Using Connection Handlers</a></li>
<li><a class="dropdown-item" href="../articles/UsingPythonUploads.html">Using python for postrgresql uploads</a></li>
<li><a class="dropdown-item" href="../articles/UsingQueryNamespaces.html">Using Query Namespaces</a></li>
</ul>
</li>
<li class="nav-item"><a class="nav-link" href="../news/index.html">Changelog</a></li>
</ul>
<ul class="navbar-nav">
<li class="nav-item"><a class="nav-link" href="https://ohdsi.github.io/ResultModelManager"><img src='https://ohdsi.github.io/Hades/images/hadesMini.png' width=80 height=17 style='vertical-align: top;'></a></li>
<li class="nav-item"><a class="external-link nav-link" href="https://github.com/OHDSI/ResultModelManager/" aria-label="GitHub"><span class="fa fab fa-github fa-lg"></span></a></li>
</ul>
</div>
</div>
</nav><div class="container template-article">
<div class="row">
<main id="main" class="col-md-9"><div class="page-header">
<h1>Using python for postrgresql uploads</h1>
<h4 data-toc-skip class="author">James P.
Gilbert</h4>
<h4 data-toc-skip class="date">2025-10-01</h4>
<small class="dont-index">Source: <a href="https://github.com/OHDSI/ResultModelManager/blob/HEAD/vignettes/UsingPythonUploads.Rmd" class="external-link"><code>vignettes/UsingPythonUploads.Rmd</code></a></small>
<div class="d-none name"><code>UsingPythonUploads.Rmd</code></div>
</div>
<div class="section level2">
<h2 id="introduction">Introduction<a class="anchor" aria-label="anchor" href="#introduction"></a>
</h2>
<p>Note, this feature should be considered experimental until further
notice.</p>
<p>The use of DatabaseConnector with postgresql without bulk uploads
functionality will often be slow and require the installation and
configuration of postgresql binaries on the system. This may be
challenging or restrictred in many environments. Similarly, this method
requires writing a data.frame to disk which will be prohibitively slow
if data is already in a csv format.</p>
<p>As a consequence, this package supports bulk uploading through python
with a small amount of configuration. This uses no r-memory; csv files
are transfered directly through python and will be considerably
faster</p>
<p>This process uses the <code>psycopg2</code> python library, which can
be installed via compilation or in binary form. This process
demonstrates usage with the <code>psycopg2-binary</code> package.</p>
</div>
<div class="section level2">
<h2 id="installing-psycopg2">Installing psycopg2<a class="anchor" aria-label="anchor" href="#installing-psycopg2"></a>
</h2>
<div class="section level3">
<h3 id="using-a-virtualenv">Using a virtualenv<a class="anchor" aria-label="anchor" href="#using-a-virtualenv"></a>
</h3>
<p>Result model manager provides an interactive function for enabling
the python library. If psycopg2 this function will do nothing. However,
if there is no available binary (and the <code>reticulate</code> package
is not installed) you will be asked to install these packages. Do do
this run the following:</p>
<div class="sourceCode" id="cb1"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="fu">ResultModelManager</span><span class="fu">::</span><span class="fu"><a href="../reference/enablePythonUploads.html">enablePythonUploads</a></span><span class="op">(</span><span class="op">)</span></span></code></pre></div>
<p>Alternatively you can specify this manually</p>
<div class="sourceCode" id="cb2"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="fu">ResultModelManager</span><span class="fu">::</span><span class="fu"><a href="../reference/install_psycopg2.html">install_psycopg2</a></span><span class="op">(</span><span class="op">)</span></span></code></pre></div>
</div>
<div class="section level3">
<h3 id="using-conda-or-system-python-installs">Using conda or system python installs<a class="anchor" aria-label="anchor" href="#using-conda-or-system-python-installs"></a>
</h3>
<p>Please consult the <a href="https://rstudio.github.io/reticulate/articles/python_packages.html" class="external-link">reticulate
documentation</a> on how to install the <code>psycopg2-binary</code>
package.</p>
</div>
</div>
<div class="section level2">
<h2 id="usage-within-functions">Usage within functions<a class="anchor" aria-label="anchor" href="#usage-within-functions"></a>
</h2>
<p>By default, this functionality will not be enabled when uploading
tables and the function <code>pyUploadCsv</code> will fail. To enable,
and directly upload a csv, try the following example code.</p>
<div class="sourceCode" id="cb3"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="fu">ResultModelManager</span><span class="fu">::</span><span class="fu"><a href="../reference/enablePythonUploads.html">enablePythonUploads</a></span><span class="op">(</span><span class="op">)</span></span>
<span><span class="va">connectionDetails</span> <span class="op"><-</span> <span class="fu">DabaseConnector</span><span class="fu">::</span><span class="fu">createConnectionDetails</span><span class="op">(</span></span>
<span> dbms <span class="op">=</span> <span class="st">"postgreql"</span>,</span>
<span> server <span class="op">=</span> <span class="st">"myserver.com"</span>,</span>
<span> port <span class="op">=</span> <span class="fl">5432</span>,</span>
<span> password <span class="op">=</span> <span class="st">"s"</span>,</span>
<span> user <span class="op">=</span> <span class="st">"me"</span>,</span>
<span> database <span class="op">=</span> <span class="st">"some_db"</span></span>
<span><span class="op">)</span></span>
<span><span class="va">connection</span> <span class="op"><-</span> <span class="fu">DatabaseConnector</span><span class="fu">::</span><span class="fu"><a href="https://ohdsi.github.io/DatabaseConnector/reference/connect.html" class="external-link">connect</a></span><span class="op">(</span><span class="va">connectionDetails</span><span class="op">)</span></span>
<span><span class="fu">readr</span><span class="fu">::</span><span class="fu"><a href="https://readr.tidyverse.org/reference/write_delim.html" class="external-link">write_csv</a></span><span class="op">(</span></span>
<span> <span class="fu"><a href="https://rdrr.io/r/base/data.frame.html" class="external-link">data.frame</a></span><span class="op">(</span></span>
<span> id <span class="op">=</span> <span class="fl">1</span><span class="op">:</span><span class="fl">1e6</span>,</span>
<span> <span class="fu"><a href="https://rdrr.io/r/base/paste.html" class="external-link">paste</a></span><span class="op">(</span><span class="fl">1</span><span class="op">:</span><span class="fl">1e6</span>, <span class="st">"bottle(s) on the wall"</span><span class="op">)</span></span>
<span> <span class="op">)</span>,</span>
<span> <span class="st">"my_massive_csv.csv"</span></span>
<span><span class="op">)</span></span>
<span></span>
<span><span class="fu">ResultModelManager</span><span class="fu">::</span><span class="fu"><a href="../reference/pyUploadCsv.html">pyUploadCsv</a></span><span class="op">(</span><span class="va">connection</span>,</span>
<span> table <span class="op">=</span> <span class="st">"my_table"</span>,</span>
<span> filepath <span class="op">=</span> <span class="st">"my_massive_csv.csv"</span>,</span>
<span> schema <span class="op">=</span> <span class="st">"my_schema"</span></span>
<span><span class="op">)</span></span></code></pre></div>
<p>Note that you are not required to call
<code><a href="../reference/enablePythonUploads.html">ResultModelManager::enablePythonUploads()</a></code> every time. As an
alternative, add the following line to your .Renviron file (note that
this will automatically assume that setup of python libraries has been
completed)</p>
<pre><code><span><span class="va">RMM_USE_PYTHON_UPLOADS</span><span class="op">=</span><span class="cn">TRUE</span></span></code></pre>
<p>The astute reader will realize that this approach requires an IO
call, writing the CSV to disk. In many situations this will be a major
bottleneck. A much more sensible approach is to use a string buffer.
Thankfully, the author of this package has provided such an
interface!</p>
<div class="sourceCode" id="cb5"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="fu">ResultModelManager</span><span class="fu">::</span><span class="fu">pyUploadDataframe</span><span class="op">(</span><span class="va">connection</span>,</span>
<span> table <span class="op">=</span> <span class="st">"my_table"</span>,</span>
<span> filepath <span class="op">=</span> <span class="st">"my_massive_csv.csv"</span>,</span>
<span> schema <span class="op">=</span> <span class="st">"my_schema"</span></span>
<span><span class="op">)</span></span></code></pre></div>
<p>Note - that this approach is actually already implemented for you
when you use <code>uploadResults</code> functionality. That’s right - if
you call <code><a href="../reference/enablePythonUploads.html">ResultModelManager::enablePythonUploads()</a></code> (and you
are using postgres) you will be able to upload your large R
<code>data.frames</code> to postgres!</p>
<div class="sourceCode" id="cb6"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="fu">ResultModelManager</span><span class="fu">::</span><span class="fu"><a href="../reference/enablePythonUploads.html">enablePythonUploads</a></span><span class="op">(</span><span class="op">)</span></span>
<span></span>
<span><span class="fu">ResultModelManager</span><span class="fu">::</span><span class="fu"><a href="../reference/uploadResults.html">uploadResults</a></span><span class="op">(</span></span>
<span> <span class="va">connectionDetails</span>,</span>
<span> schema <span class="op">=</span> <span class="st">"my_schema"</span>,</span>
<span> resultsFolder <span class="op">=</span> <span class="st">"my_results_folder"</span>,</span>
<span> tablePrefix <span class="op">=</span> <span class="st">"cm_"</span>,</span>
<span> purgeSiteDataBeforeUploading <span class="op">=</span> <span class="cn">FALSE</span>,</span>
<span> specifications <span class="op">=</span> <span class="fu">getResultsDataModelSpec</span><span class="op">(</span><span class="op">)</span></span>
<span><span class="op">)</span></span></code></pre></div>
<p>Better yet, calling
<code><a href="../reference/enablePythonUploads.html">ResultModelManager::enablePythonUploads()</a></code> before uploading
results from any OHDSI package will automatically give you this fast(er)
upload functionality.</p>
</div>
</main><aside class="col-md-3"><nav id="toc" aria-label="Table of contents"><h2>On this page</h2>
</nav></aside>
</div>
<footer><div class="pkgdown-footer-left">
<p>Developed by Jamie Gilbert.</p>
</div>
<div class="pkgdown-footer-right">
<p>Site built with <a href="https://pkgdown.r-lib.org/" class="external-link">pkgdown</a> 2.1.0.</p>
</div>
</footer>
</div>
</body>
</html>