Skip to content

Commit 6025cd2

Browse files
rueckstiessThomas Rueckstiess
andauthored
Correct PyPI setup (#11)
* updated setup. * fix long_description in setup.py * re-add entry_points for CLI, bump version. * changes to readme, correct URL to github repo. * cropped logo * bump version * fix regex in setup.py --------- Co-authored-by: Thomas Rueckstiess <[email protected]>
1 parent e08e122 commit 6025cd2

File tree

4 files changed

+62
-12
lines changed

4 files changed

+62
-12
lines changed

README.md

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<p align="center">
2-
<img src="assets/origami_logo.jpg" style="width: 100%; height: auto;">
2+
<img src="https://github.com/mongodb-labs/origami/assets/origami_logo.jpg" style="width: 100%; height: auto;">
33
</p>
44

55
# ORiGAMi - Object Representation through Generative Autoregressive Modelling
@@ -14,13 +14,11 @@ Please note: This tool is not officially supported or endorsed by MongoDB, Inc.
1414

1515
## Overview
1616

17-
ORiGAMi is a transformer-based Machine Learning model to directly process semi-structured data such as MongoDB documents or JSON files and make predictions from this data.
17+
ORiGAMi is a transformer-based Machine Learning model for supervised classification from semi-structured data such as MongoDB documents or JSON files.
1818

19-
Typically, when working with semi-structured data in a Machine Learning context, the data needs to be flattened
20-
into a tabular form first. This flattening can be lossy, especially in the presence of arrays and nested objects, and often requires domain expertise to extract meaningful higher-order features from the raw data. This feature extraction step is manual, slow and expensive and doesn't scale well.
21-
22-
ORiGAMi is a transformer model and follows the trend of many other deep learning models by operating directly on the raw data and discovering meaningful features itself. Preprocessing is fully automated (apart from some hyper-parameters that can improve the model performance).
19+
Typically, when working with semi-structured data in a Machine Learning context, the data needs to be flattened into a tabular format first. This flattening can be lossy, especially in the presence of arrays and nested objects, and often requires domain expertise to extract meaningful higher-order features from the raw data. This feature extraction step is manual, slow and expensive and doesn't scale well.
2320

21+
ORiGAMi circumvents this by directly operating on JSON data. Once a model is trained, it can be used to make predictions on any field in the dataset.
2422

2523
## Installation
2624

assets/origami_logo.jpg

-286 KB
Loading

setup.cfg

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[metadata]
22
name = origami
3-
version = 0.1.0
3+
version = 0.1.3
44

55
[options]
66
packages = find:

setup.py

Lines changed: 57 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,67 @@
1-
from distutils.core import setup
1+
import re
2+
3+
from setuptools import find_packages, setup
4+
5+
# Read README for long description
6+
with open("README.md", "r", encoding="utf-8") as fh:
7+
long_description = fh.read()
8+
9+
# Remove both image and arxiv link sections
10+
long_description = re.sub(
11+
r'<p align="center">(?:\s*<img[^>]*>|\s*\|[^|]*\|)\s*</p>\s*\n?', "", long_description, flags=re.MULTILINE
12+
)
13+
14+
# Remove the Disclaimer section (from ## Disclaimer to the next ##)
15+
long_description = re.sub(r"## Disclaimer.*?(?=## \w+)", "", long_description, flags=re.DOTALL)
216

317
setup(
418
name="origami-ml",
5-
version="0.1.0",
6-
packages=["origami"],
7-
install_requires=[
8-
"click",
19+
author="Thomas Rueckstiess",
20+
author_email="[email protected]",
21+
description="An ML classifier model to make predictions from semi-structured data.",
22+
long_description=long_description,
23+
long_description_content_type="text/markdown",
24+
url="https://github.com/mongodb-labs/origami",
25+
packages=find_packages(),
26+
classifiers=[
27+
"Development Status :: 4 - Beta",
28+
"Intended Audience :: Science/Research",
29+
"License :: OSI Approved :: Apache Software License",
30+
"Programming Language :: Python :: 3",
31+
"Programming Language :: Python :: 3.10",
32+
"Programming Language :: Python :: 3.11",
33+
"Topic :: Scientific/Engineering :: Artificial Intelligence",
934
],
35+
python_requires=">=3.10",
1036
entry_points={
1137
"console_scripts": [
1238
"origami = origami.cli:main",
1339
],
1440
},
41+
install_requires=[
42+
"click>=8.1.7",
43+
"click-option-group>=0.5.6",
44+
"guildai>=0.9.0",
45+
"lightgbm>=4.5.0",
46+
"matplotlib>=3.9.2",
47+
"mdbrtools>=0.1.1",
48+
"numpy>=1.26.4",
49+
"omegaconf>=2.3.0",
50+
"openml>=0.15.1",
51+
"pandas>=2.2.3",
52+
"pymongo>=4.8.0",
53+
"python-dotenv>=1.0.1",
54+
"scikit_learn>=1.5.2",
55+
"torch>=2.4.1",
56+
"tqdm>=4.66.4",
57+
"xgboost>=2.1.3",
58+
],
59+
extras_require={
60+
"dev": [
61+
"jupyter>=1.1.1",
62+
"jupyter_contrib_nbextensions>=0.7.0",
63+
"pytest>=8.3.3",
64+
"ruff>=0.9.3",
65+
],
66+
},
1567
)

0 commit comments

Comments
 (0)