Skip to content

Commit b51c58a

Browse files
committed
Merge remote-tracking branch 'upstream/master' into io_csv_docstring_fixed
* upstream/master: DOC: Removing rpy2 dependencies, and converting examples using it to regular code blocks (pandas-dev#23737) BUG: Fix dtype=str converts NaN to 'n' (pandas-dev#22564) DOC: update pandas.core.resample.Resampler.nearest docstring (pandas-dev#20381) REF/TST: Add more pytest idiom to parsers tests (pandas-dev#23810) Added support for Fraction and Number (PEP 3141) to pandas.api.types.is_scalar (pandas-dev#22952) DOC: Updating to_timedelta docstring (pandas-dev#23259)
2 parents 5c8a3aa + 01cb440 commit b51c58a

19 files changed

+1379
-1156
lines changed

ci/deps/travis-36-doc.yaml

-4
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@ name: pandas
22
channels:
33
- defaults
44
- conda-forge
5-
- r
65
dependencies:
76
- beautifulsoup4
87
- bottleneck
@@ -31,14 +30,11 @@ dependencies:
3130
- python-snappy
3231
- python=3.6*
3332
- pytz
34-
- r
35-
- rpy2
3633
- scipy
3734
- seaborn
3835
- sphinx
3936
- sqlalchemy
4037
- statsmodels
41-
- tzlocal
4238
- xarray
4339
- xlrd
4440
- xlsxwriter

doc/source/r_interface.rst

+25-12
Original file line numberDiff line numberDiff line change
@@ -33,21 +33,28 @@ See also the documentation of the `rpy2 <http://rpy2.bitbucket.org/>`__ project:
3333

3434
In the remainder of this page, a few examples of explicit conversion is given. The pandas conversion of rpy2 needs first to be activated:
3535

36-
.. ipython:: python
36+
.. code-block:: python
3737
38-
from rpy2.robjects import r, pandas2ri
39-
pandas2ri.activate()
38+
>>> from rpy2.robjects import pandas2ri # doctest: +SKIP
39+
>>> pandas2ri.activate() # doctest: +SKIP
4040
4141
Transferring R data sets into Python
4242
------------------------------------
4343

4444
Once the pandas conversion is activated (``pandas2ri.activate()``), many conversions
4545
of R to pandas objects will be done automatically. For example, to obtain the 'iris' dataset as a pandas DataFrame:
4646

47-
.. ipython:: python
47+
.. code-block:: python
4848
49-
r.data('iris')
50-
r['iris'].head()
49+
>>> from rpy2.robjects import r # doctest: +SKIP
50+
>>> r.data('iris') # doctest: +SKIP
51+
>>> r['iris'].head() # doctest: +SKIP
52+
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
53+
0 5.1 3.5 1.4 0.2 setosa
54+
1 4.9 3.0 1.4 0.2 setosa
55+
2 4.7 3.2 1.3 0.2 setosa
56+
3 4.6 3.1 1.5 0.2 setosa
57+
4 5.0 3.6 1.4 0.2 setosa
5158
5259
If the pandas conversion was not activated, the above could also be accomplished
5360
by explicitly converting it with the ``pandas2ri.ri2py`` function
@@ -59,13 +66,19 @@ Converting DataFrames into R objects
5966
The ``pandas2ri.py2ri`` function support the reverse operation to convert
6067
DataFrames into the equivalent R object (that is, **data.frame**):
6168

62-
.. ipython:: python
69+
.. code-block:: python
70+
71+
>>> df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]},
72+
... index=["one", "two", "three"]) # doctest: +SKIP
73+
>>> r_dataframe = pandas2ri.py2ri(df) # doctest: +SKIP
74+
>>> print(type(r_dataframe)) # doctest: +SKIP
75+
<class 'rpy2.robjects.vectors.DataFrame'>
76+
>>> print(r_dataframe) # doctest: +SKIP
77+
A B C
78+
one 1 4 7
79+
two 2 5 8
80+
three 3 6 9
6381
64-
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C':[7,8,9]},
65-
index=["one", "two", "three"])
66-
r_dataframe = pandas2ri.py2ri(df)
67-
print(type(r_dataframe))
68-
print(r_dataframe)
6982
7083
The DataFrame's index is stored as the ``rownames`` attribute of the
7184
data.frame instance.

doc/source/whatsnew/v0.24.0.rst

+4
Original file line numberDiff line numberDiff line change
@@ -466,6 +466,7 @@ For situations where you need an ``ndarray`` of ``Interval`` objects, use
466466
np.asarray(idx)
467467
idx.values.astype(object)
468468
469+
469470
.. _whatsnew_0240.api.timezone_offset_parsing:
470471

471472
Parsing Datetime Strings with Timezone Offsets
@@ -1442,6 +1443,7 @@ Reshaping
14421443
- Bug in :func:`merge_asof` where confusing error message raised when attempting to merge with missing values (:issue:`23189`)
14431444
- Bug in :meth:`DataFrame.nsmallest` and :meth:`DataFrame.nlargest` for dataframes that have a :class:`MultiIndex` for columns (:issue:`23033`).
14441445
- Bug in :meth:`DataFrame.append` with a :class:`Series` with a dateutil timezone would raise a ``TypeError`` (:issue:`23682`)
1446+
- Bug in ``Series`` construction when passing no data and ``dtype=str`` (:issue:`22477`)
14451447

14461448
.. _whatsnew_0240.bug_fixes.sparse:
14471449

@@ -1474,6 +1476,7 @@ Other
14741476
- :meth:`DataFrame.nlargest` and :meth:`DataFrame.nsmallest` now returns the correct n values when keep != 'all' also when tied on the first columns (:issue:`22752`)
14751477
- :meth:`~pandas.io.formats.style.Styler.bar` now also supports tablewise application (in addition to rowwise and columnwise) with ``axis=None`` and setting clipping range with ``vmin`` and ``vmax`` (:issue:`21548` and :issue:`21526`). ``NaN`` values are also handled properly.
14761478
- Logical operations ``&, |, ^`` between :class:`Series` and :class:`Index` will no longer raise ``ValueError`` (:issue:`22092`)
1479+
- Checking PEP 3141 numbers in :func:`~pandas.api.types.is_scalar` function returns ``True`` (:issue:`22903`)
14771480
- Bug in :meth:`DataFrame.combine_first` in which column types were unexpectedly converted to float (:issue:`20699`)
14781481

14791482
.. _whatsnew_0.24.0.contributors:
@@ -1482,3 +1485,4 @@ Contributors
14821485
~~~~~~~~~~~~
14831486

14841487
.. contributors:: v0.23.4..HEAD
1488+

pandas/_libs/lib.pyx

+46-13
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
11
# -*- coding: utf-8 -*-
22
from decimal import Decimal
3+
from fractions import Fraction
4+
from numbers import Number
5+
36
import sys
47

58
import cython
@@ -15,7 +18,6 @@ from cpython.datetime cimport (PyDateTime_Check, PyDate_Check,
1518
PyDateTime_IMPORT)
1619
PyDateTime_IMPORT
1720

18-
1921
import numpy as np
2022
cimport numpy as cnp
2123
from numpy cimport (ndarray, PyArray_GETITEM,
@@ -105,23 +107,54 @@ def is_scalar(val: object) -> bool:
105107
"""
106108
Return True if given value is scalar.
107109

108-
This includes:
109-
- numpy array scalar (e.g. np.int64)
110-
- Python builtin numerics
111-
- Python builtin byte arrays and strings
112-
- None
113-
- instances of datetime.datetime
114-
- instances of datetime.timedelta
115-
- Period
116-
- instances of decimal.Decimal
117-
- Interval
118-
- DateOffset
110+
Parameters
111+
----------
112+
val : object
113+
This includes:
114+
115+
- numpy array scalar (e.g. np.int64)
116+
- Python builtin numerics
117+
- Python builtin byte arrays and strings
118+
- None
119+
- datetime.datetime
120+
- datetime.timedelta
121+
- Period
122+
- decimal.Decimal
123+
- Interval
124+
- DateOffset
125+
- Fraction
126+
- Number
127+
128+
Returns
129+
-------
130+
bool
131+
Return True if given object is scalar, False otherwise
132+
133+
Examples
134+
--------
135+
>>> dt = pd.datetime.datetime(2018, 10, 3)
136+
>>> pd.is_scalar(dt)
137+
True
138+
139+
>>> pd.api.types.is_scalar([2, 3])
140+
False
141+
142+
>>> pd.api.types.is_scalar({0: 1, 2: 3})
143+
False
144+
145+
>>> pd.api.types.is_scalar((0, 2))
146+
False
147+
148+
pandas supports PEP 3141 numbers:
119149

150+
>>> from fractions import Fraction
151+
>>> pd.api.types.is_scalar(Fraction(3, 5))
152+
True
120153
"""
121154

122155
return (cnp.PyArray_IsAnyScalar(val)
123156
# As of numpy-1.9, PyArray_IsAnyScalar misses bytearrays on Py3.
124-
or isinstance(val, bytes)
157+
or isinstance(val, (bytes, Fraction, Number))
125158
# We differ from numpy (as of 1.10), which claims that None is
126159
# not scalar in np.isscalar().
127160
or val is None

pandas/core/dtypes/cast.py

+10-5
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
from pandas._libs import lib, tslib, tslibs
88
from pandas._libs.tslibs import OutOfBoundsDatetime, Period, iNaT
9-
from pandas.compat import PY3, string_types, text_type
9+
from pandas.compat import PY3, string_types, text_type, to_str
1010

1111
from .common import (
1212
_INT64_DTYPE, _NS_DTYPE, _POSSIBLY_CAST_DTYPES, _TD_DTYPE, _string_dtypes,
@@ -1216,11 +1216,16 @@ def construct_1d_arraylike_from_scalar(value, length, dtype):
12161216
if not isinstance(dtype, (np.dtype, type(np.dtype))):
12171217
dtype = dtype.dtype
12181218

1219-
# coerce if we have nan for an integer dtype
1220-
# GH 22858: only cast to float if an index
1221-
# (passed here as length) is specified
12221219
if length and is_integer_dtype(dtype) and isna(value):
1223-
dtype = np.float64
1220+
# coerce if we have nan for an integer dtype
1221+
dtype = np.dtype('float64')
1222+
elif isinstance(dtype, np.dtype) and dtype.kind in ("U", "S"):
1223+
# we need to coerce to object dtype to avoid
1224+
# to allow numpy to take our string as a scalar value
1225+
dtype = object
1226+
if not isna(value):
1227+
value = to_str(value)
1228+
12241229
subarr = np.empty(length, dtype=dtype)
12251230
subarr.fill(value)
12261231

pandas/core/dtypes/common.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -419,7 +419,7 @@ def is_datetime64_dtype(arr_or_dtype):
419419
return False
420420
try:
421421
tipo = _get_dtype_type(arr_or_dtype)
422-
except TypeError:
422+
except (TypeError, UnicodeEncodeError):
423423
return False
424424
return issubclass(tipo, np.datetime64)
425425

pandas/core/resample.py

+46-6
Original file line numberDiff line numberDiff line change
@@ -418,23 +418,63 @@ def pad(self, limit=None):
418418

419419
def nearest(self, limit=None):
420420
"""
421-
Fill values with nearest neighbor starting from center
421+
Resample by using the nearest value.
422+
423+
When resampling data, missing values may appear (e.g., when the
424+
resampling frequency is higher than the original frequency).
425+
The `nearest` method will replace ``NaN`` values that appeared in
426+
the resampled data with the value from the nearest member of the
427+
sequence, based on the index value.
428+
Missing values that existed in the original data will not be modified.
429+
If `limit` is given, fill only this many values in each direction for
430+
each of the original values.
422431
423432
Parameters
424433
----------
425-
limit : integer, optional
426-
limit of how many values to fill
434+
limit : int, optional
435+
Limit of how many values to fill.
427436
428437
.. versionadded:: 0.21.0
429438
430439
Returns
431440
-------
432-
an upsampled Series
441+
Series or DataFrame
442+
An upsampled Series or DataFrame with ``NaN`` values filled with
443+
their nearest value.
433444
434445
See Also
435446
--------
436-
Series.fillna
437-
DataFrame.fillna
447+
backfill : Backward fill the new missing values in the resampled data.
448+
pad : Forward fill ``NaN`` values.
449+
450+
Examples
451+
--------
452+
>>> s = pd.Series([1, 2],
453+
... index=pd.date_range('20180101',
454+
... periods=2,
455+
... freq='1h'))
456+
>>> s
457+
2018-01-01 00:00:00 1
458+
2018-01-01 01:00:00 2
459+
Freq: H, dtype: int64
460+
461+
>>> s.resample('15min').nearest()
462+
2018-01-01 00:00:00 1
463+
2018-01-01 00:15:00 1
464+
2018-01-01 00:30:00 2
465+
2018-01-01 00:45:00 2
466+
2018-01-01 01:00:00 2
467+
Freq: 15T, dtype: int64
468+
469+
Limit the number of upsampled values imputed by the nearest:
470+
471+
>>> s.resample('15min').nearest(limit=1)
472+
2018-01-01 00:00:00 1.0
473+
2018-01-01 00:15:00 1.0
474+
2018-01-01 00:30:00 NaN
475+
2018-01-01 00:45:00 2.0
476+
2018-01-01 01:00:00 2.0
477+
Freq: 15T, dtype: float64
438478
"""
439479
return self._upsample('nearest', limit=limit)
440480

pandas/core/tools/timedeltas.py

+34-22
Original file line numberDiff line numberDiff line change
@@ -17,31 +17,43 @@
1717

1818
def to_timedelta(arg, unit='ns', box=True, errors='raise'):
1919
"""
20-
Convert argument to timedelta
20+
Convert argument to timedelta.
21+
22+
Timedeltas are absolute differences in times, expressed in difference
23+
units (e.g. days, hours, minutes, seconds). This method converts
24+
an argument from a recognized timedelta format / value into
25+
a Timedelta type.
2126
2227
Parameters
2328
----------
24-
arg : string, timedelta, list, tuple, 1-d array, or Series
25-
unit : str, optional
26-
Denote the unit of the input, if input is an integer. Default 'ns'.
27-
Possible values:
28-
{'Y', 'M', 'W', 'D', 'days', 'day', 'hours', hour', 'hr', 'h',
29-
'm', 'minute', 'min', 'minutes', 'T', 'S', 'seconds', 'sec', 'second',
30-
'ms', 'milliseconds', 'millisecond', 'milli', 'millis', 'L',
31-
'us', 'microseconds', 'microsecond', 'micro', 'micros', 'U',
32-
'ns', 'nanoseconds', 'nano', 'nanos', 'nanosecond', 'N'}
33-
box : boolean, default True
34-
- If True returns a Timedelta/TimedeltaIndex of the results
35-
- if False returns a np.timedelta64 or ndarray of values of dtype
36-
timedelta64[ns]
29+
arg : str, timedelta, list-like or Series
30+
The data to be converted to timedelta.
31+
unit : str, default 'ns'
32+
Denotes the unit of the arg. Possible values:
33+
('Y', 'M', 'W', 'D', 'days', 'day', 'hours', hour', 'hr',
34+
'h', 'm', 'minute', 'min', 'minutes', 'T', 'S', 'seconds',
35+
'sec', 'second', 'ms', 'milliseconds', 'millisecond',
36+
'milli', 'millis', 'L', 'us', 'microseconds', 'microsecond',
37+
'micro', 'micros', 'U', 'ns', 'nanoseconds', 'nano', 'nanos',
38+
'nanosecond', 'N').
39+
box : bool, default True
40+
- If True returns a Timedelta/TimedeltaIndex of the results.
41+
- If False returns a numpy.timedelta64 or numpy.darray of
42+
values of dtype timedelta64[ns].
3743
errors : {'ignore', 'raise', 'coerce'}, default 'raise'
38-
- If 'raise', then invalid parsing will raise an exception
39-
- If 'coerce', then invalid parsing will be set as NaT
40-
- If 'ignore', then invalid parsing will return the input
44+
- If 'raise', then invalid parsing will raise an exception.
45+
- If 'coerce', then invalid parsing will be set as NaT.
46+
- If 'ignore', then invalid parsing will return the input.
4147
4248
Returns
4349
-------
44-
ret : timedelta64/arrays of timedelta64 if parsing succeeded
50+
timedelta64 or numpy.array of timedelta64
51+
Output type returned if parsing succeeded.
52+
53+
See also
54+
--------
55+
DataFrame.astype : Cast argument to a specified dtype.
56+
to_datetime : Convert argument to datetime.
4557
4658
Examples
4759
--------
@@ -69,10 +81,10 @@ def to_timedelta(arg, unit='ns', box=True, errors='raise'):
6981
TimedeltaIndex(['0 days', '1 days', '2 days', '3 days', '4 days'],
7082
dtype='timedelta64[ns]', freq=None)
7183
72-
See Also
73-
--------
74-
pandas.DataFrame.astype : Cast argument to a specified dtype.
75-
pandas.to_datetime : Convert argument to datetime.
84+
Returning an ndarray by using the 'box' keyword argument:
85+
86+
>>> pd.to_timedelta(np.arange(5), box=False)
87+
array([0, 1, 2, 3, 4], dtype='timedelta64[ns]')
7688
"""
7789
unit = parse_timedelta_unit(unit)
7890

0 commit comments

Comments
 (0)