rpy2 / R interface¶
Warning
Up to pandas 0.19, a pandas.rpy
module existed with functionality to
convert between pandas and rpy2
objects. This functionality now lives in
the rpy2 project itself.
See the updating section
of the previous documentation for a guide to port your code from the
removed pandas.rpy
to rpy2
functions.
rpy2 is an interface to R running embedded in a Python process, and also includes functionality to deal with pandas DataFrames.
Converting data frames back and forth between rpy2 and pandas should be largely
automated (no need to convert explicitly, it will be done on the fly in most
rpy2 functions).
To convert explicitly, the functions are pandas2ri.py2ri()
and
pandas2ri.ri2py()
.
See also the documentation of the rpy2 project: https://rpy2.readthedocs.io.
In the remainder of this page, a few examples of explicit conversion is given. The pandas conversion of rpy2 needs first to be activated:
In [1]: from rpy2.robjects import r, pandas2ri
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-1-79b90b86f23f> in <module>()
----> 1 from rpy2.robjects import r, pandas2ri
/opt/conda/envs/pandas/lib/python3.6/site-packages/rpy2/robjects/pandas2ri.py in <module>()
20 import numpy
21 import pytz
---> 22 import tzlocal
23 import warnings
24
ModuleNotFoundError: No module named 'tzlocal'
In [2]: pandas2ri.activate()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-2-0531561cbbe9> in <module>()
----> 1 pandas2ri.activate()
NameError: name 'pandas2ri' is not defined
Transferring R data sets into Python¶
Once the pandas conversion is activated (pandas2ri.activate()
), many conversions
of R to pandas objects will be done automatically. For example, to obtain the ‘iris’ dataset as a pandas DataFrame:
In [3]: r.data('iris')
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-3-8bdc5639fb0c> in <module>()
----> 1 r.data('iris')
/pandas/pandas/core/window.py in __getattr__(self, attr)
161
162 raise AttributeError("%r object has no attribute %r" %
--> 163 (type(self).__name__, attr))
164
165 def _dir_additions(self):
AttributeError: 'Rolling' object has no attribute 'data'
In [4]: r['iris'].head()
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-4-b9fbcc010df6> in <module>()
----> 1 r['iris'].head()
/pandas/pandas/core/base.py in __getitem__(self, key)
260 elif not getattr(self, 'as_index', False):
261 if key not in self.obj.columns:
--> 262 raise KeyError("Column not found: {key}".format(key=key))
263 return self._gotitem(key, ndim=2)
264
KeyError: 'Column not found: iris'
If the pandas conversion was not activated, the above could also be accomplished
by explicitly converting it with the pandas2ri.ri2py
function
(pandas2ri.ri2py(r['iris'])
).
Converting DataFrames into R objects¶
The pandas2ri.py2ri
function support the reverse operation to convert
DataFrames into the equivalent R object (that is, data.frame):
In [5]: df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C':[7,8,9]},
...: index=["one", "two", "three"])
...:
In [6]: r_dataframe = pandas2ri.py2ri(df)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-6-7620636bf651> in <module>()
----> 1 r_dataframe = pandas2ri.py2ri(df)
NameError: name 'pandas2ri' is not defined
In [7]: print(type(r_dataframe))
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-7-4ab82496598b> in <module>()
----> 1 print(type(r_dataframe))
NameError: name 'r_dataframe' is not defined
In [8]: print(r_dataframe)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-8-714489a67f8e> in <module>()
----> 1 print(r_dataframe)
NameError: name 'r_dataframe' is not defined
The DataFrame’s index is stored as the rownames
attribute of the
data.frame instance.