-
Notifications
You must be signed in to change notification settings - Fork 53
/
Copy pathrange_read.Rd
168 lines (150 loc) · 6.78 KB
/
range_read.Rd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/range_read.R
\name{range_read}
\alias{range_read}
\alias{read_sheet}
\title{Read a Sheet into a data frame}
\usage{
range_read(
ss,
sheet = NULL,
range = NULL,
col_names = TRUE,
col_types = NULL,
na = "",
trim_ws = TRUE,
skip = 0,
n_max = Inf,
guess_max = min(1000, n_max),
.name_repair = "unique"
)
read_sheet(
ss,
sheet = NULL,
range = NULL,
col_names = TRUE,
col_types = NULL,
na = "",
trim_ws = TRUE,
skip = 0,
n_max = Inf,
guess_max = min(1000, n_max),
.name_repair = "unique"
)
}
\arguments{
\item{ss}{Something that identifies a Google Sheet:
\itemize{
\item its file id as a string or \code{\link[googledrive:drive_id]{drive_id}}
\item a URL from which we can recover the id
\item a one-row \code{\link[googledrive:dribble]{dribble}}, which is how googledrive
represents Drive files
\item an instance of \code{googlesheets4_spreadsheet}, which is what \code{\link[=gs4_get]{gs4_get()}}
returns
}
Processed through \code{\link[=as_sheets_id]{as_sheets_id()}}.}
\item{sheet}{Sheet to read, in the sense of "worksheet" or "tab". You can identify a sheet by name, with a string, or by position, with a number. Ignored if the sheet is specified via \code{range}. If neither argument specifies the sheet, defaults to the first visible sheet.}
\item{range}{A cell range to read from. If \code{NULL}, all non-empty cells are
read. Otherwise specify \code{range} as described in \href{https://developers.google.com/sheets/api/guides/concepts#a1_notation}{Sheets A1 notation}
or using the helpers documented in \link{cell-specification}. Sheets uses
fairly standard spreadsheet range notation, although a bit different from
Excel. Examples of valid ranges: \code{"Sheet1!A1:B2"}, \code{"Sheet1!A:A"},
\code{"Sheet1!1:2"}, \code{"Sheet1!A5:A"}, \code{"A1:B2"}, \code{"Sheet1"}. Interpreted
strictly, even if the range forces the inclusion of leading, trailing, or
embedded empty rows or columns. Takes precedence over \code{skip}, \code{n_max} and
\code{sheet}. Note \code{range} can be a named range, like \code{"sales_data"}, without
any cell reference.}
\item{col_names}{\code{TRUE} to use the first row as column names, \code{FALSE} to get
default names, or a character vector to provide column names directly. If
user provides \code{col_types}, \code{col_names} can have one entry per column or one
entry per unskipped column.}
\item{col_types}{Column types. Either \code{NULL} to guess all from the
spreadsheet or a string of readr-style shortcodes, with one character or
code per column. If exactly one \code{col_type} is specified, it is recycled.
See Column Specification for more.}
\item{na}{Character vector of strings to interpret as missing values. By
default, blank cells are treated as missing data.}
\item{trim_ws}{Logical. Should leading and trailing whitespace be trimmed
from cell contents?}
\item{skip}{Minimum number of rows to skip before reading anything, be it
column names or data. Leading empty rows are automatically skipped, so this
is a lower bound. Ignored if \code{range} is given.}
\item{n_max}{Maximum number of data rows to parse into the returned tibble.
Trailing empty rows are automatically skipped, so this is an upper bound on
the number of rows in the result. Ignored if \code{range} is given. \code{n_max} is
imposed locally, after reading all non-empty cells, so, if speed is an
issue, it is better to use \code{range}.}
\item{guess_max}{Maximum number of data rows to use for guessing column
types.}
\item{.name_repair}{Handling of column names. By default, googlesheets4
ensures column names are not empty and are unique. There is full support
for \code{.name_repair} as documented in \code{\link[tibble:tibble]{tibble::tibble()}}.}
}
\value{
A \link[tibble:tibble-package]{tibble}
}
\description{
This is the main "read" function of the googlesheets4 package. It goes by two
names, because we want it to make sense in two contexts:
\itemize{
\item \code{read_sheet()} evokes other table-reading functions, like
\code{readr::read_csv()} and \code{readxl::read_excel()}. The \code{sheet} in this case
refers to a Google (spread)Sheet.
\item \code{range_read()} is the right name according to the naming convention used
throughout the googlesheets4 package.
}
\code{read_sheet()} and \code{range_read()} are synonyms and you can use either one.
}
\section{Column Specification}{
Column types must be specified in a single string of readr-style short
codes, e.g. "cci?l" means "character, character, integer, guess, logical".
This is not where googlesheets4's col spec will end up, but it gets the
ball rolling in a way that is consistent with readr and doesn't reinvent
any wheels.
Shortcodes for column types:
\itemize{
\item \verb{_} or \code{-}: Skip. Data in a skipped column is still requested from the
API (the high-level functions in this package are rectangle-oriented), but
is not parsed into the data frame output.
\item \verb{?}: Guess. A type is guessed for each cell and then a consensus type is
selected for the column. If no atomic type is suitable for all cells, a
list-column is created, in which each cell is converted to an R object of
"best" type. If no column types are specified, i.e. \code{col_types = NULL},
all types are guessed.
\item \code{l}: Logical.
\item \code{i}: Integer. This type is never guessed from the data, because Sheets
have no formal cell type for integers.
\item \code{d} or \code{n}: Numeric, in the sense of "double".
\item \code{D}: Date. This type is never guessed from the data, because date cells
are just serial datetimes that bear a "date" format.
\item \code{t}: Time of day. This type is never guessed from the data, because time
cells are just serial datetimes that bear a "time" format. \emph{Not implemented
yet; returns POSIXct.}
\item \code{T}: Datetime, specifically POSIXct.
\item \code{c}: Character.
\item \code{C}: Cell. This type is unique to googlesheets4. This returns raw cell
data, as an R list, which consists of everything sent by the Sheets API for
that cell. Has S3 type of \code{"CELL_SOMETHING"} and \code{"SHEETS_CELL"}. Mostly
useful internally, but exposed for those who want direct access to, e.g.,
formulas and formats.
\item \code{L}: List, as in "list-column". Each cell is a length-1 atomic vector of
its discovered type.
\item \emph{Still to come}: duration (code will be \code{:}) and factor (code will be
\code{f}).
}
}
\examples{
\dontshow{if (gs4_has_token()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf}
ss <- gs4_example("deaths")
read_sheet(ss, range = "A5:F15")
read_sheet(ss, range = "other!A5:F15", col_types = "ccilDD")
read_sheet(ss, range = "arts_data", col_types = "ccilDD")
read_sheet(gs4_example("mini-gap"))
read_sheet(
gs4_example("mini-gap"),
sheet = "Europe",
range = "A:D",
col_types = "ccid"
)
\dontshow{\}) # examplesIf}
}