Google Sheets Fetch Handler
Reads data from Google Sheets spreadsheets using OAuth2 authentication with configurable cell ranges, header detection, and row processing limits.
Architecture
Base Class: Extends FetchHandler (@since v0.2.1)
Inherited Functionality:
- Automatic deduplication via
isItemProcessed()andmarkItemProcessed() - Engine data storage via
storeEngineData()for downstream handlers - Standardized responses via
successResponse(),emptyResponse(),errorResponse() - Centralized logging and error handling
Implementation: Uses DataPacket class for consistent packet structure
Authentication
OAuth2 Required: Uses Google Sheets API v4 with client_id/client_secret authentication.
Centralized Provider (@since v0.2.5): Uses GoogleSheetsAuth from /inc/Core/OAuth/Providers/GoogleSheetsAuth.php for centralized OAuth2 authentication shared across fetch and publish handlers.
Service Integration: OAuth provider handles token refresh, service access, and credential management for seamless bi-directional integration.
Related Documentation: See OAuth Handlers for OAuth2Handler and OAuth Providers architecture.
Configuration Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
spreadsheet_id |
string | Yes | Google Sheets spreadsheet ID from URL |
worksheet_name |
string | No | Target worksheet name (default: "Sheet1") |
cell_range |
string | No | A1 notation range (default: "A1:Z1000") |
has_header_row |
boolean | No | Whether first row contains headers (default: false) |
row_limit |
integer | No | Maximum rows to process (1-1000, default: 100) |
Usage Examples
Basic Spreadsheet Read:
$handler_config = [
'googlesheets_fetch' => [
'spreadsheet_id' => '1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms'
]
];
Advanced Configuration:
$handler_config = [
'googlesheets_fetch' => [
'spreadsheet_id' => '1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms',
'worksheet_name' => 'Data',
'cell_range' => 'A1:F200',
'has_header_row' => true,
'row_limit' => 50
]
];
Data Processing
Row Selection: Processes eligible rows up to the specified limit, skipping empty rows and previously processed entries.
Header Handling: When has_header_row is true, uses first row values as column names. Otherwise generates column names as "Column_A", "Column_B", etc.
Deduplication: Uses unique identifier format: {spreadsheet_id}_{worksheet_name}_row_{row_number} for tracking processed rows.
Content Format
Row Content:
Source: Google Sheets
Spreadsheet: {spreadsheet_id}
Worksheet: {worksheet_name}
Row: {row_number}
{Header_or_Column_Name}: {cell_value}
{Header_or_Column_Name}: {cell_value}
Output Structure
Database Storage + Filter Injection Architecture
The Google Sheets fetch handler generates clean data packets for AI processing while storing empty engine parameters in database (spreadsheet data has no URLs).
Clean Data Packet (AI-Visible)
[
'data' => [
'content_string' => '...', // Formatted row data
'file_info' => null // No file info for spreadsheet data
],
'metadata' => [
'source_type' => 'googlesheets_fetch',
'original_id' => 'unique_row_identifier',
'spreadsheet_id' => 'spreadsheet_id',
'worksheet_name' => 'worksheet_name',
'row_number' => 'row_index',
'row_data' => { // Key-value pairs of row data
'column_name': 'cell_value'
},
'headers' => ['col1', 'col2'], // Column headers if present
'original_date_gmt' => 'current_timestamp'
// Clean data packet for AI processing
]
]
Engine Parameters Storage
// Stored in database via centralized datamachine_engine_data filter (array storage)
if ($job_id) {
apply_filters('datamachine_engine_data', null, $job_id, [
'source_url' => '', // Empty source URL for spreadsheet data
'image_url' => '' // Empty image URL for spreadsheet data
]);
}
Return Structure
return [
'processed_items' => [$clean_data_packet]
// Engine parameters stored separately in database
];
Cell Range Validation
A1 Notation: Supports standard Google Sheets A1 notation (e.g., A1:D100, B2:Z1000).
Range Limits: Validates cell range format using regex pattern for proper column letters and row numbers.
Processing Order: Processes rows sequentially from top to bottom within specified range.
Error Handling
Configuration Errors:
- Missing or invalid spreadsheet ID
- Malformed cell range notation
- Invalid worksheet names
API Errors:
- Authentication failures
- Spreadsheet access permissions
- API rate limiting
- Invalid spreadsheet or worksheet references
Data Errors:
- Empty spreadsheet or range
- Malformed row data
Logging: Uses datamachine_log action with debug/error levels for API calls, authentication, and data processing status.
Row Processing
Empty Row Handling: Automatically skips rows with no meaningful content (all empty cells).
Data Extraction: Extracts only non-empty cells from each row, associating with appropriate column headers or generated column names.
Batch Processing: Returns all eligible rows (up to limit) as separate DataPackets for downstream processing.