FilesRepository Components
Overview
The FilesRepository is a modular component system for file operations in the Data Machine pipeline system. Introduced in version 0.2.1, it centralizes file handling functionality and reduces code duplication across handlers.
Architecture
Location: /inc/Core/FilesRepository/
Components: 6 specialized classes
Since: 0.2.1
Components
DirectoryManager
File: DirectoryManager.php
Purpose: Directory creation and path management
use DataMachineCoreFilesRepositoryDirectoryManager;
$dir_manager = new DirectoryManager();
$pipeline_dir = $dir_manager->get_pipeline_directory($pipeline_id);
$flow_dir = $dir_manager->get_flow_directory($pipeline_id, $flow_id);
$job_dir = $dir_manager->get_job_directory($pipeline_id, $flow_id, $job_id);
Key Methods:
get_pipeline_directory($pipeline_id): Get pipeline directoryget_flow_directory($pipeline_id, $flow_id): Get flow directoryget_job_directory($pipeline_id, $flow_id, $job_id): Get job directoryget_flow_files_directory($pipeline_id, $flow_id): Get flow file storage directoryget_pipeline_context_directory($pipeline_id, $pipeline_name): Get pipeline context directoryensure_directory_exists($directory): Create directory if it does not exist
FileStorage
File: FileStorage.php
Purpose: File operations and flow-isolated storage
use DataMachineCoreFilesRepositoryFileStorage;
$storage = new FileStorage();
$context = [
'pipeline_id' => $pipeline_id,
'flow_id' => $flow_id,
];
$stored_path = $storage->store_file($source_path, $filename, $context);
Key Methods:
store_file($source_path, $filename, $context): Copy a local file into flow file storagestore_pipeline_file($pipeline_id, $pipeline_name, $file_data): Store a pipeline context fileget_all_files($context): List files for a flowget_pipeline_files($pipeline_id, $pipeline_name): List pipeline context filesdelete_file($filename, $context): Delete a stored filestore_data_packet($data, $job_id, $context): Persist step data for a jobretrieve_data_packet($reference): Read a persisted data packet
FileCleanup
File: FileCleanup.php
Purpose: Retention policy enforcement and cleanup
use DataMachineCoreFilesRepositoryFileCleanup;
$cleanup = new FileCleanup();
// Automatic cleanup via scheduled action
Key Features:
- Scheduled cleanup of old files
- Retention policy enforcement
- Job data cleanup on failure
- Configurable retention periods
ImageValidator
File: ImageValidator.php
Purpose: Image validation and metadata extraction
use DataMachineCoreFilesRepositoryImageValidator;
$validator = new ImageValidator();
$validation = $validator->validate_image_file($file_path);
if ($validation['valid']) {
$metadata = $validation['metadata'];
// width, height, mime_type, file_size, etc.
}
Key Methods:
validate_image_file($file_path): Validate image and extract metadatais_valid_image_type($mime_type): Check if MIME type is supportedget_image_dimensions($file_path): Get image width/height
RemoteFileDownloader
File: RemoteFileDownloader.php
Purpose: Remote file downloading with validation
use DataMachineCoreFilesRepositoryRemoteFileDownloader;
$downloader = new RemoteFileDownloader();
$context = [
'pipeline_id' => $pipeline_id,
'flow_id' => $flow_id,
];
$result = $downloader->download_remote_file($url, $filename, $context);
if ($result) {
$local_path = $result['path'];
$stored_filename = $result['filename'];
$file_url = $result['url'];
}
Key Methods:
download_remote_file($url, $filename, $context, $options): Download remote file and store in flow files directory
FileRetrieval
File: FileRetrieval.php
Purpose: Data retrieval operations from flow-isolated file storage
Separated from FileStorage per Single Responsibility Principle – FileStorage handles write operations while FileRetrieval handles read operations.
use DataMachineCoreFilesRepositoryFileRetrieval;
$file_retrieval = new FileRetrieval();
$file_data = $file_retrieval->retrieve_data_by_job_id($job_id, [
'pipeline_id' => $pipeline_id,
'pipeline_name' => $pipeline_name,
'flow_id' => $flow_id,
'flow_name' => $flow_name
]);
Key Methods:
retrieve_data_by_job_id($job_id, $context): Retrieves all file data for a specific job
Context Requirements:
pipeline_id– Pipeline identifierpipeline_name– Pipeline name for directory pathflow_id– Flow identifierflow_name– Flow name for directory path
Integration Pattern
Components work together for complete file handling:
Note: When mapping flow_step_id -> flow_id, the REST API uses
datamachine_get_flow_id_from_stepfilter (see datamachine/inc/Api/Files.php:168). Implement this filter when connecting flow-step-aware file operations from extensions.
use DataMachineCoreFilesRepository{
DirectoryManager,
FileStorage,
ImageValidator,
RemoteFileDownloader
};
// Download and validate image
$downloader = new RemoteFileDownloader();
$context = [
'pipeline_id' => $pipeline_id,
'flow_id' => $flow_id,
];
$result = $downloader->download_remote_file($image_url, $filename, $context);
if ($result) {
$validator = new ImageValidator();
$validation = $validator->validate_image_file($result['path']);
if ($validation['valid']) {
// Image is valid and stored
$image_path = $result['path'];
}
}
Directory Structure
Files are organized under the datamachine-files uploads directory, grouped by pipeline then flow:
wp-content/uploads/datamachine-files/
└── pipeline-5/
├── context/
│ └── example.pdf
└── flow-42/
├── flow-42-files/
│ ├── image1.jpg
│ └── document.pdf
└── jobs/
└── job-123/
└── data.json
Scheduled Cleanup
Automatic cleanup is handled via WordPress Action Scheduler:
// Scheduled daily cleanup
if (!as_next_scheduled_action('datamachine_cleanup_old_files')) {
as_schedule_recurring_action(
time(),
DAY_IN_SECONDS,
'datamachine_cleanup_old_files'
);
}
Benefits
- Modularity: Specialized components for different file operations
- Isolation: Flow-specific directories prevent conflicts
- Validation: Built-in image and file validation
- Cleanup: Automatic retention policy enforcement
- Consistency: Standardized file handling across all handlers
Used By
The FilesRepository modular components are used by:
- Files Handler – Primary consumer of all components
- PublishHandler Base Class – Uses ImageValidator for validation
- Engine Actions – Use FileCleanup for retention policy enforcement
The modular architecture allows individual components to be used independently throughout the system.