File Management and Column Mappings¶
Overview¶
The Files module handles uploads (CSV, XLSX, XLS, TXT) and maps each column to semantic meaning through column mappings.
These mappings let pipeline operations find the right data even when source column names vary between uploads.
Uploading a File¶
Files are uploaded through files:upload.
After upload, the system checks prior mapping history for the same organization and pre-populates mappings when possible.
Auto-Matched Mappings¶
When a new upload has exactly the same columns in the same order as a previous uploaded file with saved mappings, mappings are copied automatically.
Matching Rules¶
- Only files in the same organization are considered.
- Column names and order must match exactly.
- The latest matching file is used.
- Only
ColumnMappingrows are copied.
User Behavior¶
After upload, users are redirected with ?auto_mapped=1 and see:
- A one-time success banner per file in browser session storage
- Auto-matched badges on mapped rows
- A badge in the Map Columns card header
- A help modal from the What is this action
Users can still edit mappings before saving.
Implementation Notes¶
In prismio.files.views:
_get_file_columns(uploaded_file)returns ordered file columns_get_latest_exact_match_mappings(uploaded_file, current_columns)finds the most recent exact match and returns a mapping dictionary
Suggested (Default) Mappings¶
If there is no exact prior match, the view still suggests defaults from the latest mapping for matching column names in the same organization.
Suggested mappings are not persisted until users click Save Mappings.
User Behavior¶
- Info banner explains suggestions are not saved yet
- Suggested badge appears per pre-selected row
- Unrecognized columns still default to Custom
Implementation Notes¶
_get_latest_column_defaults(uploaded_file, current_columns)returns{column_name: column_type}from latest history- Suggestions are passed as
existing_mappingsand not written to DB
Viewing and Editing Mappings¶
files:view_file provides a data preview and mapping form.
Context values used in the UI:
existing_mappings: saved mappings or suggestionsmappings_status:MappedorNot Mappedauto_matched_mappings: true when loaded with?auto_mapped=1auto_matched_columns: list of auto-filled columnshas_mapping_suggestions: true when suggestions exist without saved mappingssuggested_mapping_columns: list of suggested columns
POSTing to files:view_file saves mappings via ColumnMapping.update_or_create.
Column Mapping Choices¶
ColumnDataMappings (a TextChoices enum on ColumnMapping) includes:
- Personal info fields
- Contact fields
- Address fields
- Birth date components
- Gender and sex
- Employment and education fields
- Social fields and URLs
- Custom for unmapped columns
Pipeline Operations¶
Pipelines run ordered PipelineStep.operation_key entries through a registry.
Execution Flow¶
run_pipelineiterates steps bystep_orderthenid- Each
operation_keyresolves fromOPERATION_REGISTRY - Operation is instantiated with
uploaded_fileand current dataframe operation.run()executessetup(),process(),teardown()- Output dataframe is passed to next step
If output_path is provided, final output is written as CSV.
Add a New Operation¶
Operations live in prismio.files.processing.operations and inherit from
Operation.
Each operation should define:
keylabelrequired_fieldsrequirement_logicprocess(self)
Example:
from prismio.files.models import ColumnDataMappings
from prismio.files.processing.base import FieldRequirement, MatchType, Operation
class NormalizeEmailOperation(Operation):
key = "normalize_email"
label = "normalize_email"
required_fields = (
FieldRequirement(
alias="email",
field_name=ColumnDataMappings.EMAIL.value,
match_type=MatchType.CONCEPT,
),
)
requirement_logic = "email"
def process(self) -> None:
self.dataframe["email"] = (
self.dataframe["email"].astype(str).str.strip().str.casefold()
)
Register the Operation¶
Add the class to prismio.files.processing.registry:
from .operations import NormalizeEmailOperation
OPERATION_REGISTRY: dict[str, type[Operation]] = {
ProcessFirstNameOperation.key: ProcessFirstNameOperation,
ProcessFullNameOperation.key: ProcessFullNameOperation,
NormalizeEmailOperation.key: NormalizeEmailOperation,
}
Registration is required for:
PipelineStep.clean()key validation- Admin dropdown choices
- Runtime operation lookup
Compose Pipelines¶
Admin UI flow:
- Create a
Pipeline - Add ordered
PipelineSteprows - Select operations from registered keys
Programmatic flow:
from prismio.files.models import Pipeline, PipelineStep
from prismio.files.processing.runtime import run_pipeline
pipeline = Pipeline.objects.create(name="Contact Cleanup", slug="contact-cleanup")
PipelineStep.objects.create(pipeline=pipeline, operation_key="process_first_name", step_order=1)
PipelineStep.objects.create(pipeline=pipeline, operation_key="normalize_email", step_order=2)
dataframe = run_pipeline(pipeline=pipeline, uploaded_file=uploaded_file)
Composition rules:
- Steps execute in ascending
step_order (pipeline, step_order)must be unique- Each key must exist in
OPERATION_REGISTRY - Earlier steps can prepare columns consumed by later steps