File Management and Column Mappings¶

Overview¶

The Files module handles uploads (CSV, XLSX, XLS, TXT) and maps each column to semantic meaning through column mappings.

These mappings let pipeline operations find the right data even when source column names vary between uploads.

Uploading a File¶

Files are uploaded through files:upload.

After upload, the system checks prior mapping history for the same organization and pre-populates mappings when possible.

Auto-Matched Mappings¶

When a new upload has exactly the same columns in the same order as a previous uploaded file with saved mappings, mappings are copied automatically.

Matching Rules¶

Only files in the same organization are considered.
Column names and order must match exactly.
The latest matching file is used.
Only ColumnMapping rows are copied.

User Behavior¶

After upload, users are redirected with ?auto_mapped=1 and see:

A one-time success banner per file in browser session storage
Auto-matched badges on mapped rows
A badge in the Map Columns card header
A help modal from the What is this action

Users can still edit mappings before saving.

Implementation Notes¶

In prismio.files.views:

_get_file_columns(uploaded_file) returns ordered file columns
_get_latest_exact_match_mappings(uploaded_file, current_columns) finds the most recent exact match and returns a mapping dictionary

Suggested (Default) Mappings¶

If there is no exact prior match, the view still suggests defaults from the latest mapping for matching column names in the same organization.

Suggested mappings are not persisted until users click Save Mappings.

User Behavior¶

Info banner explains suggestions are not saved yet
Suggested badge appears per pre-selected row
Unrecognized columns still default to Custom

Implementation Notes¶

_get_latest_column_defaults(uploaded_file, current_columns) returns {column_name: column_type} from latest history
Suggestions are passed as existing_mappings and not written to DB

Viewing and Editing Mappings¶

files:view_file provides a data preview and mapping form.

Context values used in the UI:

existing_mappings: saved mappings or suggestions
mappings_status: Mapped or Not Mapped
auto_matched_mappings: true when loaded with ?auto_mapped=1
auto_matched_columns: list of auto-filled columns
has_mapping_suggestions: true when suggestions exist without saved mappings
suggested_mapping_columns: list of suggested columns

POSTing to files:view_file saves mappings via ColumnMapping.update_or_create.

Column Mapping Choices¶

ColumnDataMappings (a TextChoices enum on ColumnMapping) includes:

Personal info fields
Contact fields
Address fields
Birth date components
Gender and sex
Employment and education fields
Social fields and URLs
Custom for unmapped columns

Pipeline Operations¶

Pipelines run ordered PipelineStep.operation_key entries through a registry.

Execution Flow¶

run_pipeline iterates steps by step_order then id
Each operation_key resolves from OPERATION_REGISTRY
Operation is instantiated with uploaded_file and current dataframe
operation.run() executes setup(), process(), teardown()
Output dataframe is passed to next step

If output_path is provided, final output is written as CSV.

Add a New Operation¶

Operations live in prismio.files.processing.operations and inherit from Operation.

Each operation should define:

key
label
required_fields
requirement_logic
process(self)

Example:

from prismio.files.models import ColumnDataMappings
from prismio.files.processing.base import FieldRequirement, MatchType, Operation


class NormalizeEmailOperation(Operation):
    key = "normalize_email"
    label = "normalize_email"
    required_fields = (
        FieldRequirement(
            alias="email",
            field_name=ColumnDataMappings.EMAIL.value,
            match_type=MatchType.CONCEPT,
        ),
    )
    requirement_logic = "email"

    def process(self) -> None:
        self.dataframe["email"] = (
            self.dataframe["email"].astype(str).str.strip().str.casefold()
        )

Register the Operation¶

Add the class to prismio.files.processing.registry:

from .operations import NormalizeEmailOperation


OPERATION_REGISTRY: dict[str, type[Operation]] = {
    ProcessFirstNameOperation.key: ProcessFirstNameOperation,
    ProcessFullNameOperation.key: ProcessFullNameOperation,
    NormalizeEmailOperation.key: NormalizeEmailOperation,
}

Registration is required for:

PipelineStep.clean() key validation
Admin dropdown choices
Runtime operation lookup

Compose Pipelines¶

Admin UI flow:

Create a Pipeline
Add ordered PipelineStep rows
Select operations from registered keys

Programmatic flow:

from prismio.files.models import Pipeline, PipelineStep
from prismio.files.processing.runtime import run_pipeline


pipeline = Pipeline.objects.create(name="Contact Cleanup", slug="contact-cleanup")
PipelineStep.objects.create(pipeline=pipeline, operation_key="process_first_name", step_order=1)
PipelineStep.objects.create(pipeline=pipeline, operation_key="normalize_email", step_order=2)

dataframe = run_pipeline(pipeline=pipeline, uploaded_file=uploaded_file)

Composition rules:

Steps execute in ascending step_order
(pipeline, step_order) must be unique
Each key must exist in OPERATION_REGISTRY
Earlier steps can prepare columns consumed by later steps