Skip to content

File Management and Column Mappings

Overview

The Files module handles uploads (CSV, XLSX, XLS, TXT) and maps each column to semantic meaning through column mappings.

These mappings let pipeline operations find the right data even when source column names vary between uploads.

Uploading a File

Files are uploaded through files:upload.

After upload, the system checks prior mapping history for the same organization and pre-populates mappings when possible.

Auto-Matched Mappings

When a new upload has exactly the same columns in the same order as a previous uploaded file with saved mappings, mappings are copied automatically.

Matching Rules

  • Only files in the same organization are considered.
  • Column names and order must match exactly.
  • The latest matching file is used.
  • Only ColumnMapping rows are copied.

User Behavior

After upload, users are redirected with ?auto_mapped=1 and see:

  • A one-time success banner per file in browser session storage
  • Auto-matched badges on mapped rows
  • A badge in the Map Columns card header
  • A help modal from the What is this action

Users can still edit mappings before saving.

Implementation Notes

In prismio.files.views:

  • _get_file_columns(uploaded_file) returns ordered file columns
  • _get_latest_exact_match_mappings(uploaded_file, current_columns) finds the most recent exact match and returns a mapping dictionary

Suggested (Default) Mappings

If there is no exact prior match, the view still suggests defaults from the latest mapping for matching column names in the same organization.

Suggested mappings are not persisted until users click Save Mappings.

User Behavior

  • Info banner explains suggestions are not saved yet
  • Suggested badge appears per pre-selected row
  • Unrecognized columns still default to Custom

Implementation Notes

  • _get_latest_column_defaults(uploaded_file, current_columns) returns {column_name: column_type} from latest history
  • Suggestions are passed as existing_mappings and not written to DB

Viewing and Editing Mappings

files:view_file provides a data preview and mapping form.

Context values used in the UI:

  • existing_mappings: saved mappings or suggestions
  • mappings_status: Mapped or Not Mapped
  • auto_matched_mappings: true when loaded with ?auto_mapped=1
  • auto_matched_columns: list of auto-filled columns
  • has_mapping_suggestions: true when suggestions exist without saved mappings
  • suggested_mapping_columns: list of suggested columns

POSTing to files:view_file saves mappings via ColumnMapping.update_or_create.

Column Mapping Choices

ColumnDataMappings (a TextChoices enum on ColumnMapping) includes:

  • Personal info fields
  • Contact fields
  • Address fields
  • Birth date components
  • Gender and sex
  • Employment and education fields
  • Social fields and URLs
  • Custom for unmapped columns

Pipeline Operations

Pipelines run ordered PipelineStep.operation_key entries through a registry.

Execution Flow

  1. run_pipeline iterates steps by step_order then id
  2. Each operation_key resolves from OPERATION_REGISTRY
  3. Operation is instantiated with uploaded_file and current dataframe
  4. operation.run() executes setup(), process(), teardown()
  5. Output dataframe is passed to next step

If output_path is provided, final output is written as CSV.

Add a New Operation

Operations live in prismio.files.processing.operations and inherit from Operation.

Each operation should define:

  • key
  • label
  • required_fields
  • requirement_logic
  • process(self)

Example:

from prismio.files.models import ColumnDataMappings
from prismio.files.processing.base import FieldRequirement, MatchType, Operation


class NormalizeEmailOperation(Operation):
    key = "normalize_email"
    label = "normalize_email"
    required_fields = (
        FieldRequirement(
            alias="email",
            field_name=ColumnDataMappings.EMAIL.value,
            match_type=MatchType.CONCEPT,
        ),
    )
    requirement_logic = "email"

    def process(self) -> None:
        self.dataframe["email"] = (
            self.dataframe["email"].astype(str).str.strip().str.casefold()
        )

Register the Operation

Add the class to prismio.files.processing.registry:

from .operations import NormalizeEmailOperation


OPERATION_REGISTRY: dict[str, type[Operation]] = {
    ProcessFirstNameOperation.key: ProcessFirstNameOperation,
    ProcessFullNameOperation.key: ProcessFullNameOperation,
    NormalizeEmailOperation.key: NormalizeEmailOperation,
}

Registration is required for:

  • PipelineStep.clean() key validation
  • Admin dropdown choices
  • Runtime operation lookup

Compose Pipelines

Admin UI flow:

  • Create a Pipeline
  • Add ordered PipelineStep rows
  • Select operations from registered keys

Programmatic flow:

from prismio.files.models import Pipeline, PipelineStep
from prismio.files.processing.runtime import run_pipeline


pipeline = Pipeline.objects.create(name="Contact Cleanup", slug="contact-cleanup")
PipelineStep.objects.create(pipeline=pipeline, operation_key="process_first_name", step_order=1)
PipelineStep.objects.create(pipeline=pipeline, operation_key="normalize_email", step_order=2)

dataframe = run_pipeline(pipeline=pipeline, uploaded_file=uploaded_file)

Composition rules:

  • Steps execute in ascending step_order
  • (pipeline, step_order) must be unique
  • Each key must exist in OPERATION_REGISTRY
  • Earlier steps can prepare columns consumed by later steps