Setting up an Ingestion Pipeline for CSV Files from S3 and Local File Systems

Original Slack Thread

hi!! i need help! i want to ingest CSV file, howto install plugins for CSV ? i not found “information” at web datahub.

I think s3 file as source ingest json/csv files -> https://datahubproject.io/docs/generated/ingestion/sources/s3

i have csv file on-premise …

Hi Raphael! The S3 source that Sudhakara linked to also support local file systems – I know the name of it can be confusing!

Yaml recipe

  type: s3
  config:
    path_specs:
      - include: "/Users/sst/install000.csv"
    "profiling": {"enabled": True}
    env: "PROD"```

or python pipeline

    pipeline = Pipeline.create(
        {
            "source": {
                "type": "s3",
                "config": {
                    "path_specs": [{
                        "include": "/Users/sst/install000.csv"}],
                    "profiling": {"enabled": True},
                },
            },
            "sink": {
                "type": "datahub-rest",
                "config": {
                    "server": "<http://localhost:8080>",
                }
            }
        }
    )
    return pipeline```