Thursday, June 7, 2018

Steps for importing csv / text data to Postgres database table in Ubuntu / Linux

Steps for importing csv / text data to Postgres database table in Ubuntu / Linux


  • If the file is too big to process in one step, cut the file into multiple files
    • split -l 250000 filename.txt  // Will split the file into multiple files with 250000 lines
  • In case if all columns are not required to be imported, create a new file with required number of columns. Mind the delimiter used.
    • cut -d'|' -f1,3,6,33,34,35 file_name.txt >> out.txt
  • Next two lines are specific to my case
    • sed -e '/^[a-z,A_Z]/d' test.txt > test_1.txt  // To delete the lines which starts with a character.
    • sed -i '/^$/d' file.txt   //To delete empty lines
  • Copy the first line which is the column names to all the splitted file outputs, except the first file. Use "vim".
  • I used a python program to delete lines which contains number of columns not equal to three.
#Python Program
fh = open('test.txt')      # Open the Original file
f = open("test_2.txt","w") # Create a new file to store the output
for line in fh:
    countLine = line.count("|")
    if countLine == 2:
       f.write(line)
fh.close()
  • Create a table with requited columns. Better to give data type as character varying and size as larger than reqired for all columns (eg character varying (400))
  • Copy the formatted txt / csv to postgres table. Here delimiter used is '|'.
    • COPY table_name(phone_no,subscriber_name,address,pincode,place,district) FROM 'local_location/filename.txt' DELIMITER '|' CSV HEADER;
  • Possible Errors
    • Error telling some special characters present, so not able to copy.  /*  "UTF8": 0xbf   */
    • Here i have deleted the lines with that special characters. Go this website and figure out the special character.
    • https://www.utf8-chartable.de/unicode-utf8-table.pl?utf8=0x       ¿
    • Using vim editor search for the special character in file and delete it manually.
  • After copying the file, do the required process in the table
    • --delete FROM table_name WHERE length(phone_no)!=10;
    • --delete FROM table_name WHERE length(phone_no)=0;
    • --delete FROM table_name WHERE phone_no is null
    • --delete FROM table_name WHERE length(subscriber_name)>95;
    • --delete FROM table_name WHERE length(address)>295;
  • Alter the column datatype to required one
    • ALTER TABLE table_name ALTER COLUMN phone_no TYPE bigint USING (phone_no::bigint);  //Column name is phone_no
  • Fill unfilled column in database table with value from another column, delemited value
    • update table_name set pincode = ( SELECT split_part(address, '!', 3));
  • Update Column by replacing occurance of a string
    • UPDATE SET = REPLACE ( , 'string1', 'string2' );

Steps for take postgres table dump and import to another Postgres database

  • Login as root user
    • sudo su
  • Login as postgres user
    • su - postgres
  • Take dump of the table to file named output.sql
    • pg_dump -p 5432 --column-inserts --data-only --table=table_name database_name > output.sql
  • Copy output.sql to postgres/data folder of the machine where it is to be ported.
  • Create table at destination with same table name and properties
  • Go inside the /postgres/data folder and give this command
    • psql -p 5444 -d database_name -f output.sql  
      • -p is for port number
      • -d for database name