python – splitting very large csv file into hundred of thousands csv files based on colulm values

I would like to split a very big csv file with hundreds of millions of row into small hundred of thousand files based on column value.

I have tried many options:

  • I have tried opening file and not closing but there is a limit for the number of opened file at the same time, by using this function from this post
def split_csv_file(f, dst_dir, keyfunc):
    csv_reader = csv.reader(f)
    csv_writers = {}
    for row in csv_reader:
        k = keyfunc(row)
        if k not in csv_writers:
            csv_writers(k) = csv.writer(open(os.path.join(dst_dir, k),
                                             mode='w', newline=''))
  • I have tried using a simple algorithm with iterating through the files and appending the row to the corresponding file but it is very very slow
with open(filename, 'r') as f:
    with line in f:
        filename_w = line.split(',')(1) + '.csv'
        if os.path.exists(filename_w):
             with open(filename_w, 'a') as fw:
             with open(filename_w, 'w') as fw:

with open

  • I have tried using pyspark using the partition option, same
df.coalesce(1).write.partitionBy(colname).format("csv").option("header", "true").mode("overwrite").save(out_dir)
  • I have tried using awk, same
awk -F, '{print >$2".csv"}' something.csv

Thank you

How to create a search page that searches in a custom csv or excel file and shows Exists/Doesn’t Exist result?

My website is on a shared host and cannot use tika because Java is not executable. So the search file attachment module does not work properly.

Any help will be greatly appreciated.

powershell – can any one provide complete script to compare csv file and SharePoint list column(People) and extract records not exist in SharePoint list

I have one excel file and a SharePoint list(Employee master). I want to extract the user in csv file who is not in the SharePoint list(Employee Master).

If the employee not exists in SharePoint list we treat him as user terminated for internal tracking.

Please provide the PowerShell solution or if any other way.

pages – How to upload a csv file in wordpress website, processing with a python script and download at wordpress website?

My question includes 3 parts

1)upload a csv file in wordpress website How can I upload a csv file into WordPress?

2)processing with a python script Run Python Script on WordPress Website

3)download at wordpress website eBooks download website, page or post?

Also, I know MySQL might help to interchange. Each step of my questions can be done, however, a detailed code example on interchanging files between WordPress and python is hard to find. Could you kindly provide a toy example to finish the 3 steps process?

For example, this code can

  1. upload a csv at WordPress(PhP) which including only number,
  2. added all numbers up with Python script and sent back the number as csv to WordPress
  3. click a borten to download it

Thank you for your help!

import – Parsing DATE while copying csv file into table PostgreSQL

Hi there! I have a long series of .csv files which I want to import into a local database. I believe my query is correct, but there are some problems in parsing DATE and TIMESTAMP columns. PostgreSQL reads these cols expecting an ISO format yyyy/mm/dd but my data has it in another format: dd/mm/yyyy.

I read online and on other Stackoverflow answers that one can SET the datestyle to be different, but it’s not recommended.

I was wondering if there was a way to specify the format of the columns to import. Also, I do not need to import all columns from the csv file: can I leave some out?

First, I wrote the code to create the table (sorry if column names are in Italian, but it’s not important):

    bici INT,
    tipo_bici VARCHAR(20),
    cliente_anonimizzato INT,
    data_riferimento_prelievo DATE,
    data_prelievo TIMESTAMP,
    numero_stazione_prelievo INT,
    nome_stazione_prelievo TEXT,
    slot_prelievo SMALLINT,
    data_riferimento_restituzione DATE,
    data_restituzione TIMESTAMP,
    numero_stazione_restituzione INT,
    nome_stazione_restituzione TEXT,
    slot_restituzione SMALLINT,
    durata VARCHAR(10),
    distanza_totale REAL,
    co2_evitata REAL,
    calorie_consumate REAL,
    penalità CHAR(2)

Then I add the query to copy data into the table:

COPY bikes(
FROM '/Users/luca/tesi/data/2019q3.csv'

The code seems fine, except the following error pops up:

ERROR:  date/time field value out of range: "31/07/2019"
HINT:  Perhaps you need a different "datestyle" setting.
CONTEXT:  COPY bikes, line 25296, column data_riferimento_restituzione: "31/07/2019"
SQL state: 22008

How can I specify in the CREATE TABLE portion of the code the format to parse? Also, I do not actually need all the cols of this csv, how do I leave these out? I tried to specify only those I need but I get an import error:

ERROR:  extra data after last expected column

Thank you!

c# – Como remover ” de um registro .csv

Fala, galera! sou iniciante no C# e queria saber como remover o caracter ” quando ele tiver um determinado número de ocorrências em um registro de um arquivo .csv.

Exemplo: “alfredo”; “05359888”; “porto ” alegre”;
Temos 7 ocorrencias do caracter ” e deveria ter apenas 6 ocorrencias. Logo, queria remover a ” que foi em excesso com o uso do C#

How to write to different Columns using Python to CSV

I am trying to write some values onto a CSV file but all the vlaues are getting jammed into one column how do I use a different colomn for each value? Please go easy on me I am an absolute beginner with programming. It would be really helpful if someone could write an example code on how to solve this issue

Here is my code:

with open('log.csv', 'a',) as file:
    fieldnames = ('Timestamp', 'Overall result', 'Soll-orderno', 'Desired-HW-Version', 'Desired-SF-Version',
     'Desired-productcode', 'Desired-device-type', 'Scancode', 'Wbm-orderno', 'Wbm-HW-Version', 'Wbm-SF-Version',
      'Wbm-mac-address', 'test-product-code', 'combined-product-code' , 'wbm-device-type', 'test-device-type')
    writer = csv.DictWriter(file, fieldnames=fieldnames)

    writer.writerow({'Timestamp':now, 'Overall result':'Blank', 'Soll-orderno':d_ordernum, 'Desired-HW-Version':d_hw_version, 'Desired-SF-Version':d_sf_version,
     'Desired-productcode':pc_praefix, 'Desired-device-type':d_dev_typ, 'Scancode':scancode_string, 'Wbm-orderno':ord_nmr, 'Wbm-HW-Version':v, 'Wbm-SF-Version':b,
      'Wbm-mac-address':mac_addr, 'test-product-code':'Blank', 'combined-product-code':product_code, 'wbm-device-type':dev_typ, 'test-device-type':'Blank'})

when I open the csv file using Excel it SHOULD look like this:
What the excel sheet is supposed to look like

But unfortunately my CSV file looks like this when i open it with Excel
What the excel sheet looks like now

As you can see all the values have gotten jam packed into one colomn instead of 1value:1column I would really appreciate it if you guys could show me how to do this. I think i am doing everything correctly but it still gets written only to one column. Please show me a solution for this problem if you guys can.

sharepoint online – Navigation link in CSV file for termstore

If you want to directly make the terms of the term set in .csv file become navigation links after importing, I think it is impossible.

You have to import the .csv file to term store firstly and then take turns to set the navigation link for each term set here.
enter image description here

performance – Compare hostnames and leases between CSV and DHCP info

I have 3 nested loops and they do the following:

  1. Get scopes from DHCP
  2. Get hostnames from csv file
  3. Compares hostnames and leases, matches are added to a hash table

There are 5 DHCP servers, 100’s of scopes and thousands of leases.

What would be a good method to speed this up?

$DHServers = Get-DhcpServerInDC #get DHCP info
$hashtable = @{} #create hash table

foreach ($server in $DHServers){
$scopes = Get-DHCPServerv4Scope -ComputerName $server.dnsname #get all scopes in DHCP   
    foreach ($_ in (Import-Csv C:scriptAsset_List.csv | Select-Object -ExpandProperty asset)){ #get hostnames from list          
        foreach ($scope in $scopes){            
            if($scope | Get-DhcpServerV4Lease -ComputerName $server.dnsname | Where-Object HostName -like "$_*" ){ #compares the hostname to find which lease it is in
                $scopename=$ #matches give scope name
                $hashtable($_) = $scopename #when a match is found, add keys and values to table

python – Reading and analyzing budget and transaction data from user input and .csv

Good first effort!

On currentBudget:

One of the things that is easy to tell you about is Python’s context managers docs.

The context manager lets you write code in a different context and takes care of preparing that context and cleaning it up. In your case, there’s a context to deal with opened files 🙂

So, instead of

def currentBudget():
    fileTransaction = open("transactions.csv", "r")
    info = fileTransaction.readlines()
    # ...

it is recommended that you do

def currentBudget():
    with open("transactions.csv", "r") as f:
        info = fileTransaction.readlines()
    # ...

The context manager makes sure the file is properly closed even if the code raises an error.
Generally, inside the with statement you write the least amount of code possible, so that the context can get cleaned up ASAP.

Also, if you are dealing with csv files you might want to take a look at the csv library Python provides: csv docs.

On organisation:

I really liked that you grouped related prints in functions to call repeatedly! One thing you might consider is factoring the different actions you take, depending on the userInput variable, into separate functions. That way, the body of your main function is cleaner and easier to read, and then you have a separate function for each functionality.

On user input:

Clever usage of .lower() to handle capitalised input 😉

You are doing relatively basic I/O, with the user just typing a single letter to pick an action. One thing you might want to do is

userInput = input("nEnter your Choice: ").trim().lower()

This removes leading/trailing whitespace (with .trim).
Additionally, you may want to also add (0) at the end of the line if it is acceptable for the user to write a whole word but you only need to use the first letter to distinguish. This doesn’t make much sense while the actions are a), b), etc, but would make sense if you renamed the options to reflect what they actually do.

Good luck 🙂