batch file to rename mass files without extensions using CSV

All,

You are looking for a batch file to rename approx. 9,700 files that have 5-digit file names with different extensions.
SEMrush

I currently have an Excel spreadsheet with the current file name (without extension) and the new file name (without extension)

Basically, you need this .bat file to view a file directory that includes the files to be renamed using the Excel spreadsheet for new file names. Extensions would be wildcard because they are not included in the spreadsheet.

I would love to have this done today as I need these new filenames at 0400CT tomorrow 3/30.
Will pay via Paypal, Venmo, Cashapp ..

Thanks in advance,
RK

python – Read a large amount of XML and load in a single csv

I process a large amount of XML files that I got from here https://clinicaltrials.gov/ct2/resources/download#DownloadAllData. The download generates around 300,000 XML files of similar structure, which I ultimately want to load in a single frame of data / csv. The code gives the result I want: each line is a unique XML while the columns are the categories / names of variables coming from the XML tags. The lines are filled with the text of each XML tag.
My strategy is to first analyze the structure of each XML to get the child at the lowest level for each node and reconstruct the x-path for each of them. Using these x-paths, I get the text for each of these elements. Finally, I list the columns of the same name so that the column names are unique.

I am an absolute beginner in Python and this code is the result of a mix and painful correspondence from various forum entries and tutorials. It goes through, but given the size of the data sources, it takes a very long time. Presumably, because I have many for loops in my code that are certainly avoidable. It would be great if I could get feedback on how to improve the speed and maybe even some general remarks on how to better structure this code. I know it's not good, but that's all I could get out of it for now. 🙂
Cheers!

Find my code here:

#Import packages.
import pandas as pd
from lxml import etree
import numpy as np
import os
from os import listdir
from os.path import isfile, join
import time
from tqdm import tqdm


#Set options for displaying results
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', None)

global df_final
df_final = pd.DataFrame()

global content
content = pd.DataFrame()


def run(file, csv, df):
    global df_final
    data = etree.parse(file)
    root = data.getroot()


    #create empty lists for names and indeces.
    l_first = ()
    l_second = ()
    l_third = ()
    l_fourth = ()

    i_first = ()
    i_second1 = ()
    i_second2 = ()
    i_third1 = ()
    i_third2 = ()
    i_third3 = ()
    i_fourth1 = ()
    i_fourth2 = ()
    i_fourth3 = ()
    i_fourth4 = ()

    #get the structure of each xml and layout in pandas dataframe
    for i in range(len(root.getchildren())):
        temp = root.getchildren()(i)
        first = root.getchildren()(i).tag
        l_first.append(first)
        i_first.append(i)

        for j in range(len(temp.getchildren())):
            temp2 = temp.getchildren()(j)
            second = temp.getchildren()(j).tag
            l_second.append(second)
            i_second1.append(i)
            i_second2.append(j)

            for x in range(len(temp2.getchildren())):
                temp3 = temp2.getchildren()(x)
                third = temp2.getchildren()(x).tag
                l_third.append(third)
                i_third1.append(i)
                i_third2.append(j)
                i_third3.append(x)

                for y in range(len(temp3.getchildren())):
                    temp4 = temp3.getchildren()(y)
                    fourth = temp3.getchildren()(y).tag
                    l_fourth.append(fourth)
                    i_fourth1.append(i)
                    i_fourth2.append(j)
                    i_fourth3.append(x)
                    i_fourth4.append(y)

    df_first = pd.DataFrame(l_first, columns=('name_1'))
    df_second = pd.DataFrame(l_second, columns = ('name_2'))
    df_third = pd.DataFrame(l_third, columns = ('name_3'))
    df_fourth = pd.DataFrame(l_fourth, columns = ('name_4'))

    df_first('index_1') = i_first

    df_second('index_21') = i_second1
    df_second('index_22') = i_second2

    df_third('index_31') = i_third1
    df_third('index_32') = i_third2
    df_third('index_33') = i_third3

    df_fourth('index_41') = i_fourth1
    df_fourth('index_42') = i_fourth2
    df_fourth('index_43') = i_fourth3
    df_fourth('index_44') = i_fourth4

    #merge all three layers into one dataframe.
    df = df_first.merge(df_second,how='left', left_on='index_1', right_on='index_21')
    df = df.merge(df_third,how='left', left_on=('index_21','index_22'), right_on=('index_31','index_32'))
    df = df.merge(df_fourth,how='left', left_on=('index_31','index_32','index_33'), right_on=('index_41','index_42','index_43'))

    #create number of children per row.
    children = 0
    df('children') = np.where((df('index_21').notna()) & (df('index_31').isna()), 1, 0)
    df('children') = np.where((df('index_21').notna()) & (df('index_31').notna()), 2, df('children'))
    df('children') = np.where((df('index_21').notna()) & (df('index_31').notna()) & (df('index_41').notna()), 3, df('children'))

    #create x-path for each row depending on number of children.
    df('x_path') = "//" + df('name_1').astype(str)
    df('x_path') = np.where(df('children') == 1, df('x_path').astype(str) + "https://codereview.stackexchange.com/" + df('name_2').astype(str), df('x_path'))
    df('x_path') = np.where(df('children') == 2, df('x_path').astype(str) + "https://codereview.stackexchange.com/" + df('name_2').astype(str) + "https://codereview.stackexchange.com/" + df('name_3').astype(str), df('x_path'))
    df('x_path') = np.where(df('children') == 3, df('x_path').astype(str) + "https://codereview.stackexchange.com/" + df('name_2').astype(str) + "https://codereview.stackexchange.com/" + df('name_3').astype(str) + "https://codereview.stackexchange.com/" + df('name_4').astype(str), df('x_path'))

    #drop comments from dataframe
    df = df(~df("x_path").str.contains('Comment', na = True))

    #reset index of dataframe after comments have been dropped.
    df = df.reset_index()
    #df('id') = df.index.astype(str) + df('x_path')

    content = pd.DataFrame(columns = ('x_path', 'content'))
    x_path = df('x_path').to_list()
    x_path = list(dict.fromkeys(x_path))

    #iterate through all x-paths and get the text assigned to each path.
    for row in x_path:
        e = root.xpath(row)
        for i in e:
            #print(row, ": ",i.text)
            content = content.append({'x_path': row, 'content': i.text}, ignore_index=True)


    content = content.sort_values(by=('x_path'))
    df = df.sort_values(by=('x_path'))
    #print(content)
    df = df.merge(content,on = 'x_path')
    #print(df)

    #mark duplicates and rename such that names are unique. (Intention: names to be used as column names in later dataset).

    df('duplicate') = df.duplicated('x_path', keep = False)
    df_unique = df.loc(df('duplicate') == True)
    df_unique = df_unique.drop_duplicates(subset = "x_path", keep = "first")
    unique = ()
    unique = df_unique('x_path').to_list()
    #print(unique)

    df = df(('x_path','content'))
    df = df.drop_duplicates(subset=("x_path","content"), keep = "first")
    df = df.transpose()

    #get row with variable names and safe to list
    df.columns = df.iloc(0)
    df = df.drop(df.index(0))

    cols = pd.Series(df.columns)

    for dup in cols(cols.duplicated()).unique():
        cols(cols(cols == dup).index.values.tolist()) = (dup + '.' + str(i) if i != 0 else dup for i in
                                                         range(sum(cols == dup)))
    # rename the columns with the cols list.
    df.columns = cols
    df_final = df_final.append(df)


def write_csv(df_name, csv):
    df_name.to_csv(csv, sep=";")

################### Run  #####################

mypath = '/Users/Documents/AllPublicXML'

folder_all = os.listdir(mypath)

file_all = ()

for folder in tqdm(folder_all):
    mypath2 = mypath + "https://codereview.stackexchange.com/" + folder
    if os.path.isdir(mypath2):
        file = (f for f in listdir(mypath2) if isfile(join(mypath2, f)))
        for x in tqdm(file):
            dir = mypath2 + "https://codereview.stackexchange.com/" + x
            output = "./Output/"+x+".csv"
            df_name = x.split(".", 1)(0)
            #print(df_name)
            run(dir, output, df_name)
            #print(output)

write_csv(df_final, output)

and an example of XML file here:





ClinicalTrials.gov processed this data on March 20, 2020

Link to the current ClinicalTrials.gov record.
https://clinicaltrials.gov/show/NCT03261284


2017-P-032
NCT03261284


D-dimer to Guide Anticoagulation Therapy in Patients With Atrial Fibrillation

DATA-AF

D-dimer to Determine Intensity of Anticoagulation to Reduce Clinical Outcomes in Patients With Atrial Fibrillation



Wuhan Asia Heart Hospital
Other


Wuhan Asia Heart Hospital

Yes
No
No



This was a prospective, three arms, randomized controlled study.




D-dimer testing is performed in AF Patients receiving warfarin therapy (target INR:1.5-2.5) in Wuhan Asia Heart Hospital. Patients with elevated d-dimer levels (>0.5ug/ml FEU) were SCREENED AND RANDOMIZED to three groups at a ratio of 1:1:1. First, NOAC group,the anticoagulant was switched to Dabigatran (110mg,bid) when elevated d-dimer level was detected during warfarin therapy.Second,Higher-INR group, INR was adjusted to higher level (INR:2.0-3.0) when elevated d-dimer level was detected during warfarin therapy. Third, control group, patients with elevated d-dimer levels have no change in warfarin therapy. Warfarin is monitored once a month by INR ,and dabigatran dose not need monitor. All patients were followed up for 24 months until the occurrence of endpoints, including bleeding events, thrombotic events and all-cause deaths.


Enrolling by invitation
March 1, 2019
May 30, 2020
February 28, 2020
N/A
Interventional
No

Randomized
Parallel Assignment
Treatment
None (Open Label)


Thrombotic events
24 months

Stroke, DVT, PE, Peripheral arterial embolism, ACS etc.



hemorrhagic events
24 months
cerebral hemorrhage,Gastrointestinal bleeding etc.


all-cause deaths
24 months

3
600
Atrial Fibrillation
Thrombosis
Hemorrhage
Anticoagulant Adverse Reaction

DOAC group
Experimental

Patients with elevated d-dimer levels was switched to DOAC (dabigatran 150mg, bid).



Higher-INR group
Experimental

Patients' target INR was adjusted from 1.5-2.5 to 2.0-3.0 by adding warfarin dose.



Control group
No Intervention

Patients continue previous strategy without change.



Drug
Dabigatran Etexilate 150 MG (Pradaxa)
Dabigatran Etexilate 150mg,bid
DOAC group
Pradaxa


Drug
Warfarin Pill
Add warfarin dose according to INR values.
Higher-INR group




Inclusion Criteria: - Patients with non-valvular atrial fibrillation - Receiving warfarin therapy Exclusion Criteria: - Patients who had suffered from recent (within 3 months) myocardial infarction, ischemic stroke, deep vein thrombosis, cerebral hemorrhages, or other serious diseases. - Those who had difficulty in compliance or were unavailable for follow-up.


All
18 Years
75 Years
No


Zhenlu ZHANG, MD,PhD
Study Director
Wuhan Asia Heart Hospital



Zhang litao
Wuhan Hubei 430022 China
China March 2019 August 22, 2017 August 23, 2017 August 24, 2017 March 6, 2019 March 6, 2019 March 7, 2019 Sponsor D-dimer Nonvalvular atrial fibrillation Direct thrombin inhibitor INR Atrial Fibrillation Thrombosis Hemorrhage Warfarin Dabigatran Fibrin fragment D

powershell – Assign multiple words as input to a CSV file without quotes

I am running the script below to add users to the SharePoint group in SPO sites:

Add-PSSnapin Microsoft.SharePoint.PowerShell -ErrorAction SilentlyContinue
$users= "" 
$users = import-csv D:User.csv
foreach ($user in $users)
{
$groupname=$user.group
Add-SPOUser -Site $user.SiteURL -LoginName $user.UserEmailAddress -Group $groupname
Write-host "User added:" $user.SiteURL $groupname $user.UserEmailAddress
}

The entry, namely UserEmailAddress, SiteURL and Group, is extracted from a csv file.

This script works very well if the group name is a single word without spaces like Approvers, SiteOwners but if the group name has 2 or more words with spaces between them, the script displays the error "The group cannot be found "

I understood the problem and tried different approaches to solve it, but nothing works.

python – Pandas will not repair the CSV file correctly

I am trying to process a CSV which contains the data on coronaviruses by province, the problem that I have is that it has stopped working for me, and that & # 39; is because it has stopped reading the CSV data correctly, I have the following:

import pandas as pd
csv = pd.read_csv('2020-03-25.csv', delimiter = ',')

Where the first 2 lines are (they are practically the same):
Line 1:

Ciudad,Latitude,Longitude,Código país,Diagnosticados,Activos,Recuperados,Muertos,IA,Notas,,,

Line 2:

Almería,36.8304075,-2.4637136,Casos detectados,115,105,5,5,"34,57",,,,

The headers (line 1), it separates them correctly, the problem is that it saves all the lines of the city.

I'm adding the first 2 lines of 1 version that worked for me:
Line 1:

Ciudad,Latitude,Longitude,Código país,Diagnosticados,Activos,Recuperados,Muertos,IA,Notas,,,

Line 2:

Almería,36.8304075,-2.4637136,Casos detectados,91,86,72¹,5,"28,52",¹La Junta no especifica el lugar de las altas y algunas de ellas corresponden a seguimiento domiciliario.,,,

I can't find the differences between the two, changing that you have more data, but after all, with commas it shouldn't be a problem.

Thank you in advance.

magento2 – How can I download CSV in Webkul B2B with a custom attribute

Hey there i am using webkul b2b, bulk download and custom attribute and this allows the seller to download products using csv and also allows the admin to add a new field or attribute, but I have no idea how to upload CSV with custom fields in CSV Mass Upload.
Is there anyone who can help me with this.
Thanks in advance

ms office – CSV import works in Numbers, but fails in Mac Excel and Mac OpenOffice Calc

As Apple does well, I have no problem with that, but I'm curious as to why Apple is doing it well when its competitors are falling. What makes MacOS better at doing this than its competitors?

If I open a CSV file in Numbers, a particularly long series of cells containing only text works like a charm.

If I open the same CSV file in Microsoft Excel for Mac or OpenOffice Calc for Mac, the long cells break, but at different points.

At a glance shows me that the CSV file is correct and complete.

The problem seems to occur when a line reaches 2 ^ 16-1 characters – roughly the length of a TEXT block.

Here are some images, cropped for privacy:

A view of the source CSV:

A view of the source CSV

The Job Numbers app:

The Job Numbers app

Calc application faulty: (note that the selected cell is empty, although it contains the text "NULL" in the source CSV)

Faulty Calc application

I ask this question partly out of curiosity, but more out of concern for improving the efficiency of my office in the near future. I believe that a better understanding of the MacOS system can help this project.

Thanks in advance!

Google says "duplicate store codes" when updating the hours of operation of hundreds of shopping sites via CSV

I am trying to change the hours of our pitches (206 pitches). The hours will all change the same for all stores. I have uploaded our locations to CSV, made the changes, but when I try to upload them, it indicates that it has duplicate store codes. Has anyone ever done this? I read on Google documents that bulk editing was available.

Does anyone have a step by step for this process?

https://support.google.com/business/answer/3217744?hl=en
https://support.google.com/business/answer/4577733

enter description of image here

Do SemanticImport vs Import Differ on the Same CSV File?

Use of WL 12.1.0. East SemanticImport to cache data somewhere? I did that SemanticImport for the first time yesterday. Worked well. Today, the last column (date) is missing. however, Import correctly recover everything, including the last column. Is there a way to force SemanticImport get the latest data? I know I can work around by using Import, but I would like to use SemanticImport to be able to work with Dataset. Here is an excerpt from my notebook:

SemaniticImport removes the last column from the following import:

cv19Deaths = 
  SemanticImport(
   "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/
csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-
Deaths.csv");

While Import correctly imports the last column:

cv19DeathsCSV = 
  Import("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/
master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-
covid-Deaths.csv");

Looking at the Keys:

cv19Deaths(1) // Normal // Keys

Out():= {"Province/State", "Country/Region", "Lat", "Long", "1/22/20", 
"1/23/20", "1/24/20", "1/25/20", "1/26/20", "1/27/20", "1/28/20", 
"1/29/20", "1/30/20", "1/31/20", "2/1/20", "2/2/20", "2/3/20", 
"2/4/20", "2/5/20", "2/6/20", "2/7/20", "2/8/20", "2/9/20", 
"2/10/20", "2/11/20", "2/12/20", "2/13/20", "2/14/20", "2/15/20", 
"2/16/20", "2/17/20", "2/18/20", "2/19/20", "2/20/20", "2/21/20", 
"2/22/20", "2/23/20", "2/24/20", "2/25/20", "2/26/20", "2/27/20", 
"2/28/20", "2/29/20", "3/1/20", "3/2/20", "3/3/20", "3/4/20", 
"3/5/20", "3/6/20", "3/7/20", "3/8/20", "3/9/20", "3/10/20", 
"3/11/20", "3/12/20", "3/13/20", "3/14/20", "3/15/20", "3/16/20"}
cv19DeathsCSV((1))

Out():= {"Province/State", "Country/Region", "Lat", "Long", "1/22/20", 
"1/23/20", "1/24/20", "1/25/20", "1/26/20", "1/27/20", "1/28/20", 
"1/29/20", "1/30/20", "1/31/20", "2/1/20", "2/2/20", "2/3/20", 
"2/4/20", "2/5/20", "2/6/20", "2/7/20", "2/8/20", "2/9/20", 
"2/10/20", "2/11/20", "2/12/20", "2/13/20", "2/14/20", "2/15/20", 
"2/16/20", "2/17/20", "2/18/20", "2/19/20", "2/20/20", "2/21/20", 
"2/22/20", "2/23/20", "2/24/20", "2/25/20", "2/26/20", "2/27/20", 
"2/28/20", "2/29/20", "3/1/20", "3/2/20", "3/3/20", "3/4/20", 
"3/5/20", "3/6/20", "3/7/20", "3/8/20", "3/9/20", "3/10/20", 
"3/11/20", "3/12/20", "3/13/20", "3/14/20", "3/15/20", "3/16/20", 
"3/17/20"}

(cv19Deaths(1) // Normal // Keys) == cv19DeathsCSV((1))

Out():= False

migration – How can I import fields into the Drupal 8.4 user profile from a csv file that has identical values ​​in the key line?

I use Migrate Tools, Migrate Plus and Migrate Source CSV. My CSV file looks like this:

"Stg";"Color";"Fruit"
"user1";"red";"apple"
"user1";"blue";"pear"
"user2";"green";"banana"
"user2";"black";"rotten banana"

I use the profile module (https://www.drupal.org/project/profile)

I have a migration within my migration group that looks like this (migrate_plus.migration.user_vorlesungen.yml):

id: user_vorlesungen
langcode: de
status: true
dependencies:
    enforced:
        module:
            - user_migrate
migration_group: hoevwa
label: 'HoeVWA Vorlesungen Import'
source:
    plugin: csv
    track_changes: true
    path: /config/Vorlesungsverzeichnis.csv
    # Column delimiter. Comma (,) by default.
    delimiter: ';'
    # Field enclosure. Double quotation marks (") by default.
    enclosure: '"'
    header_row_count: 1
    keys:
        - Stg
destination:
    plugin: entity:profile
process:
    type:
        plugin: default_value
        default_value: 'vorlesungen'
    uid:
        plugin: migration_lookup
        no_stub: true
        # previous user migration
        migration: user__hoerer
        # property in the source data
        source: Stg
    # These field have multiple values in D8
    field_color: Color
    field_fruit: Fruit

migration_dependencies:
    required: {  }
    optional: {  }

In my YAML file, the content is printed like this:

...
{{ content.field_color }} {{ content.field_fruit }}
...

When I run drush mim –group = hoevwa, only the last values ​​of user1 (blue, pear) are imported. How can I launch a process plugin to browse the CSV and get all the imported values. And finally, how can I browse all the values ​​of my TWIG model?

magento2.3 – Magento2: How to create configurable products with simple products pragmatically associated by CSV data with a root script

Battery exchange network

The Stack Exchange network includes 175 question and answer communities, including Stack Overflow, the largest and most reliable online community for developers who want to learn, share knowledge and develop their careers.

Visit Stack Exchange