How to filter after parsing csv values ​​in Google Sheets cells

I would like to be able to match all cells in column A (titles) with one of the comma separated values ​​in column E (key search terms). This will involve at least two steps. 1) Analyze each of the respective key search term cells in an array of cut-off text values ​​(eg cell E10 would be ("substance", "food fluff", "green").) 2) Regular expressions correspond to exact model contained under the column tags (column F).

The desired result in the image below will be a filter allowing only the title of the article "Hate Foods: A Viable Source of Catharsis?" pass through. Note that "Being Cool: A Guide" does NOT make the cut because although it contains the word "food" in the corresponding search keyword cell, it falls within the context of "food fluff" that constitutes a distinct value of "food". isolation.

I might ask myself questions for a moment, but I have a client who needs a quick fix and I have a hard time seeing how to get that result. I think I have completed the first step with the following formula: ARRAYFORMULA(TRIM(SPLIT(E10, ","))) This shows the analysis of a single cell in a table from its CSV value – in this case, "substance", "food fluff", "green".

Tags to analyze

8 – Migrating csv data in tabular form in multiple fields

How to migrate a data sequence into a multivalued field?

I'm learning D8 migration and working on a large CSV file of historical census data from around the world. Example below:

id,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018
1,Aruba,ABW,"Population, total",SP.POP.TOTL,54211,55438,56225,56695,57032,57360,57715,58055,58386,58726,59063,59440,59840,60243,60528,60657,60586,60366,60103,59980,60096,60567,61345,62201,62836,63026,62644,61833,61079,61032,62149,64622,68235,72504,76700,80324,83200,85451,87277,89005,90853,92898,94992,97017,98737,100031,100834,101222,101358,101455,101669,102046,102560,103159,103774,104341,104872,105366,105845

My yml migration file is as follows:

uuid: 6c54b8f4-96c7-4678-8919-b20c8b318c82
langcode: en
status: true
dependencies: {  }
id: World_Census_Data
class: null
field_plugin_method: null
cck_plugin_method: null
migration_tags:
  - CSV
migration_group: random_migrations
label: 'International census csv data'
source:
  plugin: csv
  path: /var/www/random-migrations/modules/custom/random_migrations/sources/census-58-years.csv
  ids:
    - id
  delimiter: ','
  track_changes: true
process:
  type:
    plugin: default_value
    default_value: census
  title: 'Country Name'
  field_country_code: 'Country Code'
  field_count:
    plugin: array_build
    default_value: 0
    value:
      - 1960
      - 1961
destination:
  plugin: 'entity:node'
migration_dependencies: {}

Instead of manually creating an individual field for each year, I created the field_count field that is an Interger field set to allow unlimited entries. My idea is to import all population counts from 1960 to 2018 as new field_count entries for each country.

My problem is that population data is not imported. Can any one advise if my proposed solution is possible and if so, what should I change / correct in my yml for my migration to work?

Help me, please.

c ++ – What is the effective way to write large data into a .csv file?

I'm trying to write big data into a CSV file and the number of lines I'm processing is in millions.

My code generates the rows as an array of duplicates. The code takes a long time to write data to a file (double that writing the same number of records into the database with a bulk insert of 10,000 units on the same machine).

I've tried to buffer the 10000 and 100000 rows in the std :: string.

m_Csvfile.open("Fact.csv", std::ios_base::out | std::ios_base::app);
void class::PrepareRow()
{
    for (int i = 1; i < m_nColumnCount; i++)
    {
        if (arrayOfRowVals(i) != NULL_NUMBER)
        {
            char buffer(50) = {};
            sprintf(buffer, "%9.7lf", arrayOfRowVals(i));
            strBulkcsvString.append(buffer); // strBulkcsvString is std::string
        }

        if (i < m_nColumnCount)
            strBulkcsvString.append(",");
    }
    strBulkcsvString.append("n");
    m_rowcount++;

    if (m_rowcount == 10000)
    {
        m_rowcount = 0;
        strBulkcsvString.clear();
        m_Csvfile << strBulkcsvString.c_str();
        strBulkcsvString.clear();
        m_rowcount = 0;
    }
}

python – Convert XMl to csv

I would like to transform a folder with .xml files into .csv.

Whatever the language used, I would like to be able to convert the xml directly into the folder containing about 1000 xml in .CSV without the problem of the conversion and when opening the output, it does not matter. there is nothing in a blank file.

Could you help me ?!
Thanks guys!

Any help and welcome :]

Note: The contents of each .XML file contain approximately 100 standard lines.

php – Delete lines from a CSV file

Hi I am trying to insert the recordings of a CSV file that I download to the web and I want to know how I can handle the lines that are saved in the file.
I've already discovered how to manipulate the columns, but I'd like to know if someone knows how to manipulate the lines. That's the code I have.

line $)
{
if ($ i == 0)
{
$ data = explode (",", $ line);
$ number = trim ($ data (0));
$ number2 = trim ($ data (2));
I miss $ number. " NOT";
echo $ number2. " not";
}
}

}
d & # 39; other
{
echo " n does not exist";
}
/ * Close the created file listsNegras.csv * /
fclose ($ fh);

?>

JSON in CSV format in python with json.loads and json_normalize

I'm trying to convert a JSON file to CSV format with the help of Python. I use the function JSON.loads (), then json_normalize () to flatten the objects. The code works correctly for a few lines of input.

I was wondering if there was a better way to do that. By Better I want to say:

Is it efficient in terms of temporal and spatial complexity? If this code needs to process about 10,000 records in a file, is this the optimized solution?

it's the input file, a line the form:

{"ID": "02","Date": "2019-08-01","Total": 400,"QTY": 12,"Item": ({"NM": "0000000001","CD": "item_CD1","SRL": "25","Disc": ({"CD": "discount_CD1","Amount": 2}),"TxLns": {"TX": ({"TXNM": "000001-001","TXCD": "TX_CD1"})}},{"NM": "0000000002","CD": "item_CD2","SRL": "26","Disc": ({"CD": "discount_CD2","Amount": 4}),"TxLns": {"TX": ({"TXNM": "000002-001","TXCD": "TX_CD2"})}},{"NM": "0000000003","CD": "item_CD3","SRL": "27"}),"Cust": {"CustID": 10,"Email": "01@abc.com"},"Address": ({"FirstName": "firstname","LastName": "lastname","Address": "address"})}

Code

import json
import pandas as pd
from pandas.io.json import json_normalize
data_final=pd.DataFrame()
with open("sample.json") as f:
    for line in f:
        json_obj = json.loads(line)
        ID = json_obj('ID')
        Item = json_obj('Item')
        dataMain = json_normalize(json_obj)
        dataMain=dataMain.drop(('Item','Address'), axis=1)
        #dataMain.to_csv("main.csv",index=False)
        dataItem = json_normalize(json_obj,'Item',('ID'))
        dataItem=dataItem.drop(('Disc','TxLns.TX'),axis=1)
        #dataItem.to_csv("Item.csv",index=False)
        dataDisc = pd.DataFrame()
        dataTx = pd.DataFrame()
        for rt in Item:
            NM=rt('NM')
            rt('ID') = ID
            if 'Disc' in rt:
                data = json_normalize(rt, 'Disc', ('NM','ID'))
                dataDisc = dataDisc.append(data, sort=False)
            if 'TxLns' in rt:
                tx=rt('TxLns')
                tx('NM') = NM
                tx('ID') = ID
                if 'TX' in tx:
                    data = json_normalize(tx, 'TX', ('NM','ID'))
                    dataTx = dataTx.append(data, sort=False)
        dataDIS = pd.merge(dataItem, dataDisc, on=('NM','ID'),how='left')
        dataTX = pd.merge(dataDIS, dataTx, on=('NM','ID'),how='left')
        dataAddress = json_normalize(json_obj,'Address',('ID'))
        data_IT = pd.merge(dataMain, dataTX, on=('ID'))
        data_merge=pd.merge(data_IT,dataAddress, on=('ID'))
        data_final=data_final.append(data_merge,sort=False)
data_final=data_final.drop_duplicates(keep = 'first')
data_final.to_csv("data_merged.csv",index=False)

it's the output:

ID,Date,Total,QTY,Cust.CustID,Cust.Email,NM,CD_x,SRL,CD_y,Amount,TXNM,TXCD,FirstName,LastName,Address
02,2019-08-01,400,12,10,01@abc.com,0000000001,item_CD1,25,discount_CD1,2.0,000001-001,TX_CD1,firstname,lastname,address
02,2019-08-01,400,12,10,01@abc.com,0000000002,item_CD2,26,discount_CD2,4.0,000002-001,TX_CD2,firstname,lastname,address
02,2019-08-01,400,12,10,01@abc.com,0000000003,item_CD3,27,,,,,firstname,lastname,address

magento2.3 – Download images of products in CSV

I want to download small images, thumbnails and samples in many products in CSV format. The images will be the same.

When I import a csv file, the process gives good results and I can see that these images are uploaded to the server, but do not display on the server.

I already copy images to: / pub / media / import /

Please check the picture of my csv.

enter the description of the image here

c ++ 11 – Base similar to C ++ Sql running on CSV files

I implemented a very constrained trivial implementation of SQL queries running on user files (in CSV format). If possible, I tried using the modern features of C ++. The goal is to serve MySql requests so that the user does not know that no MySql server is installed. Simply – make fun of mysql using the file system.

I've divided a whole concept into 3 classes:

Code below:

#include 
#include 
#include 
#include 
#include 
#include 


   /***********************************************************************************
 HELPERS DECLARATIONS 
************************************************************************************/
template
void DebugPrintVector(std::string vName, std::vector & v);
void RemoveCharsFromStr(std::string &s, char c);


/***********************************************************************************
CLASS IMPLEMENTATION:  CSVSQL 
************************************************************************************/

CsvSql::CsvSql (){};

void CsvSql::Connect(std::string host,  std::string user, std::string password, std::string  dataBase)
{
    std::ifstream dbFile;
    dbFile.exceptions(std::ifstream::failbit | std::ifstream::badbit);
    try {
        dbFile.open(dataBase.c_str(), std::fstream::in | std::fstream::out | std::fstream::app);
    } catch(const std::ifstream::failure& e) {
        std::cerr << "Error: " << e.what();
    }

    if ( dbFile.peek() == std::ifstream::traits_type::eof() ) {
        throw std::runtime_error("Could not open file");
    }

    std::vector tables;
    std::string table;
    while(!dbFile.eof() &&(dbFile >> table)) {
        tables.push_back(table);
    }

    std::cout<<"database in use: " << dataBase< tokens = querry.GetTokens();
    DebugPrintVector("tokens", tokens);

    if (*tokens.begin()== "SELECT") {   //select querry should derive from base "QUERRY" class
        std::vector::iterator iter = (std::find(tokens.begin(), tokens.end(), std::string("FROM")) );

        if (iter != tokens.end()) {
            std::vector columnsQuerry (++tokens.begin(), iter) ;
            DebugPrintVector("columns in querry:", columnsQuerry);
            std::string tableInUse = *++iter;

            Table table(tableInUse);
            return   table.GetSelectedColumnsContent(columnsQuerry);
        }
    } else if (*tokens.begin()== "SELECT") {
        ///TODO;
        }
    return std::string(" ");
}



/***********************************************************************************
CLASS IMPLEMENTATION:  QUERRY 
************************************************************************************/


Querry::Querry(std::string querrry) : _querryData {querrry} {};

std::vector  Querry::GetTokens()
{
    std::stringstream querryStream ( _querryData);
    std::string token;
    std::vector tokens;
    while(getline(querryStream, token, ' ')) {
        RemoveCharsFromStr(token, ',');
        tokens.push_back(token);
    }
    return tokens;
}


/***********************************************************************************
CLASS IMPLEMENTATION:  Table 
************************************************************************************/

Table::Table (std::string tableName)
{
    _tableFile.open(tableName);
}


std::vector Table::GetColumnsNames(void)
{
    std::string header;
    getline(_tableFile, header);
    std::stringstream headerStream(header);
    std::string column;
    std::vector columns;
    while(getline(headerStream, column, ',')) {
        RemoveCharsFromStr(column, ' ');
        columns.push_back((column));
    }
    return columns;
}

std::vectorTable::GetSelectedColumnsNumbers(std::vector tableColumns, std::vector querredColumns)
{
    std::vector clmnsNb;
    for (int i =0 ; i < querredColumns.size(); i++) {
        for (int j =0 ; j < tableColumns.size(); j++) {
            if (tableColumns(j) == querredColumns(i)) {
                clmnsNb.push_back(j);
            }
        }
    }
    return clmnsNb;

    }

    std::string Table::GetFieldsFromSelectedColumnsNumbers(std::vector clmnsNb)
    {
        std::string querredFields;
        std::string line;
        while( getline(_tableFile, line) ) {
            int i = 0;
            std::stringstream ss(line);
            std::string field;
            while( getline(ss, field, ',')) {
                if ( (std::find(clmnsNb.begin(), clmnsNb.end(), i) !=clmnsNb.end())) {
                    RemoveCharsFromStr(field, ' ');
                    querredFields += field + " ";
                }
                i++;
            }
            querredFields +=  "n";
        }

        return querredFields;
    }

std::string Table::GetSelectedColumnsContent(std::vector selectedColumns)
{
    std::vector columnsNames = GetColumnsNames();
    std::vector clmnsNb = GetSelectedColumnsNumbers(columnsNames, selectedColumns);
    return GetFieldsFromSelectedColumnsNumbers(clmnsNb);
}



/*************************************************************************************
    HELPERS
*************************************************************************************/


template
void DebugPrintVector(std::string vName, std::vector & v)
{
    std::cout<

I know that "GetSelectedColumnsContent" depends on too complicated methods, but I thought that in this way, I could optimize the memory usage (I did not read the entire file in a separate column to perform the l & rsquo; # 39; operation). Please share your opinion, if this code at least a little follows the C ++ style moder?
Best regards!

CSV -> Python -> Pandas.DF -> PostgreSQL (UPSERT, THEN Cull data)

Statement of the problem:

I have several machines that all run the same software, they produce CSV files. I need database this information. The challenge is that the user can, with permission, delete a specimen from a set. I need the database to react to this change, and shut down when the user backs up this CSV file again. Also, if something has changed in the data, I need to update it to the second backup / backup.

Note: This is my first time with Python and Postgres. If you see something stupid, lmk.

WORKFLOW:

  1. CSV registered in / csv_import
  2. DropIt calls the python script
  3. Manipulate data / convert to df
  4. insert into temp_table
  5. where is your god now postgres is not a toy and you are a weekend warrior. sobbed gently

Went very well until # 5.

The python:

import sys
import os
import logging
import pandas as pd
from sqlalchemy import create_engine
logging.basicConfig(filename='csv_import.log', filemode='a', format='%(asctime)s - %(levelname)s - %(message)s',level=logging.DEBUG)

#second argument index because first is the .py filename
filename = sys.argv(1)

#we parse our incoming filename because the machine itself needs diffentiation and callbacks
if "_" in filename :
   proj_num = filename.split('_', 1 )(0)

   #has underscore, and something after it
   if len(filename.split('_', 1 )) > 1 and len(filename.split('_', 1 )(1)) > 4 :
        type_cond = filename.split('_', 1 )(1)
        type_cond = type_cond(:-4)
   else:
        type_cond = "NO_TYPE"
else:
    proj_num = filename.split('.', 1 )(0)
    type_cond = "NO_TYPE"

logging.info('Incoming file processesed: (Filename: %s) - (Project: %s) - (Type: %s)',filename,proj_num,type_cond)

#============= CSV -> Pandas.DataFrame =============
with open(filename) as input_file:
    #header = input_file.readline()
    data = pd.read_csv(input_file, header=None)

f_m_s = filename(:-4) + "_" + data(1).astype(str) + "_" + data(0).astype(str)

data.insert(0, "f_m_s", f_m_s)
data.insert(1, "proj_num", proj_num)
data.insert(5, "type", type_cond)

#we intentionally use headless CSV's because I want them hardcoded here.
data.columns = (
  'f_m_s',
  'proj_num',
  'spec_num',
  'machine',
  'method',
  'type',
  'label',
  'data1',
  'data2',
  'data3',
  'data4',
  'data5',
  'data6',
  'data7'
)
logging.debug('Pandas DataFrame:n%s',data.to_string())
#============= End CSV Manipulation =============

#put data into temp table
db = create_engine('postgresql://user:password@host_ip:port/database')
data.to_sql('temp_csv_import', con=db, if_exists='replace', index=False)

#UNTESTED CODE BEINGS HERE:   ==================================
#UPSERT?
db.execute("INSERT INTO data SELECT * FROM temp_csv_import")

#DELETE?  data that the user pulled (dangerous and probably wrong approach?)
db.execute("DELETE FROM data WHERE ???  ")

###MAIN TABLE:    data
###TEMP TABLE:    temp_csv_import

Note: To protect the integrity of the data, the variable FILE_MACHINE_SPECIMEN (f_m_s) is created to make each record unique. Even if a bonehead user tried to use the same file name on two different machines. Peripherals do not allow duplicate file names, so it should work (I think)

The question:

Can someone help me to properly train this Upsert query?

It does not work in pgAdmin: (obviously not with the comment, but even its variants fail.)

INSERT INTO data
SELECT
   *
FROM
   temp_csv_import
ON CONFLICT (f_m_s)
    DO UPDATE SET "all the columns please?"
  • PostgreSQL 11
  • Python 3.7
  • Pandas 0.25
  • SQLAlchemy 1.3

python – Skip an empty element when parsing an xml file in csv

I am currently trying to analyze an XML file of about 10,000 elements in CSV format.

The script we created works until it reaches a child element that does not exist. I have tried and I do not know how to tell him to skip a missing child element.

Below my code:

import xml.etree.ElementTree as ET
import csv

tree = ET.parse("LNM.xml")
root = tree.getroot()


# open a file for writing

LNM_DATA = open('LNMCSV.csv', 'w')

# create the csv writer object

csvwriter = csv.writer(LNM_DATA)
LNM_head = ()
LNM_superhead =()

count = 0
for DISCREPNACIES in root.iter('DISCREPANCIES'):
    for DISCREPANCY in DISCREPNACIES.findall('DISCREPANCY'):
        for AID in DISCREPANCY.findall('AID'):
            LNM = ()
            if count == 0:
                AID_UNIQUE_IDENTIFIER = AID.find('AID_UNIQUE_IDENTIFIER').tag
                LNM_head.append(AID_UNIQUE_IDENTIFIER)
                LIGHT_LIST_NUBMER = AID.find('LIGHT_LIST_NUMBER').tag
                LNM_head.append(LIGHT_LIST_NUBMER)
                USCG_DISTRICT = AID.find('USCG_DISTRICT').tag
                LNM_head.append(USCG_DISTRICT)
                AID_NAME = AID.find('AID_NAME').tag
                LNM_head.append(AID_NAME)
                TYPE = AID.find('TYPE').tag
                LNM_head.append(TYPE)
                LATITUDE = AID.find('ASSIGNED_LATITUDE').tag
                LNM_head.append(LATITUDE)
                LONGITUDE = AID.find('ASSIGNED_LONGITUDE').tag
                LNM_head.append(LONGITUDE)
                csvwriter.writerow(LNM_head)
                count = count + 1

            AID_UNIQUE_IDENTIFIER = AID.find('AID_UNIQUE_IDENTIFIER').text
            LNM.append(AID_UNIQUE_IDENTIFIER)
            AID_UNIQUE_IDENTIFIER = AID.find('AID_UNIQUE_IDENTIFIER').text
            LNM.append(AID_UNIQUE_IDENTIFIER)
            LIGHT_LIST_NUBMER = AID.find('LIGHT_LIST_NUMBER').text
            LNM.append(LIGHT_LIST_NUBMER)
            USCG_DISTRICT = AID.find('USCG_DISTRICT').text
            LNM.append(USCG_DISTRICT)
            AID_NAME = AID.find('AID_NAME').text
            LNM.append(AID_NAME)
            TYPE = AID.find('TYPE').text
            LNM.append(TYPE)
            LATITUDE = AID.find('ASSIGNED_LATITUDE').text
            D = int(LATITUDE(0:2))
            M = int(LATITUDE(4:5))
            S = float(LATITUDE(7:12))
            direction = str(LATITUDE(12))
            DDLAT = D + float(M)/60 + float(S)/3600
            if direction == 'S' or direction == 'W':
                DDLAT *= -1
            LNM.append(DDLAT)
            LONGITUDE = AID.find('ASSIGNED_LONGITUDE').text
            D = int(LONGITUDE(0:3))
            M = int(LONGITUDE(5:6))
            S = float(LONGITUDE(8:13))
            direction = str(LONGITUDE(13))
            DDLONG = D + float(M)/60 + float(S)/3600
            if direction == 'S' or direction == 'W':
                DDLONG *= -1
            LNM.append(DDLONG)
            csvwriter.writerow(LNM)

LNM_DATA.close()

Thanks in advance !!!