Comma-Separated Values (CSV, sometimes called character-separated values because the separating characters can also be other than commas), whose files store tabular data (numbers and text) in plain text. Plain text means that the file is a sequence of characters and does not contain data that must be interpreted like binary numbers. a CSV file consists of any number of records separated by some sort of line break; each record consists of fields separated by other characters or strings, most often commas or tabs. Usually, all records have exactly the same sequence of fields.

Read out the data is generally character type, if it is a number need to be converted to a number artificially

Read data in rows

Columns are separated by a half-comma or tab, generally a half-comma

Generally for the beginning of each line without spaces, the first line is the attribute column, the data column between the spacer for the interval without spaces, no blank lines between the lines.

No blank lines between rows is very important, if there is a blank line or a space at the end of the row in the data set, reading the data will generally be an error, triggering [list index out of range] error

1.Writing and reading CSV files using python I/O

Writing csv files using PythonI/O

Here is the code to download the "birthweight.dat" low birthweight dat file from the author's source, process it, and save it to a csv file.

import csv

import os

import numpy as np

import random

import requests

# name of data file

# Data set name

birth_weight_file = 'birth_weight.csv'

# download data and create data file if file does not exist in current directory

# If there is no birth_weight.csv dataset in the current folder then download the dat file and generate the csv file

if not os.path.exists(birth_weight_file):

    birthdata_url = 'https://github.com/nfmcclure/tensorflow_cookbook/raw/master/01_Introduction/07_Working_with_Data_Sources/birthweight_data/birthweight.dat'

    birth_file = requests.get(birthdata_url)

    birth_data = birth_file.text.split('\r\n')

    # split function, split function as a line, windows line break number for '\r\n', after each line there is a '\r\n' symbol.

    birth_header = birth_data[0].split('\t')

    # The header of each column, marked in the first row, is the first data of birth_data. And use tabs as division.

    birth_data = [[float(x) for x in y.split('\t') if len(x) >= 1] for y in birth_data[1:] if len(y) >= 1]

    print(np.array(birth_data).shape)

    # (189, 9)

    # This is a list data form is not numpy array can not use the np,shape function, but we can use the np.array function to list objects into numpy arrays after using the shape property to view.

    with open(birth_weight_file, "w", newline='') as f:

    # with open(birth_weight_file, "w") as f:

        writer = csv.writer(f)

        writer.writerows([birth_header])

        writer.writerows(birth_data)

        f.close()

Common Errors:list index out of range

One of the main points we need to talk about is with open(birth_weight_file, "w", newline='') as f:. If you don't add the parameter newline='' to the csv file, but use with open(birth_weight_file, "w") as f: statement. Then the generated table will have empty lines.

Not only when using python I/O to read and write csv data, but also when using the rest of the methods to read and write csv data, or after downloading a good csv data set from the Internet, you need to check whether there are any spaces after each line, or whether there are any extra empty lines. Avoid unnecessary errors ~ affect the judgment of data analysis.

Read csv files using Python I/O

The python I/O method for reading is to create a new List and then store the data into an empty List object in the order of first and last (similar to a two-dimensional array in C), or use np.array(List name) to convert it to a numpy array if you need to.

birth_data = []

with open(birth_weight_file) as csvfile:

    csv_reader = csv.reader(csvfile)  # Read files in csvfile using csv.reader

    birth_header = next(csv_reader)  # Retrieve the title of each column in the first row

    for row in csv_reader:  # Save the data from the csv file to birth_data

        birth_data.append(row)

birth_data = [[float(x) for x in row] for row in birth_data]  # Convert data from string form to float form

birth_data = np.array(birth_data)  # Convert list arrays into array arrays for easy viewing of data structure

birth_header = np.array(birth_header)

print(birth_data.shape)  # Use .shape to view the structure.

print(birth_header.shape)

#

# (189, 9)

# (9,)

2.Read CSV files using Pandas

import pandas as pd

csv_data = pd.read_csv('birth_weight.csv')  # Read training data

print(csv_data.shape)  # (189, 9)

N = 5

csv_batch_data = csv_data.tail(N)  # Take the last 5 bars of data

print(csv_batch_data.shape)  # (5, 9)

train_batch_data = csv_batch_data[list(range(3, 6))]  # Take the values of 3 to 5 columns of these 20 data (indexes start from 0)

print(train_batch_data)

#      RACE  SMOKE  PTL

# 184   0.0    0.0  0.0

# 185   0.0    0.0  1.0

# 186   0.0    1.0  0.0

# 187   0.0    0.0  0.0

# 188   0.0    0.0  1.0

3.Read CSV files using Tensorflow

'''Read csv data using Tensorflow'''

filename = 'birth_weight.csv'

file_queue = tf.train.string_input_producer([filename])  # Set the file name queue so that you can read files from folders in bulk

reader = tf.TextLineReader(skip_header_lines=1)  # Use tensorflow text line reader and set to ignore the first line

key, value = reader.read(file_queue)

defaults = [[0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.]]  # Set the data format of column properties

LOW, AGE, LWT, RACE, SMOKE, PTL, HT, UI, BWT = tf.decode_csv(value, defaults)

# Encode the read data into the default format we set

vertor_example = tf.stack([AGE, LWT, RACE, SMOKE, PTL, HT, UI])  # The middle 7 columns of attributes are read as training features

vertor_label = tf.stack([BWT])  # The obtained BWT values are read to represent the training labels

# Used to add a batch_size dimension to the retrieved data and read it out in batch mode. You can set properties such as batch data size, whether to read data repeatedly, capacity size, end-of-queue size, read thread, etc.

example_batch, label_batch = tf.train.shuffle_batch([vertor_example, vertor_label], batch_size=10, capacity=100, min_after_dequeue=10)

# Initialize Session

with tf.Session() as sess:

    coord = tf.train.Coordinator()  # Thread Manager

    threads = tf.train.start_queue_runners(coord=coord)

    print(sess.run(tf.shape(example_batch)))  # [10  7]

    print(sess.run(tf.shape(label_batch)))  # [10  1]

    print(sess.run(example_batch)[3])  # [ 19.  91.   0.   1.   1.   0.   1.]

    coord.request_stop()

    coord.join(threads)

'''

Turning the thread manager on and off is necessary for all I/O operations using Tensorflow

with tf.Session() as sess:

    coord = tf.train.Coordinator()  # Thread Manager

    threads = tf.train.start_queue_runners(coord=coord)

    #  Your code here~

    coord.request_stop()

    coord.join(threads)

'''

Related articles

How to output vertically in python

Example: The output is the following case H e l l o W o r l d This can be done using a for loop. for name in "Hello World": print(name) This can also be done using the join method print("\n".join("Hello World"))

python how to input a list with input function

In Python 3.0 onwards, keyboard input uses the input function >>> x=input >>> 123 123 There is no display on the command line, and the input 123 is directly assigned to x and printed. Using input alone can't solve most of the data processing, usually

How python dict is implemented

The dict object in Python is a primitive Python data type, stored as a key-value pair, whose Chinese name translates to dictionary, and as the name implies, it is highly efficient at finding the corresponding value by key name

Can python replace JavaScript?

python can replace JavaScript; Pyjamas can be used to achieve Python instead of JavaScript, Pyjamas is a Python ajax development framework that can be used instead of HTML and JavaScript to write web applications , you can reuse and import classes and mod

Python+AI to colorize old photos

Colorize old photos with NoGAN's image enhancement technique.NoGAN is a new type of GAN that takes the least amount of time for GAN training.

python reverse Dict

A very common dictionary task is if we have a dictionary and want to flip its keys and values, the keys will become the values and the values will become the keys

Pyecharts Draws Visual Earth

Here we use the global dataset of the number of new crown infections as our test data, let's first look at the data as a whole import pandas as pd df = pd.read_csv("owid-covid-data.csv") df_0608 = df[df['date'] == '2022-06-08']

Python Retry Mechanism

To avoid functional problems caused by some network or other uncontrollable factors, such as. For example, when sending a request, there will be a request timeout problem often due to network instability.

How do you make Python code more professional?

Write your own code only for yourself to see, in fact, how to write. Once you have a team to work with, or to share your code, you have to write it properly, and professional code can accumulate technical influence for yourself.