-
|
Hi ! I'm trying to apply some of the course content to my own problem sets both for practice and for my personal project. I intend to try using the EVA dataset which is hosted here https://github.com/kang-gnak/eva-dataset I've used
which I have no idea how to concatenate and unzip directly in there. For repeatability (without having to download it locally and reupload it somewhere) I would like to try to do this directly in the notebook. Tried looking for solutions on StackOverflow etc but the ones I found didn't seem to work for this type of file Is anyone able to help me out? Thank you! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
|
Hi @g6k, I'd start by unzipping the files with Python and then going from there: https://stackoverflow.com/questions/3451111/unzipping-files-in-python import zipfile
with zipfile.ZipFile(path_to_zip_file, 'r') as zip_ref:
zip_ref.extractall(directory_to_extract_to)Once you've unzipped the files you could combine them. And then once you've combined them, you could make a check to see if they are downloaded already before re-downloading them. You could also try this code example from ChatGPT: import os
import zipfile
# Create a ZipFile object to work with the zip files
zip_file = zipfile.ZipFile('zip_file_1.zip')
# Extract the contents of the first zip file to a temporary directory
temp_dir = 'temp'
zip_file.extractall(temp_dir)
# Close the zip file
zip_file.close()
# Loop through the remaining zip files and extract their contents to the same temporary directory
for zip_name in ['zip_file_2.zip', 'zip_file_3.zip', ...]:
zip_file = zipfile.ZipFile(zip_name)
zip_file.extractall(temp_dir)
zip_file.close()
# Loop through the files in the temporary directory and combine them into a single file
combined_file = open('combined_file.txt', 'w')
for file_name in os.listdir(temp_dir):
with open(os.path.join(temp_dir, file_name)) as f:
combined_file.write(f.read())
combined_file.close()
# Delete the temporary directory and its contents
shutil.rmtree(temp_dir)
|
Beta Was this translation helpful? Give feedback.
Hi @g6k,
I'd start by unzipping the files with Python and then going from there: https://stackoverflow.com/questions/3451111/unzipping-files-in-python
Once you've unzipped the files you could combine them.
And then once you've combined them, you could make a check to see if they are downloaded already before re-downloading them.
You could also try this code example from ChatGPT: