I'm trying to import a large CSV file (around 900MB) using the command `$Data = import-csv information.txt -delimiter ','`, but it's taking forever to run—about half an hour so far! While the command works correctly without storing the data as a variable, it still takes a long time to process. I'm looking for ways to either speed up this import process or get a progress update on how much data has been processed. Also, I've noticed that the `-delimiter ','` part seems unnecessary since the output is clean without it. Any suggestions?
6 Answers
If you're familiar with regex, consider string processing directly rather than using 'Import-Csv'. It can be significantly faster for larger datasets if you only need specific data points without the overhead of object creation.
Have you thought about processing the data one object at a time using `ForEach-Object`? Instead of loading the whole file into memory, this method reads each line as you go, which might save both time and RAM. Here's an example:
```powershell
Import-Csv file.csv | ForEach-Object {
# Your processing code here
if (Test-Path $_.Path -PathType Container) {
[pscustomobject]@{
Path=$_.Path
IsContainer=1
}
}
}
```
If you're primarily looking for unique values from certain columns, you could use hashsets for faster processing instead of loading all data points. Here’s a simple approach:
```powershell
$uniqueValues = [System.Collections.Generic.HashSet[string]]::new()
Import-Csv 'c:temphugefile.csv' | ForEach-Object -Begin { $x = 0 } {
$x++
if ($x % 1000 -eq 0) { Write-Host "Processed $x rows" }
$uniqueValues.Add($_.column1)
# Repeat for other columns
}
```
If you're okay with using PowerShell 7, consider the new `-Progress` parameter with `Import-Csv`. It allows you to see the progress while importing your file, something like this:
```powershell
$Data = Import-Csv "C:Tempgiant_file.csv" -Progress {
Write-Progress -Activity "Importing" -Status "$($_.ReadCount) lines" -PercentComplete ($_.ReadCount / $_.TotalCount * 100)
}
```
I didn't know that! I should definitely upgrade my version.
Instead of loading everything at once, you could read the file line by line. This method is often simpler and more efficient for large files:
```powershell
$reader = New-Object -TypeName System.IO.StreamReader -ArgumentList 'information.txt'
while ($line = $reader.ReadLine()) {
# Process each line here
}
$reader.Close()
```
Importing 1GB of data can indeed take a while. You might try spawning your process as a job and checking on it periodically to see its progress. This way, you can ensure PowerShell isn't freezing up on you. Also, avoid using `Where-Object` for heavy data operations; a simple loop is often faster.

That's a solid approach! If you're looking to get specific values, you can always track counts or unique values during the loop.