C-Sharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript | SQL | PHP | Angular | HTML
It turns a comma-separated values file into smaller files containing parts of the original data. Sometimes you can only upload one megabyte sections of CSV records. This method is ideal for this case.
Example. Here we see a static class that uses methods from System.IO in the C# programming language. It divides a large input CSV file, such as example.csv, into smaller files of one megabyte.
Here: Pay attention to the method call in the Main method, which specifies files of 1024 times 1024 bytes, or one megabyte.
C# program that uses CSV files using System; class Program { static void Main() { // Split this CSV file into 1 MB chunks. CSVSplitTool.SplitCSV("example.csv", "split", 1024 * 1024); } } /// <summary> /// Tool for splitting CSV files at a certain byte size on a line break. /// </summary> static class CSVSplitTool { /// <summary> /// Split CSV files on line breaks before a certain size in bytes. /// </summary> public static void SplitCSV(string file, string prefix, int size) { // Read lines from source file string[] arr = System.IO.File.ReadAllLines(file); int total = 0; int num = 0; var writer = new System.IO.StreamWriter(GetFileName(prefix, num)); // Loop through all source lines for (int i = 0; i < arr.Length; i++) { // Current line string line = arr[i]; // Length of current line int length = line.Length; // See if adding this line would exceed the size threshold if (total + length >= size) { // Create a new file num++; total = 0; writer.Dispose(); writer = new System.IO.StreamWriter(GetFileName(prefix, num)); } // Write the line to the current file writer.WriteLine(line); // Add length of line in bytes to running size total += length; // Add size of newlines total += Environment.NewLine.Length; } writer.Dispose(); } /// <summary> /// Get an output file name based on a number. /// </summary> static string GetFileName(string prefix, int num) { return prefix + "_" + num.ToString("00") + ".txt"; } }
SplitCSV receives three parameters. These specify the source file name, the output file name prefix, and the size in bytes you want the output files to be. The second parameter, prefix, is the first part of the output file names.
We use File.ReadLines to read in the entire source CSV file. We then loop over its lines. In the for-loop, it adds up the current byte length of the strings. When it exceeds the maximum length in bytes, it outputs a new file.
Finally: It generates file names with GetFileName. This example will generate file names "split_00.txt", "split_01.txt" and more.
Verify. Here we verify the correctness of the method to make sure it works. The example CSV file is a 6,409,636-byte CSV file containing 60,000 lines, each with 10 fields. Each field is a random number.
The sum of the six output files is 6.11 MB, which is the same as the input file. The first five output files are 1024 KB each. This is displayed as 0.99 MB in the file manager. The final file is 116 KB, containing the final few KB.
Also: The lines in the output files were also checked for accuracy. The first file split occurs after line 9816.
Therefore: Line 9816 is the final line in the first output file, and line 9817 is the first line in the second output file.
Summary. This static method splits CSV files based on byte size. You can use it to split your CSV files on any size boundaries, usually one megabyte or two megabytes. This is useful for inputting CSV files to a database.