Import using SQLBulkCopy for big CSV files question - sqlserver-faq
This is a discussion on Import using SQLBulkCopy for big CSV files question - sqlserver-faq ; Hi All, Using MS SQL 2005 64 bits. Using the SQLBulkCopy for big CSV files and some of them have 24 mil records. Working with them consumes almost all the memory in the computer. I found how to resolve this ...
![]() |
| | LinkBack | Thread Tools | Display Modes |
|
#1
| |||
| |||
| Using MS SQL 2005 64 bits. Using the SQLBulkCopy for big CSV files and some of them have 24 mil records. Working with them consumes almost all the memory in the computer. I found how to resolve this problem while google that, but no code was added. The person claimed that he flushed all temporary working tables every 100K records to avoid filling up the memory. The import is all automated for multiple files using Visual Studio, and it is pretty quick and about 15 min for 24 mil records for CSV files. (2GB file) If someone dealt with this please write down some basic code for me, or any advise will be appreciated, since this is reall show stopper. Thanks, Steve |
|
#2
| |||
| |||
|
IIRC there is a batch size in SqlBulkCopy that might do what you describe. However, I would keep the batch size closer to 1000 rows. In my example(s) I use a smaller size to generate more events to illustrate the progress bars but this slows down the process. This is an example from my book... perhaps it can help. Try cn = New SqlConnection("data source=betav7;integrated security=sspi;database=biblio") cn.Open() cn2 = New OdbcConnection("DSN=TextFile;DATABASE=c:\MSFTStock.TXT;Extensions= asc,csv,tab,txt") cn2.Open() Dim cmd As New OdbcCommand("SELECT * FROM MSFTStock.TXT", cn2) dr = cmd.ExecuteReader(CommandBehavior.CloseConnection) bc = New SqlBulkCopy(cn) With bc .BatchSize = 10 ' The batch size breaks down the operations into smaller sections .DestinationTableName = "StockData" .NotifyAfter = 5 .WriteToServer(dr) End With Catch exIO As InvalidOperationException Debug.Print(exIO.ToString) Catch ex As Exception MsgBox(ex.ToString) Finally Dim cmd2 As New SqlCommand("SELECT Count(*) FROM StockData", cn) Dim strRows As String strRows = cmd2.ExecuteScalar().ToString lblRowsInTable.Text = strRows cn.Close() cn2.Close() End Try -- __________________________________________________ ________________________ William R. Vaughn President and Founder Beta V Corporation Author, Mentor, Dad, Grandpa Microsoft MVP (425) 556-9205 (Pacific time) Hitchhiker's Guide to Visual Studio and SQL Server (7th Edition) __________________________________________________ __________________________________________ "Steve" news:OIFZy$oQJHA.3932@TK2MSFTNGP02.phx.gbl... > Hi All, > > Using MS SQL 2005 64 bits. > > Using the SQLBulkCopy for big CSV files and some of them have 24 mil > records. Working with them consumes almost all the memory in the computer. > I found how to resolve this problem while google that, but no code was > added. The person claimed that he flushed all temporary working tables > every 100K records to avoid filling up the memory. > > The import is all automated for multiple files using Visual Studio, and it > is pretty quick and about 15 min for 24 mil records for CSV files. (2GB > file) > > If someone dealt with this please write down some basic code for me, or > any advise will be appreciated, since this is reall show stopper. > > Thanks, > > Steve > > |
|
#3
| |||
| |||
|
Hi William, Thanks for the code and I will give it a shot. Any idea how to flush all temporary working tables every 100K records to avoid filling up the memory? That also could be helpful tip. Thanks, Steve "William Vaughn (MVP)" > IIRC there is a batch size in SqlBulkCopy that might do what you > describe. However, I would keep the batch size closer to 1000 rows. In my > example(s) I use a smaller size to generate more events to illustrate the > progress bars but this slows down the process. > > This is an example from my book... perhaps it can help. > > Try > cn = New SqlConnection("data source=betav7;integrated > security=sspi;database=biblio") > cn.Open() > cn2 = New > OdbcConnection("DSN=TextFile;DATABASE=c:\MSFTStock.TXT;Extensions= asc,csv,tab,txt") > cn2.Open() > Dim cmd As New OdbcCommand("SELECT * FROM MSFTStock.TXT", cn2) > dr = cmd.ExecuteReader(CommandBehavior.CloseConnection) > bc = New SqlBulkCopy(cn) > With bc > .BatchSize = 10 ' The batch size breaks down the > operations into smaller sections > .DestinationTableName = "StockData" > .NotifyAfter = 5 > .WriteToServer(dr) > End With > Catch exIO As InvalidOperationException > Debug.Print(exIO.ToString) > Catch ex As Exception > MsgBox(ex.ToString) > Finally > Dim cmd2 As New SqlCommand("SELECT Count(*) FROM StockData", > cn) > Dim strRows As String > strRows = cmd2.ExecuteScalar().ToString > lblRowsInTable.Text = strRows > cn.Close() > cn2.Close() > End Try > > -- > __________________________________________________ ________________________ > William R. Vaughn > President and Founder Beta V Corporation > Author, Mentor, Dad, Grandpa > Microsoft MVP > (425) 556-9205 (Pacific time) > Hitchhiker's Guide to Visual Studio and SQL Server (7th Edition) > __________________________________________________ __________________________________________ > > > > "Steve" > news:OIFZy$oQJHA.3932@TK2MSFTNGP02.phx.gbl... >> Hi All, >> >> Using MS SQL 2005 64 bits. >> >> Using the SQLBulkCopy for big CSV files and some of them have 24 mil >> records. Working with them consumes almost all the memory in the >> computer. I found how to resolve this problem while google that, but no >> code was added. The person claimed that he flushed all temporary working >> tables every 100K records to avoid filling up the memory. >> >> The import is all automated for multiple files using Visual Studio, and >> it is pretty quick and about 15 min for 24 mil records for CSV files. >> (2GB file) >> >> If someone dealt with this please write down some basic code for me, or >> any advise will be appreciated, since this is reall show stopper. >> >> Thanks, >> >> Steve |
|
#4
| |||
| |||
|
I don't agree with the approach taken by http://weblogs.sqlteam.com/mladenp/a.../26/11368.aspx. Since you don't need to create a DataTable (and lay the rows out into memory) there is no need to "flush" anything between batches. Consider that the SqlClient.SqlBulkCopy can suck data in from (almost) any DataReader, there is no need to create a DataTable first. My example (below) shows how to create a DataReader directly from a delimited text file. Yes, it uses JET, but this is one thing it seems to do well. -- __________________________________________________ ________________________ William R. Vaughn President and Founder Beta V Corporation Author, Mentor, Dad, Grandpa Microsoft MVP (425) 556-9205 (Pacific time) Hitchhiker's Guide to Visual Studio and SQL Server (7th Edition) __________________________________________________ __________________________________________ "Steve" news:OYAs89rQJHA.4916@TK2MSFTNGP06.phx.gbl... > Hi William, > > Thanks for the code and I will give it a shot. > > Any idea how to flush all temporary working tables every 100K records to > avoid filling up the memory? > That also could be helpful tip. > > Thanks, > > Steve > > "William Vaughn (MVP)" >> IIRC there is a batch size in SqlBulkCopy that might do what you >> describe. However, I would keep the batch size closer to 1000 rows. In my >> example(s) I use a smaller size to generate more events to illustrate the >> progress bars but this slows down the process. >> >> This is an example from my book... perhaps it can help. >> >> Try >> cn = New SqlConnection("data source=betav7;integrated >> security=sspi;database=biblio") >> cn.Open() >> cn2 = New >> OdbcConnection("DSN=TextFile;DATABASE=c:\MSFTStock.TXT;Extensions= asc,csv,tab,txt") >> cn2.Open() >> Dim cmd As New OdbcCommand("SELECT * FROM MSFTStock.TXT", cn2) >> dr = cmd.ExecuteReader(CommandBehavior.CloseConnection) >> bc = New SqlBulkCopy(cn) >> With bc >> .BatchSize = 10 ' The batch size breaks down >> the operations into smaller sections >> .DestinationTableName = "StockData" >> .NotifyAfter = 5 >> .WriteToServer(dr) >> End With >> Catch exIO As InvalidOperationException >> Debug.Print(exIO.ToString) >> Catch ex As Exception >> MsgBox(ex.ToString) >> Finally >> Dim cmd2 As New SqlCommand("SELECT Count(*) FROM StockData", >> cn) >> Dim strRows As String >> strRows = cmd2.ExecuteScalar().ToString >> lblRowsInTable.Text = strRows >> cn.Close() >> cn2.Close() >> End Try >> >> -- >> __________________________________________________ ________________________ >> William R. Vaughn >> President and Founder Beta V Corporation >> Author, Mentor, Dad, Grandpa >> Microsoft MVP >> (425) 556-9205 (Pacific time) >> Hitchhiker's Guide to Visual Studio and SQL Server (7th Edition) >> __________________________________________________ __________________________________________ >> >> >> >> "Steve" >> news:OIFZy$oQJHA.3932@TK2MSFTNGP02.phx.gbl... >>> Hi All, >>> >>> Using MS SQL 2005 64 bits. >>> >>> Using the SQLBulkCopy for big CSV files and some of them have 24 mil >>> records. Working with them consumes almost all the memory in the >>> computer. I found how to resolve this problem while google that, but no >>> code was added. The person claimed that he flushed all temporary working >>> tables every 100K records to avoid filling up the memory. >>> >>> The import is all automated for multiple files using Visual Studio, and >>> it is pretty quick and about 15 min for 24 mil records for CSV files. >>> (2GB file) >>> >>> If someone dealt with this please write down some basic code for me, or >>> any advise will be appreciated, since this is reall show stopper. >>> >>> Thanks, >>> >>> Steve > > > |
|
#5
| |||
| |||
|
Hi William, I see your point and I used big batches 100,000 for a 2 GB CSV file or 10000 for smaller files and that file could have almost 25 mil records. I also import dbf files but have the same problem of the CSV with the memory. I see that you are using ODBC while I was using OLEDB for the CSV. The below is my code that handles the import. I see some differences. Thanks, Steve 'connection string in accordance to file type and path Private Sub rebuildSrcConn() Select Case DBType Case DBTypeS.DBF Dim connbuilderODBC As New OdbcConnectionStringBuilder() connbuilderODBC("Driver") = "{Microsoft Visual FoxPro Driver}" connbuilderODBC("SourceType") = "DBF" connbuilderODBC("SourceDB") = Path.GetDirectoryName(DBFileName) connbuilderODBC("Exclusive") = "No" ConnStrODBC = connbuilderODBC.ConnectionString Case DBTypeS.CSV Dim connbuilderOLEDB As New OleDbConnectionStringBuilder connbuilderOLEDB("Provider") = "Microsoft.Jet.OLEDB.4.0" connbuilderOLEDB("Extended Properties") = "text;HDR=Yes;FMT=Delimited(',')" connbuilderOLEDB("Data Source") = Path.GetDirectoryName(DBFileName) ConnStrOleDb = connbuilderOLEDB.ConnectionString End Select End Sub ' Read source file table rows count Select Case DBType Case DBTypeS.DBF Using connODBC As New OdbcConnection(ConnStrODBC) connODBC.Open() Dim dbcntcmd As New OdbcCommand(String.Format("select count(*) from {0}", TableName), connODBC) rows_cnt = dbcntcmd.ExecuteScalar() connODBC.Close() End Using Case DBTypeS.CSV Using connOleDb As New OleDbConnection(ConnStrOleDb) connOleDb.Open() Dim dbcntcmd As New OleDbCommand(String.Format("select count(*) from {0}", TableName), connOleDb) rows_cnt = dbcntcmd.ExecuteScalar() connOleDb.Close() End Using End Select ' Field SQLBulkCopy input and destination columns mapping information Dim bulkcopy As New SqlBulkCopy(connbuildersql.ConnectionString, SqlBulkCopyOptions.KeepNulls) For Each FldMappingRow As dsConversion.tblSQLDBFFldsRow In dsConversion1.tblSQLDBFFlds.Rows If Not curTbl.Columns(FldMappingRow.FieldNameSQL).Compute d Then bulkcopy.ColumnMappings.Add(FldMappingRow.FieldNam eDBF, FldMappingRow.FieldNameSQL) End If Next bulkcopy.DestinationTableName = String.Format("{0}.{1}", curTbl.Schema, curTbl.Name) ' Set BatchSize and NotifyAfter values of SQLBulkCopy to current batch size bulkcopy.NotifyAfter = BatchSize bulkcopy.BatchSize = BatchSize ' Run import bulkcopy.BulkCopyTimeout = SQL_TIMEOUT AddHandler bulkcopy.SqlRowsCopied, AddressOf OnSqlRowsCopied Try Select Case DBType Case DBTypeS.DBF Using connODBC As New OdbcConnection(ConnStrODBC) connODBC.Open() Dim dbcmd As New OdbcCommand(String.Format("select * from {0}", TableName), connODBC) dbcmd.CommandTimeout = SQL_TIMEOUT bulkcopy.WriteToServer(dbcmd.ExecuteReader()) connODBC.Close() End Using Case DBTypeS.CSV Using connOleDb As New OleDbConnection(ConnStrOleDb) connOleDb.Open() Dim dbcmd As New OleDbCommand(String.Format("select * from {0}", TableName), connOleDb) dbcmd.CommandTimeout = SQL_TIMEOUT bulkcopy.WriteToServer(dbcmd.ExecuteReader()) connOleDb.Close() End Using End Select |
|
#6
| |||
| |||
|
First, the SELECT Count(*) from the table means the entire file must be processed before a single row is uploaded. I would remove that logic. You don't need to know how many rows are being uploaded--not really. The rest looks reasonable to me. -- __________________________________________________ ________________________ William R. Vaughn President and Founder Beta V Corporation Author, Mentor, Dad, Grandpa Microsoft MVP (425) 556-9205 (Pacific time) Hitchhiker's Guide to Visual Studio and SQL Server (7th Edition) __________________________________________________ __________________________________________ "Steve" Hi William, I see your point and I used big batches 100,000 for a 2 GB CSV file or 10000 for smaller files and that file could have almost 25 mil records. I also import dbf files but have the same problem of the CSV with the memory. I see that you are using ODBC while I was using OLEDB for the CSV. The below is my code that handles the import. I see some differences. Thanks, Steve 'connection string in accordance to file type and path Private Sub rebuildSrcConn() Select Case DBType Case DBTypeS.DBF Dim connbuilderODBC As New OdbcConnectionStringBuilder() connbuilderODBC("Driver") = "{Microsoft Visual FoxPro Driver}" connbuilderODBC("SourceType") = "DBF" connbuilderODBC("SourceDB") = Path.GetDirectoryName(DBFileName) connbuilderODBC("Exclusive") = "No" ConnStrODBC = connbuilderODBC.ConnectionString Case DBTypeS.CSV Dim connbuilderOLEDB As New OleDbConnectionStringBuilder connbuilderOLEDB("Provider") = "Microsoft.Jet.OLEDB.4.0" connbuilderOLEDB("Extended Properties") = "text;HDR=Yes;FMT=Delimited(',')" connbuilderOLEDB("Data Source") = Path.GetDirectoryName(DBFileName) ConnStrOleDb = connbuilderOLEDB.ConnectionString End Select End Sub ' Read source file table rows count Select Case DBType Case DBTypeS.DBF Using connODBC As New OdbcConnection(ConnStrODBC) connODBC.Open() Dim dbcntcmd As New OdbcCommand(String.Format("select count(*) from {0}", TableName), connODBC) rows_cnt = dbcntcmd.ExecuteScalar() connODBC.Close() End Using Case DBTypeS.CSV Using connOleDb As New OleDbConnection(ConnStrOleDb) connOleDb.Open() Dim dbcntcmd As New OleDbCommand(String.Format("select count(*) from {0}", TableName), connOleDb) rows_cnt = dbcntcmd.ExecuteScalar() connOleDb.Close() End Using End Select ' Field SQLBulkCopy input and destination columns mapping information Dim bulkcopy As New SqlBulkCopy(connbuildersql.ConnectionString, SqlBulkCopyOptions.KeepNulls) For Each FldMappingRow As dsConversion.tblSQLDBFFldsRow In dsConversion1.tblSQLDBFFlds.Rows If Not curTbl.Columns(FldMappingRow.FieldNameSQL).Compute d Then bulkcopy.ColumnMappings.Add(FldMappingRow.FieldNam eDBF, FldMappingRow.FieldNameSQL) End If Next bulkcopy.DestinationTableName = String.Format("{0}.{1}", curTbl.Schema, curTbl.Name) ' Set BatchSize and NotifyAfter values of SQLBulkCopy to current batch size bulkcopy.NotifyAfter = BatchSize bulkcopy.BatchSize = BatchSize ' Run import bulkcopy.BulkCopyTimeout = SQL_TIMEOUT AddHandler bulkcopy.SqlRowsCopied, AddressOf OnSqlRowsCopied Try Select Case DBType Case DBTypeS.DBF Using connODBC As New OdbcConnection(ConnStrODBC) connODBC.Open() Dim dbcmd As New OdbcCommand(String.Format("select * from {0}", TableName), connODBC) dbcmd.CommandTimeout = SQL_TIMEOUT bulkcopy.WriteToServer(dbcmd.ExecuteReader()) connODBC.Close() End Using Case DBTypeS.CSV Using connOleDb As New OleDbConnection(ConnStrOleDb) connOleDb.Open() Dim dbcmd As New OleDbCommand(String.Format("select * from {0}", TableName), connOleDb) dbcmd.CommandTimeout = SQL_TIMEOUT bulkcopy.WriteToServer(dbcmd.ExecuteReader()) connOleDb.Close() End Using End Select |
|
#7
| |||
| |||
|
Hi William, I used batch of 500 records batch for a 24 mil CSV file and still the memory goes up. What I realized that the ms sql server 2005 Service is casing the problem for the memory that is filled up. Restart the service release all memory back into the OS. But I need find out how to clear that chunk of memory say every 50000 to 100000 records. Since to import multiple files in different sizes using the SQLBulkCopy will not be possible. Thanks, Steve |
|
#8
| |||
| |||
|
If SQL Sever is allocating the memory, I expect that you'll have to let SQL Server manage it. When memory is nearing the limit, I expect the server instance to do whatever is necessary to recover it. I would test the BCP operation and see if it works for the whole rowset. How much RAM is in the system? What else is running? -- __________________________________________________ ________________________ William R. Vaughn President and Founder Beta V Corporation Author, Mentor, Dad, Grandpa Microsoft MVP (425) 556-9205 (Pacific time) Hitchhiker's Guide to Visual Studio and SQL Server (7th Edition) __________________________________________________ __________________________________________ "Steve" Hi William, I used batch of 500 records batch for a 24 mil CSV file and still the memory goes up. What I realized that the ms sql server 2005 Service is casing the problem for the memory that is filled up. Restart the service release all memory back into the OS. But I need find out how to clear that chunk of memory say every 50000 to 100000 records. Since to import multiple files in different sizes using the SQLBulkCopy will not be possible. Thanks, Steve |
|
#9
| |||
| |||
|
Hi William, I have 8GB of memory in my dev computer and using Vista 64 bits. For small files say 300000 records it is not a problem. In this link: http://weblogs.sqlteam.com/mladenp/a.../26/11368.aspx as you know one guy complained about the memory issue and described how it was done to fix the problem, but with no actual code. Before I started Visual Studio the task manager shows only 2gb memory usage. So I have about 6GB to work with. In one point I did not pay attention it got almost to the 8GB limit using the sqlBulkImport and almost could not even shut down the computer. I also went into the Server Properties and tried to limit the Max memory to 4500 MB but that did not help as well. Not so sure what is that feature all about if it does not work. Thanks, Steve "William Vaughn (MVP)" If SQL Sever is allocating the memory, I expect that you'll have to let SQL Server manage it. When memory is nearing the limit, I expect the server instance to do whatever is necessary to recover it. I would test the BCP operation and see if it works for the whole rowset. How much RAM is in the system? What else is running? -- __________________________________________________ ________________________ William R. Vaughn President and Founder Beta V Corporation Author, Mentor, Dad, Grandpa Microsoft MVP (425) 556-9205 (Pacific time) Hitchhiker's Guide to Visual Studio and SQL Server (7th Edition) __________________________________________________ __________________________________________ "Steve" Hi William, I used batch of 500 records batch for a 24 mil CSV file and still the memory goes up. What I realized that the ms sql server 2005 Service is casing the problem for the memory that is filled up. Restart the service release all memory back into the OS. But I need find out how to clear that chunk of memory say every 50000 to 100000 records. Since to import multiple files in different sizes using the SQLBulkCopy will not be possible. Thanks, Steve |
|
#10
| |||
| |||
|
This does sound like a memory leak of some kind--or simply poor memory management somewhere along the line. I would take this thread to the SQL Server.Programming newsgroup. I expect you'll get more insightful answers there. -- __________________________________________________ ________________________ William R. Vaughn President and Founder Beta V Corporation Author, Mentor, Dad, Grandpa Microsoft MVP (425) 556-9205 (Pacific time) Hitchhiker's Guide to Visual Studio and SQL Server (7th Edition) __________________________________________________ __________________________________________ "Steve" Hi William, I have 8GB of memory in my dev computer and using Vista 64 bits. For small files say 300000 records it is not a problem. In this link: http://weblogs.sqlteam.com/mladenp/a.../26/11368.aspx as you know one guy complained about the memory issue and described how it was done to fix the problem, but with no actual code. Before I started Visual Studio the task manager shows only 2gb memory usage. So I have about 6GB to work with. In one point I did not pay attention it got almost to the 8GB limit using the sqlBulkImport and almost could not even shut down the computer. I also went into the Server Properties and tried to limit the Max memory to 4500 MB but that did not help as well. Not so sure what is that feature all about if it does not work. Thanks, Steve "William Vaughn (MVP)" If SQL Sever is allocating the memory, I expect that you'll have to let SQL Server manage it. When memory is nearing the limit, I expect the server instance to do whatever is necessary to recover it. I would test the BCP operation and see if it works for the whole rowset. How much RAM is in the system? What else is running? -- __________________________________________________ ________________________ William R. Vaughn President and Founder Beta V Corporation Author, Mentor, Dad, Grandpa Microsoft MVP (425) 556-9205 (Pacific time) Hitchhiker's Guide to Visual Studio and SQL Server (7th Edition) __________________________________________________ __________________________________________ "Steve" Hi William, I used batch of 500 records batch for a 24 mil CSV file and still the memory goes up. What I realized that the ms sql server 2005 Service is casing the problem for the memory that is filled up. Restart the service release all memory back into the OS. But I need find out how to clear that chunk of memory say every 50000 to 100000 records. Since to import multiple files in different sizes using the SQLBulkCopy will not be possible. Thanks, Steve |
![]() |
« Previous Thread
|
Next Thread »
| Thread Tools | |
| Display Modes | |
| |
All times are GMT -4. The time now is 07:05 PM.




Linear Mode