VBA⽐较两个Excel数据的异同
代码背景:
由于Excel本⾝⽆法简单的⽐较两个Excel数据的异同,所以⽤VBA编写代码的⽅式来实现。
这⾥的⽐较条件是:数据⾏为单位,假设对应Sheet中没有重复数据,对应数据⾏的所有列的数据都相等,即为此⾏数据相同。
这⾥的两个Sheet的数据⾏量级别⼤约为:50000 * 50000,数据列⼤约:50,对应Cell中的字符串⼤约100以内,中英⽂混合。
如何在Excel中调出VBA的编写⼯具,请参考如下链接:
整体来说,需求⾮常明确,若是不考虑效率的话,代码逻辑⽐较简单,循环⽐较即可。
相关代码:
Sub CompareData()
Dim i As Long
Dim j As Long
Dim fullSheetName As String
fullSheetName = "Sheet1"
Set fullSheet = Sheets(fullSheetName)
Dim fullDataRange As Variant
fullDataRange = fullSheet.Range("A1", "AT80000").CurrentRegion.Value
Dim fullSheetRowMax As Long
fullSheetRowMax = fullSheet.Range("A1", "AT80000").CurrentRegion.Rows.Count
Dim partialSheetName As String
partialSheetName = "Sheet2"
Set partialSheet = Sheets(partialSheetName)
Dim partialDataRange As Variant
partialDataRange = partialSheet.Range("A1", "AT80000").CurrentRegion.Value
Dim partialSheetRowMax As Long
partialSheetRowMax = partialSheet.Range("A1", "AT80000").CurrentRegion.Rows.Count
Dim columnMax As Integer
columnMax = 46
Dim columnMark As Integer
columnMark = 48
Dim sameRow As Boolean
For i = 1To fullSheetRowMax
For j = 1To partialSheetRowMax
sameRow = True
For columnIndex = 1To columnMax
If fullDataRange(i, columnIndex) <> partialDataRange(j, columnIndex) Then
sameRow = False
Exit For
End If
Next columnIndex
If sameRow Then
fullSheet.Cells(i, columnMark) = 1
Exit For
End If
Next j
Next i
MsgBox"Successfully!"
End Sub
View Code
上述代码实际运⾏⼤约⽤30分钟完成此数量级的⽐较,⼤约1000亿次的⽐较。
当然了我们需要更快的⽐较⽅式,那么就需要对⼤数据进⾏结构优化,即:将partial的sheet中的数据进⾏分组,⽐如每1000个row放到⼀个组⾥,然后⽤⼀个标志位标记这个组⾥1000个row是否都有相同的数据,如有都有的话,那么下次再⽐较的时候就可以跳过这个组,进⾏下⼀组的1000个row的循环遍历。相同数量级,⼤约2分钟⽐较完成。
注:实际数据是按照时间进⾏抽取出来的,所以partial的sheet数据⼤致都在full的sheet的前半部分相同,
如果数据⽆规律,⾮常混乱,那么还要对每⼀个row的数据进⾏结构优化,即:⽤另外⼀个标记为进⾏标记此row是否有相同的数据,判断的时候先判断这个标记位】
相关代码如下:
【注:函数中的⼀些变量都是HardCode的,要根据具体数据进⾏修改】
Public Type PartialBasedModule
IsAllSame As Boolean
SheetDataRange As Variant
SameCount As Long
End Type
Sub CompareData()
Dim i As Long
Dim j As Long
Dim k As Long
Dim fullSheetName As String
fullSheetName = "Sheet1"
Set fullSheet = Sheets(fullSheetName)
Dim fullDataRange As Variant
fullDataRange = fullSheet.Range("A1", "AT80000").CurrentRegion.Value
Dim fullSheetRowMax As Long
fullSheetRowMax = fullSheet.Range("A1", "AT80000").CurrentRegion.Rows.Count
Dim partialSheetName As String
partialSheetName = "Sheet2"
Set partialSheet = Sheets(partialSheetName)
Dim PartialDataRange As Variant
PartialDataRange = partialSheet.Range("A1", "AT80000").CurrentRegion.Value
Dim partialSheetRowMax As Long
partialSheetRowMax = partialSheet.Range("A1", "AT80000").CurrentRegion.Rows.Count
Dim partialSheetPages() As PartialBasedModule
partialSheetPages = SeparatePartialSheet(PartialDataRange, partialSheetRowMax)
Dim columnMax As Integer
columnMax = 46
Dim columnMark As Integer
columnMark = 48
Dim sameRow As Boolean
For i = 1To fullSheetRowMax
vba 字符串函数For j = 1To UBound(partialSheetPages)
If partialSheetPages(j).SameCount < 1000Then
For k = 1To UBound(partialSheetPages(j).SheetDataRange)
sameRow = True
For ColumnIndex = 1To columnMax
If fullDataRange(i, ColumnIndex) <> partialSheetPages(j).SheetDataRange(k, ColumnIndex) Then
sameRow = False
Exit For
End If
Next ColumnIndex
If sameRow Then
fullSheet.Cells(i, columnMark) = 1
partialSheetPages(j).SameCount = partialSheetPages(j).SameCount + 1
Exit For
End If
Next k
Else
sameRow = False
End If
If sameRow Then
Exit For
End If
Next j
Next i
MsgBox"Successfully!"
End Sub
Public Function SeparatePartialSheet(ByRef PartialDataRange As Variant, ByVal rowCount As Long) As PartialBasedModule() Dim eachPageCount As Long
eachPageCount = 1000
Dim pageCount As Integer
pageCount = Int(rowCount / eachPageCount) + 1
Dim pageIndex As Long
Dim pageArr() As PartialBasedModule
Dim startIndex As Long
Dim endIndex As Long
For pageIndex = 1To pageCount
Dim seperatedDataRange(1To1000, 1To46) As Variant
Dim seperatedIndex As Long
seperatedIndex = 1
Dim colIndex As Integer
If pageIndex < pageCount Then
endIndex = pageIndex * eachPageCount
Else
endIndex = rowCount
End If
For startIndex = (pageIndex - 1) * eachPageCount + 1To endIndex
For colIndex = 1To46
seperatedDataRange(seperatedIndex, colIndex) = PartialDataRange(startIndex, colIndex)
Next colIndex
seperatedIndex = seperatedIndex + 1
Next startIndex
Dim pageData As PartialBasedModule
pageData.SheetDataRange = seperatedDataRange
pageData.SameCount = 0
pageData.IsAllSame = False
ReDim Preserve pageArr(pageIndex)
pageArr(pageIndex) = pageData
Next pageIndex
SeparatePartialSheet = pageArr
End Function
View Code
给每个Row都加上标记的代码如下所⽰:【相同界别的数据,⼤约1分钟完成⽐较】
Public Type RowModule
IsSame As Boolean
RowData As Variant
End Type
Public Type PartialBasedModule
IsAllSame As Boolean
SheetDataRange() As RowModule
SameCount As Long
End Type
Sub CompareData()
Dim i As Long
Dim j As Long
Dim k As Long
Dim fullSheetName As String
fullSheetName = "Sheet1"
Set fullSheet = Sheets(fullSheetName)
Dim fullDataRange As Variant
fullDataRange = fullSheet.Range("A1", "AT80000").CurrentRegion.Value
Dim fullSheetRowMax As Long
fullSheetRowMax = fullSheet.Range("A1", "AT80000").CurrentRegion.Rows.Count
Dim partialSheetName As String
partialSheetName = "Sheet2"
Set partialSheet = Sheets(partialSheetName)
Dim PartialDataRange As Variant
PartialDataRange = partialSheet.Range("A1", "AT80000").CurrentRegion.Value
Dim partialSheetRowMax As Long
partialSheetRowMax = partialSheet.Range("A1", "AT80000").CurrentRegion.Rows.Count
Dim partialSheetPages() As PartialBasedModule
partialSheetPages = SeparatePartialSheet(PartialDataRange, partialSheetRowMax)
Dim columnMax As Integer
columnMax = 46
Dim columnMark As Integer
columnMark = 48
Dim sameRow As Boolean
For i = 1To fullSheetRowMax
For j = 1To UBound(partialSheetPages)
If partialSheetPages(j).SameCount < 1000Then
For k = 1To UBound(partialSheetPages(j).SheetDataRange)
sameRow = True
If partialSheetPages(j).SheetDataRange(k).IsSame Then
sameRow = False
Else
For ColumnIndex = 1To columnMax
If fullDataRange(i, ColumnIndex) <> partialSheetPages(j).SheetDataRange(k).RowData(ColumnIndex) Then                                sameRow = False
Exit For
End If
Next ColumnIndex
If sameRow Then
fullSheet.Cells(i, columnMark) = 1
partialSheetPages(j).SheetDataRange(k).IsSame = True
partialSheetPages(j).SameCount = partialSheetPages(j).SameCount + 1
Exit For
End If
End If
Next k
Else
sameRow = False
End If
If sameRow Then
Exit For
End If
Next j
Next i
MsgBox"Successfully!"
End Sub
Public Function SeparatePartialSheet(ByRef PartialDataRange As Variant, ByVal rowCount As Long) As PartialBasedModule() Dim eachPageCount As Long
eachPageCount = 1000
Dim pageCount As Integer
pageCount = Int(rowCount / eachPageCount) + 1
Dim pageIndex As Long
Dim pageArr() As PartialBasedModule
Dim startIndex As Long
Dim endIndex As Long
For pageIndex = 1To pageCount
Dim seperatedDataRange(1To1000) As RowModule
Dim dataRows(1To1000) As Variant
Dim seperatedIndex As Long
seperatedIndex = 1
Dim colIndex As Integer
If pageIndex < pageCount Then
endIndex = pageIndex * eachPageCount
Else
endIndex = rowCount
End If
For startIndex = (pageIndex - 1) * eachPageCount + 1To endIndex
Dim dataRow(1To46) As Variant
For colIndex = 1To46
dataRow(colIndex) = PartialDataRange(startIndex, colIndex) Next colIndex
Dim currentRowModule As RowModule
currentRowModule.RowData = dataRow
currentRowModule.IsSame = False
seperatedDataRange(seperatedIndex) = currentRowModule
seperatedIndex = seperatedIndex + 1
Next startIndex
Dim pageData As PartialBasedModule
pageData.SheetDataRange = seperatedDataRange
pageData.SameCount = 0
pageData.IsAllSame = False
ReDim Preserve pageArr(pageIndex)
pageArr(pageIndex) = pageData
Next pageIndex
SeparatePartialSheet = pageArr
End Function
View Code
最终的⼀个简单的数据结构如下图所⽰:

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。