Skip to content

Commit b2d9615

Browse files
author
W1thOut
committed
Merge branch 'master' of https://github.com/apache/carbondata
2 parents 7fa6471 + 1ccf295 commit b2d9615

File tree

289 files changed

+8036
-2063
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

289 files changed

+8036
-2063
lines changed

LICENSE

+3-1
Original file line numberDiff line numberDiff line change
@@ -210,4 +210,6 @@
210210
BSD 2-Clause
211211
------------
212212

213-
com.github.luben:zstd-jni
213+
com.github.luben:zstd-jni
214+
215+
com.github.paul-hammant:paranamer

README.md

+15-4
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
the License. You may obtain a copy of the License at
88
99
http://www.apache.org/licenses/LICENSE-2.0
10-
10+
1111
Unless required by applicable law or agreed to in writing, software
1212
distributed under the License is distributed on an "AS IS" BASIS,
1313
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@@ -26,12 +26,13 @@ You can find the latest CarbonData document and learn more at:
2626

2727
## Status
2828
Spark2.4:
29-
[![Build Status](https://builds.apache.org/buildStatus/icon?job=carbondata-master-spark-2.4)](https://builds.apache.org/view/A-D/view/CarbonData/job/carbondata-master-spark-2.4/lastBuild/testReport)
29+
[![Build Status](https://ci-builds.apache.org/job/carbondata/job/spark-2.4/badge/icon)](https://ci-builds.apache.org/job/carbondata/job/spark-2.4/)
3030
[![Coverage Status](https://coveralls.io/repos/github/apache/carbondata/badge.svg?branch=master)](https://coveralls.io/github/apache/carbondata?branch=master)
3131
<a href="https://scan.coverity.com/projects/carbondata">
3232
<img alt="Coverity Scan Build Status"
3333
src="https://scan.coverity.com/projects/13444/badge.svg"/>
3434
</a>
35+
3536
## Features
3637
CarbonData file format is a columnar store in HDFS, it has many features that a modern columnar format has, such as splittable, compression schema, complex data type etc, and CarbonData has following unique features:
3738
* Stores data along with index: it can significantly accelerate query performance and reduces the I/O scans and CPU resources, where there are filters in the query. CarbonData index consists of multiple level of indices, a processing framework can leverage this index to reduce the task it needs to schedule and process, and it can also do skip scan in more finer grain unit (called blocklet) in task side scanning instead of scanning the whole file.
@@ -93,8 +94,18 @@ This guide document introduces [how to contribute to CarbonData](https://github.
9394
## Contact us
9495
To get involved in CarbonData:
9596

96-
* First join by emailing to [[email protected]](mailto:[email protected]),then you can discuss issues by emailing to [[email protected]](mailto:[email protected]) or visit http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com
97-
* Report issues on [Apache Jira](https://issues.apache.org/jira/browse/CARBONDATA).
97+
* First join by emailing to [[email protected]](mailto:[email protected]), then you can discuss issues by emailing to [[email protected]](mailto:[email protected]).
98+
You can also directly visit [[email protected]](https://lists.apache.org/[email protected]).
99+
Or you can visit [Apache CarbonData Dev Mailing List archive](http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/).
100+
101+
* Report issues on [Apache Jira](https://issues.apache.org/jira/browse/CARBONDATA). If you do not already have an Apache JIRA account, sign up [here](https://issues.apache.org/jira/).
102+
103+
* You can also slack to get in touch with the community. After we invite you, you can use this [Slack Link](https://carbondataworkspace.slack.com/) to sign in to CarbonData.
104+
105+
* Of course, you can scan the QR Code to join in our WeChat Group to get in touch.
106+
![QRCode_WechatGroup](docs/images/QRCode_WechatGroup.png)
107+
108+
98109

99110
## About
100111
Apache CarbonData is an open source project of The Apache Software Foundation (ASF).

assembly/pom.xml

+1-1
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
<parent>
2323
<groupId>org.apache.carbondata</groupId>
2424
<artifactId>carbondata-parent</artifactId>
25-
<version>2.2.0-SNAPSHOT</version>
25+
<version>2.3.0-SNAPSHOT</version>
2626
<relativePath>../pom.xml</relativePath>
2727
</parent>
2828

common/pom.xml

+1-1
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
<parent>
2323
<groupId>org.apache.carbondata</groupId>
2424
<artifactId>carbondata-parent</artifactId>
25-
<version>2.2.0-SNAPSHOT</version>
25+
<version>2.3.0-SNAPSHOT</version>
2626
<relativePath>../pom.xml</relativePath>
2727
</parent>
2828

conf/carbon.properties.template

-3
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,6 @@
1717
#
1818

1919
#################### System Configuration ##################
20-
##Optional. Location where CarbonData will create the store, and write the data in its own format.
21-
##If not specified then it takes spark.sql.warehouse.dir path.
22-
#carbon.storelocation
2320
#Base directory for Data files
2421
#carbon.ddl.base.hdfs.url
2522
#Path where the bad records are stored

conf/dataload.properties.template

-4
Original file line numberDiff line numberDiff line change
@@ -16,10 +16,6 @@
1616
# limitations under the License.
1717
#
1818

19-
#carbon store path
20-
# you should change to the code path of your local machine
21-
carbon.storelocation=/home/david/Documents/carbondata/examples/spark/target/store
22-
2319
#csv delimiter character
2420
delimiter=,
2521

core/pom.xml

+1-1
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
<parent>
2323
<groupId>org.apache.carbondata</groupId>
2424
<artifactId>carbondata-parent</artifactId>
25-
<version>2.2.0-SNAPSHOT</version>
25+
<version>2.3.0-SNAPSHOT</version>
2626
<relativePath>../pom.xml</relativePath>
2727
</parent>
2828

core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java

+33
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ private CarbonCommonConstants() {
3737
/**
3838
* location of the carbon member, hierarchy and fact files
3939
*/
40+
@Deprecated
4041
@CarbonProperty
4142
public static final String STORE_LOCATION = "carbon.storelocation";
4243

@@ -122,6 +123,16 @@ private CarbonCommonConstants() {
122123
*/
123124
public static final String CARBON_TIMESTAMP_MILLIS = "dd-MM-yyyy HH:mm:ss:SSS";
124125

126+
/**
127+
* CARBON Default format - time segment
128+
*/
129+
public static final String CARBON_TIME_SEGMENT_DEFAULT_FORMAT = " HH:mm:ss";
130+
131+
/**
132+
* CARBON Default data - time segment
133+
*/
134+
public static final String CARBON_TIME_SEGMENT_DATA_DEFAULT_FORMAT = " 00:00:00";
135+
125136
/**
126137
* Property for specifying the format of DATE data type column.
127138
* e.g. yyyy/MM/dd , or using default value
@@ -2648,4 +2659,26 @@ private CarbonCommonConstants() {
26482659

26492660
public static final String CARBON_SDK_EMPTY_METADATA_PATH = "emptyMetadataFolder";
26502661

2662+
/**
2663+
* Property to identify if the spark version is above 3.x version
2664+
*/
2665+
public static final String CARBON_SPARK_VERSION_SPARK3 = "carbon.spark.version.spark3";
2666+
2667+
public static final String CARBON_SPARK_VERSION_SPARK3_DEFAULT = "false";
2668+
2669+
/**
2670+
* Carbon Spark 3.x supported data file written version
2671+
*/
2672+
public static final String CARBON_SPARK3_VERSION = "2.2.0";
2673+
2674+
/**
2675+
* This property is to enable the min max pruning of target carbon table based on input/source
2676+
* data
2677+
*/
2678+
@CarbonProperty
2679+
public static final String CARBON_CDC_MINMAX_PRUNING_ENABLED =
2680+
"carbon.cdc.minmax.pruning.enabled";
2681+
2682+
public static final String CARBON_CDC_MINMAX_PRUNING_ENABLED_DEFAULT = "false";
2683+
26512684
}

core/src/main/java/org/apache/carbondata/core/datastore/block/AbstractIndex.java

+7
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717

1818
package org.apache.carbondata.core.datastore.block;
1919

20+
import java.util.List;
2021
import java.util.Map;
2122
import java.util.concurrent.atomic.AtomicInteger;
2223

@@ -51,6 +52,12 @@ public abstract class AbstractIndex implements Cacheable {
5152
*/
5253
private long deleteDeltaTimestamp;
5354

55+
public List<TableBlockInfo> getBlockInfos() {
56+
return blockInfos;
57+
}
58+
59+
protected List<TableBlockInfo> blockInfos;
60+
5461
/**
5562
* map of blockletIdAndPageId to deleted rows
5663
*/

core/src/main/java/org/apache/carbondata/core/datastore/block/TableBlockInfo.java

+15
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,11 @@ public class TableBlockInfo implements Distributable, Serializable {
8787

8888
private transient DataFileFooter dataFileFooter;
8989

90+
/**
91+
* Carbon Data file written version
92+
*/
93+
private String carbonDataFileWrittenVersion = null;
94+
9095
/**
9196
* comparator to sort by block size in descending order.
9297
* Since each line is not exactly the same, the size of a InputSplit may differs,
@@ -132,6 +137,7 @@ public TableBlockInfo copy() {
132137
info.deletedDeltaFilePath = deletedDeltaFilePath;
133138
info.detailInfo = detailInfo.copy();
134139
info.indexWriterPath = indexWriterPath;
140+
info.carbonDataFileWrittenVersion = carbonDataFileWrittenVersion;
135141
return info;
136142
}
137143

@@ -353,4 +359,13 @@ public String toString() {
353359
sb.append('}');
354360
return sb.toString();
355361
}
362+
363+
public String getCarbonDataFileWrittenVersion() {
364+
return carbonDataFileWrittenVersion;
365+
}
366+
367+
public void setCarbonDataFileWrittenVersion(String carbonDataFileWrittenVersion) {
368+
this.carbonDataFileWrittenVersion = carbonDataFileWrittenVersion;
369+
}
370+
356371
}

core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v3/DimensionChunkReaderV3.java

+1
Original file line numberDiff line numberDiff line change
@@ -256,6 +256,7 @@ private ColumnPage decodeDimensionByMeta(DataChunk2 pageMetadata, ByteBuffer pag
256256
if (vectorInfo != null) {
257257
// set encodings of current page in the vectorInfo, used for decoding the complex child page
258258
vectorInfo.encodings = encodings;
259+
vectorInfo.vector.setCarbonDataFileWrittenVersion(vectorInfo.carbonDataFileWrittenVersion);
259260
decoder
260261
.decodeAndFillVector(pageData.array(), offset, pageMetadata.data_page_length, vectorInfo,
261262
nullBitSet, isLocalDictEncodedPage, pageMetadata.numberOfRowsInpage,

core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/measure/v3/MeasureChunkReaderV3.java

+1
Original file line numberDiff line numberDiff line change
@@ -245,6 +245,7 @@ protected ColumnPage decodeMeasure(DataChunk2 pageMetadata, ByteBuffer pageData,
245245
ColumnPageDecoder codec =
246246
encodingFactory.createDecoder(encodings, encoderMetas, compressorName, vectorInfo != null);
247247
if (vectorInfo != null) {
248+
vectorInfo.vector.setCarbonDataFileWrittenVersion(vectorInfo.carbonDataFileWrittenVersion);
248249
codec.decodeAndFillVector(pageData.array(), offset, pageMetadata.data_page_length, vectorInfo,
249250
nullBitSet, false, pageMetadata.numberOfRowsInpage, reusableDataBuffer);
250251
return null;

core/src/main/java/org/apache/carbondata/core/index/IndexInputFormat.java

+20
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@
3636
import org.apache.carbondata.core.indexstore.ExtendedBlocklet;
3737
import org.apache.carbondata.core.indexstore.PartitionSpec;
3838
import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
39+
import org.apache.carbondata.core.mutate.CdcVO;
3940
import org.apache.carbondata.core.readcommitter.LatestFilesReadCommittedScope;
4041
import org.apache.carbondata.core.readcommitter.ReadCommittedScope;
4142
import org.apache.carbondata.core.readcommitter.TableStatusReadCommittedScope;
@@ -102,6 +103,8 @@ public class IndexInputFormat extends FileInputFormat<Void, ExtendedBlocklet>
102103

103104
private Set<String> missingSISegments;
104105

106+
private CdcVO cdcVO;
107+
105108
IndexInputFormat() {
106109

107110
}
@@ -275,6 +278,10 @@ public void write(DataOutput out) throws IOException {
275278
out.writeUTF(segment);
276279
}
277280
}
281+
out.writeBoolean(cdcVO != null);
282+
if (cdcVO != null) {
283+
cdcVO.write(out);
284+
}
278285
}
279286

280287
@Override
@@ -330,6 +337,11 @@ public void readFields(DataInput in) throws IOException {
330337
missingSISegments.add(in.readUTF());
331338
}
332339
}
340+
boolean isCDCJob = in.readBoolean();
341+
if (isCDCJob) {
342+
this.cdcVO = new CdcVO();
343+
cdcVO.readFields(in);
344+
}
333345
}
334346

335347
private void initReadCommittedScope() throws IOException {
@@ -353,6 +365,14 @@ public boolean isFallbackJob() {
353365
return isFallbackJob;
354366
}
355367

368+
public CdcVO getCdcVO() {
369+
return cdcVO;
370+
}
371+
372+
public void setCdcVO(CdcVO cdcVO) {
373+
this.cdcVO = cdcVO;
374+
}
375+
356376
/**
357377
* @return Whether asyncCall to the IndexServer.
358378
*/

core/src/main/java/org/apache/carbondata/core/indexstore/Blocklet.java

+1-1
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@
3131
public class Blocklet implements Writable, Serializable {
3232

3333
/** file path of this blocklet */
34-
private String filePath;
34+
protected String filePath;
3535

3636
/** id to identify the blocklet inside the block (it is a sequential number) */
3737
private String blockletId;

0 commit comments

Comments
 (0)