Skip to content

Commit e217942

Browse files
author
Andrew Fogarty
committed
Merge branch 'master' of https://github.com/dotnet/spark into anfog/thread_local
2 parents 1529a27 + 7bcd2a5 commit e217942

File tree

2 files changed

+89
-155
lines changed

2 files changed

+89
-155
lines changed

CONTRIBUTING.md

Lines changed: 89 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,23 @@
1-
# Welcome!
1+
# Contributing to .NET for Apache Spark!
22

33
If you are here, it means you are interested in helping us out. A hearty welcome and thank you! There are many ways you can contribute to the .NET for Apache Spark project:
44

5-
* Offer PR's to fix bugs or implement new features.
6-
* Give us feedback and bug reports regarding the software or the documentation.
7-
* Improve our examples, tutorials, and documentation.
5+
* Offer PR's to fix bugs or implement new features
6+
* Review currently [open PRs](https://github.com/dotnet/spark/pulls)
7+
* Give us feedback and bug reports regarding the software or the documentation
8+
* Improve our examples, tutorials, and documentation
89

9-
## Getting started:
10+
Please start by browsing the [issues](https://github.com/dotnet/spark/issues) and leave a comment to engage us if any of them interests you. And don't forget to take a look at the project [roadmap](https://github.com/dotnet/spark/blob/master/ROADMAP.md).
11+
12+
Here are a few things to consider:
13+
14+
* Before starting working on a major feature or bug fix, please open a GitHub issue describing the work you are proposing. We will make sure no one else is already working on it and the work aligns with the project [roadmap](https://github.com/dotnet/spark/blob/master/ROADMAP.md).
15+
* A "major" feature or bug fix is defined as any change that is > 100 lines of code (not including tests) or changes user-facing behavior (e.g., breaking API changes). Please read [Proposing Major Changes to .NET for Apache Spark](#proposing-major-changes-to-net-for-apache-spark) before you begin any major work.
16+
* Once you are ready, you can create a PR and the committers will help reviewing your PR.
17+
18+
**Coding Style**: Please review our [coding guidelines](https://github.com/rapoth/spark-2/blob/master/docs/contributing.md).
19+
20+
## Getting started
1021

1122
Please make sure to take a look at the project [roadmap](ROADMAP.md).
1223

@@ -22,10 +33,81 @@ A .NET for Apache Spark team member will be assigned to your pull request once t
2233

2334
All commits in a pull request will be squashed to a single commit with the original creator as author.
2435

25-
# Contributing
36+
### Contributing
2637

2738
See [Contributing](docs/contributing.md) for information about coding styles, source structure, making pull requests, and more.
2839

29-
# Developers
40+
### Developers
3041

3142
See the [Developer Guide](docs/developer-guide.md) for details about developing in this repo.
43+
44+
## Proposing major changes to .NET for Apache Spark
45+
46+
The development process in .NET for Apache Spark is design-driven. If you intend of making any significant changes, please consider discussing with the .NET for Apache Spark community first (and sometimes formally documented), before you open a PR.
47+
48+
The rest of this document describes the process for proposing, documenting and implementing changes to the .NET for Apache Spark project.
49+
50+
To learn about the motivation behind .NET for Apache Spark, see the following talk:
51+
- [Introducing .NET Bindings for Apache Spark](https://databricks.com/session/introducing-net-bindings-for-apache-spark) from Spark+AI Summit 2019.
52+
- [.NET for Apache Spark](https://databricks.com/session_eu19/net-for-apache-spark) from Spark+AI Europe Summit 2019.
53+
54+
### The Proposal Process
55+
56+
The process outlined below is for reviewing a proposal and reaching a decision about whether to accept/decline a proposal.
57+
58+
1. The proposal author [creates a brief issue](https://github.com/dotnet/spark/issues/new?assignees=&labels=untriaged%2C+proposal&template=design-template.md&title=%5BPROPOSAL%5D%3A+) describing the proposal.
59+
2. A discussion on the issue will aim to triage the proposal into one of three outcomes:
60+
- Accept proposal
61+
- Decline proposal
62+
- Ask for a detailed doc
63+
If the proposal is accepted/declined, the process is done. Otherwise, the discussion is expected to identify concerns that should be addressed in a more detailed design.
64+
3. The proposal author follows up with a detailed description to work out details of the proposed design and address the concerns raised in the initial discussion.
65+
4. Once comments and revisions on the design are complete, there is a final discussion on the issue to reach one of two outcomes:
66+
- Accept proposal
67+
- Decline proposal
68+
69+
After the proposal is accepted or declined (e.g., after Step 2 or Step 4), implementation work proceeds in the same way as any other contribution.
70+
71+
> **Tip:** If you are an experienced committer and are certain that a design description will be required for a particular proposal, you can skip Step 2.
72+
73+
### Writing a Design Document
74+
75+
As noted [above](#the-proposal-process), some (but not all) proposals need to be elaborated in a design description.
76+
77+
- The design description should follow the template outlined below
78+
```
79+
Proposal:
80+
81+
Rationale:
82+
83+
Compatibility:
84+
85+
Design:
86+
87+
Implementation:
88+
89+
Impact on Performance (if applicable):
90+
91+
Open issues (if applicable):
92+
```
93+
94+
- Once you have the design description ready and have addressed any specific concerns raised during the initial discussion, please reply back to the original issue.
95+
- Address any additional feedback/questions and update your design description as needed.
96+
97+
### Proposal Review
98+
99+
A group of .NET for Apache Spark team members will review your proposal and CC the relevant developers, raising important questions, pinging lapsed discussions, and generally trying to guide the discussion toward agreement about the outcome. The discussion itself is expected to happen on the issue, so that anyone can take part.
100+
101+
### Consensus and Disagreement
102+
103+
The goal of the proposal process is to reach general consensus about the outcome in a timely manner.
104+
105+
If general consensus cannot be reached, the proposal review group decides the next step by reviewing and discussing the issue and reaching a consensus among themselves.
106+
107+
## Becoming a .NET for Apache Spark Committer
108+
109+
The .NET for Apache Spark team will add new committers from the active contributors, based on their contributions to the .NET for Apache Spark project. The qualifications for new committers are derived from [Apache Spark Contributor Guide](https://spark.apache.org/contributing.html):
110+
111+
- **Sustained contributions to .NET for Apache Spark**: Committers should have a history of major contributions to .NET for Apache Spark. An ideal committer will have contributed broadly throughout the project, and have contributed at least one major component where they have taken an “ownership” role. An ownership role means that existing contributors feel that they should run patches for this component by this person.
112+
- **Quality of contributions**: Committers more than any other community member should submit simple, well-tested, and well-designed patches. In addition, they should show sufficient expertise to be able to review patches, including making sure they fit within .NET for Apache Spark’s engineering practices (testability, documentation, API stability, code style, etc). The committership is collectively responsible for the software quality and maintainability of .NET for Apache Spark.
113+
- **Community involvement**: Committers should have a constructive and friendly attitude in all community interactions. They should also be active on the dev and user list and help mentor newer contributors and users. In design discussions, committers should maintain a professional and diplomatic approach, even in the face of disagreement.

azure-pipelines.yml

Lines changed: 0 additions & 148 deletions
Original file line numberDiff line numberDiff line change
@@ -11,42 +11,14 @@ variables:
1111
_SignType: real
1212
_TeamName: DotNetSpark
1313
MSBUILDSINGLELOADCONTEXT: 1
14-
# forwardCompatibleRelease/backwardCompatibleRelease is the "oldest" releases that work with the current release
15-
forwardCompatibleRelease: '0.9.0'
16-
backwardCompatibleRelease: '0.9.0'
17-
forwardCompatibleTestsToFilterOut: "(FullyQualifiedName!=Microsoft.Spark.E2ETest.UdfTests.UdfComplexTypesTests.TestUdfWithArrayType)"
18-
backwardCompatibleTestsToFilterOut: "(FullyQualifiedName!=Microsoft.Spark.E2ETest.IpcTests.DataFrameTests.TestDataFrameGroupedMapUdf)&\
19-
(FullyQualifiedName!=Microsoft.Spark.E2ETest.IpcTests.DataFrameTests.TestDataFrameVectorUdf)&\
20-
(FullyQualifiedName!=Microsoft.Spark.E2ETest.IpcTests.BroadcastTests.TestDestroy)&\
21-
(FullyQualifiedName!=Microsoft.Spark.E2ETest.IpcTests.BroadcastTests.TestMultipleBroadcast)&\
22-
(FullyQualifiedName!=Microsoft.Spark.E2ETest.IpcTests.BroadcastTests.TestUnpersist)&\
23-
(FullyQualifiedName!=Microsoft.Spark.E2ETest.UdfTests.UdfComplexTypesTests.TestUdfWithArrayType)&\
24-
(FullyQualifiedName!=Microsoft.Spark.E2ETest.UdfTests.UdfComplexTypesTests.TestUdfWithArrayOfArrayType)&\
25-
(FullyQualifiedName!=Microsoft.Spark.E2ETest.UdfTests.UdfComplexTypesTests.TestUdfWithRowArrayType)&\
26-
(FullyQualifiedName!=Microsoft.Spark.E2ETest.UdfTests.UdfComplexTypesTests.TestUdfWithSimpleArrayType)&\
27-
(FullyQualifiedName!=Microsoft.Spark.E2ETest.UdfTests.UdfComplexTypesTests.TestUdfWithMapType)&\
28-
(FullyQualifiedName!=Microsoft.Spark.E2ETest.UdfTests.UdfComplexTypesTests.TestUdfWithMapOfMapType)&\
29-
(FullyQualifiedName!=Microsoft.Spark.E2ETest.UdfTests.UdfComplexTypesTests.TestUdfWithReturnAsMapType)&\
30-
(FullyQualifiedName!=Microsoft.Spark.E2ETest.UdfTests.UdfSimpleTypesTests.TestUdfWithReturnAsTimestampType)&\
31-
(FullyQualifiedName!=Microsoft.Spark.E2ETest.UdfTests.UdfSimpleTypesTests.TestUdfWithTimestampType)&\
32-
(FullyQualifiedName!=Microsoft.Spark.E2ETest.IpcTests.SparkSessionTests.TestCreateDataFrameWithTimestamp)"
3314
ArtifactPath: '$(Build.ArtifactStagingDirectory)\Microsoft.Spark.Binaries'
3415
CurrentDotnetWorkerDir: '$(ArtifactPath)\Microsoft.Spark.Worker\netcoreapp3.1\win-x64'
35-
BackwardCompatibleDotnetWorkerDir: $(Build.BinariesDirectory)\Microsoft.Spark.Worker-$(backwardCompatibleRelease)
3616

3717
# Azure DevOps variables are transformed into environment variables, with these variables we
3818
# avoid the first time experience and telemetry to speed up the build.
3919
DOTNET_CLI_TELEMETRY_OPTOUT: 1
4020
DOTNET_SKIP_FIRST_TIME_EXPERIENCE: 1
4121

42-
resources:
43-
repositories:
44-
- repository: forwardCompatibleRelease
45-
type: github
46-
endpoint: dotnet
47-
name: dotnet/spark
48-
ref: refs/tags/v$(forwardCompatibleRelease)
49-
5022
stages:
5123
- stage: Build
5224
displayName: Build Sources
@@ -210,123 +182,3 @@ stages:
210182
- '2.4.7'
211183
- '3.0.0'
212184
- '3.0.1'
213-
214-
- stage: ForwardCompatibility
215-
displayName: E2E Forward Compatibility Tests
216-
dependsOn: Build
217-
jobs:
218-
- job: Run
219-
pool: Hosted VS2017
220-
221-
variables:
222-
${{ if and(ne(variables['System.TeamProject'], 'public'), notin(variables['Build.Reason'], 'PullRequest')) }}:
223-
_OfficialBuildIdArgs: /p:OfficialBuildId=$(BUILD.BUILDNUMBER)
224-
HADOOP_HOME: $(Build.BinariesDirectory)\hadoop
225-
DOTNET_WORKER_DIR: $(CurrentDotnetWorkerDir)
226-
227-
steps:
228-
- checkout: forwardCompatibleRelease
229-
path: s\$(forwardCompatibleRelease)
230-
231-
- task: Maven@3
232-
displayName: 'Maven build src for forward compatible release v$(forwardCompatibleRelease)'
233-
inputs:
234-
mavenPomFile: src/scala/pom.xml
235-
236-
- task: DownloadBuildArtifacts@0
237-
displayName: Download Build Artifacts
238-
inputs:
239-
artifactName: Microsoft.Spark.Binaries
240-
downloadPath: $(Build.ArtifactStagingDirectory)
241-
242-
- task: BatchScript@1
243-
displayName: Download Spark Distros & Winutils.exe
244-
inputs:
245-
filename: script\download-spark-distros.cmd
246-
arguments: $(Build.BinariesDirectory)
247-
248-
- template: azure-pipelines-e2e-tests-template.yml
249-
parameters:
250-
versions:
251-
- '2.3.0'
252-
- '2.3.1'
253-
- '2.3.2'
254-
- '2.3.3'
255-
- '2.3.4'
256-
- '2.4.0'
257-
- '2.4.1'
258-
- '2.4.3'
259-
- '2.4.4'
260-
- '2.4.5'
261-
testOptions: '--filter $(forwardCompatibleTestsToFilterOut)'
262-
263-
# Forward compatibility is tested only up to Spark 2.4.5 since it is the lastest Spark version
264-
# tested for "forwardCompatibleRelease". This can be updated when "forwardCompatibleRelease" is updated.
265-
266-
- stage: BackwardCompatibility
267-
displayName: E2E Backward Compatibility Tests
268-
dependsOn: Build
269-
jobs:
270-
- job: Run
271-
pool: Hosted VS2017
272-
273-
variables:
274-
${{ if and(ne(variables['System.TeamProject'], 'public'), notin(variables['Build.Reason'], 'PullRequest')) }}:
275-
_OfficialBuildIdArgs: /p:OfficialBuildId=$(BUILD.BUILDNUMBER)
276-
HADOOP_HOME: $(Build.BinariesDirectory)\hadoop
277-
DOTNET_WORKER_DIR: $(BackwardCompatibleDotnetWorkerDir)
278-
279-
steps:
280-
- task: DownloadBuildArtifacts@0
281-
displayName: Download Build Artifacts
282-
inputs:
283-
artifactName: Microsoft.Spark.Binaries
284-
downloadPath: $(Build.ArtifactStagingDirectory)
285-
286-
- task: CopyFiles@2
287-
displayName: Copy jars
288-
inputs:
289-
sourceFolder: $(ArtifactPath)/Jars
290-
contents: '**/*.jar'
291-
targetFolder: $(Build.SourcesDirectory)/src/scala
292-
293-
- task: BatchScript@1
294-
displayName: Download Spark Distros & Winutils.exe
295-
inputs:
296-
filename: script\download-spark-distros.cmd
297-
arguments: $(Build.BinariesDirectory)
298-
299-
- task: BatchScript@1
300-
displayName: Download backward compatible worker v$(backwardCompatibleRelease)
301-
inputs:
302-
filename: script\download-worker-release.cmd
303-
arguments: '$(Build.BinariesDirectory) $(backwardCompatibleRelease)'
304-
305-
- template: azure-pipelines-e2e-tests-template.yml
306-
parameters:
307-
versions:
308-
- '2.3.0'
309-
- '2.3.1'
310-
- '2.3.2'
311-
- '2.3.3'
312-
- '2.3.4'
313-
- '2.4.0'
314-
- '2.4.1'
315-
- '2.4.3'
316-
- '2.4.4'
317-
- '2.4.5'
318-
- '2.4.6'
319-
- '2.4.7'
320-
testOptions: '--filter $(backwardCompatibleTestsToFilterOut)'
321-
322-
# Spark 3.0.* uses Arrow 0.15.1, which contains a new Arrow spec. This breaks backward
323-
# compatibility when using Microsoft.Spark.Worker with incompatible versions of Arrow.
324-
# Skip Arrow tests until the backward compatibility Worker version is updated.
325-
- template: azure-pipelines-e2e-tests-template.yml
326-
parameters:
327-
versions:
328-
- '3.0.0'
329-
- '3.0.1'
330-
testOptions: "--filter $(backwardCompatibleTestsToFilterOut)&\
331-
(FullyQualifiedName!=Microsoft.Spark.E2ETest.IpcTests.DataFrameTests.TestGroupedMapUdf)&\
332-
(FullyQualifiedName!=Microsoft.Spark.E2ETest.IpcTests.DataFrameTests.TestVectorUdf)"

0 commit comments

Comments
 (0)