Skip to content

Commit b1c7bc5

Browse files
committed
initial import
1 parent 7b8cda8 commit b1c7bc5

11 files changed

+1199
-1
lines changed

CODE_OF_CONDUCT.md

+105
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
# Salesforce Open Source Community Code of Conduct
2+
3+
## About the Code of Conduct
4+
5+
Equality is a core value at Salesforce. We believe a diverse and inclusive
6+
community fosters innovation and creativity, and are committed to building a
7+
culture where everyone feels included.
8+
9+
Salesforce open-source projects are committed to providing a friendly, safe, and
10+
welcoming environment for all, regardless of gender identity and expression,
11+
sexual orientation, disability, physical appearance, body size, ethnicity, nationality,
12+
race, age, religion, level of experience, education, socioeconomic status, or
13+
other similar personal characteristics.
14+
15+
The goal of this code of conduct is to specify a baseline standard of behavior so
16+
that people with different social values and communication styles can work
17+
together effectively, productively, and respectfully in our open source community.
18+
It also establishes a mechanism for reporting issues and resolving conflicts.
19+
20+
All questions and reports of abusive, harassing, or otherwise unacceptable behavior
21+
in a Salesforce open-source project may be reported by contacting the Salesforce
22+
Open Source Conduct Committee at [email protected].
23+
24+
## Our Pledge
25+
26+
In the interest of fostering an open and welcoming environment, we as
27+
contributors and maintainers pledge to making participation in our project and
28+
our community a harassment-free experience for everyone, regardless of gender
29+
identity and expression, sexual orientation, disability, physical appearance,
30+
body size, ethnicity, nationality, race, age, religion, level of experience, education,
31+
socioeconomic status, or other similar personal characteristics.
32+
33+
## Our Standards
34+
35+
Examples of behavior that contributes to creating a positive environment
36+
include:
37+
38+
* Using welcoming and inclusive language
39+
* Being respectful of differing viewpoints and experiences
40+
* Gracefully accepting constructive criticism
41+
* Focusing on what is best for the community
42+
* Showing empathy toward other community members
43+
44+
Examples of unacceptable behavior by participants include:
45+
46+
* The use of sexualized language or imagery and unwelcome sexual attention or
47+
advances
48+
* Personal attacks, insulting/derogatory comments, or trolling
49+
* Public or private harassment
50+
* Publishing, or threatening to publish, others' private information—such as
51+
a physical or electronic address—without explicit permission
52+
* Other conduct which could reasonably be considered inappropriate in a
53+
professional setting
54+
* Advocating for or encouraging any of the above behaviors
55+
56+
## Our Responsibilities
57+
58+
Project maintainers are responsible for clarifying the standards of acceptable
59+
behavior and are expected to take appropriate and fair corrective action in
60+
response to any instances of unacceptable behavior.
61+
62+
Project maintainers have the right and responsibility to remove, edit, or
63+
reject comments, commits, code, wiki edits, issues, and other contributions
64+
that are not aligned with this Code of Conduct, or to ban temporarily or
65+
permanently any contributor for other behaviors that they deem inappropriate,
66+
threatening, offensive, or harmful.
67+
68+
## Scope
69+
70+
This Code of Conduct applies both within project spaces and in public spaces
71+
when an individual is representing the project or its community. Examples of
72+
representing a project or community include using an official project email
73+
address, posting via an official social media account, or acting as an appointed
74+
representative at an online or offline event. Representation of a project may be
75+
further defined and clarified by project maintainers.
76+
77+
## Enforcement
78+
79+
Instances of abusive, harassing, or otherwise unacceptable behavior may be
80+
reported by contacting the Salesforce Open Source Conduct Committee
81+
at [email protected]. All complaints will be reviewed and investigated
82+
and will result in a response that is deemed necessary and appropriate to the
83+
circumstances. The committee is obligated to maintain confidentiality with
84+
regard to the reporter of an incident. Further details of specific enforcement
85+
policies may be posted separately.
86+
87+
Project maintainers who do not follow or enforce the Code of Conduct in good
88+
faith may face temporary or permanent repercussions as determined by other
89+
members of the project's leadership and the Salesforce Open Source Conduct
90+
Committee.
91+
92+
## Attribution
93+
94+
This Code of Conduct is adapted from the [Contributor Covenant][contributor-covenant-home],
95+
version 1.4, available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html.
96+
It includes adaptions and additions from [Go Community Code of Conduct][golang-coc],
97+
[CNCF Code of Conduct][cncf-coc], and [Microsoft Open Source Code of Conduct][microsoft-coc].
98+
99+
This Code of Conduct is licensed under the [Creative Commons Attribution 3.0 License][cc-by-3-us].
100+
101+
[contributor-covenant-home]: https://www.contributor-covenant.org (https://www.contributor-covenant.org/)
102+
[golang-coc]: https://golang.org/conduct
103+
[cncf-coc]: https://github.com/cncf/foundation/blob/master/code-of-conduct.md
104+
[microsoft-coc]: https://opensource.microsoft.com/codeofconduct/
105+
[cc-by-3-us]: https://creativecommons.org/licenses/by/3.0/us/

LICENSE.txt

+14
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
BSD 3-Clause License
2+
3+
Copyright (c) 2022, Salesforce.com, Inc.
4+
All rights reserved.
5+
6+
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
7+
8+
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
9+
10+
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
11+
12+
3. Neither the name of Salesforce.com nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
13+
14+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

README.md

+51-1
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,52 @@
1+
<p align="center">
2+
<img src="img/codegen_logo_3.png" width="25%" height="25%">
3+
</p>
4+
15
# CodeGen
2-
# CodeGen
6+
This repo inclues an official code release for the **CodeGen** models, as presented in the paper, [A Conversational Paradigm for Program Synthesis](https://arxiv.org/abs/2203.13474).
7+
8+
9+
## Setup
10+
```
11+
git clone https://github.com/salesforce/CodeGen
12+
cd CodeGen
13+
14+
mkdir checkpoints
15+
cd checkpoints
16+
wget https://storage.googleapis.com/sfr-codegen-research/checkpoints/codegen-350M-mono.tar.gz && tar -xzf codegen-350M-mono.tar.gz
17+
18+
python3.8 -m venv .venv
19+
source .venv/bin/activate
20+
pip3 install --upgrade pip setuptools
21+
pip3 install -r requirements.txt
22+
python3 -m jaxformer.hf.sample --model codegen-350M-mono --context "def hello_world():"
23+
```
24+
25+
26+
## Released Models
27+
We release models of various sizes trained on various datasets. The models are named in the following format:
28+
```
29+
codegen-{model-size}-{data}
30+
```
31+
32+
`model-size` has 4 options `350M`, `2B`, `6B`, `16B`.
33+
34+
`data` has 3 options `nl`, `multi`, `mono`. `nl` models are randomly initialized and trained on [the Pile](https://github.com/EleutherAI/the-pile), a 825.18 GB English text corpous. `multi` models are initialized from `nl` models and then trained on a corpus with code data of multiple programming languages. `mono` models are initialized from `multi` models and then trained on a corpus with Python code.
35+
36+
The model names can be provided to the `--model` flag for `sample.py`. See a sample usage above in Setup.
37+
38+
39+
## Citation
40+
If you find our code or paper useful, please cite the paper:
41+
```
42+
@article{Nijkamp2022ACP,
43+
title={A Conversational Paradigm for Program Synthesis},
44+
author={Erik Nijkamp and Bo Pang and Hiroaki Hayashi and Lifu Tu and Huan Wang and Yingbo Zhou and Silvio Savarese and Caiming Xiong},
45+
journal={ arXiv preprint },
46+
year={2022}
47+
}
48+
```
49+
50+
51+
## License
52+
Our code is BSD-3 licensed. See LICENSE.txt for details.

SECURITY.md

+7
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
## Security
2+
3+
Please report any security issue to [[email protected]](mailto:[email protected])
4+
as soon as it is discovered. This library limits its runtime dependencies in
5+
order to reduce the total cost of ownership as much as can be, but all consumers
6+
should remain vigilant and have their security stakeholders review all third-party
7+
products (3PP) like this one and their dependencies.

img/codegen_logo_1.png

105 KB
Loading

img/codegen_logo_2.jpg

411 KB
Loading

img/codegen_logo_3.png

118 KB
Loading
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
# coding=utf-8
2+
# Copyright 2021 The EleutherAI and HuggingFace Teams. All rights reserved.
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
# Modified configuration implementation based on https://github.com/huggingface/transformers/blob/main/src/transformers/models/gptj/configuration_gptj.py
17+
18+
from transformers.configuration_utils import PretrainedConfig
19+
from transformers.utils import logging
20+
21+
logger = logging.get_logger(__name__)
22+
23+
24+
class CodeGenConfig(PretrainedConfig):
25+
model_type = "codegen"
26+
27+
def __init__(
28+
self,
29+
vocab_size=50400,
30+
n_positions=2048,
31+
n_ctx=2048,
32+
n_embd=4096,
33+
n_layer=28,
34+
n_head=16,
35+
rotary_dim=64,
36+
n_inner=None,
37+
activation_function="gelu_new",
38+
resid_pdrop=0.0,
39+
embd_pdrop=0.0,
40+
attn_pdrop=0.0,
41+
layer_norm_epsilon=1e-5,
42+
initializer_range=0.02,
43+
scale_attn_weights=True,
44+
gradient_checkpointing=False,
45+
use_cache=True,
46+
bos_token_id=50256,
47+
eos_token_id=50256,
48+
**kwargs
49+
):
50+
super().__init__(bos_token_id=bos_token_id, eos_token_id=eos_token_id, **kwargs)
51+
52+
self.vocab_size = vocab_size
53+
self.n_ctx = n_ctx
54+
self.n_positions = n_positions
55+
self.n_embd = n_embd
56+
self.n_layer = n_layer
57+
self.n_head = n_head
58+
self.n_inner = n_inner
59+
self.rotary_dim = rotary_dim
60+
self.activation_function = activation_function
61+
self.resid_pdrop = resid_pdrop
62+
self.embd_pdrop = embd_pdrop
63+
self.attn_pdrop = attn_pdrop
64+
self.layer_norm_epsilon = layer_norm_epsilon
65+
self.initializer_range = initializer_range
66+
self.gradient_checkpointing = gradient_checkpointing
67+
self.scale_attn_weights = scale_attn_weights
68+
self.use_cache = use_cache
69+
70+
self.bos_token_id = bos_token_id
71+
self.eos_token_id = eos_token_id
72+
73+
@property
74+
def max_position_embeddings(self):
75+
return self.n_positions
76+
77+
@property
78+
def hidden_size(self):
79+
return self.n_embd
80+
81+
@property
82+
def num_attention_heads(self):
83+
return self.n_head
84+
85+
@property
86+
def num_hidden_layers(self):
87+
return self.n_layer

0 commit comments

Comments
 (0)