Skip to content

Commit ef4c5eb

Browse files
author
twitter-team
committed
Twitter Recommendation Algorithm
Please note we have force-pushed a new initial commit in order to remove some publicly-available Twitter user information. Note that this process may be required in the future.
0 parents  commit ef4c5eb

File tree

5,364 files changed

+460239
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

5,364 files changed

+460239
-0
lines changed

.gitignore

+2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
.DS_Store
2+

COPYING

+661
Large diffs are not rendered by default.

README.md

+39
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# Twitter Recommendation Algorithm
2+
3+
The Twitter Recommendation Algorithm is a set of services and jobs that are responsible for constructing and serving the
4+
Home Timeline. For an introduction to how the algorithm works, please refer to our [engineering blog](https://blog.twitter.com/engineering/en_us/topics/open-source/2023/twitter-recommendation-algorithm). The
5+
diagram below illustrates how major services and jobs interconnect.
6+
7+
![](docs/system-diagram.png)
8+
9+
These are the main components of the Recommendation Algorithm included in this repository:
10+
11+
| Type | Component | Description |
12+
|------------|------------|------------|
13+
| Feature | [simclusters-ann](simclusters-ann/README.md) | Community detection and sparse embeddings into those communities. |
14+
| | [TwHIN](https://github.com/twitter/the-algorithm-ml/blob/main/projects/twhin/README.md) | Dense knowledge graph embeddings for Users and Tweets. |
15+
| | [trust-and-safety-models](trust_and_safety_models/README.md) | Models for detecting NSFW or abusive content. |
16+
| | [real-graph](src/scala/com/twitter/interaction_graph/README.md) | Model to predict likelihood of a Twitter User interacting with another User. |
17+
| | [tweepcred](src/scala/com/twitter/graph/batch/job/tweepcred/README) | Page-Rank algorithm for calculating Twitter User reputation. |
18+
| | [recos-injector](recos-injector/README.md) | Streaming event processor for building input streams for [GraphJet](https://github.com/twitter/GraphJet) based services. |
19+
| | [graph-feature-service](graph-feature-service/README.md) | Serves graph features for a directed pair of Users (e.g. how many of User A's following liked Tweets from User B). |
20+
| Candidate Source | [search-index](src/java/com/twitter/search/README.md) | Find and rank In-Network Tweets. ~50% of Tweets come from this candidate source. |
21+
| | [cr-mixer](cr-mixer/README.md) | Coordination layer for fetching Out-of-Network tweet candidates from underlying compute services. |
22+
| | [user-tweet-entity-graph](src/scala/com/twitter/recos/user_tweet_entity_graph/README.md) (UTEG)| Maintains an in memory User to Tweet interaction graph, and finds candidates based on traversals of this graph. This is built on the [GraphJet](https://github.com/twitter/GraphJet) framework. Several other GraphJet based features and candidate sources are located [here](src/scala/com/twitter/recos) |
23+
| | [follow-recommendation-service](follow-recommendations-service/README.md) (FRS)| Provides Users with recommendations for accounts to follow, and Tweets from those accounts. |
24+
| Ranking | [light-ranker](src/python/twitter/deepbird/projects/timelines/scripts/models/earlybird/README.md) | Light ranker model used by search index (Earlybird) to rank Tweets. |
25+
| | [heavy-ranker](https://github.com/twitter/the-algorithm-ml/blob/main/projects/home/recap/README.md) | Neural network for ranking candidate tweets. One of the main signals used to select timeline Tweets post candidate sourcing. |
26+
| Tweet mixing & filtering | [home-mixer](home-mixer/README.md) | Main service used to construct and serve the Home Timeline. Built on [product-mixer](product-mixer/README.md) |
27+
| | [visibility-filters](visibilitylib/README.md) | Responsible for filtering Twitter content to support legal compliance, improve product quality, increase user trust, protect revenue through the use of hard-filtering, visible product treatments, and coarse-grained downranking. |
28+
| | [timelineranker](timelineranker/README.md) | Legacy service which provides relevance-scored tweets from the Earlybird Search Index and UTEG service. |
29+
| Software framework | [navi](navi/navi/README.md) | High performance, machine learning model serving written in Rust. |
30+
| | [product-mixer](product-mixer/README.md) | Software framework for building feeds of content. |
31+
| | [twml](twml/README.md) | Legacy machine learning framework built on TensorFlow v1. |
32+
33+
We include Bazel BUILD files for most components, but not a top level BUILD or WORKSPACE file.
34+
35+
## Contributing
36+
37+
We invite the community to submit GitHub issues and pull requests for suggestions on improving the recommendation algorithm. We are working on tools to manage these suggestions and sync changes to our internal repository. Any security concerns or issues should be routed to our official [bug bounty program](https://hackerone.com/twitter) through HackerOne. We hope to benefit from the collective intelligence and expertise of the global community in helping us identify issues and suggest improvements, ultimately leading to a better Twitter.
38+
39+
Read our blog on the open source initiative [here](https://blog.twitter.com/en_us/topics/company/2023/a-new-era-of-transparency-for-twitter).
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
target(
2+
name = "faiss",
3+
dependencies = [
4+
"ann/src/main/java/com/twitter/ann/faiss/swig:swig-artifactory",
5+
],
6+
)
7+
8+
java_library(
9+
name = "swig-native-utils",
10+
sources = ["*.java"],
11+
compiler_option_sets = ["fatal_warnings"],
12+
platform = "java8",
13+
tags = ["bazel-compatible"],
14+
dependencies = [],
15+
)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,151 @@
1+
package com.twitter.ann.faiss;
2+
3+
import java.io.File;
4+
import java.io.FileNotFoundException;
5+
import java.io.IOException;
6+
import java.io.InputStream;
7+
import java.nio.file.Files;
8+
import java.nio.file.StandardCopyOption;
9+
import java.util.Locale;
10+
11+
public final class NativeUtils {
12+
13+
private static final int MIN_PREFIX_LENGTH = 3;
14+
public static final String NATIVE_FOLDER_PATH_PREFIX = "nativeutils";
15+
16+
public static File temporaryDir;
17+
18+
private NativeUtils() {
19+
}
20+
21+
private static File unpackLibraryFromJarInternal(String path) throws IOException {
22+
if (null == path || !path.startsWith("/")) {
23+
throw new IllegalArgumentException("The path has to be absolute (start with '/').");
24+
}
25+
26+
String[] parts = path.split("/");
27+
String filename = (parts.length > 1) ? parts[parts.length - 1] : null;
28+
29+
if (filename == null || filename.length() < MIN_PREFIX_LENGTH) {
30+
throw new IllegalArgumentException("The filename has to be at least 3 characters long.");
31+
}
32+
33+
if (temporaryDir == null) {
34+
temporaryDir = createTempDirectory(NATIVE_FOLDER_PATH_PREFIX);
35+
temporaryDir.deleteOnExit();
36+
}
37+
38+
File temp = new File(temporaryDir, filename);
39+
40+
try (InputStream is = NativeUtils.class.getResourceAsStream(path)) {
41+
Files.copy(is, temp.toPath(), StandardCopyOption.REPLACE_EXISTING);
42+
} catch (IOException e) {
43+
temp.delete();
44+
throw e;
45+
} catch (NullPointerException e) {
46+
temp.delete();
47+
throw new FileNotFoundException("File " + path + " was not found inside JAR.");
48+
}
49+
50+
return temp;
51+
}
52+
53+
/**
54+
* Unpack library from JAR into temporary path
55+
*
56+
* @param path The path of file inside JAR as absolute path (beginning with
57+
* '/'), e.g. /package/File.ext
58+
* @throws IOException If temporary file creation or read/write
59+
* operation fails
60+
* @throws IllegalArgumentException If source file (param path) does not exist
61+
* @throws IllegalArgumentException If the path is not absolute or if the
62+
* filename is shorter than three characters
63+
* (restriction of
64+
* {@link File#createTempFile(java.lang.String, java.lang.String)}).
65+
* @throws FileNotFoundException If the file could not be found inside the
66+
* JAR.
67+
*/
68+
public static void unpackLibraryFromJar(String path) throws IOException {
69+
unpackLibraryFromJarInternal(path);
70+
}
71+
72+
/**
73+
* Loads library from current JAR archive
74+
* <p>
75+
* The file from JAR is copied into system temporary directory and then loaded.
76+
* The temporary file is deleted after
77+
* exiting.
78+
* Method uses String as filename because the pathname is "abstract", not
79+
* system-dependent.
80+
*
81+
* @param path The path of file inside JAR as absolute path (beginning with
82+
* '/'), e.g. /package/File.ext
83+
* @throws IOException If temporary file creation or read/write
84+
* operation fails
85+
* @throws IllegalArgumentException If source file (param path) does not exist
86+
* @throws IllegalArgumentException If the path is not absolute or if the
87+
* filename is shorter than three characters
88+
* (restriction of
89+
* {@link File#createTempFile(java.lang.String, java.lang.String)}).
90+
* @throws FileNotFoundException If the file could not be found inside the
91+
* JAR.
92+
*/
93+
public static void loadLibraryFromJar(String path) throws IOException {
94+
File temp = unpackLibraryFromJarInternal(path);
95+
96+
try (InputStream is = NativeUtils.class.getResourceAsStream(path)) {
97+
Files.copy(is, temp.toPath(), StandardCopyOption.REPLACE_EXISTING);
98+
} catch (IOException e) {
99+
temp.delete();
100+
throw e;
101+
} catch (NullPointerException e) {
102+
temp.delete();
103+
throw new FileNotFoundException("File " + path + " was not found inside JAR.");
104+
}
105+
106+
try {
107+
System.load(temp.getAbsolutePath());
108+
} finally {
109+
temp.deleteOnExit();
110+
}
111+
}
112+
113+
private static File createTempDirectory(String prefix) throws IOException {
114+
String tempDir = System.getProperty("java.io.tmpdir");
115+
File generatedDir = new File(tempDir, prefix + System.nanoTime());
116+
117+
if (!generatedDir.mkdir()) {
118+
throw new IOException("Failed to create temp directory " + generatedDir.getName());
119+
}
120+
121+
return generatedDir;
122+
}
123+
124+
public enum OSType {
125+
Windows, MacOS, Linux, Other
126+
}
127+
128+
protected static OSType detectedOS;
129+
130+
/**
131+
* detect the operating system from the os.name System property and cache
132+
* the result
133+
*
134+
* @returns - the operating system detected
135+
*/
136+
public static OSType getOperatingSystemType() {
137+
if (detectedOS == null) {
138+
String osname = System.getProperty("os.name", "generic").toLowerCase(Locale.ENGLISH);
139+
if ((osname.contains("mac")) || (osname.contains("darwin"))) {
140+
detectedOS = OSType.MacOS;
141+
} else if (osname.contains("win")) {
142+
detectedOS = OSType.Windows;
143+
} else if (osname.contains("nux")) {
144+
detectedOS = OSType.Linux;
145+
} else {
146+
detectedOS = OSType.Other;
147+
}
148+
}
149+
return detectedOS;
150+
}
151+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
/* ----------------------------------------------------------------------------
2+
* This file was automatically generated by SWIG (http://www.swig.org).
3+
* Version 4.0.2
4+
*
5+
* Do not make changes to this file unless you know what you are doing--modify
6+
* the SWIG interface file instead.
7+
* ----------------------------------------------------------------------------- */
8+
9+
package com.twitter.ann.faiss;
10+
11+
public class AlignedTableFloat32 {
12+
private transient long swigCPtr;
13+
protected transient boolean swigCMemOwn;
14+
15+
protected AlignedTableFloat32(long cPtr, boolean cMemoryOwn) {
16+
swigCMemOwn = cMemoryOwn;
17+
swigCPtr = cPtr;
18+
}
19+
20+
protected static long getCPtr(AlignedTableFloat32 obj) {
21+
return (obj == null) ? 0 : obj.swigCPtr;
22+
}
23+
24+
@SuppressWarnings("deprecation")
25+
protected void finalize() {
26+
delete();
27+
}
28+
29+
public synchronized void delete() {
30+
if (swigCPtr != 0) {
31+
if (swigCMemOwn) {
32+
swigCMemOwn = false;
33+
swigfaissJNI.delete_AlignedTableFloat32(swigCPtr);
34+
}
35+
swigCPtr = 0;
36+
}
37+
}
38+
39+
public void setTab(SWIGTYPE_p_faiss__AlignedTableTightAllocT_float_32_t value) {
40+
swigfaissJNI.AlignedTableFloat32_tab_set(swigCPtr, this, SWIGTYPE_p_faiss__AlignedTableTightAllocT_float_32_t.getCPtr(value));
41+
}
42+
43+
public SWIGTYPE_p_faiss__AlignedTableTightAllocT_float_32_t getTab() {
44+
long cPtr = swigfaissJNI.AlignedTableFloat32_tab_get(swigCPtr, this);
45+
return (cPtr == 0) ? null : new SWIGTYPE_p_faiss__AlignedTableTightAllocT_float_32_t(cPtr, false);
46+
}
47+
48+
public void setNumel(long value) {
49+
swigfaissJNI.AlignedTableFloat32_numel_set(swigCPtr, this, value);
50+
}
51+
52+
public long getNumel() {
53+
return swigfaissJNI.AlignedTableFloat32_numel_get(swigCPtr, this);
54+
}
55+
56+
public static long round_capacity(long n) {
57+
return swigfaissJNI.AlignedTableFloat32_round_capacity(n);
58+
}
59+
60+
public AlignedTableFloat32() {
61+
this(swigfaissJNI.new_AlignedTableFloat32__SWIG_0(), true);
62+
}
63+
64+
public AlignedTableFloat32(long n) {
65+
this(swigfaissJNI.new_AlignedTableFloat32__SWIG_1(n), true);
66+
}
67+
68+
public long itemsize() {
69+
return swigfaissJNI.AlignedTableFloat32_itemsize(swigCPtr, this);
70+
}
71+
72+
public void resize(long n) {
73+
swigfaissJNI.AlignedTableFloat32_resize(swigCPtr, this, n);
74+
}
75+
76+
public void clear() {
77+
swigfaissJNI.AlignedTableFloat32_clear(swigCPtr, this);
78+
}
79+
80+
public long size() {
81+
return swigfaissJNI.AlignedTableFloat32_size(swigCPtr, this);
82+
}
83+
84+
public long nbytes() {
85+
return swigfaissJNI.AlignedTableFloat32_nbytes(swigCPtr, this);
86+
}
87+
88+
public SWIGTYPE_p_float get() {
89+
long cPtr = swigfaissJNI.AlignedTableFloat32_get__SWIG_0(swigCPtr, this);
90+
return (cPtr == 0) ? null : new SWIGTYPE_p_float(cPtr, false);
91+
}
92+
93+
public SWIGTYPE_p_float data() {
94+
long cPtr = swigfaissJNI.AlignedTableFloat32_data__SWIG_0(swigCPtr, this);
95+
return (cPtr == 0) ? null : new SWIGTYPE_p_float(cPtr, false);
96+
}
97+
98+
}

0 commit comments

Comments
 (0)