Skip to content

[Hubspot] Add de-duplication logic in Hubspot #2924

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

harsh-joshi99
Copy link
Contributor

This PR adds a function to de-duplicate properties in hubspot's custom object v2 action.
Hubspot does not allow properties with same ID.

JIRA -> https://twilio-engineering.atlassian.net/browse/STRATCONN-5746?atlOrigin=eyJpIjoiMzU3ZWIxYzkzODU2NDIyMjg0MWFkYzM4OTc2MjEwZjkiLCJwIjoiaiJ9

Testing

https://docs.google.com/document/d/17cett1m8KfzEvLxxBfzbhiFBgeRwDhzMZYnTzuRZG30/edit?usp=sharing

Include any additional information about the testing you have completed to
ensure your changes behave as expected. For a speedy review, please check
any of the tasks you completed below during your testing.

  • Added unit tests for new functionality
  • Tested end-to-end using the local server
  • [If destination is already live] Tested for backward compatibility of destination. Note: New required fields are a breaking change.
  • [Segmenters] Tested in the staging environment
  • [Segmenters] [If applicable for this change] Tested for regression with Hadron.

Copy link

codecov bot commented May 20, 2025

Codecov Report

Attention: Patch coverage is 28.57143% with 25 lines in your changes missing coverage. Please review.

Project coverage is 78.05%. Comparing base (90998c5) to head (b8d9b76).
Report is 18 commits behind head on main.

Files with missing lines Patch % Lines
...pot/upsertObject/functions/validation-functions.ts 21.87% 22 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2924      +/-   ##
==========================================
- Coverage   78.11%   78.05%   -0.07%     
==========================================
  Files        1050     1064      +14     
  Lines       19438    19629     +191     
  Branches     3734     3770      +36     
==========================================
+ Hits        15184    15321     +137     
- Misses       2972     3021      +49     
- Partials     1282     1287       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@joe-ayoub-segment
Copy link
Contributor

Hi @harsh-joshi99 please let me know when this PR is ready to review. I think I should be involved with this PR as I contributed a lot to the Actions being updated.

@joe-ayoub-segment
Copy link
Contributor

Hi @harsh-joshi99 can we schedule a call please?
I'm concerned about this change and would like to discuss it with you.

type: 'string',
required: false,
default: { '@path': '$.timestamp' },
dynamic: true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @harsh-joshi99 why is this set to be a dynamic field?

@@ -52,6 +52,10 @@ const send = async (
subscriptionMetadata?: SubscriptionMetadata,
statsContext?: StatsContext
) => {
if (syncMode === 'upsert') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't we also have the same issue if the customer is using syncMode === 'update' ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also @harsh-joshi99 @varadarajan-tw can we put this code behind a feature flag to control rollout?

const id = incoming.object_details?.id_field_value
if (!id) continue

const incomingTimestamp = incoming.timestamp ? new Date(incoming.timestamp).getTime() : 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the customer misconfigures the timestamp field or sends something that isn't a timestamp, then you're falling back to creating a timestamp. However the timestamp you create might correctly order the event.

It might be better to grab the timestamp directly from the raw payload data instead. i.e not via a field. Note that there will always be a timestamp in the raw payload data.

Have a look at how to access the rawData object here.

At least this way you will always be guaranteed to order the events in the order they arrived at the Segment ingestion point.

Copy link
Contributor

@joe-ayoub-segment joe-ayoub-segment Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what I'm recommending is:
const incomingTimestamp = isTimestamp(incoming.timestamp) ? incoming.timestamp : getTimestampFromRawPayload()

}

const existing = mergedMap.get(id)!
const existingTimestamp = existing.timestamp ? new Date(existing.timestamp).getTime() : 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should ensure that there's always a timestamp in the payload that gets added to the mergeMap (see my comment above). This means that if there is an existing payload, you can assume it has a timestamp.

// Merge properties
const mergedProps: Record<string, unknown> = { ...existing.properties }
for (const [key, value] of Object.entries(incoming.properties || {})) {
if (!(key in mergedProps) || incomingTimestamp >= existingTimestamp) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this timestamp comparison work if the values are strings (e.g. an ISO8601 string)?

incomingTimestamp >= existingTimestamp

Does it not need to be converted into a format where it can be compared first?

const existingAssociations = existing.associations || []
const incomingAssociations = incoming.associations || []

const associationKey = (assoc: any) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the linter not complain about the user of the 'any' type here?

@joe-ayoub-segment
Copy link
Contributor

hi @harsh-joshi99 and @varadarajan-tw I reviewed the PR and left some comments.

As well as the comments I think it would be a good idea to add some unit tests to test the new mergeAndDeduplicateById function. And I recommend we put this code behind a feature flag - but that's up to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants