Common reasons code insights may not match search results
There are a few reasons why chart data series' most recent datapoint may show you a different number of match counts than the same search query run in Sourcegraph manually.
If the chart data point shows higher counts than a manual search
[For versions pre-3.40] Not including fork:no
and archived:no
in your insight query
Because code insights historical search defaults to fork:yes
and archived:yes
, but a Sourcegraph search via the web interface or CLI does not, it may be that your insight data series is including results from repositories that are excluded from a Sourcegraph search. Try running the same search again manually with fork:yes
and archived:yes
filters.
NOTE: 3.40+ version defaults to
fork:no
andarchived:no
, the same way the search UI does.
Manual search will not include unindexed repositories
All repositories in a historical search are unindexed, but a manual Sourcegraph search only includes indexed repositories. It's possible your manual searches are missing results from unindexed repositories.
To investigate this, one can compare the list of repositories in the manual search (use a select:repository
filter) with the list of repositories in the insight series_points
database table. To see why a repository may not be indexing, refer to this guide.
If the chart data point shows lower counts than a manual search
New matches created since the insight datapoint ran
Currently, a data series' most recent datapoint defaults to the end of the prior month. It's possible that in the time between when your insight ran and when you ran a manual search, new matches have been added to your codebase. To confirm this, you can run type:diff
or type:commit
searches using the after:
filter, but note that those filters only support up to 10,000 repositories, so you may first need to limit your search repository set.
NOTE: Future releases of Code Insights may include an always-up-to-date present-time point.
Repository timeouts caused a datapoint to miss results
If your code insight is very large, it is possible that a few (<1% in 100+ manual tests over 26,000 repositories) repositories failed to return match counts due to timing out while searching. To check this, you can run the following GraphQL query in the Sourcegraph GraphQL API:
query debug {
insightViews(id: "INSIGHT_ID") {
nodes {
dataSeries {
label
status {
pendingJobs
completedJobs
failedJobs
}
}
}
}
insightViewDebug(id: "INSIGHT_ID") {
raw
}
}
where INSIGHT_ID
can be found in the "edit" page for the insight (selectable from the three-dot dropdown on the insight) after ...edit/
. It will look like https://yourdomain.sourcegraph.com/insights/edit/INSIGHT_ID?dashboardId=all
. The INSIGHT_ID
can also be found in the url of the single insight view found by clicking on the title of the insight. The ID will be in the url, for example, https://sourcegraph.yourdomain.com/insights/insight/{INSIGHT_ID}
If there are failedJobs
, there may be timeouts or similar issues affecting your insight.
insightViewDebug
was added in 4.2 to give you more raw information on your insight.